A novel phantom technique for evaluating the performance of PET auto-segmentation methods in delineating heterogeneous and irregular lesions

Berthon, B; Marshall, C; Holmes, R; Spezi, E

doi:10.1186/s40658-015-0116-1

Open access
Published: 27 June 2015

A novel phantom technique for evaluating the performance of PET auto-segmentation methods in delineating heterogeneous and irregular lesions

B Berthon¹,
C Marshall¹,
R Holmes² &
…
E Spezi³

EJNMMI Physics volume 2, Article number: 13 (2015) Cite this article

2607 Accesses
12 Citations
7 Altmetric
Metrics details

Abstract

Background

Positron Emission Tomography (PET)-based automatic segmentation (PET-AS) methods can improve tumour delineation for radiotherapy treatment planning, particularly for Head and Neck (H&N) cancer. Thorough validation of PET-AS on relevant data is currently needed. Printed subresolution sandwich (SS) phantoms allow modelling heterogeneous and irregular tracer uptake, while providing reference uptake data. This work aimed to demonstrate the usefulness of the printed SS phantom technique in recreating complex realistic H&N radiotracer uptake for evaluating several PET-AS methods.

Methods

Ten SS phantoms were built from printouts representing 2mm-spaced slices of modelled H&N uptake, printed using black ink mixed with 18F-fluorodeoxyglucose, and stacked between 2mm thick plastic sheets. Spherical lesions were modelled for two contrasted uptake levels, and irregular and spheroidal tumours were modelled for homogeneous, and heterogeneous uptake including necrotic patterns. The PET scans acquired were segmented with ten custom PET-AS methods: adaptive iterative thresholding (AT), region growing, clustering applied to 2 to 8 clusters, and watershed transform-based segmentation. The difference between the resulting contours and the ground truth from the image template was evaluated using the Dice Similarity Coefficient (DSC), Sensitivity and Positive Predictive value.

Results

Realistic H&N images were obtained within 90 min of preparation. The sensitivity of binary PET-AS and clustering using small numbers of clusters dropped for highly heterogeneous spheres. The accuracy of PET-AS methods dropped between 4% and 68% for irregular lesions compared to spheres of the same volume. For each geometry and uptake modelled with the SS phantoms, we report the number of clusters resulting in optimal segmentation. Radioisotope distributions representing necrotic uptakes proved most challenging for most methods. Two PET-AS methods did not include the necrotic region in the segmented volume.

Conclusions

Printed SS phantoms allowed identifying advantages and drawbacks of the different methods, determining the most robust PET-AS for the segmentation of heterogeneities and complex geometries, and quantifying differences across methods in the delineation of necrotic lesions. The printed SS phantom technique provides key advantages in the development and evaluation of PET segmentation methods and has a future in the field of radioisotope imaging.

Background

Positron emission tomography (PET) imaging using ¹⁸F-fluorodeoxyglucose (¹⁸F-FDG) allows the observation of metabolic pathways in the human body and is therefore increasingly used for gross tumour volume (GTV) delineation for a number of cancers, including head and neck (H&N). The use of PET-based automatic segmentation (PET-AS) methods could be useful in radiotherapy treatment planning and in the prediction of response to therapy, for which accurate segmentation of the tumours is crucial. Some studies have shown that PET-AS methods which perform well with homogeneous lesions show poor accuracy in the case of more realistic inhomogeneous and irregular clinical lesions, using clinical or simulated data [1, 2], in particular when using fixed thresholding methods, which are highly dependent on the image type [3]. The use of advanced PET-AS beyond thresholding was recommended to reduce dosimetry errors, especially in the case of heterogeneous tumours [4]. Although an increasingly large number of studies have investigated and compared the performance of existing PET segmentation methods, the target objects used are most frequently obtained with plastic fillable phantoms, including inserts of spherical geometry [5, 6]. Plastic phantoms combine the advantage of a known ground truth and a physical object, which can be scanned using patient protocols. However, these phantoms are limited to modelling simplified and clinically unrealistic uptake patterns. Furthermore, due to their fixed regular geometry, they do not allow modelling intra-tumour heterogeneity, which is a key element of clinical lesions. In addition, we have shown in a previous work that the presence of thick plastic walls encompassing the target object has an important effect on the evaluation of PET-AS methods [7]. Therefore, such phantoms are not adequate for studies requiring accurate modelling of patient metabolic uptake [8, 9], particularly in the H&N where the intricate anatomy and heterogeneity occurring in both background and tumour make the task of delineating the GTV very challenging. A small number of phantom studies have used deformed objects or molecular sieves to model non-spherical lesions [10–13] or have included absorbent material into their inserts to model inhomogeneities [14]. However, these techniques did not allow modelling combined heterogeneity and geometrical complexity in a controlled and reproducible manner and most still included the presence of glass or plastic walls. To our knowledge, heterogeneity and complex geometry have not yet been modelled in combination in realistic phantoms.

The use of printed radioactive uptake patterns has been investigated in the literature as a promising technique for generating radioactive sources for PET [15–17]. This allows modelling any desired tracer distribution while providing reference data or ground truth useful for a number of quality assurance purposes. A quantitative calibration study of the printing method was described in detail by Markiewicz et al. [17] for generating single-slice patterns with applications to brain imaging studies. However, the stacking of several printed patterns to produce a 3D object for quantitative applications was not investigated. Recent work by Holmes et al. used a 3D-printed phantom, named subresolution sandwich (SS) phantom, for the generation of realistic SPECT brain images [18]. However, to our knowledge, the use of stacked ¹⁸F-FDG-printed uptake patterns to generate a 3D PET phantom has not yet been investigated nor used for the evaluation of PET segmentation techniques.

This work aimed at demonstrating the advantages of using irregular and heterogeneous target objects to evaluate and compare the performance of PET-AS methods. For this purpose, we calibrated and used a novel 3D-printed SS phantom technique to acquire realistic image data. We used the PET images obtained by scanning the 3D-printed SS phantoms to evaluate and compare a set of ten PET-AS methods representing different medical image segmentation approaches. We have investigated the benefits of using the printed SS phantom compared to a standard plastic fillable phantom for testing PET-AS methods intended for radiotherapy treatment planning.

Methods

Experimental method and reproducibility

Preparation of the SS phantom

The printed SS phantom structure consists of 120 oval poly(methyl methacrylate) (PMMA) sheet of 2-mm thickness, corresponding to axial slices, which can be assembled using three plastic rods attached to a cylindrical PMMA support. The radioactive part of the phantom, when containing radioactive printouts, can reach a maximum length of 240 mm. The paper and PMMA are held together by a thick plastic sheet, which is screwed on top of the phantom once assembled, allowing it to be scanned as a 3D physical object. A picture of the assembled 3D phantom is shown on Fig. 1a, along with the position of the phantom in the scanner on Fig. 1b.

Plain A4 80-mg paper was used, cut to 168 mm × 197 mm to fit into the phantom and hole punched in order for it to be assembled on the rods. Uptake printouts were generated as grey-level 3D images in Matlab (The MathWorks Inc., Natick, USA), resampled to 2-mm slices and printed on a HP deskjet 990 cxi, using drop-on-demand thermal inkjet printing. The advantage of this type of equipment is its use of refillable ink cartridges, making it possible to add the desired quantity of radiotracer to the same cartridge before each set of experiments. The printing settings “normal” and “black & white” were chosen in order to minimise the printing time (and therefore the radiotracer decay and user exposure to gamma emissions) while ensuring a good printing quality. The corresponding printing speed is 6.5 pages per minute. The printing resolution used throughout this work was 600 × 600 dpi.

The cartridge was filled with the desired ¹⁸F-FDG volume and topped with black ink. Various ¹⁸F-FDG activity concentrations were used for the different experiments. The images were printed in a hot cell (Gravatom Engineering Systems Ltd, Southampton, UK), after leaving the cartridge with its dispensing head down for 20 min to homogenize its contents, as recommended by the manufacturer. All operations including filling the ink cartridge and assembling the phantom were done behind a lead glass shield (Bright Technologies Ltd, Sheffield, UK). Any inaccuracy in the positioning of the pattern on the paper was corrected for by aligning markers printed as part of the pattern to reference markers drawn on the PMMA sheet. The cross-shaped markers were printed with the same radioactive ink as the printout and were visible on the PET image obtained. The phantom was scanned immediately after assembling on a GE 690 Discovery PET/CT scanner for two bed positions with the protocol used for clinical whole body diagnostic scans, given in Table 1. Both low-dose CT (used for attenuation correction) and high-resolution CT were acquired. Operator exposure to the radioactive tracer was controlled using standard safety equipment (e.g. lead glass shields, shielded syringe carriers, hot cell) and monitored with electronic portable dosimeters (RAD-60S, RADOS Technology, Oy, Finland). We assessed the homogeneity and reproducibility of the printing to ensure reliable printing of the desired uptake distributions.

Table 1 Parameters used for the acquisition and reconstruction of PET scans

Full size table

The printing, assembling and scanning of the SS phantom took approximately 80 min for each experiment. This included (a) filling the cartridge (10 min), (b) leaving the contents of the cartridge to homogenize (10 min), (c) printing (30 min), (d) assembling (20 min) and (e) scanning (10 min). The whole body radiation dose to the operator for one session with a single scan was 4 μSv.

Printing quality

To assess the printing homogeneity, we printed two 30 mm × 200 mm stripes with a mixture of black ink and radiotracer along both width and length of an A4 paper. The number of counts was measured along these stripes, using thin layer chromatography (TLC) (iScan, Canberra, Uppsala, Sweden) at a speed of 1 mm/s.

The printing reproducibility was assessed using a 100 × 100 mm homogeneous square. This was printed with the same grey level and radioactive ink mixture 66 consecutive times. The phantom obtained by stacking these printouts was then scanned, and the resulting PET image was analysed. A region of interest (ROI) positioned at the centre of each square was reproduced on 60 consecutive slices (the superior and inferior edges of the phantom were excluded) of the PET image and the mean intensity of each ROI was measured.

Printer calibration

Additional experiments aimed at determining the relationship between grey levels specified to the printer and obtained on the PET image and derive an adequate calibration to ensure that the desired tissue uptake ratios were carried out. In this case, ten grey levels ranging from 10 to 100 % of the maximum printed intensity were defined and for each grey level, a 140 mm × 160 mm homogeneous rectangle was printed five times with the same mixture of black ink and ¹⁸F-FDG. The paper was weighed before and after printing to measure the amount of ink added by the printer. The weight of ink printed for each grey level, averaged over the five instances, was then plotted against the grey-level values specified. Furthermore, 20 distinct homogeneous 30 mm × 30 mm squares of grey-level values evenly spaced within 5 and 100 % were printed with the radioactive ink mixture. The number of counts detected across the different rectangles was then measured using the iScan TLC. Correction for radioactive decay was applied to compare all readings at the same time point. This process was repeated with three different activity concentrations in the ink at the time of measurement corresponding to different volumes of black ink added to 2 mL of the same radiotracer solution. The relationship between counts and the amount of ink printed on the paper was then derived.

In all experiments, the accuracy of the paper positioning in the phantom was assessed using radioactive cross-shaped markers printed at the top (T), left (L) and right (R) of the printout. The markers’ position on the acquired PET image was determined for each slice, as the highest intensity voxel in a 5 × 5 voxel square drawn around the imaged marker. For each one of the T, L and R markers, the difference in positioning with the average marker position was measured.

Generation of realistic 3D uptake maps

A first uptake map was generated to model six spherical tumours of diameters 10, 13, 17, 22, 28 and 38 mm, named S1, S2, S3, S4, S5 and S6, respectively, with two levels of intensity, with the difference between the highest (central) uptake and lowest uptake equal to the difference between the lowest tumour uptake and background. This uptake pattern is shown on Fig. 2b. The methods described in the next section were applied to the six images obtained.

We further aimed at using the printed SS phantom to generate realistic irregular and heterogeneous target lesions. For this purpose, a clinical tumour outline was extracted from an available H&N PET/CT scan using manual delineation. The background uptake was modelled by segmenting normal anatomical structures on the CT scan and assigning to each structure a grey-level value corresponding to its mean ¹⁸F-FDG uptake, measured on the PET image. Ellipsoidal outlines were also used for different experiments at the same locations as the irregular tumour outlines on the background printout template. These target lesions were modelled with a volume of 11 mL, which is large enough to allow better investigation of highly heterogeneous uptake patterns, such as necrotic centres encountered in large lymph nodes. The different images printed corresponded to the background image, in which one of the volumes (irregular tumour or ellipsoid) was inserted with a grey-level value representing the desired ¹⁸F-FDG uptake. The resulting templates were resampled to 2-mm slices in the superior-inferior direction of the H&N scan, in order to match the thickness of the PMMA sheets. This process allowed the retrieval of the modelled tumour contour from the final printout template, providing a ground truth for the evaluation of segmentation results on the PET image. Various tumour uptake distributions of the irregular and ellipsoidal lesions were modelled for a tumour-to-background ratio (TBR) of 4. These are shown for the irregular lesion on Fig. 2. The different uptake patterns included:

a)
Homogeneous uptake
b)
Two-level uptake as described above for the spherical lesions (only used for the irregular lesion)
c)
Heterogeneous Gaussian smoothed uptake: addition to the background uptake map of a homogeneous uptake smoothed with a Gaussian filter to model higher uptake at the centre
d)
Necrotic: homogeneous high uptake with no uptake at the centre of the tumour
e)
Necrotic Gaussian: necrotic uptake smoothed with a Gaussian filter

The phantoms obtained for each case were scanned with an activity concentration in the cartridge of about 6000 kBq/mL, as this provided a PET image with activities corresponding to the original PET scan.

Evaluation of PET-AS methods

In order to evaluate the performance of state-of-the-art PET-AS methods on heterogeneous target objects of complex geometry, we selected four advanced PET-AS approaches (Table 2) from the recent literature to represent some of the categories described by Bankman et al. [19]. One or more custom implementation of these approaches was written and optimised in house into a common framework using the Matlab package, with the Image Processing Toolbox available for testing. All approaches were implemented as fully automatic 3D algorithms except for WT, since previous work had shown better performance when implemented in 2D [20, 21]. The resulting segmentation methods have been described in more details in the previous work [22]. The clustering approach was implemented for a total number of clusters ranging between 2 and 8, leading to PET-AS methods named GCM2, GCM3, GCM4, GCM5, GCM6, GCM7 and GCM8 in this work. Each of these individual clustering algorithms identifies the lowest intensity cluster as the background and the remaining clusters as the tumour in a final step and provides a single contour for the tumour. This method is used because the aim of the segmentation in this study is to identify the whole lesion outline and because no heterogeneities are modelled in the close neighbourhood of the lesions.

Table 2 Description and name of PET-AS methods used in this study. The references correspond to recent publications using similar PET-AS algorithms

Full size table

The resulting ten PET-AS methods were applied for all target lesions to the region of the original scan corresponding to an extension of 10-mm margin of the true contour’s bounding box. The segmentation accuracy of each PET-AS was assessed by comparing the contour obtained to the true contour (extracted from the printout template) using the dice similarity coefficient (DSC) [23] which quantifies the similarity between reference and evaluated volume returning a score between 0 and 1. We used a DSC above 0.7 as an indicator of good overlap:

$$ \mathrm{D}\mathrm{S}\mathrm{C}=\frac{2*\left|A{\displaystyle \cap }B\right|}{\left|A\right|+\left|B\right|} $$

(1)

where A is the set of voxels in the reference volume and B is the set of voxels in the evaluated volume.

In addition, the sensitivity (S) and positive predictive value (PPV) were calculated with the following equations:

$$ S=\frac{\mathrm{TP}}{\mathrm{TP}+\mathrm{F}\mathrm{N}}=\frac{A{\displaystyle \cap }B}{A} $$

(2)

$$ \mathrm{P}\mathrm{P}\mathrm{V}=\frac{\mathrm{TP}}{\mathrm{TP}+\mathrm{F}\mathrm{P}}=\frac{A{\displaystyle \cap }B}{B} $$

(3)

with TP the true positives (voxels accurately classified), FN the false negatives (voxels in true contour A not included in B) and FP the false positives (voxels in contour B not included in true contour A).

For comparison purposes, the performance of the PET-AS methods was also evaluated using the commonly used NEMA IEC body phantom with spherical plastic inserts. In particular, the results obtained for the irregular lesion which had a volume of 5.9 mL were compared with the segmentation results obtained for the 5.6 mL sphere of the NEMA IEC body phantom scanned at a TBR of 4.