Variability in PET image quality and quantification measured with a permanently filled 68Ge-phantom: a multi-center study

Sipilä, O.; Liukkonen, J.; Halme, H.-L.; Tolvanen, T.; Sohlberg, A.; Hakulinen, M.; Manninen, A.-L.; Tahvanainen, K.; Tunninen, V.; Ollikainen, T.; Kangasmaa, T.; Kangasmäki, A.; Vuorela, J.

doi:10.1186/s40658-023-00551-w

Original research
Open access
Published: 16 June 2023

Variability in PET image quality and quantification measured with a permanently filled ⁶⁸Ge-phantom: a multi-center study

O. Sipilä ORCID: orcid.org/0009-0005-2471-8820¹,
J. Liukkonen²,
H.-L. Halme¹,
T. Tolvanen³,
A. Sohlberg⁴,
M. Hakulinen^5,6,
A.-L. Manninen^7,8,
K. Tahvanainen¹,
V. Tunninen⁹,
T. Ollikainen¹⁰,
T. Kangasmaa¹¹,
A. Kangasmäki¹² &
…
J. Vuorela¹³

EJNMMI Physics volume 10, Article number: 38 (2023) Cite this article

1419 Accesses
1 Altmetric
Metrics details

Abstract

Background

This study evaluated, as a snapshot, the variability in quantification and image quality (IQ) of the clinically utilized PET [¹⁸F]FDG whole-body protocols in Finland using a NEMA/IEC IQ phantom permanently filled with ⁶⁸Ge.

Methods

The phantom was imaged on 14 PET-CT scanners, including a variety of models from two major vendors. The variability of the recovery coefficients (RC_max, RC_mean and RC_peak) of the hot spheres as well as percent background variability (PBV), coefficient of variation of the background (COV_BG) and accuracy of corrections (AOC) were studied using images from clinical and standardized protocols with 20 repeated measurements. The ranges of the RCs were also compared to the limits of the EARL ¹⁸F standards 2 accreditation (EARL2). The impact of image noise on these parameters was studied using averaged images (AVIs).

Results

The largest variability in RC values of the routine protocols was found for the RC_max with a range of 68% and with 10% intra-scanner variability, decreasing to 36% when excluding protocols with suspected cross-calibration failure or without point-spread-function (PSF) correction. The RC ranges of individual hot spheres in routine or standardized protocols or AVIs fulfilled the EARL2 ranges with two minor exceptions, but fulfilling the exact EARL2 limits for all hot spheres was variable. RC_peak was less dependent on averaging and reconstruction parameters than RC_max and RC_mean. The PBV, COV_BG and AOC varied between 2.3–11.8%, 9.6–17.8% and 4.8–32.0%, respectively, for the routine protocols. The RC ranges, PBV and COV_BG were decreased when using AVIs. With AOC, when excluding routine protocols without PSF correction, the maximum value dropped to 15.5%.

Conclusion

The maximum variability of the RC values for the [¹⁸F]FDG whole-body protocols was about 60%. The RC ranges of properly cross-calibrated scanners with PSF correction fitted to the EARL2 RC ranges for individual sphere sizes, but fulfilling the exact RC limits would have needed further optimization. RC_peak was the most robust RC measure. Besides COV_BG, also RCs and PVB were sensitive to image noise.

Background

Positron emission tomography (PET) measures quantitative information on the distribution of a radioactive tracer in a patient, presented as activity concentration (Bq/ml) in the images. These measurements are used to compute standard uptake values (SUVs) by normalizing them with patient weight (or lean body mass) and injected activity [1]. The SUVs are commonly utilized for classifying abnormal tracer uptake as benign or malignant, as well as for follow-up of disease progress [2,3,4,5]. Thus, besides visual image quality, measured activity concentration should not significantly vary between PET scanners in which the patient might be imaged. Moreover, reference SUV values for disease stages should be reliably utilizable in all PET scanners. In practice, the choice of technical settings including imaging, image reconstruction and post-processing parameters could account up to 55% variability in the measured activity concentration [6]. In addition, variations in practical implementation including patient preparation may impact the result of the imaging study.

To test the performance characteristics of a PET scanner, the metrics in National Electrical Manufacturers Association (NEMA) NU 2 standards have been widely adopted, e. g. [7,8,9,10]. The newest version of this standard was published in 2018 [11]. To facilitate multicenter quantitative imaging studies, several programs and software tools for harmonizing recovery coefficients (RCs) of activity concentration in small hot objects have been implemented [12,13,14,15,16]. The RC is defined as the ratio of the activity concentration measured from the PET image to the known activity concentration of the object. Most commonly, the hot spheres in the NEMA/International Electrotechnical Commission (IEC) NU2 image quality (IQ) phantom [11] have been utilized for the RC measurements. Several definitions for RCs exist including maximum, mean and peak RC values [17]. Calibration of the activity meter (dose calibrator) utilized for cross-calibration of the PET scanner (e. g. [18]) as well as PET image noise and resolution may have strong influence on the RC values [19].

Besides hot object contrast, image noise and cold object contrast have a major effect on visual image quality and thus on valid interpretation of the image. Widely utilized PET IQ parameters include coefficient of variation of the background voxel values (COV_BG) [12, 18, 20], percent background variability (PBV) [11] and accuracy of corrections (AOC) [11]. Noise equivalent count rate (NECR) has been studied for noise level optimization of patient images, e. g. [21, 22], although with the modern iterative reconstruction methods the results have been quite variable. Moreover, radiomics models, as e. g. in [23], might be utilized for estimating IQ features directly from the clinical images.

When utilizing relatively short-lived PET isotopes, often ¹⁸F with a half-life of 1.8 h, separate filling of a phantom is usually required for every measurement session, and the measurement time is limited due to the decay of the activity. Thus, the variance in the measurement results may be influenced by the differences in phantom filling processes and in activity measurements. To avoid these limitations, NEMA IQ phantoms permanently filled with a relatively long half-life isotope of ⁶⁸Ge (271 d) have been utilized to study e. g. repeatability and reproducibility of serial PET measurements [24], noise and signal properties of reconstructions including point spread function (PSF) correction [25] as well as feasibility of using them in IQ assessment in multicenter clinical trials [26, 27].

In our work, a NEMA IQ phantom permanently filled with ⁶⁸Ge was imaged in Finnish PET centers to study differences in quantification and in image quality. The 14 PET scanners included in the study varied from older models without PSF correction or time of flight (TOF) available to digital systems from two major vendors. All scanners had integrated computed tomography (CT). Variations in RCs as well as in IQ parameters measurable with the NEMA IQ phantom, including COV_BG, PBV and AOC, were evaluated from the routinely used whole-body imaging protocols of each PET center and from standardized protocols. The ranges of RC values, proportional to the ranges of SUV values among Finnish PET centers, were evaluated, as well as the comparability of these ranges to the limits of EARL ¹⁸F standards 2 accreditation [28], which are referred to as EARL2 limits in the rest of the article. In addition, the impact of image noise on the RC and IQ results was studied.

Material and methods

Phantom imaging

A NEMA 2018 IQ phantom with ⁶⁸Ge was imaged in 11 Finnish PET centers during June 2019–January 2020. The total activity of the phantom varied from 30.6 to 17.5 MBq. The measurements were performed using 14 PET-CT scanners, including analog and digital systems from two major vendors (Tables 1, 2). The phantom included six hot spheres with diameters of 10, 13, 17, 22, 28 and 37 mm. The activity concentration ratio of the spheres to the background was 4:1. In addition, a cold lung insert was included.

Table 1 Scanners and imaging parameters for routine protocols 1r–14r

Full size table

Table 2 Standard protocols 1s–13s

Full size table

Routine protocols

In every PET-CT scanner, the local clinical imaging protocol for the whole body [¹⁸F]FDG studies was used. CT was used for attenuation correction. The main parameters of the 14 protocols are listed in Table 1. The routine protocols were numbered as 1r–14r. PET imaging was repeated 20 times during the same imaging session, except for protocol 14r there was only 10 repetitions due to technical reasons. In addition, imaging session for protocol 1r was repeated five times and protocol 13r three times during a period of 5.5 months for estimating the impact of intra-scanner variations on the measurements.

The imaging time for each session was adjusted according to the average activity concentration of the phantom in the imaging day, varying from 3.2 to 1.8 MBq/kg (Table 1). In addition, the imaging time of the phantom was linearly scaled according to the clinically used patient activity concentration (MBq/kg) of the whole body [¹⁸F]FDG studies in the particular PET center. The scaling also included the effect of slightly variable patient resting times utilized in different centers. The goal was to preserve the differences in the relative count rates of routine imaging protocols between the centers with different optimization strategies and scanners available. The scaled imaging times are also listed in Table 1. The time-activity-product (TAP) of the routine protocols varied between 4.6 and 9.0 min*MBq/kg. In scanners with stationary bed positions, two positions were imaged with the overlapping region placed in the middle of the hot spheres of the phantom. In Fig. 1a, axial slices in the middle plane of the hot spheres from six different routine protocols are shown.

Standard protocols

In addition, with scanners enabling PSF correction, the phantom was imaged 20 times with standardized imaging parameters. The standard protocols were numbered as 1s–6s and 8s–13s. The number is referring to the same scanner as in the routine protocols. Two of the scanners (7 and 14) did not enable PSF correction. In the standard protocols, only one bed position with the middle plane in the middle of the hot spheres was imaged. The imaging time was 5 min for the one bed position, or the corresponding bed speed was utilized in scanners with continuous bed motion. The images were reconstructed using ordered-subsets expectation–maximization (OSEM), TOF, PSF correction, matrix size of 256 × 256 and no filtering. Scatter, random and dead time corrections were enabled. The (number of iterations) * (number of subsets) as well as the slice thicknesses and pixel sizes were standardized as accurately as possible. The slight variations in these parameters can be found in Table 2.

Data processing

Image analysis was conducted using in-house developed automated MATLAB scripts (R2019b; The MathWorks, Inc., Natick, Massachusetts, USA).

Data sets

As every PET imaging protocol was repeated 20 (or 10) times during an imaging session, the analyses were performed 20 (or 10) times and the final result was reported as the mean value of these 20 (or 10) repetitions. In addition, an average image (AVI) of the 20 (or 10) repeated PET images was computed to simulate a very low noise image to be utilized in some of the analyses.

Coefficient of variation (COV)

Coefficient of variation was used to compare the results from different data sets. It was computed as the standard deviation of the results divided by the mean of them and multiplied by 100%.

Background bias correction

To exclude the impact of the cross-calibration of the scanner and/or calibration of the activity meter utilized in an individual PET center, images were corrected for the calibration biases before further analysis, if not stated otherwise. For the correction, a background bias correction factor (BBCF) was computed as the mean phantom background value from the PET image divided by the time corrected true activity concentration in the phantom background as stated in the calibration certificate of the phantom. The mean phantom background value was computed as the mean voxel value of the 60 background regions of interests (ROIs) with a 37 mm diameter (C_{B, 37 mm}) utilized also in the NEMA IQ analysis [11] (Fig. 1b).

Recovery coefficients (RCs)

To compute an RC, the maximum, mean or peak activity concentration [17] of a hot sphere was measured from a PET image and divided by the known activity concentration. The maximum activity concentration was computed as the maximum voxel value in the volume of interest (VOI) including all voxels inside a hot sphere. The mean activity concentration was computed as the mean voxel value in the VOI including voxels with values ≥ 50% of the maximum voxel value inside a hot sphere. The peak activity concentration was computed as the highest mean value in a spherical VOI with a diameter of 12 mm and the center voxel inside a hot sphere.

For all sphere sizes, the computed RCs from the 20 (or 10) repeated PET series were averaged to obtain the final RC_max, RC_mean and RC_peak values. The corresponding RC values were also computed from the AVIs.

Comparison of the RCs in the routine protocols

To estimate the range in SUV values between scanners and protocols routinely used in Finnish PET centers, the maximum range of RC_max, RC_mean and RC_peak values from the routine protocols were computed without background bias correction.

Intra-scanner variability vs. inter-scanner variability for the RC_max, RC_mean and RC_peak was studied by comparing the mean COV of the corresponding RCs from the five scanning sessions of protocol 1r and the three scanning session of protocol 13r to the mean COV from all 14 different protocols (1r – 14r). The mean COVs were computed as the mean of the COVs for the six different sized hot spheres.

Comparison of the routine and standard protocols to EARL2 limits

Routine and standard protocols with PSF correction and BBCF with variation of < 10% from the nominal value of 1 were included in the comparison to EARL 2 accreditation limits for the RC_max, RC_mean and RC_peak [28], which can be found in Table 3. The ranges of the maximum and minimum limits for the RC_max, RC_mean and RC_peak are also tabulated. The number of the routine and standard protocols as well as the corresponding AVIs fulfilling the EARL2 limits was reported. Moreover, the number of RC results fulfilling the EARL limits for an individual sphere size was counted. It was also checked for the individual spheres, whether the range of the RCs fitted to the range of the corresponding EARL2 limits (but not necessarily the exact upper and lower limits of EARL2).

Table 3 Upper and lower limits of RCs and their ranges in EARL2 [26]

Full size table

Besides direct comparison of the RCs to the EARL2 limits and ranges, the COVs of the RC_max, RC_mean and RC_peak from the included protocols were computed for each six differently sized hot sphere. The mean value of the COVs for all six spheres (meanCOV_max, meanCOV_mean, meanCOV_peak) was reported as a measure of the similarity of the RC values of the included protocols. In addition, the mean values of the RC_max, RC_mean and RC_peak for all sphere sizes for a single protocol (MCR, as in Ref. [29]) was computed, and the COVs of these MCRs (COVMCR_max, COVMCR_mean, COVMCR_peak) were reported as a measure of the similarity of the shape of the RC curves.

PBV and COVBG

For the routine and standard protocols and AVIs, the PBVs for each sphere diameter j were computed as the N_js in the corresponding NEMA 2018 test [11]

$${N}_{j}=\frac{{\mathrm{SD}}_{j}}{{C}_{B,j}}*100\%,$$

(1)

where C_B,j is the average value of the voxel values in the K (= 60) circular background ROIs with diameter j (10, 13, 17, 22, 27 and 37 mm) and SD_j the standard deviation of the average values of the K individual background ROIs with diameter j

$${\mathrm{SD}}_{j}=\sqrt{\sum_{k=1}^{K}{\left({C}_{B,j,k}-{C}_{B,j}\right)}^{2}/(K-1)}$$

(2)

To estimate the effect of averaging on N_j, the results from the single images were divided by the results from the AVIs for each diameter j. In addition, for each protocol the effect of the diameter was reported as the ratio of the maximum and minimum N_j.

The results from the routine protocols were also compared to the 10% limit required by the Finnish Radiation and Nuclear Safety Authority [30].

COV_BG was computed using voxel values from 60 circular background ROIs with the diameter of 37 mm specified in the NEMA IQ test [11] for the routine protocols and AVIs. The results were compared to the 15% limit used as a criterium for sufficient clinical image quality in Refs. [31, 32].

Accuracy of corrections (AOC)

As in the NEMA Accuracy of Corrections test, ΔC_lung,i was first computed for every slice i in the axial range of the lung insert of the phantom and excluding those slices nearer than 30 mm from the axial edges of the insert [11]:

$$\Delta {C}_{\mathrm{lung}, i}=\frac{{C}_{\mathrm{lung}, i}}{{C}_{B, 37\mathrm{ mm}}}*100\%$$

(3)

C_lung,i was the mean voxel value of a circular ROI with a diameter of 30 mm inside the lung insert in slice i and C_B, 37 mm the mean voxel value of the 60 background ROIs with a diameter of 37 mm (Fig. 1b). The final accuracy of corrections (AOC) was computed as a mean of ΔC_lung,I from all slices i.

Results

Background bias correction

The BBCFs varied between 0.91 and 0.98, except for protocols 6r and 6s the factors were 1.11 and 1.10, respectively (Tables 1 and 2).

Comparison of the RCs in the routine protocols

The RC_max, RC_mean and RC_peak values for routine protocols 1r–14r without background bias correction are presented in Fig. 2. The maximum ranges of the RC_max values were 0.36, 0.62, 0.68, 0.57, 0.47 and 0.49 for the sphere sizes of 10 mm, 13 mm, 17 mm, 22 mm, 28 mm and 37 mm, respectively. The corresponding ranges for the RC_mean were 0.23, 0.43, 0.47, 0.40, 0.32 and 0.30, and for the RC_peak 0.21, 0.38, 0.48, 0.47, 0.35 and 0.33. When computing the RC_max, RC_mean and RC_peak values, the biggest SD from averaging the 20 (10) individual results for each protocol and sphere size was 0.15.

The inter-scanner mean COV of the RC_max values from the routine protocols 1r–14r was 15.3%. The corresponding intra-scanner mean COVs were 1.8% and 1.6% for protocols 1r and 13r, respectively. For the RC_mean, the inter-scanner mean COV was 15.3%, and the intra-scanner mean COV was 1.4% for both protocols 1r and 13r. For the RC_peak, the corresponding inter-scanner value was 12.3% and intra-scanner values were 0.7% and 1.0%. Thus, the intra-scanner mean COVs were about 10% of the corresponding inter-scanner mean COVs and about 10% the inter-scanner variabilities could be accounted to repeatability issues of different measurement sessions.

Comparison of the routine and standard protocols to EARL2 limits

From further analysis of the RC values, routine protocols 6r, 7r and 14r and standard protocol 6s were excluded, because the BBCFs of protocols 6r and 6s were more than 10% over the nominal value and protocols 7r and 14r did not include PSF correction. For the rest of the 11 routine and standard protocols and corresponding AVIs, RC_max, RC_mean and RC_peak values are presented in Fig. 3. For the 11 routine protocols, the maximum ranges of the RC_max values were 0.30, 0.36, 0.30, 0.25, 0.21 and 0.21 for the sphere sizes of 10 mm, 13 mm, 17 mm, 22 mm, 28 mm, and 37 mm, respectively. The corresponding ranges for the RC_mean were 0.19, 0.24, 0.21, 0.15, 0.10 and 0.09, and for the RC_peak 0.12, 0.16, 0.17, 0.11, 0.06 and 0.05. The percentages of the protocols fulfilling the EARL limits for all six sphere sizes as well as separately for the individual spheres are presented in Table 4.

Table 4 The percentages of the routine and standard protocols and AVIs fulfilling the limits of the EARL2 of RC_max, RC_mean and RC_peak for all the six sphere sizes as well as separately for individual sphere sizes. In addition, the meanCOVs and COVMCRs of the different protocols are listed

Full size table

The range of the RC results for individual sphere diameters from all 11 routine or standard protocols or AVIs fitted into the range of the EARL 2 limits in almost 100% of the cases. The only exceptions were the ranges of the RC_mean values from spheres of the sizes of 13 and 17 mm in the routine protocols exceeding the corresponding EARL ranges by 5.4 and 0.7%, respectively.

The results for the meanCOVs and COVMCRs can be found in Table 4.

PBV and COV_BG

The PBV varied between 0.9 and 11.8% for the routine images and AVIs (Fig. 4a) and 0.7–9.1% for the standard images and AVIs (Fig. 4b) depending on the ROI size and averaging. The variability for the same sized ROIs was 1.8–4.5 and 2.0–4.6 times smaller in routine and standard AVIs than in the corresponding results computed from single images (routine and standard protocols), respectively, with bigger changes for smaller ROI sizes.

When computing the ratio of the maximum and minimum N_j for each protocol, the maximum value was always found for the diameter of 10 mm and the minimum for the diameter of 37 mm, as can also be observed from Fig. 4. For all the protocols including AVIs, the ratio of the maximum and the minimum values varied between 1.3 and 3.3.

Every routine protocol had N_j less than 10% for ROIs with diameter 17 mm or more. With diameter of 13 mm, one protocol exceeded slightly the 10% limit (10.6%). With the smallest diameter of 10 mm, four routine protocols exceeded the 10% limit (10.1–11.8%).

The COV_BG values for the routine protocols and AVIs varied between 9.6–17.8% and 3.0–5.6%, respectively (Fig. 5). In four routine protocols, the COV_BG exceeded 15%. If the AVI of protocol 14r with only 10 averaged images was excluded, the maximum COV_BG for the AVIs was 4.1%.

Accuracy of corrections (AOC)

The AOCs are presented in Fig. 6 for both the routine and standard protocols. For all routine protocols, the AOC varied between 4.8–32.0% with SDs of 0.5–2.4%. When excluding routine protocols without PSF correction, the maximum AOC dropped to 15.5%. For the standard images, the AOCs ranged between 2.9–12.8% with SDs of 0.5–1.7%.

Discussion

In this study, a NEMA 2018 IQ phantom permanently filled with ⁶⁸Ge was imaged in almost all Finnish PET centers, including a variety of scanner models from two major vendors. After decay correction, the phantom had the same activity concentrations in every measurement, thus excluding the uncertainty of filling the phantom separately for every measurement session. In addition, long measurement sessions with several repetitions were possible. The phantom was imaged with the routine whole-body [¹⁸F]FDG imaging protocol of each PET center, as well as with a standardized protocol if the scanner enabled PSF correction. The variability in the results of the activity concentration measurements of the small hot spheres as well as image quality parameters measurable with the NEMA IQ phantom were studied.

When using the routine protocol of each PET center, the greatest RC difference without background bias correction was 0.68 for the RC_max of the 17 mm sphere, the range being 0.70–1.38. Thus, if taking into account the intra-scanner variability of about 10%, SUV_max for a similar-sized small object could range about 60% for the routine whole-body protocols used in the Finnish PET centers due to the variability in the imaging protocols and scanner properties. As can be noticed from Fig. 2, the RC results of protocol 6r, which had a divergent BBCF from the other routine protocols, and the RC results of protocols 7r and 14r without PSF correction expectedly deviated from the rest. When excluding protocols 6s, 6r, 7r and 14r, the RC_peak values had smaller ranges than the RC_max and RC_mean in every sphere size. Similar more robust behaviour of RC_peak has also been noticed e. g. in Ref. [33].

When excluding protocols 6r, 6 s, 7r and 14r, the rest of the protocols fitted into the RC ranges of EARL2 in every sphere size, except for two minor exceptions. Majority of the spheres in different protocols also fulfilled the EARL2 upper and lower limits for an individual sphere size, but fulfilling the EARL limits for all sphere sizes of a protocol was scarcer. Thus, it seemed that the ranges of the EARL2 upper and lower RC limits for a sphere size were wide enough to include the results from properly calibrated scanners with PSF correction in the imaging protocol, without any further optimization of the imaging parameters. However, the shape of the RC curves did not necessarily match that of the EARL2 requirements, depending at least on overall averaging (imaging time), the cut-off frequency in filtering and possibly on dissimilarity of other parameters (Fig. 3). It could also be observed that the RC_peak curves were less dependent on these factors, especially on the overall averaging, than the RC_max and RC_mean curves. On the other hand, the shape of the RC_max curves was the most dependent on the overall averaging. In this study, the overshoot of RC_max values for sphere sizes 13 mm and 17 mm was not so emphasized as in the EARL2 limits.

As can be observed from Table 4, the similarity of RCs (meanCOV_max, meanCOV_mean, meanCOV_peak) as well as the shape of the RC curves (COVMCR_max, COVMCR_mean, COVMCR_peak) were improved by the standardization of the imaging parameters as well as lowering the overall noise level. Still, these changes did not necessarily improve the fulfillment of the exact RC limits defined by the EARL organization. A practical approach for reaching the required shapes of the RC curves would probably be changing the cut-off frequency in filtering during reconstruction as necessary, instead or in addition to standardization and lowering overall noise level. This approach has been suggested e. g. in Refs. [13, 14], with adjusting the cut-off frequency for SUVs on the fly without changing the visual image quality. With the meanCOV and COVMRC results, RC_peak seemed again to be the most robust measure among the RCs.

In the PBV test, most of the variability N_j seemed to be due to image noise, as the variability dropped with increasing the ROI size and was minimized in the AVIs (Fig. 4). It should be noticed, that the AVI for protocol 14r had only 10 averaged images while the others had 20, probably affecting the result. Besides image noise, the PBV may have reflected spatial variation of the noise, which can be due to the iterative reconstruction methods and corrections utilized [34]. The routine and corresponding standard protocols could not be directly compared, because the imaging time in the standard protocols was longer.

Annual measurement of PBV is also required by STUK with an acceptance level of 10% [30], although it is not specified whether the requirement concerns all sizes (j) of the background ROIs. Using low noise images (AVIs), the 10% limit could be achieved for every ROI size in every protocol used in this study.

As expected, the COV_BG values of routine protocols depended mostly on the noise, with 3–4 times smaller values when using AVIs. Part of the COV_BG values was probably due to the background variability. The parameters of routine protocols as well as the generations of scanners were quite diverse. Besides imaging time and sensitivity of the scanner, the voxel sizes (8.2–97.8 mm³), reconstruction methods and parameters and filtering had distinctive differences, which were reflected in the image noise and thus in the COV_BG results.

The COV_BG results from the routine and standard protocols could not be compared because of the different imaging times. In addition, the imaging time was the same (5 min) for every standard protocol regardless of the activity of the phantom.

EARL considers COV_BG of 15% or smaller to be an acceptable noise level for clinical image interpretation [31]. Some of the routine protocols in our study produced COV_BG values exceeding the 15% threshold, which might suggest increasing slightly imaging time in these protocols. As all exceeding results were from scanners with fixed bed positions, the overlapping region of the bed positions with smaller sensitivity may have contributed to the local noise level, as found in Refs. [17, 35].

The AOC results seemed to depend mostly on the generation of the scanner, with better results for protocols with PSF correction available, which has also been observed in other studies, e. g. in Ref. [8]. As can be seen in Fig. 6, the best results were obtained for the newest digital scanners.

There were some non-optimal protocol or phantom related issues in our study that should be noticed when reviewing the results. Although in the routine protocols the imaging and reconstruction parameters as well as the imaging time were chosen to mimic the whole body [¹⁸F]FDG protocols and timing practicalities clinically used in each individual PET center, the results of the phantom experiments cannot be directly applied to patient studies. The equivalency between count rates in patient and phantom studies cannot be claimed due to different photon flux environments [36, 38]. Moreover, data processing and corrections by a PET system may not have been fully comparable due to different isotopes (¹⁸F vs. ⁶⁸Ge, which decays through ⁶⁸ Ga) [39]. On the other hand, only small differences in RCs and IQ parameters were found using ¹⁸F and ⁶⁸ Ga in Ref. [37], and the use of ⁶⁸Ge-filled NEMA IQ phantoms for IQ assessment in multicenter clinical trials has been successfully demonstrated in Refs. [26, 27].

Due to the materials used in the phantom, exact homogeneity of the known activity concentrations in every part of the phantom could not be guaranteed. Especially possible inhomogeneities in the background activity concentration may have had an impact on the PBV and COV_BG. Instead, the possible structures and relative activity concentrations were the same in every imaging session. The results from the PBV and AOC tests could not be directly compared to results from the corresponding NEMA NU2 2018 tests, since the scatter phantom required to be placed next to the IQ phantom in the NEMA setup was not available in our measurements and the imaging time was not defined according to the NEMA standard [11].

Relating to the computation of the peak activity concentration, the volume used for averaging was bigger than the smallest hot sphere in the phantom. Thus, better PET scanner capabilities, e. g. resolution, might not have been reflected as more truthful RC_peak value of the smallest hot sphere.

In this study, no long-term information was assessed. A snapshot of the variations in RCs and IQ accumulated from different sources was obtained, and thus factors affecting stability of the results, such as drifting of an activity meter or calibration of a PET scanner [38, 39], have not been considered.

In conclusion, the largest ranges of the RC (and thus SUV) values of small hot objects due to differences in PET scanners, imaging protocols and parameters was found to be 68%, of which about 10% can be accounted to intra-scanner variability between imaging sessions. The largest ranges were found in the RC_max values. The RC ranges from properly calibrated scanners with PSF correction fitted to the EARL2 RC ranges for individual sphere sizes. However, fulfilling the exact upper and lower RC limits and especially the shape of the RC curves would have needed further optimization of the imaging parameters, e. g. cut-off frequency in filtering, in most of the image sets included in this study. RC_peak was found to be less dependent on the noise level in the image as well as on other variations in the imaging parameters than RC_max and RC_mean. Most of the RC and IQ results in this study were sensitive to image noise. Thus, if the purpose of the phantom tests were to estimate the performance of PET in clinical use, the image noise level of the clinical protocol should be preserved when choosing the imaging parameters, e. g. imaging time.

Availability of data and materials

The data analyzed during the current study are available from the corresponding author on reasonable request.

Abbreviations

AOC:: Accuracy of corrections
AVI:: Averaged image
BBFC:: Background bias correction factor
COV:: Coefficient of variation
COV_BG :: Coefficient of variation of the background voxel values
CT:: Computed tomography
EARL2:: EARL ¹⁸F standards 2 accreditation
IEC:: International Electrotechnical Commission
IQ:: Image quality
NECR:: Noise equivalent count rates
NEMA:: National Electrical Manufacturers Association
OSEM:: Ordered-subsets expectation–maximization
PBV:: Percent background variability
PET:: Positron emission tomography
PSF:: Point spread function
RC:: Recovery coefficient
ROI:: Region of interest
SUV:: Standard uptake value
TAP:: Time-activity-product
TOF:: Time of flight
VOI:: Volume of interest

References

Kinahan PE, Fletcher JW. Positron emission tomography–computed tomography standardized uptake values in clinical practice and assessing response to therapy. Semin Ultrasound CT MRI. 2010;31:496–505. https://doi.org/10.1053/j.sult.2010.10.001.
Article Google Scholar
Adams MC, Turkington TG, Wilson JM, Wong TZ. A systematic review of the factors affecting accuracy of SUV measurements. AJR. 2010;195:310–20. https://doi.org/10.2214/AJR.10.4923.
Article PubMed Google Scholar
Gallamini A, Zwarthoed C, Borra A. Positron emission tomography (PET) in oncology. Cancers. 2014;6:1821–89. https://doi.org/10.3390/cancers6041821.
Article PubMed PubMed Central Google Scholar
Treglia G, Giovanella L, editors. Evidence-based positron emission tomography. Summary of recent meta-analyses on PET. Berlin: Springer; 2020. https://doi.org/10.1007/978-3-030-47701-1.
Book Google Scholar
Weber WA. Assessing tumor response to therapy. J Nucl Med. 2009;50:1S-10S. https://doi.org/10.2967/jnumed.108.057174.
Article CAS PubMed Google Scholar
Boellaard R. Standards for PET image acquisition and quantitative data analysis. J Nucl Med. 2009;50:11S-20S. https://doi.org/10.2967/jnumed.108.057182.
Article CAS PubMed Google Scholar
Vandendriessche D, Uribe J, Bertin H, De Geeter F. Performance characteristics of silicon photomultiplier based 15-cm AFOV TOF PET/CT. EJNMMI Phys. 2019;6:8. https://doi.org/10.1186/s40658-019-0244-0.
Article PubMed PubMed Central Google Scholar
Rausch I, Cal-González J, Dapra D, Gallowitsch HJ, Lind P, Beyer T, Minear G. Performance evaluation of the Biograph mCT Flow PET/CT system according to the NEMA NU2-2012 standard. EJNMMI Phys. 2015;2:26. https://doi.org/10.1186/s40658-015-0132-1.
Article PubMed PubMed Central Google Scholar
van Sluis J, de Jong J, Schaar J, Noordzij W, van Snick P, Dierckx R, Borra R, Willemsen A, Boellaard R. Performance characteristics of the digital biograph vision PET/CT system. J Nucl Med. 2019;60(7):1031–6. https://doi.org/10.2967/jnumed.118.215418.
Article CAS PubMed Google Scholar
Hsu DFC, Ilan E, Peterson WT, Uribe J, Lubberink M, Levin CS. Studies of a next-generation silicon-photomultiplier-based time-of-flight PET/CT system. J Nucl Med. 2017;58:1511–8. https://doi.org/10.2967/jnumed.117.189514.
Article CAS PubMed Google Scholar
NEMA Standards Publication NU 2-2018. Performance measurements of positron emission tomographs (PET). National Electrical Manufacturers Association, 2018.
Kaalep A, Sera T, Oyen W, et al. EANM/EARL FDG-PET/CT accreditation: summary results from the first 200 accredited imaging systems. Eur J Nucl Med Mol Imaging. 2018;45:412–22. https://doi.org/10.1007/s00259-017-3853-7.
Article CAS PubMed Google Scholar
Ferretti A, Chondrogiannis S, Rampin L, et al. How to harmonize SUVs obtained by hybrid PET/CT scanners with and without point spread function correction. Phys Med Biol. 2018;63:235010. https://doi.org/10.1088/1361-6560/aaee27.
Article CAS PubMed Google Scholar
Quak E, et al. Harmonizing FDG PET quantification while maintaining optimal lesion detection: prospective multicentre validation in 517 oncology patients. Eur J Nucl Med Mol Imaging. 2015;42:2072–82. https://doi.org/10.1007/s00259-015-3128-0.
Article PubMed PubMed Central Google Scholar
Tsutsui Y, Daisaki H, Akamatsu G, Umeda T, Ogawa M, Kajiwara H, et al. Multicentre analysis of PET SUV using vendor-neutral sofware: the Japanese Harmonization Technology (J-Hart) study. EJNMMI Res. 2018;8:83. https://doi.org/10.1186/s13550-018-0438-9.
Article CAS PubMed PubMed Central Google Scholar
Orlhac F, Boughdad S, Philippe C, Stalla-Bourdillon H, Nioche C, Champion L, Soussan M, Frouin F, Frouin V, Buvat I. A postreconstruction harmonization method for multicenter radiomic studies in PET. J Nucl Med. 2019;59:1321–8. https://doi.org/10.2967/jnumed.117.199935.
Article CAS Google Scholar
Boellaard R, Delgado-Bolton R, Oyen WJG, et al. FDG PET/CT: EANM procedure guidelines for tumour imaging: version 2.0. Eur J Nucl Med Mol Imaging. 2015;42:328–54. https://doi.org/10.1007/s00259-014-2961-x.
Article CAS PubMed Google Scholar
Huizing, et al. Multicentre quantitative ⁶⁸Ga PET/CT performance harmonization. EJNMMI Phys. 2019;6:19. https://doi.org/10.1186/s40658-019-0253-z.
Article PubMed PubMed Central Google Scholar
Boellaard R, Krak NC, Hoekstra OS, Lammertsma AA. Effects of noise, image resolution, and ROI definition on the accuracy of standard uptake values: a simulation study. J Nucl Med. 2004;45:1519–27.
PubMed Google Scholar
Gnesin S, Kieffer C, Zeimpekis K, Papazyan JP, Guignard R, Prior JO, Verdun FR, Lima TVM. Phantom-based image quality assessment of clinical ¹⁸F-FDG protocols in digital PET/CT and comparison to conventional PMT-based PET/CT. EJNMMI Physics. 2020;7:1. https://doi.org/10.1186/s40658-019-0269-4.
Article PubMed PubMed Central Google Scholar
Chang T, Chang G, Clark JW, Diab RH, Rohren E, Mawlawi OR. Reliability of predicting image signal-to-noise ratio using noise equivalent count rate in PET imaging. Med Phys. 2012;39:5891–900. https://doi.org/10.1118/1.4750053.
Article PubMed PubMed Central Google Scholar
Carlier T, Ferrer L, Necib H, Bodet-Milin C, Rousseau C, Kraeber-Bodéré F. Clinical NECR in 18F-FDG PET scans: optimization of injected activity and variable acquisition time. Relationship with SNR. Phys Med Biol. 2014;59:6417–30. https://doi.org/10.1088/0031-9155/59/21/6417.
Article CAS PubMed Google Scholar
Reynés-Llompart G, Sabaté-Llobera A, Llinares-Tello E, et al. Image quality evaluation in a modern PET system: impact of new reconstructions methods and a radiomics approach. Sci Rep. 2019;9:10640. https://doi.org/10.1038/s41598-019-46937-8.
Article CAS PubMed PubMed Central Google Scholar
Doot RK, Scheuermann JS, Christian PE, Karp JS, Kinahan PE. Instrumentation factors affecting variance and bias of quantifying tracer uptake with PET/CT. Med Phys. 2010;37:6035–46. https://doi.org/10.1118/1.3499298.
Article CAS PubMed PubMed Central Google Scholar
Tong S, Alessio AM, Kinahan PE. Noise and signal properties in PSF-based fully 3D PET image reconstruction: an experimental evaluation. Phys Med Biol. 2010;55:1453–73. https://doi.org/10.1088/0031-9155/55/5/013.
Article CAS PubMed PubMed Central Google Scholar
Vallot D, De Ponti E, Morzenti S, et al. Evaluation of PET quantitation accuracy among multiple discovery IQ PET/CT systems via NEMA image quality test. EJNMMI Phys. 2020;7:30. https://doi.org/10.1186/s40658-020-00294-y.
Article PubMed PubMed Central Google Scholar
Chauvie S, Bergesio F, Fioroni F, et al. The (68)Ge phantom-based FDG-PET site qualification program for clinical trials adopted by FIL (Italian Foundation on Lymphoma). Phys Med. 2016;32:651–6. https://doi.org/10.1016/j.ejmp.2016.04.004.
Article PubMed Google Scholar
¹⁸F Accreditation Specifications. In: Accreditation. EARL. https://earl.eanm.org/accreditation-specifications/. Accessed 20 April 2022.
Kaalep A, et al. Feasibility of state of the art PET/CT systems performance harmonisation. Eur J Nucl Med Mol Imaging. 2018;45:1344–61. https://doi.org/10.1007/s00259-018-3977-4.
Article PubMed PubMed Central Google Scholar
STUK. Säteilyturvakeskuksen määräys säteilylähteiden käytönaikaisesta säteilyturvallisuudesta ja säteilylähteiden ja käyttötilojen poistamisesta käytöstä. Määräys STUK S/5/2019, 2019. https://www.stuklex.fi/fi/maarays/stuk-s-5-2019. Accessed 16 Jan 2020.
Boellaard R, Willemsen AT, Arends B, Visser EP. EARL FDG PET/CT optimization procedure: EARL procedure for assessing PET/CT system specific patient FDG activity preparations for quantitative FDG PET/CT studies. In: Accreditation, Guidelines and Publications. EARL. https://earl.eanm.org/guidelines-and-publications/. Accessed 20 April 2022.
Gnesin S, Kieffer C, Zeimpekis K, et al. Phantom-based image quality assessment of clinical 18F-FDG protocols in digital PET/CT and comparison to conventional PMT-based PET/CT. EJNMMI Phys. 2020;7:1. https://doi.org/10.1186/s40658-019-0269-4.
Article PubMed PubMed Central Google Scholar
Lodge MA, Chaudry MA, Wahl RL. Noise considerations for PET quantification using maximum and peak standardized uptake value. J Nucl Med. 2021;53:1041–7. https://doi.org/10.2967/jnumed.111.101733.
Article CAS Google Scholar
Kueng R, Driscoll B, Manser P, Fix MK, Stampanoni M, Keller H. Quantification of local image noise variation in PET images for standardization of noise-dependent analysis metrics. Biomed Phys Eng Express. 2017;3:025007. https://doi.org/10.1088/2057-1976/3/2/025007.
Article Google Scholar
McKeown C, Gillen G, Dempsey MF, Findlay C. Influence of slice overlap on positron emission tomography image quality. Phys Med Biol. 2016;61:1259–77. https://doi.org/10.1088/0031-9155/61/3/1259.
Article CAS PubMed Google Scholar
Watson CC, Casey ME, Beyer T, Bruckbauer T, Townsend DW, Brasse D. Evaluation of clinical PET count rate performance. IEEE Trans Nucl Sci. 2003;50:1379–85. https://doi.org/10.1109/TNS.2003.817314.
Article Google Scholar
Soderlund AT, Chaal J, Tjio G, Totman JJ, Conti M, Townsend DW. Beyond 18F-FDG: characterization of PET/CT and PET/MR scanners for a comprehensive set of positron emitters of growing application—18F, 11C, 89Zr, 124I, 68Ga, and 90Y. J Nucl Med. 2015;56:1285–91. https://doi.org/10.2967/jnumed.115.156711.
Article CAS PubMed Google Scholar
Byrd D, Christopfel R, Arabasz G et al. Measuring temporal stability of positron emission tomography standardized uptake value bias using long-lived sources in a multicenter network. J Med Imaging (Bellingham). 2018;5:011016. https://doi.org/10.1117/1.JMI.5.1.011016. Erratum in: J Med Imaging (Bellingham). 2019;6:019801.
Doot RK, Pierce LA 2nd, Byrd D, et al. Biases in multicenter longitudinal PET standardized uptake value measurements. Transl Oncol. 2014;7:48–54. https://doi.org/10.1593/tlo.13850.
Article PubMed PubMed Central Google Scholar

Download references

Acknowledgements

Not applicable.

Funding

Open Access funding provided by University of Helsinki including Helsinki University Central Hospital.

Author information

Authors and Affiliations

HUS Diagnostic Center, Clinical Physiology and Nuclear Medicine, Helsinki University Hospital and University of Helsinki, P. O. Box 442, 00029, Helsinki, Finland
O. Sipilä, H.-L. Halme & K. Tahvanainen
Radiation and Nuclear Safety Authority, Vantaa, Finland
J. Liukkonen
Turku PET Centre, Turku University Hospital, Turku, Finland
T. Tolvanen
Department of Nuclear Medicine, Päijät-Häme Central Hospital, Lahti, Finland
A. Sohlberg
Department of Clinical Physiology and Nuclear Medicine, Diagnostic Imaging Center, Kuopio University Hospital, Kuopio, Finland
M. Hakulinen
Department of Applied Physics, University of Eastern Finland, Kuopio, Finland
M. Hakulinen
OYS Department of Nuclear Medicine and Radiology, Oulu University Hospital, Oulu, Finland
A.-L. Manninen
Medical Research Center Oulu, Oulu University Hospital and University of Oulu, Oulu, Finland
A.-L. Manninen
Department of Clinical Physiology and Nuclear Medicine, Satakunta Central Hospital, Pori, Finland
V. Tunninen
Clinical Physiology and Neurophysiology, North Karelia Central Hospital, Joensuu, Finland
T. Ollikainen
Department of Clinical Physiology and Nuclear Medicine, Vaasa Central Hospital, Wellbeing Services County of Ostrobothnia, Vaasa, Finland
T. Kangasmaa
Department of Imaging and Radiotherapy, Docrates Cancer Center, Helsinki, Finland
A. Kangasmäki
Clinical Physiology and Nuclear Medicine, Central Finland Health Care District, Jyväskylä, Finland
J. Vuorela

Authors

O. Sipilä
View author publications
You can also search for this author in PubMed Google Scholar
J. Liukkonen
View author publications
You can also search for this author in PubMed Google Scholar
H.-L. Halme
View author publications
You can also search for this author in PubMed Google Scholar
T. Tolvanen
View author publications
You can also search for this author in PubMed Google Scholar
A. Sohlberg
View author publications
You can also search for this author in PubMed Google Scholar
M. Hakulinen
View author publications
You can also search for this author in PubMed Google Scholar
A.-L. Manninen
View author publications
You can also search for this author in PubMed Google Scholar
K. Tahvanainen
View author publications
You can also search for this author in PubMed Google Scholar
V. Tunninen
View author publications
You can also search for this author in PubMed Google Scholar
T. Ollikainen
View author publications
You can also search for this author in PubMed Google Scholar
T. Kangasmaa
View author publications
You can also search for this author in PubMed Google Scholar
A. Kangasmäki
View author publications
You can also search for this author in PubMed Google Scholar
J. Vuorela
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

OS, JL, HLH and TT contributed to the study conception and design. Material preparation and data collection were performed by all authors. OS and HLH contributed to the software scripts used for data analysis, which was performed by OS. The first draft of the manuscript was written by OS and all authors commented on previous versions of the manuscript. All authors read and approved the final manuscript except KT, who has read and approved with few minor suggestions the second last version of this manuscript.

Corresponding author

Correspondence to O. Sipilä.

Ethics declarations

Ethics approval and consent to participate

Not applicable.

Consent for publication

Not applicable.

Competing interests

Not applicable.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

K. Tahvanainen: deceased during the final preparation of the manuscript.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Sipilä, O., Liukkonen, J., Halme, HL. et al. Variability in PET image quality and quantification measured with a permanently filled ⁶⁸Ge-phantom: a multi-center study. EJNMMI Phys 10, 38 (2023). https://doi.org/10.1186/s40658-023-00551-w

Download citation

Received: 29 December 2022
Accepted: 15 May 2023
Published: 16 June 2023
DOI: https://doi.org/10.1186/s40658-023-00551-w

Variability in PET image quality and quantification measured with a permanently filled 68Ge-phantom: a multi-center study

Abstract

Background

Methods

Results

Conclusion

Background

Material and methods

Phantom imaging

Routine protocols

Standard protocols

Data processing

Data sets

Coefficient of variation (COV)

Background bias correction

Recovery coefficients (RCs)

Comparison of the RCs in the routine protocols

Comparison of the routine and standard protocols to EARL2 limits

PBV and COVBG

Accuracy of corrections (AOC)

Results

Background bias correction

Comparison of the RCs in the routine protocols

Comparison of the routine and standard protocols to EARL2 limits

PBV and COVBG

Accuracy of corrections (AOC)

Discussion

Availability of data and materials

Abbreviations

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Ethics approval and consent to participate

Consent for publication

Competing interests

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Variability in PET image quality and quantification measured with a permanently filled ⁶⁸Ge-phantom: a multi-center study

PBV and COV_BG