Phantom-based image quality assessment of clinical 18F-FDG protocols in digital PET/CT and comparison to conventional PMT-based PET/CT

Background We assessed and compared image quality obtained with clinical 18F-FDG whole-body oncologic PET protocols used in three different, state-of-the-art digital PET/CT and two conventional PMT-based PET/CT devices. Our goal was to evaluate an improved trade-off between administered activity (patient dose exposure/signal-to-noise ratio) and acquisition time (patient comfort) while preserving diagnostic information achievable with the recently introduced digital detector technology compared to previous analogue PET technology. Methods We performed list-mode (LM) PET acquisitions using a NEMA/IEC NU2 phantom, with activity concentrations of 5 kBq/mL and 25 kBq/mL for the background (9.5 L) and sphere inserts, respectively. For each device, reconstructions were obtained varying the image statistics (10, 30, 60, 90, 120, 180, and 300 s from LM data) and the number of iterations (range 1 to 10) in addition to the employed local clinical protocol setup. We measured for each reconstructed dataset: the quantitative cross-calibration, the image noise on the uniform background assessed by the coefficient of variation (COV), and the recovery coefficients (RCs) evaluated in the hot spheres. Additionally, we compared the characteristic time-activity-product (TAP) that is the product of scan time per bed position × mass-activity administered (in min·MBq/kg) across datasets. Results Good system cross-calibration was obtained for all tested datasets with < 6% deviation from the expected value was observed. For all clinical protocol settings, image noise was compatible with clinical interpretation (COV < 15%). Digital PET showed an improved background signal-to-noise ratio as compared to conventional PMT-based PET. RCs were comparable between digital and PMT-based PET datasets. Compared to PMT-based PET, digital systems provided comparable image quality with lower TAP (from ~ 40% less and up to 70% less). Conclusions This study compared the achievable clinical image quality in three state-of-the-art digital PET/CT devices (from different vendors) as well as in two conventional PMT-based PET. Reported results show that a comparable image quality is achievable with a TAP reduction of ~ 40% in digital PET. This could lead to a significant reduction of the administered mass-activity and/or scan time with direct benefits in terms of dose exposure and patient comfort.


Background
Positron emission tomography (PET) coupled with computed tomography (CT) is an established quantitative imaging technique playing a key role in clinical oncology [1,2]. In particular, quantitative or semi-quantitative 18 F-FDG-PET/CT examinations cover a large part of PET indications, such as oncological, cardiac, and neurological imaging [3][4][5].
To guide clinical protocol validation and optimization, reference methodologies make use of phantoms with known geometry and activity preparation, representing a reasonable approximation of patient morphology and activity distribution [6]. To reproduce patient-relevant conditions, and to assess the signal recovery in small structures, the National Electrical Manufacturers Association (NEMA)/ International Electrotechnical Commission (IEC) NU2 phantom is currently a standard reference [7]. Phantoms with even more anthropomorphic shape also exist but they have not been widely tested so far and lack in standardization [8].
In the last decade, the clinical introduction of time of flight (TOF) technology and the point spread function (PSF) correction have substantially enhanced the achievable image quality [9][10][11][12].
In high-end commercial PET/CT devices, conventional analogue photomultipliers (PM) are replaced by the solid-state technology aiming to improve time resolution, event collection (consequently improving system sensitivity), localization, and counting efficiency [13].
In this evolving scenario, standardization and harmonization of 18 F-FDG-PET protocols are essential to promote inter-machine and multi-center PET studies. Accordingly, image protocols have been proposed to satisfy the European Association of Nuclear Medicine (EANM)/Research 4 Life (EARL) recommendations [14,15]. However, present EANM/EARL recommendations were derived for analogue PET systems and will undoubtedly be updated in the future to account for performances available in digital PET [16].
To the best of our knowledge, the image quality obtained with the three recently available commercial digital PET/CT devices using clinical whole-body oncologic 18 F-FDG protocols have not been measured, characterized and compared yet in a single publication. Furthermore, the clinical image quality obtained with digital PET devices has not been extensively compared with analogue PET devices in a controlled and standardized approach.
Our aim was to present, characterize, and compare clinical implementation of 18 F-FDG oncologic PET protocols across different PET technologies (digital vs. analogue). Accordingly, we performed NEMA/IEC NU2 phantom acquisitions on three recently installed digital TOF PET/CT systems (three different vendors) and compared the obtained results with the measurement performed in two analogue TOF PET/CT.
In addition, we also compared the signal recovery obtained in hot sphere inserts of the NEMA/IEC NU2 phantom with present EAMN/EARL recommendations [17].
The phantom's main volume (background) of 9.5 L mimics the human abdominal shape. It includes six spherical inserts with diameters of: 10,13,17,22,18, and 37 mm, respectively, and a lung insert (5-mm diameter and 16-cm long cylinder filled with plastic material mimiking the lung density of 0.3 g/mL) positioned in the center of the phantom to reproduce lung tissue attenuation.
The phantom was filled with a background activity concentration of 5 kBq/mL and an activity concentration five times higher (25 kBq/mL) in the spherical inserts. The background activity concentration reproduced the average hepatic activity concentration measured in patients occurring 18 F-FDG oncologic PET 1 h after administration of a mass-activity of 3.5 MBq/kg, corresponding to the recommended dose reference level in Switzerland at the time of this study for this specific examination [22]. For each phantom experiment, on each tested PET/CT device, the net background activity concentration at the time of the image acquisition start was calculated from the net total activity injected in the known background volume.

Clinical acquisition/reconstruction parameters
We performed step-and-shoot, single-bed, 300 s long list-mode (LM) PET acquisitions of the phantom in five PET centers in Switzerland. The phantom was placed on the PET bed with the equatorial plane of the spherical inserts at the center of the device field-of-view where the system sensitivity is expected to be maximal.
The LM data were reconstructed according to the local clinical protocol used for whole-body oncologic 18 F-FDG PET examinations reported in Table 1.
To investigate the influence of the image statistics, additional reconstructions were performed using time subsets of 10, 30, 60, 120, and 180 s obtained from the original 300 s long LM data.
Supplementary reconstructions were performed by varying the number of iterations from 1 to 10 to characterize the evolution of the signal recovery in background and spheres. Pertinent image corrections (normalization, dead time, activity decay, random coincidence, attenuation, and scatter corrections) were applied.
Some clinical reconstruction protocols do not use image smoothing. Therefore, to aid the comparison of image quality across tested devices, when applicable, image reconstruction without smoothing was also performed.
All devices used ordered subset expectation maximization (OSEM) based iterative reconstruction algorithm based on an iterations × subsets setup. Additionally, Discovery-MI's data was also reconstructed with the Q.Clear algorithm [23] to correctly represent the local clinical practice. The Q.Clear reconstruction algorithm is a block sequential regularized EM algorithm with a single relaxation parameter and is not directly comparable with other algorithms in terms of the number of iterative updates.
In this study, we used the time (min) × mass activity (MBq/kg) product (TAP) as a metric for protocol characterization. Table 1 also reports the TAP characteristic of each PET protocol tested. This parameter reflects the emission signal available for a given PET acquisition resulting from the product of the scan duration and the specific injected activity, two key parameters defining a clinical implementation of a PET procedure.
It is worth noting that different image matrices, different field of view (FOV) sizes, and therefore different pixel sizes were used across tested image protocols and PET devices.

Background characterization
The PET-to-local dose calibrator cross-calibration (BG cal ) was tested by calculating the ratio between the measured ð A c;bg Þand expected average activity concentration (A c,bg ) evaluated in the homogeneous phantom background: BG cal ¼ A c;bg A c;bg A c;bg was the average activity concentration obtained by averaging the signal from the voxels contained in four cubic regions of interest (side of 40 mm) placed in the homogeneous background region surrounding the spheres. We consider as acceptable a deviation of < 0.1 from the ideal BG cal = 1. The coefficient of variation (COV) used for image noise assessment was defined by the ratio between the standard deviation (SD bg ) over all the voxels contained in the four cubic background VOIs and A c;bg : The background signal-to-noise ratio (SNR) is the reciprocal of the COV. We considered a COV ≤ 15% (background SNR ≥ 6.7) as an acceptable noise level for clinical image interpretation as suggested in the EARL procedure [24]; even if this value is somehow arbitrary, it has already been used as a reference value in previously published works [14,25,26], which enables a term of comparison for 18 F-FDG PET image quality assessments. COV as a function of TAP was assessed to investigate possible margins of optimization in terms of administered activity and/or scan time duration.
The COV for different values of TAP obtained by phantom experiments and TAP values for a COV = 15% were calculated by linear interpolation between neighboring measured values.
PET protocol setups were characterized by their specific TAP value. In particular, we reported and compared TAP obtained with clinical setup (TAP clinic ) and TAP obtained for a matched image noise level by considering a COV = 15% (TAP COV-15 ).

Spheres characterization
A cubic volume of interest (VOI), side of 50 mm, was centered on each spherical insert (j = 1,..,6) of the NEMA/IEC NU2 phantom. Maximum and background-adapted recovery coefficients (RC) were obtained as follows: where A c,sph is the expected activity concentration in the spheres, a c,sph,j,max is the measured maximum voxel value (in Bq/mL) for a given spherical insert. a c,sph,j,A50 is the average voxel value in each hot insert VOI defined by a 3D iso-contour adapted for background as defined in [27] and recommended by the EANM Guidelines for FDG tumor PET imaging [28]. RCs were compared with reference values provided by the EANM/EARL accreditation protocol [17]. We tested the robustness RC max and RC A50 as a function of time per bed position by comparing the measured values to the reference value obtained for the 300 s long acquisition. Additional spherical VOIs, matching the actual insert volume, were segmented on the co-registered CT, to derive mean RCs: Convergence of signal recovery in spheres (j = 1,…,6) as a function of the number of iterative updates (UPD = iteration × subsets) was studied using the normalized value of RC mean : RC j;mean;N ðUPDÞ ¼ RC j;mean ðUPDÞ max UPD ðRC j;mean Þ where max UPD (RC j,mean ) is the maximum RC mean value obtained for a given sphere (j) across the tested number of updates.
Image segmentation on PET data was performed using the PMOD (release 3.903) software (PMOD Technologies Ltd., Zurich, Switzerland).
Transaxial views across the equatorial plane of spherical inserts of the NEMA/IEC phantom, obtained for the tested clinical setups, are reported.

Phantom experiment preparation
Parameters describing the experimental phantom preparation at the start of the PET acquisitions across the five tested PET devices are listed in Table 2.

Background characterization
The system cross-calibration (BG cal ) as a function of the acquired statistics (by varying the time per bed position at matched total activity in the phantom) and the number of iterations used in the iterative reconstruction process, for the tested acquisition and reconstruction setups, is shown in Fig. 1.
Measured COV values are reported in Fig. 2. The dashed black line indicates a 15% COV level (SNR = 6.7) used as an upper threshold defining an acceptable level of noise for clinical image interpretation. All tested clinical PET setups (Table 1) are characterized by a COV ≤ 15%. Figure 3 shows COV as a function of the TAP parameter. All clinical tested setups were characterized by a COV close to 15%. COV values corresponding to local clinical TAP and TAP values for a COV = 15% (TAP COV-15 ) are reported in Table 3.
Among the tested PET FDG protocols, two different image matrix sizes were used clinically with the Philips Vereos: 144 × 144 and 288 × 288, respectively. The TOF listmode reconstruction [29,30] leading to the thinner image discretization was characterized by a higher noise level: COV = 19% vs. 13.2% (clinical TAP of 3 min × MBq/kg). For a given device and same acquisition setups, lower COV levels were obtained using Gaussian image smoothing compared to not. Across clinical protocol setups, only the Vereos with the 288 × 288 image matrix had a clinical TAP lower than the TAP value corresponding to a 15% COV level (3 min × MBq/kg vs. 4.5 min × MBq/kg). The averaged Signal recovery in spheres Figure 4 shows RC max and RC A50 values as a function of increasing sphere size for the PET setups tested using clinical reconstruction parameters (iterations × subsets and acquisition time) regardless of the image smoothing. The convergence of the signal recovery in spheres of different sizes obtained as a function of the number of iterative updates is shown in Additional file 1: Figures S1 and S2.
As reported in Table 4, the normalized value of the RC mean for a sphere of 10 mm (smaller size) and 17 mm (medium size) in diameter are respectively at least 89% and 95% of the maximum RC mean values for the number of iterative updates used in clinical reconstruction setups. An improved convergence was measured for larger spheres.
The robustness of RC max and RC A50 according to the PET scan length was assessed for decreasing scan times (Additional file 1: Figure S2). Tested setups showed RCs to be stable (less than 15% variation compared to the reference value obtained for the 300-s bed acquisition scan time) for time per bed position ≥ 60 s.
Transaxial views across the equatorial plane of the spherical inserts of the NEMA/ IEC phantom, obtained for the tested clinical setups, are displayed in Fig. 5.

Discussion
This study was the result of a collaboration among five PET centers in Switzerland. Data were collected from five different PET/CT devices: three recently installed (2017-2018) To the best of our knowledge, this is the first study comparing image quality from the three currently-available digital PET with those of the previous analogue generation. Although, absolute system performances have been compared elsewhere in the literature [19], the use of different acquisition and reconstruction parameters (ex. image matrix and pixel size, number of iterative updates), and the use of vendor-specific reconstruction algorithms make it difficult to disentangle the specific contribution of each parameters to the final image quality.
This study aimed to investigate and characterize the image quality of clinical wholebody oncologic 18 F-FDG protocols. All tested setups included TOF information and  Improved TOF capabilities and system sensitivity have been measured and reported in recent publications [13,19,31]. In particular, the gain in system sensitivity resulted from the interplay of the new digital technology coupled with the adoption of an improved axial extension of the PET detector by some of the available models.
We based our study on PET acquisitions and reconstruction of a NEMA/IEC NU2 body phantom, which is a standard in PET image quality assessments. The phantom   [17] was prepared with a good reproducibility across centers as reported in Table 2. PET datasets were obtained by varying the number of iterations to verify signal recovery convergence and the scan acquisition time to verify image quality stability as a function of collected statistics. To remove the influence of PVE effect due to image smoothing, we produced PET data without post-reconstruction smoothing when the local reconstruction setup was adopting it. All tested devices and reconstruction setups demonstrate a good cross-calibration with the local dose calibrator. Deviations from BG cal = 1 were always less than 6% regardless of the time per bed position (10-300 s) and the number of iterations (1 to 10). Quantitative bias increased at low count density (as visible in Fig. 1); this behavior was already documented and characterized in the literature [32][33][34]. Furthermore, the bias observed at low count density was found to have a trend for lower levels when a listmode based reconstruction was used (Vereos system) while this trend was to higher values when the reconstruction methods were based on sinograms. This behavior was also described in the literature in conditions of low count statistics such as 90 Y PET [35], PET for ion-beam therapy monitoring [36], and low-dose 18 F-FDG PET [37].
As expected, image noise increased with the number of iterative reconstruction updates (Fig. 2a, c). In the tested conditions, digital PET systems exhibited a lower noise level compared to analogue PET systems. This was more evident when comparing reconstruction setups without Gaussian filtering (Fig. 2c, d). This feature may be potentially attributed to the synergistic improved system sensitivity and TOF performances of digital devices compared to analogue PET. Image noise as a function of the acquired statistics (Fig. 2b, d), also confirmed the superior noise properties of digital PET images vs. analogue devices. In particular, we used a 15% COV level as a reference maximum noise level for clinical evaluation as suggested in the literature [14,25,27]. We found all tested clinical protocol setups characterized by a COV ≤ 15% for the adopted experimental setup representing a massactivity administration of 3.5 MBq/kg. We reported the COV as a function of the TAP (Fig. 3). TAP values characteristic of local clinical image protocols (TAP clinic , summarized in Tables 1 and 3)   Table 4 Normalized RC mean for the number for iterative updates used in clinical reconstruction setups (and maximum RC mean values) obtained for the smallest sphere insert (diameter of 10 mm) and a medium size insert (diameter of 17 mm) characteristic of tested PET FDG procedures. Reconstruction protocol setups used in the clinic are labeled with (c) PET device/recon. procedure Clinic setup, It × ss = UPD RC mean,N (max UPD (RC mean )), sphere 10 mm resulted in COV close to 15% (range 9-19%). Based on this result, we can deduce that the tested setups satisfy the requirements for clinical interpretation. Nevertheless, the assumed reference limit, COV = 15%, is somehow arbitrary; therefore, the particular image pattern, signal recovery in lesions, and different clinical experience between sites and devices would motivate possibly different optimal COV values for clinical image evaluation. It is also worth remarking that COV alone does not represent the most significant metric for comparing image quality across devices and protocol setups, since this parameter depends not only on overall device performance and reconstruction parameters but also on the injected specific massactivity and the adopted scan time duration per bed position. For this reason, we adopted the TAP COV-15 as a term of comparison between different technologies. The COV obtained at clinical TAP, however, was reported to characterize the different clinical protocols. Our results confirm lower TAP clinic (range 3-4 min·MBq/kg) and are currently used with digital PET devices compared to the tested analogue PET (TAP clinic = 5.25 min × MBq/kg or higher). On average, a 40% TAP reduction was reported in clinical configurations in favor of digital PET.
When considering TAP COV-15 , if we exclude the mCT device (thus considering it an outlier), analogue systems are represented by the only value of 3.7 min × MBq/ We should also consider that the clinical setup adopted for the Discovery 690 includes a Gaussian smoothing (FWHM = 5 mm) that helps reducing TAP COV-15 values, while, excluding the Discovery MI (M256, FWHM = 6.4 mm) setup, all other clinical setups adopted in digital devices did not used Gaussian smoothing.
The differential improvement of new systems is even more evident when comparing similar setups without the use of Gaussian smoothing. For instance, according to data reported in Table 3, comparing the Discovery 690 with the Discovery-MI that used the same image matrix (256 × 256) and iterations × subsets (3 × 16), the TAP COV-15 was 13.2 min × MBq/kg and 3.9 min × MBq/kg, respectively, corresponding to a 70% TAP-COV-15 reduction in favor of the digital PET system. This translates to a lower massactivity administration and/or shorter scan times at matched image noise levels. Accordingly, patient comfort (at matched image quality) can be improved and/or dose exposure reduction can be achieved as discussed in the recent clinic works of Behr et al. [38] and Van Sluis et al. [39].
RC max and RC A50 (Fig. 4) higher than the present reference EANM/EARL levels were commonly obtained in all clinical protocol setups tested. These values are typical of PET reconstructions adopting TOF and PSF corrections [26].
The EARL proposed a target range of RCs to promote inter-device and inter-center comparison of quantitative PET data. This is not always the purpose in local clinical setups. Most often, the local clinical demand favor image contrast and spatial resolution (reduced PVE) with resulting higher RCs values compared to the proposed EARL range.
Lower RC levels were observed for the clinical setups in the Discovery MI, as a consequence of the 6.4-mm Gaussian filter applied and for the Vereos system adopting the 144 × 144 matrix size which results in large voxels with consequent large PVE in small structures. By definition, RC max and RC A50 depend on the voxel with the maximum value and are intrinsically sensitive to the image noise level. Accordingly, we observed they increased with the number of iterations and decreased image statistics (Additional file 1: Figures S3 and S4) especially for reconstruction protocols without image smoothing. For the tested conditions, an important deviation of RC max and RC A50 (higher RCs) can arise for scan times shorter than 60 s.
A normalized RC mean was used to test the signal recovery convergence as a function of the number of iterative updates. Across the tested clinic protocol setups, a reasonable level of convergence (RC mean,N ≥ 89%) was obtained even for the smallest spherical insert (10 mm in diameter). In particular, there are two systems exhibiting a faster convergence rate: the Biograph Vision and the Vereos, the first probably due to the best TOF timing resolution (214 ps), and the latter probably thanks to the favorable convergence properties of the blob-based OSEM iterative reconstruction algorithm [40], having a TOF time resolution of 316 ps, an intermediate value compared to the Siemens and the GE digital systems. We also observed faster convergence for reconstruction setups adopting Gaussian smoothing compared to reconstruction without smoothing. This behavior can be attributed to the peculiarity of image smoothing in reducing high spatial frequency (typical of small structures) that are known to require more iterations to converge when compared to lower spatial frequency (characterizing large structures) that is also the reasons why this behavior is more evident for the spheres of smaller size.
Compared to the tested OSEM iterative reconstruction setups, the Q.Clear implemented in the Discovery-MI PET/CT showed (at least) comparable performances. This reconstruction method indeed, guarantees a good level of signal recovery coupled with favorable noise properties. It was not our goal in this work to systematically characterize the Q.Clear algorithm, something which has been discussed elsewhere in the literature [24,40].
Concerning the signal recovery performances, we did not observe major differences between conventional PMT-based PET and recently introduced digital PET devices (all reconstructions used PSF correction). The work of Kaalep et al. [26] pointed out the convenience of adopting a new range of signal recovery coefficients that thanks to the inclusion of PSF all PET devices can achieve. Kaalep et al. tested analogue PET devices, but in light of the results presented in our work, their methodology and results are in principle transferable to recently available digital PET.
We also noticed that, thanks to the improved system sensitivity and TOF capabilities, clinical protocols implemented in digital PET devices tend to avoid image smoothing. This fact coupled with the use of a relatively small voxel size (ex. 1.65 × 1.65 × 2 mm 3 for the tested Biograph Vision device) can help reducing partial volume effects. Consequently, based on our results, we expect at a matched activity distribution present across the device FOV, and at a matched acquisition time duration, the digital PET potentially provide higher contrast-to-noise ratios, thus possibly improving lesions detection and quantitative accuracy.

Limitations
Despite the limited number of tested PET devices, digital PET (n = 3) and analogue PET (n = 2), we found reasonable indications on the potential of operating digital devices at lower TAP compared to conventional analogue ones at matched image quality. Furthermore, matched image quality was achievable (for instance COV = 15%, as used in our study) in digital PET without applying additional image smoothing and/or using smaller voxel size with potential benefit in reducing PVE.

Conclusion
This work is the result of the collaboration of different PET centers in Switzerland and was, to the best of our knowledge, the first study comparing the image quality obtained for clinical whole-body oncologic 18 F-FDG PET protocols using the three recently introduced digital PET devices. We further extended the comparison to two analogue PET devices equipped with conventional PMTs. The methodology, based on a wellcharacterized NEMA/IEC NU2 phantom, highlighted the improved signal-to-noise ratios achievable with the new digital PET devices compared to conventional ones. With appropriate protocol optimization in terms of acquisition and reconstruction parameters, we found that sensible improvements in patient comfort (reduced scan time for the same matched image quality) and/or dose exposure (reduced administered activity) are achievable.