Impact of point spread function modelling and time of flight on FDG uptake measurements in lung lesions using alternative filtering strategies

Background The use of maximum standardised uptake value (SUVmax) is commonplace in oncology positron emission tomography (PET). Point spread function (PSF) modelling and time-of-flight (TOF) reconstructions have a significant impact on SUVmax, presenting a challenge for centres with defined protocols for lesion classification based on SUVmax thresholds. This has perhaps led to the slow adoption of these reconstructions. This work evaluated the impact of PSF and/or TOF reconstructions on SUVmax, SUVpeak and total lesion glycolysis (TLG) under two different schemes of post-filtering. Methods Post-filters to match voxel variance or SUVmax were determined using a NEMA NU-2 phantom. Images from 68 consecutive lung cancer patients were reconstructed with the standard iterative algorithm along with TOF; PSF modelling - Siemens HD·PET (HD); and combined PSF modelling and TOF - Siemens ultraHD·PET (UHD) with the two post-filter sets. SUVmax, SUVpeak, TLG and signal-to-noise ratio of tumour relative to liver (SNR(T-L)) were measured in 74 lesions for each reconstruction. Relative differences in uptake measures were calculated, and the clinical impact of any changes was assessed using published guidelines and local practice. Results When matching voxel variance, SUVmax increased substantially (mean increase +32% and +49% for HD and UHD, respectively), potentially impacting outcome in the majority of patients. Increases in SUVpeak were less notable (mean increase +17% and +23% for HD and UHD, respectively). Increases with TOF alone were far less for both measures. Mean changes to TLG were <10% for all algorithms for either set of post-filters. SNR(T-L) were greater than ordered subset expectation maximisation (OSEM) in all reconstructions using both post-filtering sets. Conclusions Matching image voxel variance with PSF and/or TOF reconstructions, particularly with PSF modelling and in small lesions, resulted in considerable increases in SUVmax, inhibiting the use of defined protocols for lesion classification based on SUVmax. However, reduced partial volume effects may increase lesion detectability. Matching SUVmax in phantoms translated well to patient studies for PSF reconstruction but less well with TOF, where a small positive bias was observed in patient images. Matching SUVmax significantly reduced voxel variance and potential variability of uptake measures. Finally, TLG may be less sensitive to reconstruction methods compared with either SUVmax or SUVpeak. Electronic supplementary material The online version of this article (doi:10.1186/s40658-014-0099-3) contains supplementary material, which is available to authorized users.

Background [ 18 F]2-Fluoro-2-deoxy-D-glucose (FDG) positron emission tomography (PET) has been shown to play a key role in the management of patients with non-small cell lung cancer in terms of staging and prognosis [1][2][3][4][5] and monitoring response to therapy [6]. In these applications, the uptake of FDG expressed as standardised uptake value (SUV) is of key importance, with SUV max being the most commonly reported measure [7]. The use of SUV max for discrimination between benign and malignancy for soft tissue masses and lymph nodes has been demonstrated for lung cancer patients [8,9] and changes in SUV max used as an indicator of response to therapy [10].
While the use of SUV max is commonplace, it is known to be sensitive to both reconstruction parameters [11] and the amount of statistical image noise, leading to poorer test-retest consistency relative to other SUV-based metrics [12,13]. Consequently, alternative metrics such as SUV peak [14] and total lesion glycolysis (TLG), the product of SUVmean and metabolic tumour volume derived from the PET images, have been suggested for use, particularly in monitoring response to therapy [6,15]. Recently, TLG has also been shown to offer superior prognostic information than SUV max [16][17][18][19][20].
In recent years, there have been significant advances in iterative image reconstruction algorithms and scanner hardware. Consequently, reconstruction algorithms that include point spread function (PSF) modelling [20,21] and time of flight (TOF) [22] have become commercially available on PET/CT scanners, with TOF also available on PET/MR [23].
The use of PSF modelling, with and without TOF, has been shown to improve signalto-noise ratio (SNR) [24][25][26][27] and lesion detectability [28][29][30] partly through decreasing voxel variance. However, the implementation of PSF modelling, both within projection space and image space, from different manufacturers and also academic institutions has been shown to produce Gibbs artefacts [21,[31][32][33][34][35] (Nick Vennart, personal communication). In patient imaging, the Gibbs artefact, combined with reduced partial volume effects, has a significant impact on SUV max [36][37][38]. This is particularly evident with minimal or no post-reconstruction filtering, which has been shown in phantom studies with numerical observers to provide greater lesion detectability [28][29][30]. Changes to SUV max as a consequence of PSF modelling present a challenge as changes to defined local practice for reporting may be required such as changing the thresholds used for the discrimination of malignancy. The scanner used in this study has been part of a multi-site network of scanners for routine FDG oncology imaging since 2009. SUV max is the reported uptake metric, and the consensus amongst local reporting clinicians within the network is that lesions with SUV max > 5.0 are considered highly suspicious of malignant disease.
It is necessary, in practice, to smooth clinical images to provide image quality that is deemed acceptable for clinical reporting. This degrades the spatial resolution but increases signal to noise. The degree of smoothing applied at any given centre is heavily influenced by the experience and personal preferences of the reporting clinicians, informed by the advice of physicists providing scientific support. Where several PET scanners serve the same patient population, it is also advantageous to match imaging performance across the network in terms of visual image quality and quantitative characteristics.
A trade-off curve of signal enhancement versus noise reduction when using PSF and/ or TOF algorithms can be established by applying a range of reconstruction post-filters.
It has been demonstrated that it is possible to match SUV max from PSF-based reconstruction with traditional non-PSF algorithms by applying a particular post-filter. Lasnon et al. [39] showed that a 7.0-mm full-width-half-maximum (FWHM) postfilter with PSF reconstruction gave comparable recovery coefficients in phantom data to non-PSF reconstructions and brought the recovery coefficients in line with European recommendations [40]. Another study proposed the application of a post-filter for the purpose of quantification [41]. This study also demonstrated that despite a spatially dependent PSF, this approach of using a single post-filter choice was adequate for all lesions irrespective of their location in the field of view. The application of a relatively broad post-filter to PSF modelling images may seem counterintuitive as it will undo the improvements in partial volume effect, but there are likely to be other benefits that have not been reported such as a reduction in voxel variance in the images.
Another potential solution may be to use alternative uptake metrics to SUV max . One study [37] suggested that TLG may be more stable when comparing PSF to non-PSF reconstruction, but this study only assessed ten lung lesions. Another study [38] has suggested the move to SUV mean based upon a 50% isocontour of SUV max . To our knowledge, there are currently no studies that investigate the impact of these reconstructions with PSF modelling and TOF on TLG and SUV peak .
The primary aim of this study was to evaluate the impact of PSF modelling and TOF on SUV max -based lesion classification as implemented at the local institution. This was performed using Siemens reconstruction software including implementations for TOF and PSF modelling (HD, UHD). Implementations of reconstruction algorithms can differ, and therefore, the results might be specific to HD and UHD; however, we feel it is likely that findings may be generalisable to other reconstruction implementations with similar philosophies. Any change in FDG uptake measurements across different reconstruction protocols can hopefully allow other centres to assess how such changes may impact their approaches to lesion classification. Two set criteria for post-filtering the images were assessed based upon characteristic locations on a signal enhancement versus noise reduction trade-off curve. These two points are 1) matching image noise (voxel variance) which was expected to enhance signal and 2) matching signal (SUV max ) which, based on previous studies [39,41], was anticipated to require greater levels of post-filtering and hence reduce image noise. This latter approach is aimed to be particularly relevant to centres that wish to maintain uptake quantification for practical purposes, which is particularly important in multi-site imaging networks. In addition, this work aimed to expand on the results of previous studies [36][37][38] with the addition of TOF, evaluation of other uptake metrics such as SUV peak and TLG, and determining gains in SNR for the two strategies.

PET/CT scanner
The PET scanner used in this study was a Siemens Biograph mCT with 64 slice CT (Siemens Medical Solutions, Erlangen, Germany). The scanner has a four-ring extended axial field of view of 21.6 cm (TrueV) and includes options for PSF modelling (Siemens HD·PET) and combined PSF modelling with TOF (Siemens ultraHD·PET) in the image reconstruction. Performance data for the scanner has been published previously [42].

Phantom acquisitions
A NEMA NU-2 image quality (IQ) phantom (PTW, Freiburg, Germany) was filled with [ 18 F]FDG so that the background compartment and all six hot spheres had activity concentrations of 5.19 and 41.7 kBq/ml, respectively. This 8:1 contrast was chosen to mimic lung lesion contrast, which is generally high. In order to divide the data into ten replicate datasets, a gated 60-min list-mode acquisition was performed using an ECG simulator as the gating input. Each replicate image contained 30 million (±0.2%) net true coincidences as this was typical of the number of counts measured over the thorax in our standard patient acquisitions. Images were reconstructed using four methods: standard 3-D ordinary Poisson ordered subset expectation maximisation (OSEM) reconstruction; OSEM with TOF (TOF); OSEM with PSF modelling -Siemens HD·PET (HD); and OSEM with both PSF and TOF -Siemens ultraHD·PET (UHD). For non-TOF reconstructions, 3 iterations and 24 subsets (3i24s) were used, while for TOF reconstructions, 2 iterations and 21 subsets (2i21s) were used.
Two iterations were chosen for TOF reconstructions as TOF has been shown to provide faster convergence with comparable signal to noise achieved in fewer iterations than non-TOF [27,43], and it has been shown in published performance data for the scanner that one fewer iteration with TOF is optimal [42], providing similar background variability and marginally superior contrast recovery in smaller objects. However, it is not possible to exactly match the number of subsets for TOF and non-TOF reconstructions. All images were reconstructed into a 256 × 256 matrix with voxel sizes of 3.2 mm × 3.2 mm × 2.0 mm. As is routinely performed with patient data, a 5.0-mm FWHM Gaussian post-filter was applied to the OSEM images. The baseline parameters of 3 iterations and 24 subsets and 5.0-mm post-filter for OSEM reconstruction have been in routine use since the scanner was commissioned in 2009. These parameters were selected to align SUV max quantification and voxel variance with other scanners in the local oncology imaging network.
A variety of post-filters with different kernel widths was applied to the TOF, HD and UHD images with kernel widths ranging from 0 to 10 mm FWHM in step sizes for 0.1 mm.

Noise matching
Twelve circular regions of interest (ROIs) of 37-mm diameter were placed in the phantom background over five separate slices (60 ROIs in total) of the IQ phantom image in accordance with the NEMA NU-2-2007 standard [44]. For each image replicate, the average coefficient of variation (COV) over the 60 ROIs was calculated as where σ k,R and μ k,R are the voxel standard deviation and mean, respectively, within ROI k and replicate R. The mean and standard deviation of COV R was determined across all ten replicate images. The OSEM 3i24s 5.0-mm post-filter image was used to compute the reference COV value. For the three other reconstruction methods, the post-filter that gave the smallest difference in COV, relative to the OSEM image, was determined.

SUV max matching
SUV max is the uptake measure used in our routine patient reports and so was the measure chosen to match across the reconstruction algorithms. To achieve this, SUV max was measured in each hot sphere in the phantom for the OSEM images using a 3-D volume of interest, equal in diameter to each true sphere size and centred on the sphere. As with the COV matching, a post-filter was incremented in 0.1-mm steps on the other three reconstructions until the summed squared difference of SUV max for the six hot spheres relative to those in the OSEM image was minimised.

PET/CT acquisitions
The PET acquisition was performed from eyes to mid-thigh for all patients, requiring six or seven bed positions. The acquisition time for each bed position was 2.5 min. Attenuation correction was performed using a non-contrast CT acquisition performed prior to the PET acquisition. Scatter and random corrections were applied to all images. All images were reconstructed with OSEM 3i24s and 5.0-mm post-filter as the reference, along with the phantom-determined TOF, HD and UHD protocols, which match either voxel COV or SUV max .

Uptake measurements
All images were viewed and the uptake quantified using Siemens TrueD image display software (Siemens Medical Solutions, Erlangen, Germany). In each patient, a 3-cmdiameter spherical volume of interest (VOI) was placed within an area of uniform FDG distribution in the liver, and the COV of the voxels within the VOI was calculated. Three FDG uptake measurements were derived for each identified lesion within the lung: SUVmax , SUV peak (as defined in the PET response criteria in solid tumours (PERCIST) protocol [14]) and TLG. SUV was normalised to patient body weight only. Volume delineation for TLG was performed using a 40% threshold of SUV max (TLG-40). Recent metaanalyses [16,17] have highlighted several methods for volume delineation -either using percentage or absolute SUV thresholds. The choice of a percentage threshold in this study was based on a hypothesis that as the magnitude of the partial volume effect varied with different reconstructions, the impact on the tumour volume and SUV mean would be inversely related. This may result in a more stable value for the TLG. It should be noted that other methods of delineation are likely to produce alternative results. Lesion volume was measured on the OSEM image using a 40% threshold of SUV max .

Signal to noise
It is difficult to estimate SNR directly in a lesion due to inhomogeneous uptake; therefore, we have adopted the use of the liver as a source for the background and noise measurement. This technique has been performed previously [25] and is considered a reasonable relative surrogate for SNR in the lesion. For lesions with SUV max above the PERCIST threshold of 1.5 times the mean SUV in the liver VOI + 2 standard deviations of the voxels within the liver VOI [14], the signal-to-noise ratio of the tumour, relative to the liver, (SNR (T-L) ) was calculated as where the Tumour refers to SUV max in the lung lesion, Liver is the mean SUV measured in the liver VOI and σ L is the standard deviation of voxel values measured in the liver VOI. This method allows comparison to other studies, which have used the same metric [25,42]. SNR (T-L) of all qualifying lesions was determined for each reconstruction using the two filtering schemes of matched voxel COV and matched SUV max . The gain in SNR (T-L) was expressed for the TOF, HD and UHD reconstructions as the ratio to the SNR (T-L) measurements from the standard OSEM images of the same patient.

Statistical analysis
Relative percentage differences of the uptake metrics relative to OSEM were expressed as mean with 95% confidence intervals. Bland-Altman analysis was also performed on the data. Relative changes of >25% for SUV max and >30% for SUV peak were considered clinically significant based upon EORTC [10] and PERCIST [14] guidelines respectively. In addition, hypothetical changes to patient management as a consequence of SUV max based on local practice were recorded. Differences in voxel COV in the liver VOI and gains in SNR (T-L) were assessed using a paired t test with a p value <0.05 considered to be significant.

Phantom images
The FWHM of the post-filters obtained for matching voxel COV to OSEM 3i24s and a 5.0-mm post-filter were 4.4, 3.8 and 2.9 mm for TOF, HD and UHD, respectively. The FWHM of the post-filters obtained for matching SUV max were 4.8, 6.6 and 6.5 mm for TOF, HD and UHD, respectively. To provide an illustration of the underlying impact of each algorithm, SUV max , expressed as a percentage of the true activity concentration, and noise data are first shown with no post-filter in Table 1. Data are then presented with the two post-filter sets as described in Table 2. From the data, it is seen that there is considerable increase in SUV max in the two smallest spheres with HD and UHD with matched voxel COV. The variability of SUV max was greater in the two smallest spheres at matched voxel COV, particularly with HD and UHD; the positive bias in the larger spheres with OSEM and TOF at matched voxel COV is likely to be due to image voxel variance, while with HD and UHD at matched voxel COV, Gibbs artefacts are also expected to contribute. This can be seen in Figure 1, which shows profiles through the centre of the 37-, 22-and 13-mm spheres.
With post-filters to match SUV max recovery, variability is comparable or less with HD and UHD compared with OSEM. To verify the cross-calibration between the dose calibrator and scanner, the activity concentration, averaged across the 60 background ROIs, was measured as 5.14 ± 0.1 kBq/ml. Figure 2 shows images from a single representative female patient with a BMI of 37 kg/m 2 . The image has been cropped to show only the lung lesion and liver. Voxel COV within the liver VOI was 16.3%, 15.0%, 16.5% and 15.4% for OSEM, TOF, HD and UHD, respectively, with matched voxel COV post-filters and 13.5%, 10.8% and 7.95% for TOF, HD and UHD, respectively, with matched SUV max post-filters. SUV max for the lesion in the right lung was 5.4, 6.0, 8.2 and 10.1 for OSEM, TOF, HD and UHD, respectively, with matched noise post-filters and 5.2, 5.7 and 5.7 for TOF, HD and UHD, respectively, with matched SUV max post-filters. The visual reduction in voxel variance within the liver is evident in the HD and UHD images with the matched SUV max protocol.   SUV max in each of the image quality spheres expressed as a percentage of the true activity concentration, and voxel COV in the phantom background. Data are shown for OSEM (reference reconstruction) and the PSF and TOF-based reconstructions with the two post-filter sets. Values are mean and standard deviation (SD) obtained from the replicates, with the latter shown in parentheses. For clarity, the SD shown is the SD across the replicates expressed as a percentage of the true activity concentration in the sphere. Table 3 shows the voxel COV data measured in the VOI within the patient livers. There were no significant differences for the PSF and TOF-based reconstructions versus OSEM when using the matched voxel COV post-filters. As with the phantom data, significant reductions of voxel COV were measured for PSF and TOF-based reconstructions compared with OSEM using the post-filters to match SUV max recovery. The mean measurements of  voxel COV in the liver VOI for TOF, HD and UHD were 90%, 65% and 56%, respectively, of the value measured using OSEM.

FDG uptake measurements
Tables 4 and 5 summarise the changes of the three uptake measures observed using the PSF and TOF-based reconstructions relative to OSEM. The data in Table 5 for the number of lesions with a change in SUV max greater than 25% occurred in lesions with very low grade uptake (SUV max <2.5). Bland-Altman plots for the relative differences are shown in Figures 3, 4 and 5, which, in addition to data in Tables 4 and 5, show that the smaller values of SUV max and SUV peak experience the greatest increase with matched voxel COV (Figure 3a,b,c and Figure 4a,b,c). For matched SUV max filters, this is still present with TOF algorithms (Figure 3d,f and Figure 4d,f) but not with HD reconstruction. For matched voxel COV, the increase in both SUV max and SUV peak ratio for PSF and TOF-based reconstructions versus OSEM was inversely related to lesion volume as shown in Figure 6. This reflects what was seen in the image quality phantom measurements. The gains in SUV max were most pronounced with UHD, which is likely to be a consequence of reduced post-filtering compared with HD when voxel COV was matched (2.9 mm for UHD and 3.8 mm for HD). Differences in TLG-40 were not dependent on lesion volume. No relationship between SUV difference and lesion volume was observed for matched SUV max post-filters.
Out of the 74 lesions, 59 had a SUV max of >5.0 using OSEM reconstruction. No change to patient management would occur in these instances as a result of an increase of SUV max when using the PSF and TOF-based reconstructions. A key group of ten patients was identified with low or borderline SUV max (<5.0) for suspicion of malignancy using this institute's practice. The SUV max for these 15 lesions in each of the reconstruction algorithms are shown in Table 6. The table shows that, with matched voxel COV, several of these lesions would change classification with HD and UHD, as would be expected from data in previous tables and figures. With matched SUV max filters, there is only one lesion that would have changed classification according to local practice and only with the TOF reconstruction.

Signal-to-noise gains
Fifty-nine lesions were found to have SUV max above the threshold based on the liver uptake as measured on the OSEM images. Significant SNR (T-L) gains were found for PSF and TOF-based reconstructions with both matched voxel COV and matched SUV max . With the addition of PSF modelling, either to OSEM or OSEM + TOF images, there is a more marked gain in SNR (T-L) . For matched voxel COV, SNR (T-L) ratios relative to OSEM were 1.10 ± 0.11, 1.43 ± 0.23 and 1.67 ± 0.41 for TOF, HD and UHD, respectively, and for matched SUV max , they were 1.19 ± 0.12, 1.58 ± 0.16, and 1.94 ± 0.29, respectively. For each reconstruction algorithm, the improvement in SNR (T-L) with matched SUV max versus matched noise was also significant. Image noise, expressed as coefficient of variation (COV), measured in the liver for each reconstruction for matched voxel COV and matched SUV max post-filters. Values are mean and standard deviation, with the latter shown in parentheses.

Discussion
The deployment of PSF and TOF-based reconstruction methods into routine clinical practice for FDG imaging presents a challenge, particularly in centres or collaborative imaging networks with a defined protocol for classification of malignancy based upon SUV data. To our knowledge, this is the first study that has evaluated the performance of PSF and TOF-based reconstruction algorithms with two post-filtering strategies based on the objective criteria of matched image noise (voxel COV) or matched SUVmax , quantifying the impact on SUV max , SUV peak , TLG and SNR (T-L) . Specific findings are applicable to Siemens HD and ultraHD reconstruction algorithms using the parameters applied in the study. It is clear from the data in Tables 1 and 2 and Figure 3 that quantification differences occur in the phantom data for all algorithms applied in this study. There are several factors that will contribute to the differences: the effect of statistical noise, partial volume effect, the size (and hence number of voxels) of the region of interest and, for the HD and UHD algorithms, Gibbs artefacts. The contributions from these factors to the measurements of SUV max will differ as reconstruction parameters are varied. We believe that the interactions between the various factors are complex and not completely separable. As such, we do not feel that it is possible to identify one single phenomenon as the source of quantification differences for any of the algorithms used. Mean percentage changes and 95% confidence intervals of the three uptake measures relative to OSEM reconstruction. Also shown are the number of lesions with a greater than 25% and 30% increase in SUV max and SUV peak , respectively. Data in the table are from images using post-filters to match image voxel COV. Mean percentage changes and 95% confidence intervals of the three uptake measures relative to OSEM reconstruction. Also shown are the number of lesions with a greater than 25% and 30% increase in SUV max and SUV peak , respectively. Data in the table are from images using post-filters to match SUV max .
It can be seen that overestimation occurs for all four reconstruction algorithms (Table 1) and requires the application of a post-filter to reduce this ( Table 2). The smaller filter kernel applied to HD and UHD to match noise combined with voxel correlation leads to a lesser reduction of this overestimation. It can be seen that there appears to be a particular size of object where an overestimation with HD and UHD is particularly prominent with no or minimal levels of post-filtering, which, in part, may be due to overlapping Gibbs edge artefacts. Despite this, it can be seen from HD recovery data in Table 2 that, with matched voxel variance, there is very little dependence of  recovery on sphere size for the 13-to 37-mm spheres, which is a desirable property. This highlights the importance of establishing a full understanding of the impact of these algorithms, and it is the duty of medical physics experts to educate clinicians on changes expected to quantification.
Ideally, the implementation of PSF modelling would not lead to Gibbs artefacts, but given the necessary compromises for PET imaging with limited statistics, an improvement in one area such as in image resolution is almost certainly going to lead to a deterioration in other aspects. Overall, whether the changes are desirable is application dependent, with our data showing smaller absolute errors for smaller spheres (but not for large spheres) and reduced dependency on quantification with lesion size.
Matching image noise produces marked increases in SUV max , particularly with PSF reconstructions, that are potentially clinically significant, depending on local practice. This highlights the pitfalls of using uptake metrics such as SUV max, that are so sensitive to partial volume effects and reconstruction parameters, with fixed thresholds for malignancy. The largest increases in SUV max occur for small lesions, which typically have  low SUV max (less than 5), which is consistent with other studies [36,37]. One potential solution may be to modify thresholds based on estimated tumour volume. It would be useful to extend the matching of SUV max to smaller objects, but this is not possible due to the limitation of the current NEMA phantom, with 10 mm being the diameter of the smallest sphere insert. It is these small lesions, with SUV max close to the typical cut-offs for discrimination of benign and malignant disease, that are arguably the most critical lesions for lung cancer staging as they are likely to be possible additional pulmonary nodules or lymph nodes. Determining whether a lymph node is malignant, particularly those in the mediastinum, has a considerable influence on the overall staging and will play a major role in patient management. This change in SUV max is expected to require an adaptation of locally used thresholds for discrimination of disease. It was also noted from the phantom studies that variability of SUV max was worse for PSFbased algorithms in the small spheres, which suggests worse test-retest performance in clinical data. This is suspected to be due to increased inter-voxel correlation that is introduced when using PSF-based algorithms [21]. This increased correlation results in a reduction of voxel variance (and hence the voxel COV as used in this study as a noise metric), but it has been shown to potentially result in larger variability of uptake metrics within small ROIs [45]. We feel that the impact of PSF modelling on variability for clinical data has yet to be explored fully, and while this is beyond the scope of this study, it is recommended that caution is observed when applying PSF modelling for assessing response to treatment with follow-up scans. Despite this, the reduced levels of post-filtering required with PSF and PSF + TOF have been shown to improve lesion visualisation [28][29][30].
With matched voxel COV, SUV peak experiences similar differences to those seen for SUV max , albeit to a lesser extent. Quantification of peak uptake implicitly includes an SUV max values for patients with low uptake (<5.0) for suspicion of malignancy and the SUV max data obtained from the PSF and TOF-based algorithms using the two post-filter strategies. The left column shows the lesion volume as measured from the OSEM PET image using a 40% threshold. Values in bold represent lesions that would have changed classification using a strict SUV max cut-off of 5.0. Values in italics represent increases of greater than 25%.
additional filtering operation with a spherical kernel. The small mean relative differences for TLG suggests that it is a relatively robust uptake metric when comparing against OSEM images for either filtering strategies. The large degree of variability seen in the relative changes, as highlighted by the confidence intervals in Tables 4  and 5, may be concerning. However, it should also be noted that the total range of TLG observed in this study is approximately a full order of magnitude greater than SUV max and SUV peak . The use of TLG has been reported in assessment of therapy response and, recently, for prognosis in a small number of studies. The increased stability of TLG with a volume delineation based on a percentage of SUV max suggests the metric may be more appropriate than SUV max for staging and prognosis as the evidence base for this metric is established. We believe that this is the first time that the dependence of TLG on reconstruction algorithm has been explored in the literature. Alternatively, post-filters for PSF and TOF-based algorithms can be determined to give SUV max that, according to this institute's practice, would not alter the outcome of the study. For all lesions with borderline SUV max for suspicion of malignancy, relative changes with PSF and TOF-based reconstructions were less than 20%.
Matching SUV max between PSF-based algorithms and OSEM has been demonstrated previously [39]. However, our study has also shown that matching SUV max will significantly reduce the voxel variance in the image compared with OSEM, which we believe has yet to be demonstrated quantitatively. Combined with increased voxel correlation, this reduction of voxel variance alters the image appearance quite considerably and may be perceived as over-smoothing of images. Findings from this study are based upon an image matrix of 256 × 256 voxels, whereas other centres may use different parameters such as 200 × 200 or 400 × 400 voxels, which are common choices on the mCT due to the system's intrinsic 400 × 400 matrix. We believe that, when Gaussian post-filtering is applied, the dependence of both image noise and SUV max on matrix choice is diminished. It has also been shown that the thickness of the walls of the fillable spheres of the NEMA phantom has an impact on SUV max quantification [46,47]. This is only seen to cause appreciable error with low sphere-to-background contrast and small spheres, and hence, we expect that the impact on the test objects used in this study is likely to be minimal.
It is noted that the degree of post-filtering for the HD and UHD algorithms (6.6 and 6.5 mm, respectively) will reduce spatial resolution for these PSF-based algorithms that are intended to provide superior spatial resolution. However, we feel that this approach may be beneficial when deploying a new PET/CT scanner to an existing clinical setting, comparing patient scans for follow-up with other systems or supporting the transition to a 'new' imaging facility with a catalogue or library of images with higher resolution.
In this study, the addition of TOF increased the variation in ratio values of image voxel variance for both phantom and patient data with either matched noise or matched SUV max . In the patient data only, TOF appeared to introduce a slight positive bias and greater distribution of differences in the SUV max data. This was not seen in the phantom studies and the cause of this is unclear. It could be due to a dependence on patient size, as TOF is associated with SNR gains proportional to the diameter of object [48]. However, in this study and others [20], this did not appear to apply in lung images where the majority of tissue in the image has low density with very low uptake of FDG.
We believe this is the first study to demonstrate SNR gains with PSF and/or TOF using lesion uptake as a measure of signal with two different criteria for choosing post-filtering. A recent study has shown reductions in voxel variance and gains in SNR but measured only in uniform areas of uptake with patient livers [27]. One study has evaluated SNR gains using lesion uptake as the signal [25] but only comparing images reconstructed with PSF and PSF + TOF, with the intention to demonstrate the SNR gains brought on by TOF. It was expected that SNR gains would be seen for PSF and TOF-based algorithms compared with conventional OSEM. However, it was not anticipated that the gains in SNR would be greater when parameters are chosen to match SUV max . This may be of particular relevance for low-contrast lesions elsewhere in the body, such as the abdomen, which do not have the inherent high lesion to background contrast of lung lesions. The notion that increased levels of post-filtering may be superior in terms of SNR gains seems slightly at odds with published work on lesion detection that suggest less post-filtering results in optimal lesion detection [28,29]. This may be due to fact that the definition of SNR in this study is not a direct indicator of lesion detectability.
There are two limitations with this study where future work is planned. Firstly, no histological correlation with FDG uptake measured in the lesions was performed as in other studies [36]. Therefore, it is not possible to determine cut-off values and diagnostic accuracy of the uptake metrics in the two strategies of implementation. This is arguably outside the scope of this study as the purpose was not to determine such data. Secondly, we have only assessed lung lesions, and from other studies [25], it is likely that reconstruction will perform differently in other areas of the body.
The effect of PSF and TOF-based reconstruction on quantification, particularly SUV max , has limited their introduction into routine clinical use despite demonstrated improvements in lesion detectability. This study extends existing studies [39] which have shown that the impact on SUV max can be addressed with appropriate post-filters, by demonstrating that the same approach can be used for reconstructions with TOF reconstructions and also with alternative uptake metrics such as SUV peak or TLG. Furthermore, we have demonstrated that this additional filtering to match SUV max actually provides added gains in SNR over parameters to match image voxel COV. However, if the additional smoothing is visually undesirable, an alternative methodology can be used which performs the additional filtering required to match SUV max only for quantification and is not visualised [41].

Conclusions
This work evaluated the impact of reconstructions that include PSF modelling and/ or TOF on lesion classification according to a local protocol by assessing changes in FDG uptake measurements. Two objective strategies for post-filtering were investigated: matching image voxel COV versus matching SUV max . For matched voxel COV, considerable increases in SUV max and SUV peak were observed compared with OSEM. Using post-filters to match SUV max reduced the discrepancies of either SUV max or SUV peak across reconstructions, particularly with PSF modelling. This also resulted in a considerable reduction in voxel variance. Some small discrepancies in patient data still remained when TOF was incorporated, which was not seen in phantom data, warranting further investigation. The TLG metric appears to be more robust in either scheme of post-filtering despite a slightly larger variation in the amount of change, which may be less of a problem considering the large range of TLG data observed. This suggests TLG may be a more suitable metric to adopt instead of SUV max as the evidence base develops. Gains in SNR were seen in both implementations with the greatest gains seen for matched SUV max post-filters.

Competing interests
This study was performed as part of the first author's (IA) PhD project, which receives financial support (course fees) from Siemens Healthcare that is paid to the nuclear medicine department and then directly to the University of Manchester.
Authors' contributions IA managed and processed all image data and wrote the manuscript. MK assisted with data analysis (MATLAB code) and critically appraised and modified the draft manuscript. HW critically appraised and modified the draft manuscript. JM is a PhD supervisor and critically appraised and modified the draft manuscript. All authors read and approved the final manuscript.