A multicentre and multi-national evaluation of the accuracy of quantitative Lu-177 SPECT/CT imaging performed within the MRTDosimetry project

Purpose Patient-specific dosimetry is required to ensure the safety of molecular radiotherapy and to predict response. Dosimetry involves several steps, the first of which is the determination of the activity of the radiopharmaceutical taken up by an organ/lesion over time. As uncertainties propagate along each of the subsequent steps (integration of the time–activity curve, absorbed dose calculation), establishing a reliable activity quantification is essential. The MRTDosimetry project was a European initiative to bring together expertise in metrology and nuclear medicine research, with one main goal of standardizing quantitative 177Lu SPECT/CT imaging based on a calibration protocol developed and tested in a multicentre inter-comparison. This study presents the setup and results of this comparison exercise. Methods The inter-comparison included nine SPECT/CT systems. Each site performed a set of three measurements with the same setup (system, acquisition and reconstruction): (1) Determination of an image calibration for conversion from counts to activity concentration (large cylinder phantom), (2) determination of recovery coefficients for partial volume correction (IEC NEMA PET body phantom with sphere inserts), (3) validation of the established quantitative imaging setup using a 3D printed two-organ phantom (ICRP110-based kidney and spleen). In contrast to previous efforts, traceability of the activity measurement was required for each participant, and all participants were asked to calculate uncertainties for their SPECT-based activities. Results Similar combinations of imaging system and reconstruction lead to similar image calibration factors. The activity ratio results of the anthropomorphic phantom validation demonstrate significant harmonization of quantitative imaging performance between the sites with all sites falling within one standard deviation of the mean values for all inserts. Activity recovery was underestimated for total kidney, spleen, and kidney cortex, while it was overestimated for the medulla. Conclusion This international comparison exercise demonstrates that harmonization of quantitative SPECT/CT is feasible when following very specific instructions of a dedicated calibration protocol, as developed within the MRTDosimetry project. While quantitative imaging performance demonstrates significant harmonization, an over- and underestimation of the activity recovery highlights the limitations of any partial volume correction in the presence of spill-in and spill-out between two adjacent volumes of interests.


Introduction
Tomographic molecular imaging methods such as single photon emission computed tomography in combination with x-ray computed tomography (SPECT/CT) provide powerful image quantification techniques with applications for diagnostics and therapy optimization [1][2][3][4]. Accurate quantitative imaging (QI) is an essential input to absorbed dose calculations in molecular radiotherapy (MRT), enabling an assessment of the activity distribution in a patient [5]. This, in turn, allows the calculation of absorbed doses to organs, tissues, or tumours of interest that can be exploited for optimizing the treatment.
In contrast to the more common qualitative use of SPECT/CT, quantitative imaging with SPECT/CT allows a direct measurement of the activity distribution within a given source region. Such direct measurement requires a calibration to relate the detected counts to the activity of a specific radionuclide, commonly defined by an image calibration factor (ICF) expressed in counts per second per MBq (cps/MBq). Calculation of an ICF typically involves the preparation of a test object (phantom) with a known activity concentration of a radionuclide for imaging on the SPECT/CT system to be calibrated [6]. This process can have multiple sources of uncertainty [7,8] related to the accuracy and traceability of the phantom preparation, artifacts or errors in the SPECT/CT image reconstruction [9], and the choice of image reconstruction parameters.
The uncertainties in QI calibration, including uncertainties related to the determination of activity distributions in relatively small organs and tumour volumes [10], propagate directly to subsequent absorbed dose calculations [7,11]. Accurate evaluation of the uncertainty on the complete measurement chain is therefore essential when optimizing therapies based on these calculations. For clinical trials involving dosimetry, the comparability of dosimetry results is of importance. Currently, the calibration of SPECT/CT QI is highly site-dependent, even among those using the same SPECT/CT systems and reconstruction parameters, and few efforts have been undertaken to establish traceability of activity quantification across sites. The importance of traceability for MRT dosimetry was highlighted in a recent review of multicentre studies on standardized quantitative imaging and dosimetry for radionuclide therapies by Lassmann et al. [12]. Only three studies describing the use of 177 Lu were identified [13][14][15], of which only one made use of traceable activities [14].
The "Metrology for clinical implementation of dosimetry in molecular radiotherapy" project (MRTDosimetry) was a joint research project (JRP) within the European Metrology Programme for Innovation and Research (EMPIR), which ran for 3 years, finishing on 31 May 2019. This initiative brought together expertise in metrology and nuclear medicine research to address the problem of assessing the radiation absorbed dose to individual patients who are undergoing MRT. A main part of the project was the development of a protocol for commissioning and quality control of quantitative 177 Lu SPECT/CT imaging, the feasibility of which was tested in a multicentre clinical SPECT/CT imaging comparison exercise among the partners of the consortium.
In this study, the results of the 177 Lu SPECT/CT QI comparison exercise from the MRTDosimetry project are presented. The experimental protocol used to harmonize QI across the participating sites is outlined. The protocol includes (1) determination of an appropriate ICF, (2) correction of partial volume effects, and (3) validation of QI using a 3D printed two-organ phantom (based on kidney and spleen models from ICRP110 [16]). The results for activity recovery in realistic organ volumes are presented. The harmonization of image quantification between centres and the potential for the measurement protocol to be used for commissioning of quantitative 177 Lu SPECT/CT imaging is discussed. To allow clinical centres to implement the techniques presented in this work and benchmark QI against these results, the standard operating procedure for the comparison exercise and the designs used for the phantom fabrication (in the STL file format) have been made available [17], and the designs used for the phantom fabrication (in the STL file format) have also been made available [17].

Methods
A standard protocol was followed to calculate an image calibration factor, provide a correction for partial volume effects and to acquire SPECT/CT data of a customdesigned 3D printed two-organ phantom. A description of the comparison exercise and details of the data acquisition and image analysis are given in the following sections.

Comparison exercise
Eight members of the MRTDosimetry consortium (Azienda Unità Sanitaria Locale di Reggio Emilia, The Christie NHS Foundation Trust, Lund University, National Physical Laboratory (NPL), Oxford University Hospitals NHS Foundation Trust, Royal Surrey County Hospital, "THEAGENIO" Anticancer Hospital, and University of Würzburg) participated in the comparison exercise. In total, nine systems were included in the study; details of the camera models and reconstruction software used for this study are given in Table 1.
When harmonizing results in any nuclear medicine comparison, it is essential to ensure that the various methods used at the sites for radionuclide activity measurement are all traceable to an appropriate primary standard for 177 Lu. The methods used in this study included measurements with (i) a High Purity Germanium detector (HPGe) previously calibrated against primary standards, (ii) radionuclide calibrators and secondary standard ionization chambers previously calibrated against primary standards for the radionuclide of interest and a well-defined geometry [20,21], and (iii) for sites where traceability had not previously been shown, a sample from a stock solutions was sent to National Physical Laboratory to proceed with calibration and ensure traceability to primary standards.
All participants submitted results for phantom activities and counts in volumes of interest for the three phantom measurements (described in the following sections) using a common reporting template for centralized analysis. For one site with access to a range of different reconstruction software (S2a-S2c), an individual dataset was submitted for each reconstruction setup. One site (S3) submitted an incomplete dataset, and these data were excluded from the comparison.

Image calibration factor for 177 Lu
To determine the image calibration factor, a cylindrical phantom (Jaszczak phantom), with nominal volume 6.9 L [22], was filled with a uniform distribution of 177 Lu (target activity of 400 MBq). To ensure a uniform activity distribution of 177 Lu in the phantom, the use of a lutetium chloride carrier solution (10 μg·g -1 of inactive lutetium dissolved in 0.1 M hydrochloric acid) was recommended. The filling volume was determined by weighing (difference between filled and empty phantom). The dispensed activity by the participants varied between 387 and 410 MBq (average 400 MBq, standard deviation 11 MBq) at the start of the SPECT acquisitions. SPECT/CT data were acquired with the phantom positioned in the centre of the SPECT field of view using the acquisition parameters described in Table 2.

Partial volume correction
Resolution and partial volume effects were assessed using the six-sphere insert of the IEC NEMA PET body phantom (NEMA phantom) [23] with uniform activity distribution in the inserts and a water-filled background. Despite the relatively small volumes of these inserts, the NEMA phantom was chosen as it was readily available at all imaging centres. The inserts were filled with a uniform activity distribution of 177 Lu (target activity concentration of 2.0 MBq/mL). The filling volume of each sphere was determined by weighing (difference between filled and empty phantom). The dispensed activity concentration by the participants for this phantom was between 1.8 MBq/mL and 2.3 MBq/mL (mean 2.1 MBq/mL, standard deviation 0.2 MBq/mL) at the start of the SPECT acquisition. SPECT/CT data were acquired with the phantom positioned in the centre of the SPECT field of view using the acquisition parameters described in Table 2.

3D printed two-organ phantom
In the last part of the exercise, the quantitative imaging setup was validated using a 3D printed anthropomorphic phantom (two-organ phantom) modelled based on the adult female kidney and spleen from ICRP 110 [16]. The kidney inserts contained individual compartments corresponding to the cortex and medulla. The inserts were printed with polylactide (PLA) using fused deposition modelling as described in [24][25][26]. Each centre was equipped with a copy of the phantom designed to be attached inside the previously used Jaszczak phantom using a laser-cut mounting plate and support rods (see Fig. 1). Each organ compartment was filled with a uniform activity of 177 Lu using two stock solutions to model reduced activity uptake in the renal medulla compared with the cortex [27,28]. Details of the organ volumes and activity concentrations are provided in Table 3. The filling procedure for the kidney insert is outlined in Fig. 2. SPECT/CT data were acquired with the phantom positioned in the centre of the SPECT field of view using the acquisition parameters described in Table 2.

Image reconstruction
All phantom data acquired during the exercise were reconstructed locally at the sites using common reconstruction parameters with setup-specific choices of scatter correction and resolution recovery (as used for clinical imaging). The SPECT/CT reconstruction parameters are given in Table 4. Step-and-shoot CT Standard low-dose protocol a According to [11], reasonable quantitative accuracy can be achieved using only the 208 keV peak or including the 113 keV peak for a medium energy collimator. b Scatter windows may be adjusted as required by a specific SPECT system or scatter correction method (as used for clinical measurements) Data were reconstructed by all sites with a range of iterations (see Table 4) to ensure a sufficient total number of counts in the reconstructed image for convergence. For all sites, stable quantification was observed with 2 subsets and 25 iterations, with the total number of reconstructed counts increasing by < 0.7% for higher numbers of iterations. The target activities of 177 Lu used for phantom measurements in this study were chosen to ensure negligible acquisition dead-time [29]. Decay corrections for the acquired counts (e.g., decay correction of the count number in each projection to the start of the SPECT acquisition) were applied as implemented by the manufacturer for all data to be reconstructed. For all systems, any reconstruction processing options that produce nonlinear responses were not enabled.

Image calibration factor for 177 Lu
The reconstructed Jaszczak phantom data were used to determine a setup-specific image calibration factor for 177 Lu: Here, C is the counts in the reconstructed image within a cylindrical volume of interest (VOI) corresponding to 130% of the radius and 120% of the height of the phantom, T is the acquisition duration (unit: s), and A Calibrator (unit: Bq) is the activity dispensed  in the phantom. The standard uncertainty (u(ICF)) was calculated according to the multiplicative variant of the law of propagation of uncertainty [30]: The standard uncertainty in the counts within any volume can be approximated by Poisson statistics and calculated as the square root of the number of counts [31]. The standard uncertainty on activity measurements was provided by each site according to the methods used. A standard uncertainty of 1 s was assumed for all acquisition durations.

Partial volume correction
VOIs corresponding to each of the six sphere inserts were drawn based on a sphere of known sphere diameter positioned using the CT. Recovery coefficients were calculated by dividing the SPECT/CT-based activity in each of the spheres of nominal volume V i by the activity dispensed to the sphere known from the phantom experiment preparation [10,32]. The recovery coefficients were fitted to a two-parameter model using a weighted nonlinear regression model with the Levenberg-Marquardt algorithm in MATLAB 2020a [33,34]. The weights of the NEMA recovery coefficients were calculated as the reciprocals of the product of the fractional standard uncertainties of the recovery coefficients and the fractional standard uncertainties of the sphere volumes. The weights were adjusted by the uncertainties in  the drawn volume to account for a higher degree of uncertainty in the recovery of small volumes calculated according to the analytical approach of the EANM guidelines [7].

Validation of quantitative imaging
VOIs corresponding to the spleen and kidney (total, cortex and medulla) inserts were defined for the two-organ phantom data using the CT images (depending on each individual site's clinical setup, polygon-and threshold-based VOIs were used). The partial volume corrected SPECT/CT-based activity for each organ VOI were then calculated using Eqs. (1) and (3), where C is the total counts in the reconstructed image within the organ VOI of volume V Organ . The recovery coefficients R(V Organ ) were calculated according to Eq. (3). The standard uncertainty in the ratio between the organ activity calculated from SPECT/CT and as measured in the radionuclide calibrator was calculated as follows: Here, the uncertainty in the recovery at a given organ volume V Organ was calculated following the law of propagation of uncertainties: The uncertainties in the volumes were determined following EANM guidelines under the assumption of spherical organs and the definition of target VOIs on SPECT imaging to reflect a more realistic clinical situation [7].

Results
A representative example of the reconstructed phantom data is shown in Fig. 3. Each participating centre acquired corresponding datasets for identical phantoms. The centres reported results for phantom activities and counts in VOIs as described in the previous section.
A centralized analysis of ICF, activity recovery and partial volume correction was performed for all results. A total of 10 datasets were included in the comparison. The values of ICF for the datasets are shown in Fig. 4. Results of the partial volume analysis, with the resulting partial volume correction (Eq. (3)) are shown in Fig. 5 for each dataset with the associated 95% confidence intervals. Data points corresponding to the six spheres used for determining the parameters of the recovery curve (black points) and the fitted recovery coefficients for the Two-Organ validation phantom inserts (red points) are shown.
The ratio of activity determined from SPECT/CT imaging (A SPECT ) to the measured activity in the phantom (A Calibrator ) is shown in Fig. 6 for the total kidney, renal medulla, renal cortex, and spleen inserts. For each setup, data are shown for activity recovery with partial volume correction (PVC) applied (black crosses) and without (blue circles)-see Eq. (3). For each case, the mean ratio is shown as a horizontal line with the shaded area representing one standard deviation. The combined standard uncertainty and contributions from each component are given in Tables 5, 6, 7 and 8 for the renal medulla, renal cortex, total kidney and spleen, respectively.

Quantitative imaging comparison
The ICF values reported in this work (Fig. 4) are comparable when the same SPECT camera and reconstruction software are used (Siemens -S1 and S8, GE -S2, S5, S6, and S7). The influence of reconstruction software on the ICF is clearly demonstrated by the results for S2a, S2b and S2c where the same SPECT projections have been  Table 1 reconstructed with different software resulting in a large variation in ICF. It should, however, be noted that this apparent reduction in sensitivity is a normalization effect in the reconstruction rather than a change in intrinsic sensitivity. In contrast, the increased ICF value for S9 (compared with the otherwise identical systems S1 and S8) reflects the additional sensitivity from the thicker crystal built into this system. Whilst these variations suggest that a setup-specific calibration is required for accurate QI, the values presented in Fig. 4 can be a useful guide when benchmarking QI calibration of specific system and reconstruction software combinations (see Table 1). The activity ratio results (Fig. 6) demonstrate significant harmonization of quantitative imaging performance between the sites. For datasets without PVC (blue data points in Fig. 6), all setups are within one standard deviation of the mean values for all inserts, with the exception of S4 which, in contrast to the other datasets, had no resolution recovery applied. When PVC is applied, all setups (including S4) fall within one standard deviation of the mean values. However, all setups underestimated activity recovery from SPECT/CT imaging with PVC applied for the kidney (mean value 0.90 ± 0.06), spleen (mean value 0.94 ± 0.06) and renal cortex (mean value 0.79 ± 0.05). Overestimated activity recovery was reported by all sites for the renal medulla (mean value 1.97 ± 0.34). There were 3 out of 10 setups with activity recovery within one standard deviation of 100% for the kidney, and 4 setups for the larger spleen. Additional 2 setups (kidney) and 4 setups (spleen) were within two standard deviations. The uncertainty in the final activity quantification (a crucial input to subsequent absorbed dose calculations) was consistent across all datasets with a mean value of 7.2 ± 2.5% for all inserts (see Tables 5, 6, 7 and 8 for details).  Table 5 Uncertainty budget for renal medulla activity ratios (A SPECT : SPECT-based activity, A Calibrator : radionuclide calibrator-based activity, R(α) / R(β) / R(V): uncertainty components introduced by the recovery originating from α, β and V, respectively) Whilst the establishment of a common protocol has clearly demonstrated harmonization across the datasets, the accuracy of the resulting activity recovery was found to be limited by the choice of PVC. This is not unexpected with previous studies highlighting both the importance of accurate PVC for QI [10] and the difficulty of implementing more advanced algorithms in a clinical context [35]. Specifically, the choice of relatively small spheres for the partial volume determination (volumes < 30 mL in comparison to volumes of > 100 mL used for validation) limits the accuracy of the PVC. This decision was made with the aim of focusing on the most commonly clinically available phantoms. However, larger size inserts may be advantageous to improve the accuracy of the PVC methodology for larger volumes. In this current study, it is notable that for all setups, the application of partial volume correction improved activity recovery for inserts where no activity was present outside most of the insert. The mean increase in activity recovery was 0.11 (kidney), 0.13 (renal cortex) and 0.11 (spleen). In contrast, activity recovery was significantly overestimated with PVC when activity was present outside the insert, with an average increase of 0.47 for the medulla. This highlights the limitations of any volume-based PVC method where there is significant spill-in from outside a VOI as for the kidney medulla, which is enclosed by the kidney cortex with a 1.8-fold larger volume in combination with a 3-fold higher activity Table 6 Uncertainty budget for renal cortex activity ratios (A SPECT : SPECT-based activity, A Calibrator : radionuclide calibrator-based activity, R(α) / R(β) / R(V): uncertainty components introduced by the recovery originating from α, β and V, respectively)  concentration than the medulla. In these cases, no volume-based correction method will manage to yield an exact correction. The data for kidney and spleen obtained without resolution recovery in case of S4 allows a direct comparison with theoretical and experimental activity recovery values reported in previous work [24,25,36]. The values for the kidney (0.60 ± 0.01) and spleen inserts (0.61 ± 0.01) agree (within measurement uncertainties) with values reported by [25] for kidney and spleen inserts corresponding to mathematical models from [37] for measurements with a GE system. The values for the kidney inserts also agree with the values reported in [36] for patient-specific inserts based on CT imaging. A similar trend in activity recovery for 177 Lu in kidney phantom inserts corresponding to models from [38] is reported in [24] for measurements with a Siemens system. Repetition of this comparison exercise for data reconstructed without resolution recovery would allow the generality of these findings to be further investigated.
In general, whilst the application of resolution recovery increases activity recovery for the organ-sized volumes in this study, all data still require additional PVC (see Fig. 6). For the medulla, however, spill-in from the cortex leads to an overestimation of the activity concentration which is then further increased by the partial volume correction. In fact, it is notable that the application of PVC without resolution recovery (S4) is one out of only 3 setups with activity recovery within one standard deviation of 100%, in contrast to the majority of setups where PVC was applied to data with resolution recovery already applied. However, it should be noted that for data that include resolution recovery, the uncertainty contribution from the PVC (R(α), R(β), R(V) in Tables 6, 7 and 8) is substantially smaller than for data without resolution recovery. This reflects the better convergence in terms of activity recovery for smaller volumes, as seen in Fig. 5, resulting in an order-of-magnitude reduction in the fit parameter α when compared with S4, with a corresponding increase in the sensitivity of the volume component of the uncertainty, R(V), for S4. It should also be noted that S4 reported an orderof-magnitude lower uncertainty for the activity measurements compared with all other setups (average 0.6% compared to 4.0% for other setups), which balanced the increased PVC uncertainty in the final combined uncertainty. These results clearly demonstrate the complex interconnection between corrections used for quantitative imaging and the benefits of a comprehensive uncertainty analysis for understanding and prioritizing these corrections.

Protocol for commissioning SPECT/CT QI
The quantitative imaging comparison performed in this study has demonstrated that harmonization of quantitative SPECT/CT imaging across multiple international sites is feasible when following a common protocol. Ensuring traceability of activity measurements to an agreed primary standard is an essential underpinning step to obtain accurate quantitative radionuclide imaging. The protocol set out for commissioning QI in this study is self-contained, requiring no previous setup for 177 Lu QI at a site, and provides clinically relevant validation of QI for a specific scanner. This study has demonstrated that the absolute accuracy of SPECT/CT QI is highly dependent on the PVC methodology. Whilst this dependence may present challenges in validating QI at a single site following this protocol, the robustness of the results presented in this study provides a valuable benchmark for sites commissioning SPECT/CT QI for 177 Lu. As such, the adoption of a common (although unoptimized) PVC methodology for validation of QI has clear benefits. After such validation, further optimization of PVC and other imaging corrections, applied identically to both calibration and clinical imaging and based on local clinical requirements and SPECT/CT equipment, may be beneficial.
Adopting common clinically realistic test objects for validation such as the two-organ phantom used in this study, gives increased confidence in QI commissioning and cross-site harmonization. Such test objects can also be powerful in understanding the limitations of QI methodologies (as demonstrated by the significant overestimation of activity in the kidney medulla observed in this study) and their potential optimization. The reported designs for the two-organ phantom (in the STL file format) are offered as a freely available option for validation [17].
The methods for SPECT QI outlined in this study can provide the basis of a calibration and commissioning protocol, supported by the presented inter-comparison results and phantom designs. A next step will be to define a protocol for commissioning SPECT/CT QI, which utilizes the baseline results from this publication to allow validation of single site SPECT/CT QI for 177 Lu. The adoption of such a protocol is an essential step in supporting larger-scale clinical trials involving SPECT QI.

Conclusion
This comparison exercise shows that reliable quantitative SPECT/CT is feasible when following the very specific recommendations of the dedicated calibration protocol developed within the MRTDosimetry project. It is of high value as it was conducted in an international setting, bringing together the expertise of clinical sites and metrology institutes to ensure traceability and assess uncertainties in the activity determinationtwo aspects that are rarely considered in comparison exercises in the field of nuclear medicine. For an anthropomorphic two-organ phantom (ICRP kidney and spleen), the count loss due to spill-out was successfully compensated by using a standardized partial volume correction based on sphere recovery coefficients. In conclusion, this work shows that, given a detailed and standardized protocol for the measurements to be performed, quantitative SPECT/CT systems can be commissioned to deliver comparable activity values which is important for larger-scale clinical trials.