Arti�cial intelligence for reduced dose 18F-FDG PET examinations: A real-world deployment through a standardized framework and business case assessment

Background To determine whether arti�cial intelligence (AI) processed PET/CT images of reduced by 33% administered 18-F FDG activity acquired in a single center, were non-inferior to native scans and if so to assess the potential impact of commercialization. Methods SubtlePET™ AI was introduced in a PET/CT center in Italy. Eligible patients referred for 18F-FDG PET/CT were prospectively enrolled. Administered 18F-FDG was reduced to two-thirds of standard dose. Patients underwent one low-dose CT and two sequential PET scans; ‘PET-processed’ with reduced dose and standard acquisition time, and ‘PET-native’ with an elapsed time to simulate standard acquisition time and dose. PET-processed images were reconstructed using SubtlePET™. PET-native images were de�ned as the standard of reference. The datasets were anonymized and independently evaluated in random order by four blinded readers. The evaluation included subjective image quality (IQ) assessment, lesion detectability and assessment of business bene�ts. Results From February to April 2020, 61 patients were prospectively enrolled. Subjective IQ was not signi�cantly different between datasets (4.62±0.23, p=0.237) for all scanner models, with ‘almost perfect’ inter-reader agreement. There was no signi�cant difference between datasets in lesions’ detectability, target lesion mean SUVmax value and liver mean SUVmean value (182.75/181.75 [SD:0.71], 9.8/11.4 [SD:1.13], 2.1/1.9 [SD:0.14] respectively). No false-positive lesions were reported in PET-processed examinations. Agreed SubtlePET™ price per examination was 15-20% of FDG savings. Conclusion This is the �rst real-world study to demonstrate the non-inferiority of AI processed 18F-FDG PET/CT examinations obtained with 66% standard dose and a methodology to de�ne the AI solution price.


Introduction
Positron Emission Computed Tomography (PET/CT) is widely used in various clinical applications, including the investigation of oncological and neurological disorders [1].The evolution of technology and the extended clinical applications of PET/CT have led to a notable worldwide increase of the number of scans performed [2].Due to the leading role of PET/CT in the evaluation of systemic therapy response, a signi cant number of patients undergo more than one PET/CT scan per year, thus increasing patients' radiation exposure.Radiation dose has been associated with a slight increase in patients' lifetime risk of developing cancer [3].Legal framework, such as EURATOM 2013/59 Directive [4], has been published for the optimization of patients' radiation exposure from imaging practices.
Recently, reconstruction algorithms have been developed to improve the quality of images acquired with reduced administered radiotracer.However, these procedures are complicated, time-consuming and do not produce satisfactory outcomes when the injected activity is signi cantly lower compared to the standard dose suggested by the procedure guidelines [5].Machine learning methods have been developed to resolve these issues [6] by utilizing paired low-dose and standard-dose images to train models that can predict standard-dose images from low-dose inputs [7].
The value proposition of Arti cial Intelligence (AI) in healthcare has been well described [8], however realworld use in clinical practice remains limited [9].A multinational healthcare organization developed a nine-stage framework (Fig. 1) to deploy AI solutions into its network.The framework was designed to allow objective assessment of where the clinical and business bene ts lie [10,11].Using this framework, a machine learning solution enabling reduced administered radiotracer activity PET/CT scans was introduced into a single center in Italy.The purpose of this study was to determine whether it was able to produce images of adequate diagnostic con dence which were considered non-inferior to native scans with two different PET/CT scanner models, and if so to assess the potential impact of commercialization to the business.

Materials And Methods
According to the framework, the SubtlePET™ AI solution (Subtle Medical, Menlo Park, CA, USA) was selected to be assessed in a single, European Association of Nuclear Medicine (EANM) Research Ltd (EARL) [12] accredited, PET/CT center with three scanners.Appropriate legal review was undertaken to ensure that the solution was certi ed with the correct Medical Device classi cation and was compliant with the European General Data Protection Regulations (GDPR).In parallel, the technical architecture (Fig. 2) for the integration of the solution was reviewed.Once the legal and technical requirements were validated, the software was installed, con gured and veri ed, and the personnel were trained on its use and limitations.
SubtlePET™ software uses a convolution neural network-based algorithm to reduce noise and improve image quality of uorodeoxyglucose (FDG) and amyloid PET and PET/CT images [6,13].Even though SubtlePET™ is certi ed and validated for clinical use for all major PET/CT vendors and many different models, according to the framework, a clinical assessment had to be undertaken.This prospective analysis was designed to verify the performance of the algorithm using real-world data.

Patient enrollment
Patients referred for 18F-FDG PET/CT during diagnostic work-up for oncological disease were screened for prospective enrollment.Inclusion criteria were: (a) age > 18 years; (b) FDG-avid malignancy; (c) glycemia < 180 mg/dL; (d) adequate physical condition to allow them to remain still for approximately 40 minutes, for two consecutive PET scans.Claustrophobic patients were excluded.Patients meeting these criteria were approached to participate in the study.
According to GDPR and institutional procedures related to the information provided to patients for the examination process, all the patients signed an informed consent form prior to any study procedures.Examination protocol PET images were acquired with three different 3D PET/CT scanners (Discovery ST-4 -PET scanner 1, Discovery ST-16 -PET scanner 2 and Discovery IQ -PET scanner 3) from the same manufacturer (GE Healthcare, Milwaukee, WI, United States) without time of ight (TOF) technology.
18F-FDG was provided by Advanced Accelerator Applications pharmaceuticals (AAA by Novartis, Saint-Genis-Pouilly, France) in compliance with Good Manufacturing Practice (GMP) and in accordance with EANM procedure guidelines [5].
For the purposes of this study, FDG doses were reduced by one-third compared to the standard injected dose to a patient with the same body weight, according to institutional procedure guidelines.All doses were injected via peripheral venous catheter.
During the same day, patients underwent two sequential PET scans in continuous-bed-mode; a reduced dose acquisition scan (PET-processed) and a reference acquisition scan (PET-native).The PET-processed scan was acquired rst at 60 minutes post-injection from skull base to mid-thigh.In order to simulate normal acquisition time and reduced injected dose, PET images for scanners 1 and 2 were acquired at 2.5 minutes per bed-position, while for scanner 3, images were acquired at 1.5 minutes per-bed position, in accordance with institutional procedure guidelines.
Following the PET-processed scan, the PET-native scan was acquired for the same region without moving the patient.The PET-native images were acquired with an elapsed time, increasing the minutes per bedposition, to simulate normal acquisition time and standard injected dose.To de ne the PET emission acquisition time that simulated a full dose examination, a phantom study was performed on each PET/CT scanner using cancer imaging conditions, applying the following equation: standard time acquisition per bed * exp(900*λ) * 1,25 (second).
Patients underwent one low-dose CT prior to PET-processed acquisition, for attenuation correction and anatomical correlation of PET ndings.
Emission data was corrected for randoms, dead time, scatter, and attenuation and was reconstructed iteratively by an ordered-subsets expectation maximization (OSEM) algorithm.
According to institutional processes, images were reviewed for artifacts by the technologist before the patient was discharged.Upon con rmation, PET-processed acquisitions were sent by the radiographer from the modality to the Subtle server (SubtleEdge) for processing.Incoming images were automatically anonymized and quality controlled (QC) according to the SubtlePET™ process.Images that passed QC were processed and were sent automatically to the Picture Archiving and Communication System (PACS) in an average time of ten minutes.

Image Quality assessment
The PET-native images were de ned as the standard of reference and were reviewed by two independent physicians who had access to all the clinical, imaging and reconstruction data, to reach to consensus report that was delivered to the patient within 24 hours.
The PET-processed and PET-native datasets were anonymized, separated and randomized allowing independent assessment of each dataset over a four weeks period, by four blinded board-certi ed nuclear medicine physicians, with more than ve years' experience (EP and VA > 15 years; GP and AI > 5 years).
Each reviewer assessed all datasets.They were blinded regarding image acquisition, reconstruction technique and clinical information.18F-FDG PET/CT images were reported according to EANM procedure guidelines [5].
For image quality, the PET datasets were rated on a 5-point scale (1: very poor/non-diagnostic; 2: poor; 3: moderate; 4: good; and 5: excellent) with scores 4 and 5 considered adequate to provide diagnostic con dence.
Furthermore, each reviewer had to give their opinion as the whether they were reviewing the PETprocessed or the PET-native dataset or if this was indeterminate.
Lastly, the detectability of all lesions was evaluated in a per-lesion analysis.In patients with ten lesions or fewer, all lesions were assessed by the reviewers, while in patients with more than ten lesions, those ten with the highest standard uptake values (SUV) max were included in the analysis.In the two datasets, the SUVmax of the largest lesion and the SUVmean of the liver were measured.SUV was de ned as activity concentration (Bq/mL) divided by injected activity (Bq) normalized to body weight.The highest voxel value (SUVmax) and the mean voxel value (SUVmean) were obtained in a volume of interest (VOI) covering the entire tumor as de ned by each reviewer.Considering that all PET-native images were acquired after the PET-processed images, a correction factor for the SUV values was calculated according to Appendix 1 and Supplemental Table 1 [14].Lesions not detected by a reviewer in a speci c dataset were assigned as 0.
Once the independent analysis of the native and processed datasets was complete, they were logged and unblinded.The whole dataset was scrutinized to determine whether the processed scans were noninferior to the native scans.Quantitative assessment was of lesion detectability and SUV levels; qualitative assessment on subjective image quality.Inter-observer variability for image quality assessment was performed.Differences in results between PET/CT scanner models were also assessed.

Statistical analysis
Descriptive statistics for categorical variables were presented as relative/absolute frequencies, while those for continuous ones as the median (range).The inferential analyses for categorical and continuous variables were performed by the Fisher's exact test and the Mann-Whitney test, respectively.The degree of agreement among reviewers for evaluating image quality was assessed using intraclass correlation coe cients (ICC) and their 95%CI, using a 2-way mixed, single measure, consistency model.ICC was interpreted according to Landis J. R. interpretation scale [15] (0.0: poor; 0.0-0.20:slight; 0.21-0.40:fair; 0.41-0.60:moderate; 0.61-0.80:substantial; 0.81-1.00:almost-perfect reproducibility).To analyze the lesion detectability in the two PET datasets, the detection rate was calculated for each reviewer based on the total number of suspected lesions determined by the standard of reference.All p values were obtained by the two-sided exact method at the conventional 5% signi cance level.Data were analyzed with R 3.6.1 (R Foundation for Statistical Computing, Vienna-A, http://www.R-project.org).
Once the analysis of the outcomes from the clinical evaluation was completed, an assessment of the business bene ts was performed.The potential net savings from the use of SubtlePET™ were calculated using data from the whole PET/CT network, not just the single center, assuming replicability of results.A percentage of these savings was then agreed as a fair price for the AI solution.

Results
From February to April 2020, 1167 patients were referred for an 18F-FDG PET/CT examination at our facility.From the total referred patients, 107 (9.2%) complied to the eligibility criteria, out of which 46 did not consent to participate in the study.Sixty-one patients, 36 (59.1%) female and 25 (40.9%)male, were prospectively enrolled to the study.The median age was 66 years (33-84 years), median weight was 67 kg (40-100 kg) and the tumor sub-types investigated were ten.Patients were injected with a median dose of 131 MBq (99-199 MBq) of 18F-FDG; 66% of the median standard institutional dose of 196 MBq (148-296 MBq).Twenty examinations were performed with PET scanner 1, 21 with PET scanner 2 and 20 with PET scanner 3. The mean time interval between tracer injection and the second PET scan time point was 75 min for PET scanner 1 and 80 min for PET scanners 2 and 3. Data is presented in Table 1 and Supplemental Table 2 The mean SUVmax value measured in the target lesions (lesion with highest SUVmax) was 9.8 ± 8.8 in the PET-processed dataset and 11.4 ± 9.8 in the PET-native dataset.The mean SUVmean value of the liver was 2.1 ± 0.7 in the PET-processed dataset and 1.9 ± 0.7 in the PET-native dataset.The difference was not statistically signi cant in either (all p > 0.05).

PET scanners reproducibility
The analysis of the evaluation of image quality and lesion detectability was performed in three subcohorts, stratifying the overall population by the three PET scanner models.Both the quantitative and qualitative results did not produce any signi cant differences between the scanner models and are presented in Table 5: Comparison between three different PET scanner models

Business model calculation
The of F18-FDG examinations performed in the facility in 2019 and the respective cost of the ordered radiopharmaceutical were used to calculate the potential net savings from the introduction of SubtlePET™.The business model projected 25% savings instead of 33%, to account for a possible increase in the radiopharmaceutical price from the supplier due to the reduction of the total amount ordered.The annual cost of the use of SubtlePET™ agreed by both parties was 15% − 20% of the gross annual radiopharmaceutical savings.

Discussion
This is the rst real-world study to assess the clinical use of AI for dose reduced 18F-FDG PET/CT examinations and methodology to de ne a pricing model.
A single center study on the feasibility of 18F-FDG dose reduction in PET/MR examinations was recently published, in which imaging of reduced doses was simulated by reconstruction with different percentages of the original 20 patient's data [16].The study concluded that the detection rate and semiquantitative analysis results were not affected by 50% dose reduction in 18F-FDG PET/MR 6 min/bed whole body examinations.The key difference between the two studies was that we used real data instead of simulated data reducing the administered dose to 66% of standard and also focused on 18F-FDG PET/CT, using a greater number of patients and systems, blinded reviewers, as well as including details of a business case.
In our study, the decision to reduce the administered dose of 18F-FDG by 33% was based on the assumption that the prolonged duration of the examination for two consecutive PET scans, would be less likely to lead to patient movement artefacts and more likely to be proved non-inferior than the previously described simulations reducing activity by 50%.Although the reviewers could identify whether they were reviewing the PET-processed or PET-native datasets in almost 98% of the cases, the mean image quality score of the datasets was not signi cantly different and there was 'almost-perfect' agreement among reviewers.There was no signi cant difference in the mean number of lesions detected by the four reviewers compared to the reference standard, with no reviewer reporting false-positive lesions in the PETprocessed examinations.There were also no signi cant statistical differences in the mean number of lesions detected per-patient, mean SUVmax value measured in the target lesions and the mean SUVmean value of the liver.Therefore, the AI-processed, 33% dose reduced images were deemed to be 'non-inferior' to native images.
In addition, there were no signi cant differences in the subjective image quality of the datasets and number of lesions detected between the different PET scanners.
Based upon the equivalence of processed and native images, a business case was de ned in which the cost of the algorithm was determined to be between 15-20% the overall cost saving in 18F-FDG ordered.This commercial arrangement was agreed between both AI developer and service provider for the network, with differing AI cost per scan based upon local 18F-FDG pricing and examinations volume.
Ideally, a direct comparison of dose-reduced/processed images would be made to native images obtained from a scan performed according to the routine clinical protocol.A limitation of the study was that only one dose of 18F-FDG could be administered due to radiation protection limitations.This meant that the native images had to be acquired sequentially with a lower administered dose but longer acquisition times.A correction factor was used to account for this but may have affected the quality of the native images acquired therefore minimizing the differences between image quality.
A further limitation was that the pathology was not controlled for, with ten different cancer types included in the 61-patient cohort.Although different pathological entities have differing avidities to 18F-FDG, the methodology chosen should have controlled for this variability.
Although the reviewers were blinded, they were still able to accurately determine whether the scans were processed or native.This presents a potential bias, should the reviewers decide to assign a more favorable or worse image quality to the images they know are processed, depending on their own prejudice regarding arti cial intelligence.

Conclusion
To our knowledge, this study provides the rst 'real world' data comparing actual dose-reduced, AI processed images with native scans, rather than simulation.Given the robust methodology involving a large patient cohort, multiple blinded reviewers and three different systems we can be con dent that for adult patients undergoing 18F-FDG PET, reducing the dose to 66% of standard produces images with noninferior image quality when processed by SubtlePET™ in a reproducible manner.Further studies should be undertaken to determine the lower limit of 18F-FDG administration that AI processing will allow while still preserving diagnostic image quality, and also whether these results are reproducible in paediatric population in which dose reduction is even more pertinent.
Abbreviations AAA: Advanced Accelerator Applications

Table 2 Image
Quality assessment using a 5-points scale.The analysis was performed considering the mean score of the four reviewers and the score assigned by each individual reviewer..75 ± 3.1 for the PET-processed examinations and 181.75 ± 2.8 for the PET-native images.No reviewer reported false-positive lesions in the PET-processed examinations.Results are presented in Table3: Lesions detectability for PET-processed and PET-native datasets According to the standard of reference, 183 lesions were detected in 46 examinations.In 15 examinations, no lesions were detected.The mean number of lesions detected by the four reviewers was 182

Table 3
The analysis was performed considering the mean lesions detected by the four reviewers and by each individual reviewer.
A mean number of 2.98 ± 0.05 lesions per-patient were detected by all reviewers in PET-processed images and 2.99 ± 0.05 in PET-native images.Results are presented in Table4: Number of lesions detected per patient for PET-processed and PET-native datasets

Table 4
The analysis was performed the mean lesions detected by the four reviewers and by each individual reviewer.

Table 5
Image Quality assessment, and SUV values for three different PET scanner models