Patients
The data in this study were acquired in two groups. First, a validation cohort of fifteen patients diagnosed with malignant melanoma who underwent whole-body acquisition. The imaging consists of a clinically indicated 18F-FDG-PET/CT examination and a subsequent whole-body PET/MRI acquisition. No additional radiotracer was injected for the PET/MRI acquisition. The second group was employed for model development and consisted of 11 patients scanned over head and neck (2 bed positions) and 20 patients scanned over the thorax and pelvis (2 bed positions). The imaging protocols and detailed information regarding these patients are described in previous studies [9, 29].
Written informed consent was obtained from all patients before the examination.
Image acquisition
CT data
Each whole-body 18F-FDG-PET/CT examination (Biograph mCT, Siemens Healthineers) was performed with arms positioned alongside the body. Approximately 3 MBq/kg 18F-FDG was injected intravenously about 60 min prior to image acquisition. Standardized CT examination protocols included a weight-adapted 90–120 ml intravenous CT contrast agent, as part of the clinical routine. CT imaging was performed with a tube voltage of 120 kV, 0.98 × 0.98 × 2 mm3 voxels, and a reference mAs of 240 using CARE Dose 4D. Data were acquired from the vertex of the skull to the toes using continuous table motion acquisition, except for two patients who were scanned from the skull to mid-thigh.
18
F-FDG-PET/MRI data
All PET/MRI examinations were performed on a 3 T PET/MRI whole-body hybrid imaging system (Biograph mMR, Siemens Healthineers) covering the same anatomy as in PET/CT. PET data were acquired in list mode over seven to eight bed positions with an acquisition time of 3 min per bed position.
For MR-based AC, the standard transaxial two-point Dixon, three-dimensional (3D), volume-interpolated, T1-weighted breath-hold MRI sequence (VIBE) was acquired utilizing a head/neck RF coil, a spine-array RF coil, and five flexible body array RF coil. MRI sequence was acquired with 3.85 ms repetition time (TR), 1.23 & 2.46 ms echo time (TE), and 10° flip angle for MR images with 384 × 312 pixel in-plane dimension. It provided four sets of MR images including T1-weighted in- and opposed-phase, fat and water images, with a transaxial MRI FOV of 500 mm × 408 mm.
To prevent truncation of the peripheral body parts due to the limited transaxial MRI FOV, the HUGE method was applied [21]. The sequence was acquired in the left and right direction with 1610 ms repetition time (TR), 27 ms echo time (TE), 180° flip angle, resolution of 2.3 × 2.3 mm2, and 8 mm slice thickness.
Data preprocessing
Images were preprocessed using Python (version 3.7) and MINC Toolkit (McConnell Brain Imaging Centre, Montreal).
Truncation correction
The composed Dixon in-phase and opposed-phase images were combined with images acquired with the HUGE technique to correct truncation. HUGE images from the left and right side of the body were resampled to match the resolution of the Dixon images using trilinear interpolation. The histogram normalization was performed twice using the inormalize tool (version 1.5.1 for OS X as part of MINC toolkit) to match the intensity with Dixon in-phase and opposed-phase. The resampled and normalized HUGE images were then used to replace voxels in the Dixon images where the arms were truncated. Dixon images were extended by 18 voxels on each side, for a total transaxial FOV of 576 mm × 374 mm.
CT to MRI registration
Whole-body co-registration for the validation cohort was challenging due to the different positioning of patients between the two scans. To secure accurate co-registering of paired whole-body CT and MRI images, Dixon in-phase images were cropped into sub-volumes and registration was performed independently for various anatomical sites. Sub-volumes were defined by joints with a high degree of freedom which led to anatomical regions such as head, neck, torso, pelvic, upper arm, lower arm, upper leg, lower leg, and feet. For each sub-volume, registration was performed in two steps. First, CT images were rigidly aligned to the corresponding Dixon in-phase images using a set of landmarks. The transformation file was used to initialize a deformable registration using the freely available registration package, NiftyReg [36] (Centre for Medical Image Computing, University College London). Sub-volumes were drawn with an overlap of two voxels in which the co-registered CT was averaged. Finally, co-registered sub-volumes were stitched together forming the whole-body co-registered CT volume. A thorough visual inspection was performed to validate each individual registration.
Registration between CT and MRI for regional data was less challenging due to similar positioning between the two scans. However, registration was performed for each bed position separately using the method described above.
Network structure
The deep CNN with 3D U-Net architecture [37] used in this study was developed with convolutional encoder/decoder parts for generating sCT in LAC from MR images (as shown in Additional file 1: Fig. S1). The overall architecture of our model was a slight modification of the DeepDixon network presented by Ladefoged et al. [23]. Dixon in-phase and opposed-phase MRI were the inputs to the network and integrated by the first convolutional layer with 32 different kernels. Each of the encoder and decoder paths contains 3 × 3 × 3 kernels, followed by a batch normalization (BN) for faster convergence and a rectified linear unit (ReLU) activation function. At the end of each convolution operation, a similar convolutional layer with strides of 2 is attached for downsampling.
Model training
The training of the model was performed using pairs of Dixon MRI and CT-based AC volumes. Prior to the training, images were resampled to the isotropic voxel size of 2 mm and normalized to zero mean and unit standard deviation. A binary mask was derived from Dixon in-phase images to clear the CT images from elements outside the body, before transforming the voxels into LACs at 511 keV. Subsequently, we extracted 3D patches from 288 × 192 × S with a stride of 4, where S refers to the number of slices and varies between patients depending on their length. The networks were implemented in TensorFlow (version 2.1.0). Our model used mean absolute error as the loss function, Adam optimizer with a learning rate of 5 × 10–5 trained for 1000 epochs with a batch size of 16 (random selection of patches). Computations were performed on two IBM Power9 each equipped with four Nvidia Tesla V100 GPUs and a Lenovo SR650_v2 with four Nvidia A40 GPUs. The networks used 3D MRI volumes as a 2-channel input consisting of 16 full adjacent transaxial slices (288 × 192 × 16 slices) and output the corresponding slices of sCT in LAC (288 × 192 × 16 slices). First model was trained using the regional data containing thorax, pelvis, and head and neck regions with transfer from a pretrained model on 811 brain scans [23]. Subsequently, we created an updated model using the whole-body database with transfer learning from the first model. In this step, we used a leave-one-out cross-validation approach. To avoid artificially increased bias, CT slices with a metal artifact from pelvis and knee were not included in the training. For each whole-body dataset, the sCT volume was generated from the MRI volumes.
Image reconstruction
PET images from raw data acquired on PET/MRI scanner were reconstructed offline (e7-tool, Siemens Healthineers) using 3D ordinary Poisson ordered-subset expectation maximization (3D OP-OSEM) algorithm with 3 iterations, 21 subsets in 344 × 344 image matrix, and a Gaussian filter with 4 mm full width at half maximum (FWHM). For each patient, PET images were reconstructed using three different attenuation maps: co-registered CT-based AC map serving as a standard of reference (PETCT), deep learning-derived sCT map (PETsCT), and vendor-provided atlas-based map (PETAtlas).
Analysis
Data analysis involved resampling all reconstructed PET images to match the voxel size of the attenuation map.
sCT evaluation
The generated sCT images in LAC were converted back to HU and compared to CT on a voxel-wise basis.
For each patient, the accuracy of sCT relative to CT was compared by measuring the mean absolute error (MAE) within the body contour. The ability of the methods to correctly estimate bony tissues was evaluated using the Dice similarity coefficient that measures the overlap between the segmented bones on sCT and CT volumes. In both images, voxels with HU higher than 300 were classified as bone.
PET evaluation
For quantitative assessment, the intensity values in PET images were converted to standardized uptake value (SUV). Considering PETCT as the ground truth, the performance of MR-based AC methods in quantifying radiotracer uptake in PETsCT and PETAtlas were compared for the entire body as well as specific regions including lung, liver, spinal cord, femoral head, iliac bone, and aorta. These regions were manually segmented on the reference CT. Quantitative assessment was compared using relative difference (Rel%) and absolute relative difference (Abs%) for all the voxels within the above-mentioned regions, using the following formula:
$${\text{Rel}}\% = \frac{{{\text{PET}}_{X} - {\text{PET}}_{{{\text{CT}}}} }}{{{\text{PET}}_{{{\text{CT}}}} }} \times 100\%$$
$${\text{Abs}}\% = \frac{{\left| {{\text{PET}}_{X} - {\text{PET}}_{{{\text{CT}}}} } \right|}}{{{\text{PET}}_{{{\text{CT}}}} }} \times 100\%$$
For a fair comparison, PET slices with a metal artifact in CT-based AC map were ignored in the voxel-wise calculation of relative differences. The mean SUV (SUVmean) was calculated for the segmented regions.
Moreover, all the voxels within the specified regions were pooled over all subjects and the accuracy of the two MR-based AC maps on PET quantification was compared to PETCT in a joint histogram.