Algorithms for joint activity–attenuation estimation from positron emission tomography scatter

Berker, Yannick; Schulz, Volkmar; Karp, Joel S.

doi:10.1186/s40658-019-0254-y

Original Research
Open access
Published: 28 October 2019

Algorithms for joint activity–attenuation estimation from positron emission tomography scatter

EJNMMI Physics volume 6, Article number: 18 (2019) Cite this article

1732 Accesses
10 Citations
Metrics details

Abstract

Background

Attenuation correction in positron emission tomography remains challenging in the absence of measured transmission data. Scattered emission data may contribute missing information, but quantitative scatter-to-attenuation (S2A) reconstruction needs to input the reconstructed activity image. Here, we study S2A reconstruction as a building block for joint estimation of activity and attenuation.

Methods

We study two S2A reconstruction algorithms, maximum-likelihood expectation maximization (MLEM) with one-step-late attenuation (MLEM-OSL) and a maximum-likelihood gradient ascent (MLGA). We study theoretical properties of these algorithms with a focus on convergence and convergence speed and compare convergence speeds and the impact of object size in simulations using different spatial scale factors. Then, we propose joint estimation of activity and attenuation from scattered and nonscattered (true) emission data, combining MLEM-OSL or MLGA with scatter-MLEM as well as trues-MLEM and the maximum-likelihood transmission (MLTR) algorithm.

Results

Shortcomings of MLEM-OSL inhibit convergence to the true solution with high attenuation; these shortcomings are related to the linearization of a nonlinear measurement equation and can be linked to a new numerical criterion allowing geometrical interpretations in terms of low and high attenuation. Comparisons using simulated data confirm that while MLGA converges largely independent of the attenuation scale, MLEM-OSL converges if low-attenuation data dominate, but not with high attenuation. Convergence of MLEM-OSL can be improved by isolating data satisfying the aforementioned low-attenuation criterion. In joint estimation of activity and attenuation, scattered data helps avoid local minima that nonscattered data alone cannot. Combining MLEM-OSL with trues-MLEM may be sufficient for low-attenuation objects, while MLGA, scatter-MLEM, and MLTR may additionally be needed with higher attenuation.

Conclusions

The performance of S2A algorithms depends on spatial scales. MLGA provides lower computational complexity and convergence in more diverse setups than MLEM-OSL. Finally, scattered data may provide additional information to joint estimation of activity and attenuation through S2A reconstruction.

Introduction

Positron emission tomography (PET) is an important noninvasive medical imaging modality for clinical and research applications [1], with particular strengths in sensitive detection of photon pairs emitted by a radiotracer and quantitative reconstruction of the radiotracer activity image λ. PET image reconstruction is usually based on linear models, involving the Radon transform $\mathcal {R} \lambda $ in the analytic case or discrete mappings of a vector $\vec {\lambda }$ in the numerical case, respectively.

For quantitative reconstruction of the activity image, attenuation correction (AC) is essential, compensating for a lack of detected photon pairs along lines of response (LORs) due to the photoelectric effect and Compton scattering in the patient. A complementary step, scatter correction (SC), computes an estimate of extraneous photon pairs along broken LORs, which are generated through Compton scattering. Both corrections usually input the spatial distribution of the electron density ρ in the form of a map of linear attenuation coefficients μ or, for AC purposes, the so-called attenuation sinogram$\mathcal {R} \mu $. Given μ, both AC and SC are state of the art using well-validated algorithms [2, 3], but vast research efforts had to be—and still are—directed to the determination of μ.

Determination of an attenuation map

Depending on the level of integration of PET with other modalities (standalone or multi-modality PET), information from radionuclide transmission sources [4, 5], X-ray computed tomography (CT) [6], or magnetic resonance imaging (MRI) [7] can be used. However, radionuclide transmission data suffer from low signal-to-noise ratio, necessitating segmentation to prevent noise in the transmission data from impacting activity images. In PET/CT, 4-D attenuation correction of PET data acquired from a moving subject remains limited due to concerns over radiation doses induced by cine CT imaging. In PET/MRI outside of the head/neck area, MRI is often incapable of distinguishing bone from air in reasonable scan times [8].

More universal approaches to determine μ do not depend on multi-modality information. Popularized through maximum-likelihood reconstruction of attenuation and activity (MLAA, [9–11]), these algorithms use only PET emission data, replacing the optimization problem in λ by a joint problem in (λ,μ). A recently proposed group of algorithms jointly estimate the activity image and the attenuation sinogram $\mathcal {R} \mu $, either using alternation [12, 13] or simultaneous updates [14].

Time-of-flight (TOF) PET emission data determine the attenuation sinogram $\mathcal {R} \mu $, but only on LORs with activity ($\mathcal {R}\lambda > 0$) and only up to an unknown offset [15]. The former limitation is not a severe issue for AC, where other values of $\mathcal {R} \mu $ are not needed. However, it complicates reconstruction of μ from $\mathcal {R} \mu $ and therefore is a problem for SC, where an image-space μ-map is usually required. The latter limitation translates into an unknown scaling factor in the reconstructed λ. For these reasons, AC and SC using only PET emission data are still impractical [16].

Another type of available data is low-energy, object-scattered PET emission data, which may contain enough additional information to address both aforementioned limitations [17]: particularly, in a joint reconstruction scheme [18]. Similar opportunities arise in single-photon emission computed tomography [19–22]. Unfortunately, the model of the measured PET scatter data is neither based on the regular Radon transform nor linear in μ. A maximum-likelihood gradient ascent algorithm for scatter-to-attenuation (S2A) reconstruction has therefore been proposed [23, 24] but, so far, not been used in joint estimation. Most recently, a Broyden–Fletcher–Goldfarb–Shanno (BFGS)-based algorithm has been proposed for attenuation reconstruction from coincidences in a lower energy window [25, 26].

The problem of estimating attenuation from scattered PET photons shares similarities with Compton scatter imaging, in which external Gamma sources are used to probe an object’s electron density for medical [27] or industrial [28] applications. While it is known from the latter that the nonlinearity of the problem favors thin, low-density objects, the impact of object size in scatter-based PET attenuation correction remains to be studied.

Objectives

This paper is thus concerned with characterizing S2A reconstruction as a building block in joint estimation of activity and attenuation (joint estimation). We follow three objectives: (1) further understand fundamental properties impacting convergence and convergence speed of S2A algorithms; (2) compare S2A algorithms using simulated data, specifically, in terms of convergence speed, the impact of object size, and improved performance of one algorithm by reducing its input data; and (3) study joint estimation, which implies dropping the assumption of known radiotracer activity images in S2A reconstruction [17, 24]. Therefore, we integrate scatter data into joint estimation by interleaving S2A reconstruction with trues-to-activity reconstruction, as proposed before [18, 29], as well as with trues-to-attenuation and scatter-to-activity reconstruction.

In this algorithmically oriented proof of concept, studies are carried out using 2-D digital phantoms and simulations restricted to single scattering without TOF information. Furthermore, we assume perfect energy resolution that enables ideal separation of scattered and nonscattered events and noise-free data.

After statement of the problem, introducing required imaging models for use in S2A reconstruction and joint estimation, we summarize and propose algorithms for both and describe the evaluation data used and the experiments carried out, before presenting and discussing our results.

Problem statement

This section summarizes notation and models for scattered and unscattered data.

Scattered data for S2A reconstruction

Scatter-to-attenuation reconstruction requires a model of the low-energy, scattered data. Therefore, we identify the coincident detection of two photons along an LOR by the involved detector pair. If exactly one of two detected photons has been object-scattered exactly once, the coincidence is said to be single-scattered and the energy of that photon is denoted E. We denote the respective detector d_s (scattered) and the other d_n (nonscattered). Thus, a tuple i=(d_s,d_n,E) comprises all properties of a single-scattered coincidence used in this work; l=(d_s,d_n) denotes a regular LOR.

The trajectory of both photons is a broken LOR as shown in Fig. 1, connecting a scattering location $\vec {x}_{s}$ with both detector locations. Unfortunately, many different broken LORs, in particular, having different scattering locations, yield the same apparent LOR l, so that the photon trajectory, and in particular, the scattering location, cannot be determined from l. It is known, however, that the true $\vec {x}_{s}$ lies on an American-football-shaped surface of revolution, with the pointed ends in the detector locations and the radius determined by E. We use i to denote this surface of response (SOR), comprising all possible scattering locations for a single-scattered coincidence^{Footnote 1}. For each scattering location $\vec {x}_{s}$ in SOR i,(i,s) describes one potential broken LOR.

In list-mode acquisition, the raw scatter data is a sequence (i₁,i₂,…); after histogramming, the data is the number of detected single-scattered coincidences for each possible i. Here, yⁱ denotes the simulated or measured data on SOR i, while $\bar {y}^{{i}}$ is used for the expected data. The dimension of the data space is $N_{{i}} \leq N_{d}^{2} \times N_{E}$, with N_d detectors and N_E energy bins (or equivalently, energy windows)^{Footnote 2}.

Voxels are indexed according to their physical roles using e (emitting), s (scattering), and t (transmitting). A 2-D matrix A_λ (with entries $a^{{i}}_{s}$) describes the sensitivity of the PET camera on SOR i for radiation scattered in a voxel s in the absence of attenuation; it integrates both normalized camera sensitivity (scatter geometry, photon detection efficiency) and the object’s source density λ, as detailed in the Appendix. A 3-D tensor $\underline {\boldsymbol {K}}$ (with entries $k^{{i}}_{s,t}$) represents the attenuating path length of that radiation through a voxel t, independent of the object.

The expected number of low-energy scatter coincidences $\vec {\bar {y}}$, which is linear in the activity $\vec {\lambda }$, is modeled according to a discretized variant of the scatter-measurement equation (23), a generalization of the single scatter simulation (SSS) equation [3]. Using the notation in Table 1, we write the discrete measurement equation as:

$$ \bar{y}^{{i}} = \sum_{s}\left(\sum_{e} b^{{i}}_{s,e} \lambda^{e}\right) \exp \left(- \sum_{t} k^{{i}}_{s,t} \rho^{t}\right) \rho^{s} = \sum_{s} a^{{i}}_{s} \exp \left(- \sum_{t} k^{{i}}_{s,t} \rho^{t}\right) \rho^{s} $$

(1)

Table 1 Notation used for mathematics, image space, measurement space, and physics

Full size table

or, in matrix notation, denoting the element-wise operations by ⊙ and $\overset {\circ }{\text {exp}}$:

$$ \vec{\bar{y}} (\vec{\lambda}, \vec{\rho}) = \left(\boldsymbol{A}_{\lambda} \odot \overset{\circ}{\text{exp}} (- \underline{\boldsymbol{K}} \vec{\rho})\right) \vec{\rho}. $$

(2)

The above expressions are equivalent to (41) and (42), respectively. Their derivation is subject to the following assumptions:

ks,ti,e=ks,ti Effective attenuation lengths of a voxel t seen by photons along a broken LOR (i,s) are independent of the point of emission e along that broken LOR: this is a common assumption in PET that has been fundamental in showing that TOF PET data determine the attenuation sinogram up to a constant [15].

μ∝ρ The linear attenuation coefficient μ is proportional to the electron density ρ: approximately, this is true because in biological (low-Z) materials at PET energies, Compton scattering is the dominant interaction preventing gamma photon pairs from being detected. At fixed energy, the ratio μ/ρ can be formulated to depend on the mass attenuation coefficient (μ/ρ_m) and the quotient of mass density and electron density (ρ_m/ρ). The former is fairly constant across human tissues at PET energies ([30], Fig. 3), and the latter is almost perfectly constant for materials less dense than water and deviates a maximum of 10% for materials three times as dense ([31], Fig. 1). Note that μ/ρ may well depend on the photon energy; no assumption about the energy dependence of μ is implied (see the discussion around (35)).

If we assume that the electron density $\vec {\rho }$ is known accurately enough to approximate attenuation effects, as we will for one algorithm, we can simplify (2) further. That is, with an estimate $\vec {\rho }^{\,\text {est}}$ and using the abbreviation $\boldsymbol {\tilde {A}}_{\lambda,\rho } := \boldsymbol {A}_{\lambda } \odot \overset {\circ }{\text {exp}} (- \underline {\boldsymbol {K}} \vec {\rho })$, we find the linear mapping:

$$ \vec{\bar{y}}' = \boldsymbol{\tilde{A}}_{\lambda,\rho^{\,\text{est}}} \vec{\rho} \iff \bar{y}'^{{i}} = \sum_{{j}} \tilde{a}^{{i}}_{{j}} \rho^{{j}}. $$

(3)

We denote attenuated system matrices by a tilde (see Table 1; $\tilde {a}$ for components) and refer to (3) as the linearization of the scatter measurement equation.

Unscattered data for joint estimation

Our joint estimation approach requires, additionally, a model of the nonscattered data and three well-known algorithms. We assume the LOR model with:

$$ \begin{aligned} \vec{\bar{z}} &= (\boldsymbol{U} \vec{\lambda}) \odot \exp (- \boldsymbol{U} \vec{\rho})= \boldsymbol{\tilde{U}}_{\rho} \vec{\lambda}\\ &\Longleftrightarrow \quad \bar{z}^{{l}} = \left(\sum_{{j}} u^{l}_{j} \lambda^{j}\right) \exp \left(- \sum_{j} u^{l}_{j} \rho^{j}\right) = \sum_{j} \tilde{u}^{l}_{j} \lambda^{j}, \end{aligned} $$

(4)

where $\vec {\bar {z}}$ is the expected nonscattered data, U is the LOR system matrix without attenuation, i.e., the usual system matrix applied for the usual PET reconstruction (see Appendix), and $\boldsymbol {\tilde {U}}_{\rho }$ the attenuated one; $\smash {u^{{l}}_{{j}}}$ and $\smash {\tilde {u}^{{l}}_{{j}}}$ represent entries of U and $\boldsymbol {\tilde {U}}_{\rho }$, respectively, for LOR l and voxel $\smash {{j}}$.

These nonscattered (true) data are used by maximum-likelihood expectation-maximization [32], which we refer to as trues-MLEM:

$$ \vec{\lambda}^{\text{new}} = \vec{\lambda} \odot \left(\boldsymbol{\tilde{U}}_{\rho}^{{\top}} \left(\vec{z} \oslash \vec{\bar{z}}\right)\right) \oslash \left(\boldsymbol{\tilde{U}}_{\rho}^{{\top}} \vec{1}_{[{l}]}\right), $$

(5)

and by the relaxed maximum-likelihood transmission algorithm [33] (trues-MLTR):

$$ \vec{\rho}^{\text{new}} = \vec{\rho} + \eta \cdot \left(1 - \left(\boldsymbol{U}^{\top} \vec{z}\right) \oslash \left(\boldsymbol{U}^{{\top}} \vec{\bar{z}}\right)\right). $$

(6)

In addition, scatter-MLEM [34] will be used, of which a brief derivation is given in the Appendix. This algorithm’s update equation reads:

$$ \vec{\lambda}^{\text{new}} = \vec{\lambda} \odot \left(\boldsymbol{\tilde{A}}_{\rho}^{\top} \left(\vec{y} \oslash \vec{\bar{y}}\right)\right) \oslash \left(\boldsymbol{\tilde{A}}_{\rho}^{\top} \vec{1}_{[{i}]}\right). $$

(7)

Methods and materials

In this section, we summarize two recent S2A algorithms and then look at fundamental differences between them. We then propose two novel joint estimation approaches using these algorithms and present our evaluation strategy.

Gradient-based algorithms for S2A reconstruction

In previous work, we introduced the two-branch back-projection (2BP) algorithm [17] which chooses between a positive and a negative update of ρ in a binary random-walk fashion. Since we found this algorithm to be impractical for most applications [24], we focus on two gradient-ascent-based algorithms here.

The Poisson log-likelihood (LL) of some expected data $\vec {\bar {y}}$, given the data $\vec {y}$ and omitting terms that do not depend on $\vec {\bar {y}}$, reads:

$$ \mathcal{L}_{y}(\vec{\bar{y}}) = \sum_{{i}} \left(y^{{i}} \log \bar{y}^{{i}} - \bar{y}^{{i}}\right), $$

(8)

with its gradient with respect to a vector $\vec {\rho }$

$$ \vec{\nabla}_{\vec{\rho}} \mathcal{L}_{y} = \left(\vec{\nabla}_{\vec{\rho}} \otimes \vec{\bar{y}}\right) \left(\vec{y} \oslash \vec{\bar{y}} - \vec{1}_{[{i}]}\right) \quad\iff\quad \frac{\partial \mathcal{L}_{y}}{\partial \rho^{{j}}} = \sum_{{i}} \frac{\partial \bar{y}^{{i}}}{\partial \rho^{{j}}} \left(\frac{y^{{i}}}{\bar{y}^{{i}}}- 1\right). $$

(9)

For the linearization (3), since $\boldsymbol {\tilde {A}}_{\lambda,\rho ^{\,\text {est}}}$ does not depend on $\vec {\rho }$, we find the gradient of the expected data to be:

$$ \vec{\nabla}_{\vec{\rho}} \otimes \vec{\bar{y}} = \boldsymbol{\tilde{A}}_{\lambda,\rho^{\,\text{est}}}^{{\top}} \quad\iff\quad \frac{\partial \bar{y}^{{i}}}{\partial \rho^{{j}}} = \tilde{a}^{{i}}_{{j}}. $$

(10)

By contrast, observing the double dependence of (2) on $\vec {\rho }$, one finds:

$$\begin{array}{*{20}l} \vec{\nabla}_{\vec{\rho}} \otimes \vec{\bar{y}} &= \left[\boldsymbol{A}_{\lambda}\odot\overset{\circ}{\text{exp}}(- \underline{\boldsymbol{K}}\vec{\rho})-\vec{\rho}^{\top} \left\{\underline{\boldsymbol{K}}\odot\left(\left[\boldsymbol{A}_{\lambda}\odot\overset{\circ}{\text{exp}}(-\underline{\boldsymbol{K}}\vec{\rho}) \right]\otimes\vec{1}_{[t]}\right)\right\}\right]^{\top} \end{array} $$

(11a)

$$\begin{array}{*{20}l} &= \left[\boldsymbol{\tilde{A}}_{\lambda,\rho} - \vec{\rho}^{\top} \left\{\underline{\boldsymbol{K}}\odot\left(\boldsymbol{\tilde{A}}_{\lambda,\rho}\otimes\vec{1}_{[t]}\right)\right\}\right]^{\top} \end{array} $$

(11b)

instead, which simplifies to (10) only under $\vec {\rho } = 0$ or $\underline {\boldsymbol {K}}=0$ (nonscattering or nonattenuating object) and $\vec {\rho }^{\,\text {est}} = \vec {\rho }$. This vectorial expression lends itself particularly well to an implementation in MATLAB (The MathWorks, Natick, MA); note that the multiplication with $\vec {\rho }^{{\top }}$ (from the left) denotes a summation over the scattering voxels s, as indicated in the component-wise expression:

$$\begin{array}{*{20}l} \frac{\partial \bar{y}^{i}}{\partial \rho^{j}} & =a^{i}_{j} \exp \left(- \sum_{t} k^{i}_{{j},t} \rho^{t}\right) -\sum_{s} \rho^{s} k^{i}_{s,{j}} a^{i}_{s} \exp \left(- \sum_{t} k^{i}_{s,t} \rho^{t}\right) \end{array} $$

(12a)

$$\begin{array}{*{20}l} & = \tilde{a}^{i}_{j} - \sum_{s} \rho^{s} k^{i}_{s,{j}} \tilde{a}^{{i}}_{s}. \end{array} $$

(12b)

Scatter-to-attenuation MLEM with one-step-late attenuation (MLEM-OSL)

This algorithm, called MLEM by its authors [18], is based on subsuming attenuation effects under the system matrix in the linearized measurement equation (3), yielding the MLEM update [32]:

$$ \vec{\rho}^{\text{new}} = \vec{\rho}\odot\left(\boldsymbol{\tilde{A}}_{\lambda,\rho}^{\top}\left(\vec{y} \oslash \vec{\bar{y}}\right)\right) \oslash \left(\boldsymbol{\tilde{A}}_{\lambda,\rho}^{\top}\vec{1}_{[{i}]}\right) \quad\Longleftrightarrow\quad \rho^{{\text{new}},{j}} = \rho^{j} \frac{\sum_{i} \tilde{a}^{i}_{j} {y^{i}} / {\bar{y}^{i}}}{\sum_{i} \tilde{a}^{i}_{j}}. $$

(13)

However, this update ignores the fact that $\boldsymbol {\tilde {A}}_{\lambda,\rho }$ depends on $\vec {\rho }$, and $\boldsymbol {\tilde {A}}_{\lambda,\rho }$ has to be updated after every iteration: Eq. (13) follows the spirit of the so-called one-step-late (OSL) algorithms [35], and we will refer to it as MLEM-OSL here.

Maximum-likelihood gradient ascent (MLGA)

The MLEM(-OSL) update (13) can be written as a scaled gradient ascent, with the gradient given by Eqs. (9 and 10) and a vector-valued step size [36]:

$$\begin{array}{*{20}l}{2} \vec{\rho}^{\text{new}} & = \vec{\rho} + \underbrace{\vec{\rho}\oslash\left(\boldsymbol{\tilde{A}}_{\lambda,\rho}^{\top} \vec{1}_{[{i}]}\right)}_{\mathrm{step~size}} \odot \vec{\nabla}_{\vec{\rho}} \mathcal{L}_{y} &\quad\Longleftrightarrow \quad \rho^{{\text{new}},{j}} &= \rho^{j} + \underbrace{\frac{\rho^{j}}{\sum_{i}\tilde{a}^{i}_{j}}}_{\mathrm{step~size}}\frac{\partial \mathcal{L}_{y}}{\partial \rho^{j}} \end{array} $$

(14a)

$$\begin{array}{*{20}l} & = \vec{\rho} + \vec{s} \odot \vec{\nabla}_{\vec{\rho}} \mathcal{L}_{y} && = \rho^{{j}} + s^{j} \frac{\partial \mathcal{L}_{y}}{\partial \rho^{j}}. \end{array} $$

(14b)

The closed-form expression of MLGA [23, 24] is obtained from (14b) by inserting the full log-likelihood gradient (9 and 11) and choosing a step size. In this work, we focus on a step size inspired by the MLEM update equation (14a):

$$ \vec{s} = \gamma \cdot \vec{\rho} \oslash \left(\boldsymbol{\tilde{A}}_{\lambda,\rho}^{\top}\vec{1}_{[{i}]}\right) \quad\Longleftrightarrow\quad s^{j} = \gamma \frac{\rho^{j}}{\sum_{i} \tilde{a}^{i}_{j}}. $$

(15)

In addition to this MLEM-like step size, two additional step sizes have been tested: the constant step size, proposed before [24], and the scaled nonuniform step size:

$$ \vec{s}^{\prime} = \alpha \cdot \vec{1}_{[{j}]} \quad\text{and}\quad \vec{s}^{\prime\prime} = \beta \cdot \vec{\rho}. $$

(16)

Step-size constants α,β, and γ have been optimized empirically for fastest, yet stable convergence with our data.

Validity of linearizing the scatter measurement equation

The amount of scatter as a function of electron density may not be sufficiently well represented by a linearized measurement equation, and (2) may require more careful treatment. To explore the limits of the linearization, we derive a geometrical interpretation as well as a numerical criterion. This criterion is used to distinguish data that can be used for algorithms based on linearization (here, MLEM-OSL) from data that cannot; further, it is linked to the log-likelihood gradient (11).

The most basic, one-dimensional (1-D) simplification of (2):

$$ \bar{y}= a_{\lambda} \exp(- k \rho^{\,\text{est}}) \rho \quad (a_{\lambda}, k, \rho^{\,\text{est}}, \rho > 0), $$

(17)

confirms that no possible linearization in the form of (3):

$$ \bar{y}' = a_{\lambda,\rho^{\,\text{est}}} \rho, $$

(18)

reflects the behavior of (17) with high attenuation (see Fig. 2): in particular, the derivative of (18) misrepresents the sign of the derivative of (17) for kρ>1. This may have drastic implications for gradient-based algorithms using the linearization to compute the gradient: in particular, if ρⁿ>ρ^true>1/k, an iteration of MLEM-OSL yields ρⁿ⁺¹>ρⁿ regardless of the value of ρ^est.

Comparison of the gradient of the linearization (10) with the full gradient (11) reveals the advantage of MLGA over MLEM-OSL; the difference term $- \vec {\rho }^{{\top }} \{\underline {\boldsymbol {K}}\odot [\ldots ]\}$ reverses the direction of the full gradient (only) with high attenuation, all components of $\vec {\rho }$ and $\underline {\boldsymbol {K}}$ being nonnegative^{Footnote 3}.

A multi-voxel interpretation of the high-attenuation situation is presented in the Appendix: it is of importance in patients with great attenuation-length–electron-density products $\vec {\rho }^{\top } \underline {\boldsymbol {K}}$^{Footnote 4}. One conclusion from the arguments in the Appendix is that it is not straightforward to downsample (or downsize) S2A experiments, as that can transform low attenuation into high attenuation (by downsampling), or vice versa (by downsizing)^{Footnote 5}.

For MLEM-OSL, the linearization of the measurement equation may only be appropriate whenever attenuation effects do not reverse the sign of $\partial \bar {y}^{{i}} / \partial \rho ^{{j}}$. Since this may be true for some SORs i, but not for others, it may be appropriate to remove the latter from the data and apply MLEM-OSL to the reduced data set: (46) represents an approximate inclusion criterion used later.

n-algorithms for joint estimation

Up to this point, we have focused on dedicated S2A reconstruction algorithms, assuming knowledge of the activity distribution; in this section, we drop this assumption and extend our studies to joint estimation of activity and attenuation using scattered as well as nonscattered data. The added value of combining scattered and nonscattered data is visualized in the Appendix (Fig. 16).

We use five building blocks: the aforementioned MLGA with MLEM-like step size for S2A reconstruction, henceforth referred to as scatter-MLGA; scatter-MLEM-OSL as an alternative; scatter-MLEM for scatter-to-activity reconstruction; trues-MLEM for trues-to-activity reconstruction; and trues-MLTR for trues-to-attenuation reconstruction. Combinations of n individual algorithms form n-algorithms.

As for S2A reconstruction, we distinguish two main cases: low and high attenuation. The general data flow per iteration is similar for both cases and is visualized in Fig. 3; radiotracer activity distribution λ and electron density ρ are repeatedly updated using the current estimate of the respective other quantity. For both cases, we re-optimized the MLGA step sizes to achieve stability.

2-algorithms for low attenuation

The idea of this subsection has been presented before [18, 29]. For low attenuation situations (e.g., with a spatial scale factor of 0.2 in Fig. 4), we interleave trues-MLEM with scatter-MLGA. The data flow in this part is similar to that proposed earlier [18]. We start with initial guesses for ρ and λ. In each iteration, plugging the current electron-density estimate $\vec {\rho }$ into $\boldsymbol {\tilde {U}}_{\rho }$, we use trues-MLEM to update the current activity estimate $\vec {\lambda }$ using the nonscattered data $\vec {z}$; then, we use the updated activity estimate to compute scatter-MLGA updates of $\vec {\rho }$.

In this part of the study, we aim to minimize the number of computationally expensive updates of the system matrix $\boldsymbol {\tilde {A}_{\lambda }}$. Therefore, we run 10 iterations of attenuation-corrected (using the current $\vec {\rho }$) trues-OSEM (with 4 data subsets) at a time, followed by 10 iterations of scatter-MLGA with 4 data subsets (the use of subsets in scatter-MLGA being studied in detail elsewhere [24]). This low-attenuation 2-algorithm is summarized as $(\text {trues-OSEM}^{10}_{4}) + (\text {scatter-MLGA}^{10}_{4})$, with a total of 20 sub-iterations per iteration. Since MLEM-OSL can replace scatter-MLGA for low attenuation, we also run $(\text {trues-OSEM}^{10}_{4}) + (\text {MLEM-OSL}^{10})$ on the same data for comparison.

4-algorithms for low or high attenuation

For high attenuation (e.g., Fig. 4 at the original, that is, human spatial scale), we find it necessary to further consider activity information contained in scattered coincidences, as well as attenuation information contained in true coincidences. The former is achieved by the scatter-MLEM algorithm, the latter with the trues-MLTR algorithm with a relaxation factor of η=0.03. Iterations of different algorithms updating the same quantity are considered as one sub-iteration; all updates are applied subsequently (e.g., the trues-MLEM update used the estimate of λ as updated by the previous scatter-MLEM update; see Algorithm 1).

Not using any subsets, the high-attenuation 4-algorithm is noted (scatter-MLEM+trues-MLEM)+(scatter-MLGA+trues-MLTR), with two sub-iterations per iteration.

Evaluation strategy

Evaluation data

We simulate data based on an 18×18-voxel version (high resolution, Fig. 4) of the human-sized chest cross-section phantom used previously [24], as well as the original one (9×9 voxels, low resolution, Fig. 14a). For the former, the voxel size is 25×25 mm² and the radius of the 2-D PET scanner used to simulate a PET acquisition is 40 cm. For a rat-sized field of view (FOV), the phantom (and the scanner geometry) are uniformly scaled down by a factor of 0.2, all size relations remaining identical (5×5 mm² pixel size, 8-cm detector radius)^{Footnote 6}. An intermediate, rabbit-sized FOV is obtained using a linear downscaling factor of 0.35. At all scales, the scanner is equipped with N_d=64 equidistant detectors having N_E=10 energy bins, of which 7 are effectively used: (511 keV, 460 keV] down to (204 keV, 153 keV].

Single-scattered data is simulated by evaluating (2). Nonscattered data is simulated by (4) using a system matrix U, each column j of which is constructed from the result of the MATLAB radon function [37] for a unity point source in j.

For all algorithms, the initial guess of ρ generously bounds the true object and is filled with the equivalent of μ=0.07/cm (Fig. 4c): this value ensures approximately correct attenuation correction factors for the first iteration of trues-to-activity reconstruction. For joint estimation, the initial activity is homogeneous throughout the FOV (Fig. 4e).

S2A reconstruction

The first part of this comparison of MLGA and MLEM-OSL is along the lines of earlier work comparing MLGA with 2BP [24], using additional simulation data with higher numbers of voxels than before. Therefore, both algorithms are applied to the (low and high resolution) data described above. Due to the small number of voxels, specific features of reconstructed images are of less interest; for the agreement between reconstructed images $\vec {x}$ with their respective references $\vec {x}^{\,\text {true}}$, we therefore report normalized mean squared errors (NMSE):

$$ \mathcal{S}(\vec{x}, \vec{x}^{\,\text{true}}) = \sum_{{i}} \left(x^{{i}} - x^{\,\text{true},{i}}\right)^{2} \Big/ \sum_{{i}} \left(x^{\,\text{true},{i}}\right)^{2}. $$

(19)

FOV size variations

Both algorithms are applied to the data simulated at all three spatial scales (human: scale 1; rabbit: scale 0.35; rat: scale 0.2).

Reduced data

For MLEM-OSL, data is reduced by separating SORs into useful and less useful ones based on the aforementioned criterion, useful ones fulfilling:

$$ \max_{{j}} \sum_{s} \rho^{s} k^{{i}}_{s,{j}} \leq {1}. $$

(20)

This criterion is evaluated using the current estimate of $\vec {\rho }$ in every iteration. SORs i which are to be left out are removed both from the data $\vec {y}$ (removing single data points) and the system matrix A_λ (removing whole rows), and all computations are carried out with these reduced variables when working with reduced data.

Computational complexity and sparsity

Computational complexity of both algorithms is assessed by measuring run times on a consumer-grade laptop (Intel Core 2 Duo 2.8 GHz processor, 4 GB memory). Therefore, the simulation parameters are varied in two ways. First, with the number of voxels fixed at low resolution, we vary the number of detectors following N_d=2ⁿ with n∈{1,…,7}. Second, with the number of detectors fixed (at N_d=32), we vary the number of voxels following 2ⁿ×2ⁿ with n∈{1,…,5}; in terms of vector lengths, that corresponds to N_e=N_s=N_t=4ⁿ. When varying the number of voxels, the voxel dimensions are adapted to maintain a constant spatial extent of the phantom.

For this part of the study, we choose constant activity and attenuation distributions (λ_j=1,μ_j=0.1/cm), with an initial μ_j=0.05/cm. Since this choice implies a maximum population of the system matrix A_λ, we also determine what we term the geometrical density (fraction of non-null entries with flat activity) of A_λ and $\underline {\boldsymbol {K}}$, respectively, which represent upper bounds for cases with less extended activity distributions.

Joint estimation

In joint estimation, in addition to computing NMSEs, we are interested in the evolution of several likelihood values. Attenuation- and activity-reconstruction algorithms are designed to maximize likelihoods given the true value of all other quantities. For scattered data, these are:

$$\begin{array}{*{20}l} \mathcal{L}^{\text{att}}_{\text{scatt}}\left({\rho^{\,\text{est}}}\right) &= \mathcal{L}_{y}\left(\vec{\bar{y}}\left(\lambda^{\,\text{true}}, \rho^{\,\text{est}}\right)\right) \end{array} $$

(21a)

$$\begin{array}{*{20}l} \mathcal{L}^{\text{act}}_{\text{scatt}}\left({\lambda^{\,\text{est}}}\right) &= \mathcal{L}_{y}\left(\vec{\bar{y}}\left(\lambda^{\,\text{est}}, \rho^{\,\text{true}}\right)\right). \end{array} $$

(21b)

However, in a joint-estimation setting, λ^true and ρ^true are generally not available. For the scattered and the true data, respectively, we therefore also track the apparent likelihoods, which are the quantities as seen by the optimization algorithms:

$$\begin{array}{*{20}l} \mathcal{L}^{\text{app}}_{\text{scatt}} &= \mathcal{L}_{y}\left(\vec{\bar{y}}\left(\lambda^{\,\text{est}}, \rho^{\,\text{est}}\right)\right) \end{array} $$

(22a)

$$\begin{array}{*{20}l} \mathcal{L}^{\text{app}}_{\text{trues}} &= \mathcal{L}_{z}\left(\vec{\bar{z}}\left(\lambda^{\,\text{est}}, \rho^{\,\text{est}}\right)\right). \end{array} $$

(22b)

We then study the following combinations of data and algorithms.