Algorithms for joint activity–attenuation estimation from positron emission tomography scatter

Background Attenuation correction in positron emission tomography remains challenging in the absence of measured transmission data. Scattered emission data may contribute missing information, but quantitative scatter-to-attenuation (S2A) reconstruction needs to input the reconstructed activity image. Here, we study S2A reconstruction as a building block for joint estimation of activity and attenuation. Methods We study two S2A reconstruction algorithms, maximum-likelihood expectation maximization (MLEM) with one-step-late attenuation (MLEM-OSL) and a maximum-likelihood gradient ascent (MLGA). We study theoretical properties of these algorithms with a focus on convergence and convergence speed and compare convergence speeds and the impact of object size in simulations using different spatial scale factors. Then, we propose joint estimation of activity and attenuation from scattered and nonscattered (true) emission data, combining MLEM-OSL or MLGA with scatter-MLEM as well as trues-MLEM and the maximum-likelihood transmission (MLTR) algorithm. Results Shortcomings of MLEM-OSL inhibit convergence to the true solution with high attenuation; these shortcomings are related to the linearization of a nonlinear measurement equation and can be linked to a new numerical criterion allowing geometrical interpretations in terms of low and high attenuation. Comparisons using simulated data confirm that while MLGA converges largely independent of the attenuation scale, MLEM-OSL converges if low-attenuation data dominate, but not with high attenuation. Convergence of MLEM-OSL can be improved by isolating data satisfying the aforementioned low-attenuation criterion. In joint estimation of activity and attenuation, scattered data helps avoid local minima that nonscattered data alone cannot. Combining MLEM-OSL with trues-MLEM may be sufficient for low-attenuation objects, while MLGA, scatter-MLEM, and MLTR may additionally be needed with higher attenuation. Conclusions The performance of S2A algorithms depends on spatial scales. MLGA provides lower computational complexity and convergence in more diverse setups than MLEM-OSL. Finally, scattered data may provide additional information to joint estimation of activity and attenuation through S2A reconstruction.


Introduction
Positron emission tomography (PET) is an important noninvasive medical imaging modality for clinical and research applications [1], with particular strengths in sensitive detection of photon pairs emitted by a radiotracer and quantitative reconstruction of the radiotracer activity image λ. PET image reconstruction is usually based on linear models, involving the Radon transform Rλ in the analytic case or discrete mappings of a vector λ in the numerical case, respectively.
For quantitative reconstruction of the activity image, attenuation correction (AC) is essential, compensating for a lack of detected photon pairs along lines of response (LORs) due to the photoelectric effect and Compton scattering in the patient. A complementary step, scatter correction (SC), computes an estimate of extraneous photon pairs along broken LORs, which are generated through Compton scattering. Both corrections usually input the spatial distribution of the electron density ρ in the form of a map of linear attenuation coefficients μ or, for AC purposes, the so-called attenuation sinogram Rμ. Given μ, both AC and SC are state of the art using well-validated algorithms [2,3], but vast research efforts had to be-and still are-directed to the determination of μ.

Determination of an attenuation map
Depending on the level of integration of PET with other modalities (standalone or multi-modality PET), information from radionuclide transmission sources [4,5], X-ray computed tomography (CT) [6], or magnetic resonance imaging (MRI) [7] can be used. However, radionuclide transmission data suffer from low signal-to-noise ratio, necessitating segmentation to prevent noise in the transmission data from impacting activity images. In PET/CT, 4-D attenuation correction of PET data acquired from a moving subject remains limited due to concerns over radiation doses induced by cine CT imaging. In PET/MRI outside of the head/neck area, MRI is often incapable of distinguishing bone from air in reasonable scan times [8].
More universal approaches to determine μ do not depend on multi-modality information. Popularized through maximum-likelihood reconstruction of attenuation and activity (MLAA, [9][10][11]), these algorithms use only PET emission data, replacing the optimization problem in λ by a joint problem in (λ, μ). A recently proposed group of algorithms jointly estimate the activity image and the attenuation sinogram Rμ, either using alternation [12,13] or simultaneous updates [14].
Time-of-flight (TOF) PET emission data determine the attenuation sinogram Rμ, but only on LORs with activity (Rλ > 0) and only up to an unknown offset [15]. The former limitation is not a severe issue for AC, where other values of Rμ are not needed. However, it complicates reconstruction of μ from Rμ and therefore is a problem for SC, where an image-space μ-map is usually required. The latter limitation translates into an unknown scaling factor in the reconstructed λ. For these reasons, AC and SC using only PET emission data are still impractical [16].
Another type of available data is low-energy, object-scattered PET emission data, which may contain enough additional information to address both aforementioned limitations [17]: particularly, in a joint reconstruction scheme [18]. Similar opportunities arise in single-photon emission computed tomography [19][20][21][22]. Unfortunately, the model of the measured PET scatter data is neither based on the regular Radon transform nor linear in μ. A maximum-likelihood gradient ascent algorithm for scatter-to-attenuation (S2A) reconstruction has therefore been proposed [23,24] but, so far, not been used in joint estimation. Most recently, a Broyden-Fletcher-Goldfarb-Shanno (BFGS)-based algorithm has been proposed for attenuation reconstruction from coincidences in a lower energy window [25,26].
The problem of estimating attenuation from scattered PET photons shares similarities with Compton scatter imaging, in which external Gamma sources are used to probe an object's electron density for medical [27] or industrial [28] applications. While it is known from the latter that the nonlinearity of the problem favors thin, low-density objects, the impact of object size in scatter-based PET attenuation correction remains to be studied.

Objectives
This paper is thus concerned with characterizing S2A reconstruction as a building block in joint estimation of activity and attenuation (joint estimation). We follow three objectives: (1) further understand fundamental properties impacting convergence and convergence speed of S2A algorithms; (2) compare S2A algorithms using simulated data, specifically, in terms of convergence speed, the impact of object size, and improved performance of one algorithm by reducing its input data; and (3) study joint estimation, which implies dropping the assumption of known radiotracer activity images in S2A reconstruction [17,24]. Therefore, we integrate scatter data into joint estimation by interleaving S2A reconstruction with trues-to-activity reconstruction, as proposed before [18,29], as well as with trues-to-attenuation and scatter-to-activity reconstruction.
In this algorithmically oriented proof of concept, studies are carried out using 2-D digital phantoms and simulations restricted to single scattering without TOF information. Furthermore, we assume perfect energy resolution that enables ideal separation of scattered and nonscattered events and noise-free data.
After statement of the problem, introducing required imaging models for use in S2A reconstruction and joint estimation, we summarize and propose algorithms for both and describe the evaluation data used and the experiments carried out, before presenting and discussing our results.

Problem statement
This section summarizes notation and models for scattered and unscattered data.

Scattered data for S2A reconstruction
Scatter-to-attenuation reconstruction requires a model of the low-energy, scattered data. Therefore, we identify the coincident detection of two photons along an LOR by the involved detector pair. If exactly one of two detected photons has been objectscattered exactly once, the coincidence is said to be single-scattered and the energy of that photon is denoted E. We denote the respective detector d s (scattered) and the other d n (nonscattered). Thus, a tuple i = (d s , d n , E) comprises all properties of a single-scattered coincidence used in this work; l = (d s , d n ) denotes a regular LOR.
The trajectory of both photons is a broken LOR as shown in Fig. 1, connecting a scattering location x s with both detector locations. Unfortunately, many different broken LORs, in particular, having different scattering locations, yield the same apparent LOR l, so that the photon trajectory, and in particular, the scattering location, Fig. 1 Front, side, top, and oblique views of SORs for coincidences detected in opposite detector elements on a detector surface (gray) for three different energies of the scattered photon: 460 keV (innermost, darkest), 358 keV (middle), and 307 keV (outermost, lightest). Assuming the scattered photon in the left detector, one potential broken LOR is indicated, with the solid part indicating potential activity source locations. Each broken LOR runs inside its SOR, touching it only at the detectors and one potential scattering location. Adapted, with permission, from ([17], Figure 2). © 2014 American Association of Physicists in Medicine cannot be determined from l. It is known, however, that the true x s lies on an American-football-shaped surface of revolution, with the pointed ends in the detector locations and the radius determined by E. We use i to denote this surface of response (SOR), comprising all possible scattering locations for a single-scattered coincidence 1 . For each scattering location x s in SOR i, (i, s) describes one potential broken LOR.
In list-mode acquisition, the raw scatter data is a sequence (i 1 , i 2 , . . . ); after histogramming, the data is the number of detected single-scattered coincidences for each possible i. Here, y i denotes the simulated or measured data on SOR i, whileȳ i is used for the expected data. The dimension of the data space is N i ≤ N 2 d × N E , with N d detectors and N E energy bins (or equivalently, energy windows) 2 .
Voxels are indexed according to their physical roles using e (emitting), s (scattering), and t (transmitting). A 2-D matrix A λ (with entries a i s ) describes the sensitivity of the PET camera on SOR i for radiation scattered in a voxel s in the absence of attenuation; it integrates both normalized camera sensitivity (scatter geometry, photon detection efficiency) and the object's source density λ, as detailed in the Appendix. A 3-D tensor K (with entries k i s,t ) represents the attenuating path length of that radiation through a voxel t, independent of the object.
The expected number of low-energy scatter coincidences ȳ, which is linear in the activity λ, is modeled according to a discretized variant of the scatter-measurement equation (23), a generalization of the single scatter simulation (SSS) equation [3]. Using the notation in Table 1, we write the discrete measurement equation as: or, in matrix notation, denoting the element-wise operations by and • exp: The above expressions are equivalent to (41) and (42), respectively. Their derivation is subject to the following assumptions: Effective intersection length of photon path along the broken LOR (i, s) with the transmitting (or attenuating) voxel t, taking photon energy into account Poisson log-likelihood (LL) given the measured data y S(•, x true ) Normalized mean squared error (NMSE) with respect to a reference x true a These parameters are assumed to be constant within one iteration, but can be updated between iterations k i,e s,t = k i s,t Effective attenuation lengths of a voxel t seen by photons along a broken LOR (i, s) are independent of the point of emission e along that broken LOR: this is a common assumption in PET that has been fundamental in showing that TOF PET data determine the attenuation sinogram up to a constant [15]. μ ∝ ρ The linear attenuation coefficient μ is proportional to the electron density ρ: approximately, this is true because in biological (low-Z) materials at PET energies, Compton scattering is the dominant interaction preventing gamma photon pairs from being detected. At fixed energy, the ratio μ/ρ can be formulated to depend on the mass attenuation coefficient (μ/ρ m ) and the quotient of mass density and electron density (ρ m /ρ). The former is fairly constant across human tissues at PET energies ( [30], Fig. 3), and the latter is almost perfectly constant for materials less dense than water and deviates a maximum of 10% for materials three times as dense ( [31], Fig. 1). Note that μ/ρ may well depend on the photon energy; no assumption about the energy dependence of μ is implied (see the discussion around (35)).
If we assume that the electron density ρ is known accurately enough to approximate attenuation effects, as we will for one algorithm, we can simplify (2) further. That is, with an estimate ρ est and using the abbreviationÃ λ,ρ := A λ • exp(−K ρ), we find the linear mapping: We denote attenuated system matrices by a tilde (see Table 1;ã for components) and refer to (3) as the linearization of the scatter measurement equation.

Unscattered data for joint estimation
Our joint estimation approach requires, additionally, a model of the nonscattered data and three well-known algorithms. We assume the LOR model with: where z is the expected nonscattered data, U is the LOR system matrix without attenuation, i.e., the usual system matrix applied for the usual PET reconstruction (see Appendix), andŨ ρ the attenuated one; u l j andũ l j represent entries of U andŨ ρ , respectively, for LOR l and voxel j.
These nonscattered (true) data are used by maximum-likelihood expectationmaximization [32], which we refer to as trues-MLEM: and by the relaxed maximum-likelihood transmission algorithm [33] (trues-MLTR): In addition, scatter-MLEM [34] will be used, of which a brief derivation is given in the Appendix. This algorithm's update equation reads:

Methods and materials
In this section, we summarize two recent S2A algorithms and then look at fundamental differences between them. We then propose two novel joint estimation approaches using these algorithms and present our evaluation strategy.

Gradient-based algorithms for S2A reconstruction
In previous work, we introduced the two-branch back-projection (2BP) algorithm [17] which chooses between a positive and a negative update of ρ in a binary random-walk fashion. Since we found this algorithm to be impractical for most applications [24], we focus on two gradient-ascent-based algorithms here. The Poisson log-likelihood (LL) of some expected data ȳ, given the data y and omitting terms that do not depend on ȳ, reads: with its gradient with respect to a vector ρ For the linearization (3), sinceÃ λ,ρ est does not depend on ρ, we find the gradient of the expected data to be: By contrast, observing the double dependence of (2) on ρ, one finds: instead, which simplifies to (10) only under ρ = 0 or K = 0 (nonscattering or nonattenuating object) and ρ est = ρ. This vectorial expression lends itself particularly well to an implementation in MATLAB (The MathWorks, Natick, MA); note that the multiplication with ρ (from the left) denotes a summation over the scattering voxels s, as indicated in the component-wise expression:

Scatter-to-attenuation MLEM with one-step-late attenuation (MLEM-OSL)
This algorithm, called MLEM by its authors [18], is based on subsuming attenuation effects under the system matrix in the linearized measurement equation (3), yielding the MLEM update [32]: However, this update ignores the fact thatÃ λ,ρ depends on ρ, andÃ λ,ρ has to be updated after every iteration: Eq. (13) follows the spirit of the so-called one-step-late (OSL) algorithms [35], and we will refer to it as MLEM-OSL here.

Maximum-likelihood gradient ascent (MLGA)
The MLEM(-OSL) update (13) can be written as a scaled gradient ascent, with the gradient given by Eqs. (9 and 10) and a vector-valued step size [36]: The closed-form expression of MLGA [23,24] is obtained from (14b) by inserting the full log-likelihood gradient (9 and 11) and choosing a step size. In this work, we focus on a step size inspired by the MLEM update equation (14a): In addition to this MLEM-like step size, two additional step sizes have been tested: the constant step size, proposed before [24], and the scaled nonuniform step size: Step-size constants α, β, and γ have been optimized empirically for fastest, yet stable convergence with our data.

Validity of linearizing the scatter measurement equation
The amount of scatter as a function of electron density may not be sufficiently well represented by a linearized measurement equation, and (2) may require more careful treatment. To explore the limits of the linearization, we derive a geometrical interpretation as well as a numerical criterion. This criterion is used to distinguish data that can be used for algorithms based on linearization (here, MLEM-OSL) from data that cannot; further, it is linked to the log-likelihood gradient (11).
The most basic, one-dimensional (1-D) simplification of (2): confirms that no possible linearization in the form of (3): reflects the behavior of (17) with high attenuation (see Fig. 2): in particular, the derivative of (18) misrepresents the sign of the derivative of (17) for kρ > 1. This may have drastic implications for gradient-based algorithms using the linearization to compute the gradient: in particular, if ρ n > ρ true > 1/k, an iteration of MLEM-OSL yields ρ n+1 > ρ n regardless of the value of ρ est .
Comparison of the gradient of the linearization (10) with the full gradient (11) reveals the advantage of MLGA over MLEM-OSL; the difference term − ρ {K [ . . . ] } reverses the direction of the full gradient (only) with high attenuation, all components of ρ and K being nonnegative 3 .
A multi-voxel interpretation of the high-attenuation situation is presented in the Appendix: it is of importance in patients with great attenuation-length-electron-density products ρ K 4 . One conclusion from the arguments in the Appendix is that it is not straightforward to downsample (or downsize) S2A experiments, as that can transform low attenuation into high attenuation (by downsampling), or vice versa (by downsizing) 5 .
For MLEM-OSL, the linearization of the measurement equation may only be appropriate whenever attenuation effects do not reverse the sign of ∂ȳ i /∂ρ j . Since this may be true for some SORs i, but not for others, it may be appropriate to remove the latter from the data and apply MLEM-OSL to the reduced data set: (46) represents an approximate inclusion criterion used later.

n-algorithms for joint estimation
Up to this point, we have focused on dedicated S2A reconstruction algorithms, assuming knowledge of the activity distribution; in this section, we drop this assumption and extend our studies to joint estimation of activity and attenuation using scattered as well as nonscattered data. The added value of combining scattered and nonscattered data is visualized in the Appendix (Fig. 16).
We use five building blocks: the aforementioned MLGA with MLEM-like step size for S2A reconstruction, henceforth referred to as scatter-MLGA; scatter-MLEM-OSL as an alternative; scatter-MLEM for scatter-to-activity reconstruction; trues-MLEM for trues-to-activity reconstruction; and trues-MLTR for trues-to-attenuation reconstruction. Combinations of n individual algorithms form n-algorithms.
As for S2A reconstruction, we distinguish two main cases: low and high attenuation. The general data flow per iteration is similar for both cases and is visualized in Fig. 3; radiotracer activity distribution λ and electron density ρ are repeatedly updated using the current estimate of the respective other quantity. For both cases, we re-optimized the MLGA step sizes to achieve stability.

2-algorithms for low attenuation
The idea of this subsection has been presented before [18,29]. For low attenuation situations (e.g., with a spatial scale factor of 0.2 in Fig. 4), we interleave trues-MLEM with scatter-MLGA. The data flow in this part is similar to that proposed earlier [18]. We start with initial guesses for ρ and λ. In each iteration, plugging the current electrondensity estimate ρ intoŨ ρ , we use trues-MLEM to update the current activity estimate Fig. 3 One iteration of joint estimation: an electron density estimate is used for attenuation correction in MLEM scatter-to-activity (step 1) and subsequent trues-to-activity (step 2) updates. Then, the activity distribution estimate is used as a source term in MLGA scatter-to-attenuation (step 3) and MLTR trues-to-attenuation (step 4) updates. In low-attenuation cases, only steps 2 and 3 are used in each iteration, while all steps are used in high-attenuation cases and c initial μ-maps, respectively, in 1/cm; d true and e initial activity distributions, respectively, in arbitrary units; the initial activity distribution is used only for joint estimation (see the "n-algorithms for joint estimation" section). Human-sized phantom, axes scaled in cm λ using the nonscattered data z; then, we use the updated activity estimate to compute scatter-MLGA updates of ρ.
In this part of the study, we aim to minimize the number of computationally expensive updates of the system matrixÃ λ . Therefore, we run 10 iterations of attenuation-corrected (using the current ρ) trues-OSEM (with 4 data subsets) at a time, followed by 10 iterations of scatter-MLGA with 4 data subsets (the use of subsets in scatter-MLGA being studied in detail elsewhere [24]). This low-attenuation 2-algorithm is summarized as (trues-OSEM 10 4 ) + (scatter-MLGA 10 4 ), with a total of 20 sub-iterations per iteration. Since MLEM-OSL can replace scatter-MLGA for low attenuation, we also run (trues-OSEM 10 4 ) + (MLEM-OSL 10 ) on the same data for comparison.

4-algorithms for low or high attenuation
For high attenuation (e.g., Fig. 4 at the original, that is, human spatial scale), we find it necessary to further consider activity information contained in scattered coincidences, as well as attenuation information contained in true coincidences. The former is achieved by the scatter-MLEM algorithm, the latter with the trues-MLTR algorithm with a relaxation factor of η = 0.03. Iterations of different algorithms updating the same quantity are considered as one sub-iteration; all updates are applied subsequently (e.g., the trues-MLEM update used the estimate of λ as updated by the previous scatter-MLEM update; see Algorithm 1). Not using any subsets, the high-attenuation 4-algorithm is noted (scatter-MLEM + trues-MLEM) + (scatter-MLGA + trues-MLTR), with two sub-iterations per iteration.

Evaluation data
We simulate data based on an 18 × 18-voxel version (high resolution, Fig. 4) of the humansized chest cross-section phantom used previously [24], as well as the original one (9 × 9 voxels, low resolution, Fig. 14a). For the former, the voxel size is 25 × 25 mm 2 and the radius of the 2-D PET scanner used to simulate a PET acquisition is 40 cm. For a rat-sized field of view (FOV), the phantom (and the scanner geometry) are uniformly scaled down Algorithm 1 The joint 4-algorithm with updates in simplified notation 1: Fig. 4(c) and 4(e)) 2: for each iteration do 3: λ ← λ × (A(y/ȳ))/(A1) / / Sub-iteration 1: scatter-MLEM (7) 4: (14) 6: by a factor of 0.2, all size relations remaining identical (5 × 5 mm 2 pixel size, 8-cm detector radius) 6  Single-scattered data is simulated by evaluating (2). Nonscattered data is simulated by (4) using a system matrix U, each column j of which is constructed from the result of the MATLAB radon function [37] for a unity point source in j.
For all algorithms, the initial guess of ρ generously bounds the true object and is filled with the equivalent of μ = 0.07/cm (Fig. 4c): this value ensures approximately correct attenuation correction factors for the first iteration of trues-to-activity reconstruction. For joint estimation, the initial activity is homogeneous throughout the FOV (Fig. 4e).

S2A reconstruction
The first part of this comparison of MLGA and MLEM-OSL is along the lines of earlier work comparing MLGA with 2BP [24], using additional simulation data with higher numbers of voxels than before. Therefore, both algorithms are applied to the (low and high resolution) data described above. Due to the small number of voxels, specific features of reconstructed images are of less interest; for the agreement between reconstructed images x with their respective references x true , we therefore report normalized mean squared errors (NMSE): FOV size variations Both algorithms are applied to the data simulated at all three spatial scales (human: scale 1; rabbit: scale 0.35; rat: scale 0.2).
Reduced data For MLEM-OSL, data is reduced by separating SORs into useful and less useful ones based on the aforementioned criterion, useful ones fulfilling: This criterion is evaluated using the current estimate of ρ in every iteration. SORs i which are to be left out are removed both from the data y (removing single data points) and the system matrix A λ (removing whole rows), and all computations are carried out with these reduced variables when working with reduced data.
Computational complexity and sparsity Computational complexity of both algorithms is assessed by measuring run times on a consumer-grade laptop (Intel Core 2 Duo 2.8 GHz processor, 4 GB memory). Therefore, the simulation parameters are varied in two ways. First, with the number of voxels fixed at low resolution, we vary the number of detectors following N d = 2 n with n ∈ {1, . . . , 7}. Second, with the number of detectors fixed (at N d = 32), we vary the number of voxels following 2 n × 2 n with n ∈ {1, . . . , 5}; in terms of vector lengths, that corresponds to N e = N s = N t = 4 n . When varying the number of voxels, the voxel dimensions are adapted to maintain a constant spatial extent of the phantom.
For this part of the study, we choose constant activity and attenuation distributions (λ j = 1, μ j = 0.1/cm), with an initial μ j = 0.05/cm. Since this choice implies a maximum population of the system matrix A λ , we also determine what we term the geometrical density (fraction of non-null entries with flat activity) of A λ and K , respectively, which represent upper bounds for cases with less extended activity distributions.

Joint estimation
In joint estimation, in addition to computing NMSEs, we are interested in the evolution of several likelihood values. Attenuation-and activity-reconstruction algorithms are designed to maximize likelihoods given the true value of all other quantities. For scattered data, these are: However, in a joint-estimation setting, λ true and ρ true are generally not available. For the scattered and the true data, respectively, we therefore also track the apparent likelihoods, which are the quantities as seen by the optimization algorithms: We then study the following combinations of data and algorithms.

4-algorithms for low and high attenuation
We first compare the 4-algorithm, (scatter-MLEM + trues-MLEM) + (scatter-MLGA + trues-MLTR), to the 2-algorithm in terms of performance on the high-resolution, low-attenuation data; then, we apply only the 4-algorithm to high-resolution, high-attenuation data.

4-algorithm to resolve MLAA crosstalk
During initial studies with a low-resolution object at the human scale (Fig. 14a), traditional MLAA (trues-MLEM 1 1 + trues-MLTR 1 1 ) converged to an apparent local maximum of the likelihood (Fig. 14c). Therefore, we use the values of ρ and λ at this point to initialize the 4-algorithm.

Implementation
All algorithms are implemented in MATLAB (R2019a; The MathWorks, Natick, MA, USA). The Appendix describes the use of sparse matrices in evaluating likelihoods and gradients. In trues-and scatter-to-attenuation algorithms, instead of adding nonnegativity constraints, we set ρ j ← max{0, ρ j } after each update.

S2A reconstruction
In this section, we verify the theoretical findings using NMSEs of reconstructed ρ-maps, of which we present some examples in Fig. 5. Figure 6a shows the NMSEs of ρ for the human-sized phantom and system, for both low and higher resolution. All variants of MLGA converge to the correct solution as all NMSE curves tend to zero, while MLEM-OSL does not; MLGA with the MLEM-like step size is the fastest algorithm in both cases. Figure 6b and c show the data for the same phantom and system at rabbit and rat sizes, respectively. In these cases, all algorithms converge to the correct solution; generally, MLGA with the MLEM-like step size is among the fastest.

FOV size variations
As summarized in Fig. 7a, MLEM-OSL converges well with a rat-sized FOV, less rapidly (and nonmonotonously) so with a rabbit-sized FOV, and not at all with a human-sized FOV despite otherwise identical simulations. This dependence of convergence, and convergence rates, on the spatial scale of the simulated phantom is less pronounced with MLGA which, even with a constant step size as an example, converges faster than MLEM-OSL in most cases. Figure 7b verifies the hypothesis that MLEM-OSL scale dependence (and hence convergence) is improved by ignoring high-attenuation SORs in the data and the system matrix using (20). In fact, decreasing the dimensionality of the problem in this way leads to an increase in convergence speed for MLEM-OSL.    Tables 2 and 3 summarize the density of K and A λ for the experiments shown in Fig. 8. Generally, the fraction of nonzero entries in both quantities decreases with increasing numbers of detectors or voxels.

2-algorithms for low attenuation
The resulting images of the 2-algorithm for the low-attenuation phantom are shown after 100 and 1000 sub-iterations, respectively, in Fig. 9a and b. For the evolution of true and apparent likelihoods, we refer to Fig. 10: this plot shows the NMSE of μ and λ as a function of sub-iterations, indicating the alternating updates of activity and attenuation, and similarly, the ideal log-likelihoods (LL) using the current estimate of one quantity and the true value of the respective other quantity, respectively. Finally, the apparent LLs of scattered and nonscattered data, based on both estimated activity and estimated attenuation, are plotted.
In summary, the 2-algorithm converges towards the true activity and attenuation, even though true NMSE and apparent LL curves are nonmonotonous in parts. Figure 11 confirms that MLEM-OSL can replace scatter-MLGA in joint estimation, at low-attenuation and at the cost of reduced convergence speed. Figure 12 shows the results for the 4-algorithm in the low-attenuation case. Due to the fact that each iteration consists of only 2 sub-iterations (2 updates each), the LL and NMSE curves appear smoother than the same curves for the 2-algorithm.

4-algorithm for low or high attenuation
Applied to the same phantom at high attenuation, the 4-algorithm converges slower, but in a similarly smooth way as for low attenuation to the true activity and attenuation   (Figs. 9c, d and 13): some nonmonotonicity remains, both at the sub-iteration level (for apparent LLs) and at the scale of dozens of sub-iterations (e.g., the true-activity LL early on or the true-attenuation LL, for which we note that it increases for later sub-iterations).

4-algorithm to resolve MLAA crosstalk
Finally, Fig. 14 shows the results of the crosstalk study. While the MLAA 2-algorithm is stuck in an apparent local maximum (Fig. 14c), 7 the proposed 4-algorithm is able to not only avoid, but escape from this local maximum and converges towards the true solution (Fig. 14d).

Discussion
We have studied reconstruction of attenuation information from scattered coincidences in PET. Unlike other problems regarding reconstruction of activity or attenuation from nonscattered or scattered coincidences, the problem at hand is unique in that the reconstructed quantity appears twice in the measurement equation (2). The problem is therefore nonlinear, with the degree of nonlinearity depending on attenuation, which in mostly water-like objects implies spatial scale 6 .

S2A reconstruction
We have interpreted a recent take on the problem [18] as MLEM with a one-steplate update of the attenuated system matrix. MLEM-OSL, ignoring the dependence of A λ,ρ on ρ in computing a new estimate, has been derived by linearizing the nonlinear measurement equation; however, the fact that MLEM-OSL thus relies on a linear relationship between electron density and scattered coincidences impacts performance with high attenuation (Fig. 7a). An important result is therefore the characterization of the spatial-scale problem, which is complementary to the intensity-scale problem described earlier for joint estimation of activity and attenuation from only nonscattered data [15,39]. Another result is a potential nonuniqueness of the isolated S2A problem indicated in Fig. 16: fortunately, the same figure indicates that combined scattered and nonscattered data do not necessarily feature the same nonuniqueness. Also, we hypothesize that additional voxels and detectors help further resolve nonuniqueness. We have studied a maximum-likelihood gradient-ascent method for attenuation-map reconstruction based on the full, nonlinear data log-likelihood. Since the step size ignores the dependence ofÃ λ,ρ on ρ, MLGA does not feature the provable monotonicity of the likelihood that MLEM offers in activity reconstruction from nonscattered coincidences. However, MLGA has advantages over MLEM-OSL, where ignoring said dependence leads to a wrong direction of the update and results in instabilities: the impact on MLGA is noticeably smaller. Application to simulated data confirmed that in larger objects, MLGA outperforms MLEM-OSL. Nonetheless, MLEM-OSL is a simple and fast algorithm for rabbit-or rat-sized objects, while MLGA may require additional speedup [24].
By characterizing high attenuation, we have found a criterion to separate low-from high-attenuation data and improve MLEM-OSL convergence speed (Fig. 7b). Thus, one strategy to decrease the size of system matrices and tensors, and thus computational complexity, lies in choosing the most useful SORs from the full data. Here, we only briefly   [17,24].
While MLGA converges, we find it to be more computationally complex than MLEM-OSL (Fig. 8). Also, MLEM-OSL with reduced data is more complex than using the full data: this could be remedied by stopping to re-evaluate (20) after some iterations.

Joint estimation
MLGA, or MLEM-OSL, are not primarily meant as stand-alone algorithms, as they assume knowledge of the unknown λ, mandating a joint (λ, ρ) estimation scheme. This scheme is similar to traditional MLAA for estimation of (λ, μ) from nonscattered coincidences, where knowledge of λ is assumed by MLTR. Along the same lines, we have focused on using MLGA in joint n-algorithms. Just as MLAA iterates back and forth between MLEM and MLTR, reconstructing one quantity (λ or μ) while keeping the other (μ or λ, respectively) fixed, our proposed 2-algorithm iterates back and forth between trues-MLEM and scatter-MLGA. Another viable scheme encompassing all available data has been presented in the form of the scatter-MLEM/trues-MLEM/MLGA/MLTR Fig. 14 Crosstalk study of a low-resolution, high-attenuation (human-sized) phantom (as in Fig. 4): a true activity and attenuation; b activity and attenuation used to initialize MLAA; c activity and attenuation after apparent MLAA convergence, used to initialize the 4-algorithm; and d activity and attenuation after 1000 sub-iterations (500 iterations) of the 4-algorithm 4-algorithm, using both true and scattered coincidences for estimation of both λ and ρ.
For low-attenuation data, the simple 2-algorithm may be sufficient: in this case, MLEM-OSL may serve as a drop-in replacement for MLGA, however with decreased convergence speed (compare Figs. 10 and 11). This result is compatible with the results for S2A reconstruction (Fig. 6b and c). The more sophisticated 4-algorithm enables joint reconstruction with high-attenuation as well as low-attenuation data. In addition, this 4-algorithm can employ the scatter information to escape from a nonoptimal fix point of MLAA (Fig. 14d).
The plots of apparent (using estimated activity and attenuation) and ideal likelihoods indicate several nonmonotonicities that are overcome by the combination of algorithms. In particular, in the 2-algorithm, increasing the apparent scatter likelihood with MLGA updates of the attenuation often decreases the true-activity scatter likelihood, limiting the number of repeated MLGA updates that can be concatenated. This observation is one reason for choosing only a single update of each algorithm in each iteration of the 4-algorithm (Algorithm 1).

Limitations
This study has several limitations in the simplicity of the simulations: in particular, in neglecting detector scatter, multiple scatter, and energy-measurement uncertainties; in using the same forward model for the simulation as for the reconstruction; and in using low-dimensional objects and scanners.
In reality, the use of scattered photons is complicated by the fact that the detected signal of a nonscattered 511 keV photon, when it deposits only part of its energy in the detector, resembles that of a lower-energy, object-scattered photon [34]. One solution may be the use of an object-scatter energy window above the Compton edge (at 341 keV) and below the photopeak, which is virtually free of detector-scattered photons ( [40], Fig. 1). This highest-possible energy window also has a lower contribution of multiply scattered photons [34].
Furthermore, photon energy measurements suffer from uncertainties in the range of 10% FWHM in state-of-the-art PET scanners. This uncertainty leads to blurred estimation of potential scattering locations in the object. So far, it is unclear exactly which energy resolution is required to successfully use this approach in practice, although some comparisons have been made for S2A reconstruction ( [17], Fig. 11). In joint nalgorithms, separation of scattered and unscattered photons will be of importance in all sub-algorithms. New detector materials, such as LaBr 3 [41] or cadmium zinc telluride [42], might be needed.
Another limitation is that specific findings may not be generalizable to arbitrary scanner geometries; for example, 3-D geometries may exhibit different, presumably much sparser system quantities A λ and K . We expect that this increasing sparsity partially offsets the (otherwise unmanageable) size increase of these quantities with growing number of voxels, detectors, and energy bins.
It should be noted that due to computational complexity, 2-D considerations are not uncommon in recent studies regarding image reconstruction from scattered photons [20,26,43]. Furthermore, a more sophisticated imaging model that offers more realistic system matrix components A λ and K would be subject to the same measurement equations and lead to the same derivation of MLGA. So while specific convergence rates may vary with the density and condition number of those quantities, we expect the overall conclusions to prevail in more realistic settings. Finally, noise will have to be considered in future studies; currently, it is challenging to determine realistic noise levels for these nonrealistic types of objects and scanners.

Outlook
In this paper, we have used MLGA as a S2A building block in the context of joint estimation. It might be possible to find improved algorithms: for example, one might pursue one of the many paths that lead to the MLEM update equation for an algorithm which features more of the well-known properties of MLEM. This may include the minorize/maximize (MM) algorithm [44], of which regular MLEM is one special case. Following earlier incomplete-data formulations [32], one might define complete data that involve not only the emission location, but also the scattering location of every coincidence; this may result in a formulation similar to that for joint estimation from nonscattered data [45]. Algorithms that use the formulation of the Hessian of the log-likelihood may also be of value without requiring inversion of the full Hessian during image reconstruction, as has been shown recently [25,26].
TOF information might improve S2A reconstruction by increasing the sparsity of K (as some scattering locations on the surface of the football may not be compatible with the emission locations indicated by a TOF measurement) and further improving the condition of A λ (by reducing the number of emission voxels over which to compute e b i s,e λ e ). Our study does not simulate, or incorporate, TOF measurements, as the amount of additional information from TOF is nonetheless limited in attenuation reconstruction compared to activity reconstruction: even with perfect TOF information, the surface of potential scattering locations (and hence the density of K ) will hardly be reduced by more than a few times, on average, as most broken LORs compatible with a non-TOF coincidence will also be compatible with the TOF coincidence. Therefore, the primary way for TOF information to find its way into this problem may be through activity-reconstruction building blocks (TOF-trues-MLEM and TOFscatter-MLEM) and the estimate of the activity distribution they provide-similar to how TOF-MLAA benefits from TOF information without the MLTR algorithm using it explicitly.
Regarding the impact of the results outside of PET imaging, we have achieved a definition of data being more or less compatible with a linearization of the measurement equation that may be applied in external Compton scatter imaging, in which a number of ways have been tried to solve a structurally similar measurement equation [28]-compare in particular (2) to ( [46], Eq. 3) or ( [47], Eq. 1). Furthermore, full knowledge of the "source" distribution is given in CT and other transmission imaging modalities, where scattered radiation could be similarly exploited if discerned by energy measurements, such as in multi-energy (spectral) CT [48].

Conclusion
In reconstruction of attenuation information from scattered PET coincidences, maximum-likelihood gradient-ascent algorithms provide faster convergence and convergence in more diverse setups than MLEM-OSL, for which we have presented both analytic and experimental evidence: MLGA converges across all spatial scales, while MLEM-OSL may only converge with smaller objects. Nonetheless, MLEM-OSL can be a lower-complexity alternative to MLGA. We have defined a numerical criterion to determine when the simpler and more efficient MLEM-OSL can be used and described how its performance can be improved by reducing data based on said criterion. Finally, joint estimation of activity and attenuation from scattered and nonscattered coincidences has been presented using either MLGA or MLEM-OSL, in particular, in an example where MLAA fails to converge to the correct solution.

Appendix Derivation of discrete measurement equations
In the previous notation of the measurement equation ( [17], Eq. 5) and its derivation ( [49], Eq. 11), the continuous measurement equation is: We refer to the original publications for more context of this equation:N S (d 1 , E 1 , d 2 ), number of scattered coincidences detected in detectors d 1 (with energy E 1 ) and d 2 ; T, acquisition time; E 1 , energy bin width; b, length of baseline connecting d 1 and d 2 ; θ 1 , scattering angle associated with E 1 ; E 0 , 511 keV; σ KN , Klein-Nishina scattering crosssection; ω S , ϕ S , angles parameterizing the surface of scattering locations x S ; ρ e , electron density; λ, radiotracer density; L, a line segment connecting two points; 1 , 2 , detector solid angles seen from x S ; μ, μ E 1 , linear attenuation coefficient at E 0 and E 1 ; E, photon detection sensitivity as a function of detector element, photon energy, and angle of incidence. While most terms have direct physical interpretations: (cos ω S − cos θ 1 ) 2 sin 4 θ 1 and (cos θ 1 − 2) 2 E 0 · sin θ 1 (24) arise from the Jabobian determinant of the coordinate transformation from Cartesian coordinates to surface parameter angles and account for a change of variables from polar scatter angle to photon energy, respectively [49].
In discretizing and changing notation, we apply the following mapping: Two remarks are in order. First,ĉ i s serves a double purpose of both defining the domain of summation over potential scattering locations x s (28) and including weighting factors (29). Due to the former, it can be stored efficiently as a sparse matrix; this holds true for c i s,e and l i s,t as well. Second, in the mapping (33), l i s,t represents an effective attenuation length by encoding the differences in attenuation coefficients seen by scattered photons of different energies (μ E at lower energy E) compared to nonscattered photons (μ = μ 511keV ). In other words, a voxel can have different lengths along different broken LORs.
This eliminates the need to store different attenuation coefficients for the same voxel, assuming that μ E /μ is known. If one has μ ∝ ρ, that is, μ E (ρ) = α(E) · ρ, one finds which can be determined knowing only E. Since i comprises the scattered photon energy E (25), and each broken LOR (i, s) defines which voxels t are part of the respective lowerenergy section (Fig. 15a), the effective attenuation lengths l i s,t can be computed. While this reasoning confirms that the shape of (2) does not impede taking into account energydependent attenuation, we did not do so here.

Compression and simplification
In the above way, fully accounting for all terms and operators in (23), we obtain: While in terms of computational complexity, it may be beneficial to compute c i ,ĉ i s , and c i s,e separately; with regard to storage and algorithmic simplicity, one may prefer combining all these terms into b i s,e = c i ·ĉ i s ·c i s,e . Continuing to assume that μ ∝ ρ, we set to find This equation forms the basis of both activity and attenuation reconstruction.

Activity reconstruction
For the simpler case of activity reconstruction [34], we consider ρ as a parameter for attenuation correction. Then, changing the order of summation in (38): we recognize that (39) can be written as a matrix-vector product: whereÃ ρ is the matrix with entriesã i e := s ρ s · b i s,e · exp − t k i s,t · ρ t . Knowing (an estimate of ) ρ, the linear nature of (40) allows application of the regular MLEM algorithm [32] to reconstruct or update λ, as has been described by [34].

Attenuation reconstruction
Focusing on attenuation reconstruction, we consider λ as a parameter instead. Then, b i s,e and λ e can be combined into A λ with entries a i s := e b i s,e · λ e , yieldinḡ By using element-wise operations ( and • exp) and by defining 3rd-order tensors B and K and the matrix A λ := B λ, (38) and (41) can also be written as: respectively, where the latter is (2) as was to show.

Unscattered coincidences
For (mostly) unscattered data, where the one value of E represents the photopeak energy window, an SOR reduces to an LOR, andã i e is replaced byũ l j for activity reconstruction; similarly, in attenuation reconstruction, a i s is replaced by u l j .

Implementation using sparse matrices
K and A λ are highly sparse, and so is K ρ. However, since exp 0 = 1, exp(−K ρ) is not sparse, which impedes storage of intermediate results as sparse matrices. Hence, we rewrite: and use the expm1 function [50]

Multi-voxel interpretations of high attenuation
Motivated by the 1-D results around (17), we are interested in the sign of the gradient of the expected data. In component-wise notation, (12) reads: with δ sj the Kronecker delta. Based on the sign of this expression, we can identify distinct regions of the football-shaped SOR in Fig. 1. Figure 15a shows half a cross-sectional plane through the football, 8 determined by d s , s, and d n . On the inside and outside of the football, the sign is determined mainly by the quantities a i j and k i s,j , which represent the scattering contributions of a voxel j along an SOR i and the attenuating contributions of a voxel j along broken LORs (i, s), respectively.
Outside When both a i j = 0 and s k i s,j = 0 (⇔ k i s,j = 0 ∀s), we find ∂ȳ i /∂ρ j = 0. The voxel j does not contribute to a measurement along i, neither through scattering nor through attenuation; in Fig. 15a, this region of voxels j corresponds to the violet area outside the football. Changing ρ j does not influenceȳ i .

Inside
The football's strict inside contributes to a measurement along SOR i only through attenuation. We have a i j = 0 (voxel j not contributing scatter to this SOR), but s k i s,j > 0 (voxel j attenuating on at least one broken LOR of this SOR). These are red and blue areas in Fig. 15a: red highlights where k i s,j > 0 for one particular s; blue areas where k i s,j = 0, but k i s ,j > 0 for at least one other s = s. The result is ∂ȳ i /∂ρ j < 0: increasing ρ in j decreases the expected number of coincidences on i.
Thus, when taking these attenuating effects into account, e.g., in an iterative update, it would be more appropriate to speak of a volume rather than a surface of response; we still hold on to the term SOR here.
Surface Defined by a i j > 0 and marked green in Fig. 15a, the football's hull is the most interesting segment. These voxels are the only ones with scattering contributions, with potentially additional attenuating contributions (overlap between green and red regions). If we had s ρ s k i s,j = 0, then ∂ȳ i /∂ρ j ≥ 0 since j would only contribute through scattering. However, with a i j > 0, we certainly have k i j,j > 0: a scattering voxel is always on the broken LOR through it, and hence attenuating. Thus, if ρ j > 0 (which means j lies within the object), the sign of ∂ȳ i /∂ρ j is more complex to determine. Besides that, we have only little general information about k i s,j for s = j; this greatly depends on the object (low vs. high attenuation), the geometry of i and (i, s), as well as the discretization strategy. We therefore aim for a definition of "high attenuation. " For now, we will therefore ignore differences between voxels that are both inside the object (ρ j > 0) and on the surface of SOR i (a i j > 0), and refer to them by their number n and averagesā,k, andρ. This way, we simplify (44) to: By setting k i s,j = 0 ∀j when a i s = 0 (no attenuation along broken LORs through scattering voxels s geometrically incompatible with an SOR i), we simplify the sum over s: on the hull (a i j > 0), the expression determining the sign of (44) is then: the multidimensional (size of a system matrix) analog of 1 − kρ. Note that K ρ (or t k i j,t ρ t ; used in (2) and the attenuation-factor expressions of (11 and 12)) is a matrix of radiological paths, so in terms of units of measurement, so is ρ K (or s ρ s k i s,j ; used in (11) and, in the following, (20)). However, there are fundamental differences: the former is the weighted sum of the red voxels in Fig. 15a [over all voxels t that attenuate for one specific broken LOR (i, s), weighted by ρ t ], and as such,is the line integral of the (discrete) attenuation coefficient along the broken LOR. By contrast, the latter has a more complex geometrical interpretation (see Fig. 15b, bottom): it is the sum of the attenuating contributions of the same voxel t to all possible broken LORs (i, s), weighted by ρ s , the electron density in each broken LOR's scattering voxel s. Low attenuation, in this case, corresponds to a small number of scattering voxels which are impacted by one attenuating voxel j, and thus a small overlap between scattering (green) and attenuating (red, blue) voxels across all broken LORs of an SOR.
In the minimal overlap case depicted in Fig. 15a, a voxel j on the football's surface can have attenuating contributions to one of only three locations of each broken LOR (i, s), namely at d s , s, and d n (where d s and d n will be outside of the unknown object). However, significant additional overlap may be the result of a number of factors: SORs with low curvature, that is, small scattering angle θ and large SOR radius R = b/(2 sin θ); scattering points s close to d s or d n ; consideration of energy uncertainty in A λ ; large subjects; and nonrectangular grids and large voxel sizes. An extreme example is a barely scattered coincidence (E ≈ 511 keV, θ ≈ 0, R → ∞) on an LOR-like SOR i, along which, when a i j > 0, we find that a i s > 0 implies k i s,j > 0. So each voxel j attenuates (with its full effective length, k i s,j ≈ k i j,j ) contributions from every scattering voxel; in other words, contributions from any scattering voxel s are attenuated by a voxel j. In this case, (46) reads: where n i is the number of object voxels part of SOR i. With l i j,j on the order of the voxel size andμ ≈ 0.1/cm in water, (47) has the same sign as (10cm − L i ), where L i is the path length through the object. So for very tight SORs, the interpretations of ρ K and K ρ are similar, and the SOR through a water-like body of more than 10-cm thickness has ∂ȳ i /∂ρ j < 0; this is where the linear approximation underlying MLEM-OSL fails. The situation is more complex for other SORs.

Added value of combining nonscattered and scattered data
We focus on an artificial single-voxel/single-detector-pair problem, with the expected true and scattered data: z = (uλ) exp(−uρ) andȳ = (bλ) exp(−kρ)ρ, (48) respectively (compare (4) and (2) for multi-voxel variants). For a set of (λ, ρ) candidates, we plot the log-likelihood of trues and scatter as well as the joint log-likelihood, using system values b = k = u = 1 and true values λ true = 1 and ρ true ∈ {0.2, 2} (low and high attenuation, respectively), chosen such that 0.2 exp(− 0.2) ≈ 0.16 is in the range of 2 exp(− 2) ≈ 0.27. Figure 16 illustrates the added value of combining nonscattered (true) and scattered data, as compared to using only trues (MLAA) or only scatter (MLGA and scatter-MLEM). In both the low and the high attenuation example, the likelihood of trues exhibits the well-known scaling issue that can be appreciated in the form of an extended nonunique maximum; a similar effect is seen with the likelihood of scatter 9 . In this single-voxel example, the likelihood of scatter exhibits an additional nonuniqueness: for each value of λ, two values of ρ yield the same scatter likelihood, in line with two solutions of the 1-D scatter measurement equation (see Fig. 2); however, this nonuniqueness is more difficult to characterize with multiple voxels. The intersection of the curves that follow the maxima features a unique intersection, which coincides with the maximum of the joint likelihood of trues and scatter data. The peak around this maximum is broader for high attenuation, in line with the lower angle of intersection between the maxima of the individual likelihoods.