1 Introduction
Various applications in medical imaging, remote sensing and elsewhere require solving inverse problems of the form
(1.1) 
where is a linear operator between Hilbert spaces, and is the data distortion. Inverse problems are well analyzed and several established approaches for its solution exist [1, 2]
. Recently, neural networks (NN) and deep learning appeared as a new paradigms for solving inverse problems and demonstrate impressive performance
[3, 4, 5, 6].In order to enforce data consistency, in [7] a deep learning approach named NETT (NETwork Tikhonov Regularization) has been proposed and analyzed based on minimizing , where is a trained network serving as regularizer. One of the main assumptions for the analysis of [7] is the coercivity of the regularizer which requires special care in network design and training. In order to overcome this limitation, we introduce the sparse augmented NETT (aNETT), which considers minimizers of
(1.2) 
Here is a sparse autoencoder network, and are the encoder and decoder network, is a countable index set, and is the latent Hilbert space of sparse codes. The weighted norm implements learned sparsity, and the augmented term is to force to be close to the data manifold . Both terms together allow to show coercivity of the regularizer. Based on this we derive stability, convergence and convergence rates for aNETT. Note that sparse regularization is well investigated for linear representations [8, 9] but so far has not been investigated for nonlinear deep autoencoders.
2 Sparse augmented NETT
2.1 Theoretical results
Throughout this section we assume the following.

is linear and bounded.

is weakly sequentially continuous.

is weakly sequentially continuous.

, ,

.
Furthermore, we define and choose the regularizer
Under these assumptions, (1.2) has a minimizer for all and all . Moreover, we have the following results.
Theorem 2.1 (Convergence)
Let , for satisfy , and as . Then with the following hold:

has at least one weak accumulation point.

Every weak accumulation point of is an minimizing solution of .

If has a unique minimizing solution , then weakly converges to .
Theorem 2.2 (Convergence rate)
Let be Gâteaux differentiable, have finitedimensional range and consider minimizers with . Then implies the convergence rate as in terms of the socalled absolute Bregman distance .
2.2 Trained autoencoder
First, an autoencoder is trained such that is close to and that is small for any in a class of images of interest. For that purpose, we add the regularizer
to the loss function for training
as denoising network. To be more specific, let be a family of autoencoder networks , where are admissible (in the sense of above assumptions) encoder networks and admissible decoder networks. Moreover, suppose thatis a training dataset. To select the particular autoencoder based on the training data, we consider the following training strategy for the sparse denoising autoencoder
(2.1) 
and set . Here are data perturbations and a regularization parameter.
By training with perturbed data points , we increase robustness of the trained autoencoder. Note that the perturbations are chosen independently of the operator such that the autoencoder can be used for each forward operator in a universal manner. Clearly then, the autoencoder depends on the specific manifold of images of interest. As we shall see however, opposed to typical deep learning based reconstruction methods which do not account for data consistency outside the training data set, the sparse aNETT is robust against changes of the specific image manifold. Note that Thms. 2.1 and 2.2 hold true for in place of .
2.3 Adaptation to specific forward models
The sparse aNETT (1.2) consists of a data consistency term, a sparsity term, and an augmented term enforcing . Ideally, the set of all approximately data consistent elements that are also approximate fixed points of , is close to the image manifold . However, without adjusting the autoencoder to specific forward models, this is a challenging and maybe impossible task. Indeed, for the application we consider in this paper, namely sparse view CT, we observed that the autoencoder trained independent of the forward operator, was not able to sufficiently well distinguish between dataconsistent elements inside and outside desired image class.
One way to increase the value of for undesired but data consistent elements is to adopt the training strategy developed in [7] and to take the data perturbations in (2.1) as where is a reconstruction operator approximating the MoorePenrose inverse of , are the artifact free images and images with artefacts. In this case, the training dataset depends on the forward operator, and the autoencoder has to be retrained for every specific forward operator. Therefore, in this paper we follow a different approach. Instead of adjusting the autoencoder training, we compose the operator independent autoencoder with another network , that is trained to distinguish between the desired images and images with operator dependent artefacts. For that purpose we choose a network architecture and select , where is a minimizer of
(2.2) 
where for and for and is a regularization parameter. We see that Thms. 2.1 and 2.2 still hold true for the final autoencoder if is weakly sequentially continuous.
3 Application to sparse view CT
For the numerical simulations we consider the problem of recovering an image from sparse view parallelbeam CT data with angles. For this problem, the forward operator is given by the angularly subsampled Radon transform
for equidistant angles in . Here
is the line in the plane with normal vector
and signed distance from the origin. Discretization of the Radon transform is done using the ODL library [11]. The data chosen for the numerical simulations are taken from the Low Dose CT Grand Challenge [12]. We consider the images at slice thickness given in the dataset and take the first seven patients for training (4267 images), the next two patients for validation (1143 images) and the last patient for testing (526 images). Each of these images is rescaled to have pixel values in the interval .3.1 Network training
We first train , by minimizing (2.1) and subsequently train by minimizing (2.2). The sparse autoencoder is chosen as . The network architecture chosen for the problem adapted network is the tight frame UNet [13] and the autoencoder architecture is chosen as in [14]. The perturbations in (2.1
) are taken as independent realizations of Gaussian white noise with standard deviation
where is uniformly sampled from and is the mean of . The weighs in the term are taken as where is the index of the downsamplingstep, see [14].We train all networks using the Adam [15] optimizer with the recommended parameters for iterations and use only the best parameters of these iterations. Here, the best parameters are those which give the smallest loss on the validation set. The parameters are chosen empirically and we found that and give the best results for our approach.
3.2 Solution of sparse aNETT
For minimizing the sparse aNETT functional (1.2) we use a splitting approach. For that purpose we introduce the auxiliary variable and rewrite (1.2) as the following constraint optimization problem
Note that we have only replaced in th term but not in the augmented term. To solve the above constrained version of aNETT, we use the ADMM scheme with scaled dual variable. This results in the update scheme
(3.1)  
(3.2)  
(3.3) 
where is a scaling parameter. The strength of the splitting type iteration (3.1)(3.3) is that the optimization problems involved in each iterative update is simpler and easier to solve than the original sparse aNETT minimization problem (1.2), which contains the nondifferentiable norm as well as nonlinear augmented network term. In fact, the update can be explicitly solved by softthresholding. Additionally, if we take being differentiable, the update can be solved efficiently using gradient type iterative schemes.
outer  inner  stepsize  

noise  
noise  
adversarial 
We minimize (3.1) using gradient descent with momentum parameter . The ADMM is initialized with , and , where are the given data. Here and below denotes the filtered backprojection operator. The parameter specifications for the minimization using (3.1)(3.3) in various scenarios are shown in Table 3.1. All parameters were chosen empirically to give the best results. Here, outer refers to the total ADMM iterations, stepsize is the stepsize and inner is the maximal number of iterations for the update step (3.1).
3.3 Numerical results
The first case we consider is the case of noisefree data. Figure 3.1 shows the FBP reconstruction and the reconstruction with the full network where is defined as above and the aNETT reconstruction . Comparing the results we see that the output of the problem adapted network and the aNETT output are visually identical. This is because, the test image is close to the training data and therefore the considered training procedure implies that is close to minimizer of the sparse aNETT. In comparison to the FBP we see that the aNETT was able to completely remove all the artefacts and yields an almost perfect reconstruction.
To simulate noisy data we add Gaussian noise to the measurement data, i.e. we use where is the mean of the data and
is a standard normal distributed noise term. Reconstructions using FBP, postprocessing and the sparse aNETT are shown in Figure
3.2. We enhance the contrast in these images by a factor of using the Python Pillow library [16] to make the differences more clearly visible. The postprocessing reconstruction shows some noiselike structure on parts where the image should be mostly constant, e.g. in and around the orange square. We hypothesize that these noise like structures occur because the problem adapted network has not been trained with noise in the data domain and hence has difficulties in reconstructing these. While we could add this to the training the networks would then likely fail on different noise models, e.g. Poisson noise. Comparing this to the aNETT we see that this noiselike structure has been greatly reduced and we have to rely more on the sparsifying term of the regularization method to get noisefree reconstructions.3.4 Robustness to adversarial attack
One particular advantage of aNETT over postprocessing is the increased robustness with respect to the type of image to be reconstructed. To highlight this advantage, as illustrated in the top left image in Figure 3.3 we add a high intensity disc to the CT image shown in 3.1. The disc represents a clear low complexity structure and its accurate reconstruction should be easily possible.
Figure 3.3 shows the reconstructions using the FBP, the postprocessing network and the aNETT. Taking a look at the zoomed in square in these images we see that FBP well reconstructs the circle. The postprocessing network output, however, has some dark spots close to the circle and generally shows datainconsistent behaviour around the circle. On the other hand, using the aNETT we see that these problems do not occur. This improved accuracy is because aNETT takes into account the given data even for images different form the training data.
4 Discussion
In this paper we introduced the sparse aNETT which is a sparse reconstruction framework using a learned regularization term and founded on a solid mathematical fundament. As we have shown in our numerical experiments, the aNETT shows results similar to a post processing network in the case of noisefree data phantoms close to the training data. However, thanks to included data consistency, the aNETT approach can much better deal with unseen phantom structure. While the chosen simple example might look artificial, it suggests that similar effects occur for more complex structures in a real scenario. When considering the case of noisy data, the aNETT is able to leverage the sparsifying term and increase robustness with respect to noise.
While the aNETT gives an overall more robust and stable reconstruction method, there is currently one major downside. Namely, our proposed approach relies on an iterative minimization scheme and is therefore substantially slower than the reconstruction by a postprocessing network. Therefore the design of numerical schemes for minimizing the sparse aNETT functional is a main step of future research. Further, comparisons with different reconstruction methods including network cascades [17, 18], variational and iterative networks [5, 19, 20] and null space networks [21] in future work.
References
 [1] HW Engl, M Hanke, and A Neubauer, Regularization of inverse problems, vol. 375, Springer Science & Business Media, 1996.
 [2] O Scherzer, M Grasmair, H Grossauer, M Haltmeier, and F Lenzen, Variational methods in imaging, Springer, 2009.
 [3] D Lee, J Yoo, and JC Ye, “Deep residual learning for compressed sensing mri,” in 2017 IEEE 14th International Symposium on Biomedical Imaging (ISBI 2017). IEEE, 2017, pp. 15–18.

[4]
KH Jin, M McCann, E Froustey, and M Unser,
“Deep convolutional neural network for inverse problems in imaging,”
IEEE Trans. Image Process., vol. 26, no. 9, pp. 4509–4522, 2017.  [5] J Sun, H Li, and Z et al. Xu, “Deep ADMMNet for compressive sensing MRI,” in Advances in neural information processing systems, 2016, pp. 10–18.
 [6] G Wang, “A perspective on deep imaging,” IEEE Access, vol. 4, pp. 8914–8924, 2016.
 [7] H Li, J Schwab, S Antholzer, and M Haltmeier, “NETT: Solving inverse problems with deep neural networks,” Inverse Probl., 2020.
 [8] M Grasmair, M Haltmeier, and O Scherzer, “Sparse regularization with penalty term,” Inverse Probl., vol. 24, no. 5, pp. 055020, 2008.
 [9] I Daubechies, M Defrise, and C De Mol, “An iterative thresholding algorithm for linear inverse problems with a sparsity constraint,” Commun. Pur. Appl. Math., vol. 57, no. 11, pp. 1413–1457, 2004.
 [10] M Haltmeier, L Nguyen, D Obmann, and J Schwab, “Sparse regularization of inverse problems with deep learning,” arXiv:1908.03006, 2019.
 [11] J Adler, H Kohr, and O Öktem, “Operator discretization library (odl),” Software available from https://github. com/odlgroup/odl, 2017.
 [12] C McCollough, “TUFG207A04: Overview of the low dose CT grand challenge,” Med. Phys., vol. 43, no. 6Part35, pp. 3759–3760, 2016.
 [13] Y Han and JC Ye, “Framing unet via deep convolutional framelets: Application to sparseview ct,” IEEE Trans. Med. Imag., vol. 37, no. 6, pp. 1418–1429, 2018.
 [14] D Obmann, J Schwab, and M Haltmeier, “Deep synthesis regularization of inverse problems,” arXiv:2002.00155, 2020.
 [15] D Kingma and J Ba, “Adam: A method for stochastic optimization,” arXiv:1412.6980, 2014.
 [16] A Clark, “Pillow (pil fork) documentation,” 2015.

[17]
A Kofler, M Haltmeier, C Kolbitsch, M Kachelrieß, and M Dewey,
“A UNets cascade for sparse view computed tomography,”
in
International Workshop on Machine Learning for Medical Image Reconstruction
. Springer, 2018, pp. 91–99.  [18] J Schlemper, J Caballero, J Hajnal, A Price, and D Rueckert, “A deep cascade of convolutional neural networks for dynamic MR image reconstruction,” IEEE Trans. Med. Imag., vol. 37, no. 2, pp. 491–503, 2017.
 [19] J Adler and O Öktem, “Solving illposed inverse problems using iterative deep neural networks,” Inverse Probl., vol. 33, no. 12, pp. 124007, 2017.

[20]
E Kobler, T Klatzer, K Hammernik, and T Pock,
“Variational networks: connecting variational methods and deep
learning,”
in
German conference on pattern recognition
. Springer, 2017, pp. 281–293.  [21] J Schwab, S Antholzer, and M Haltmeier, “Deep null space learning for inverse problems: convergence analysis and rates,” Inverse Probl., vol. 35, no. 2, pp. 025008, 2019.
Comments
There are no comments yet.