Diffusion Approximation to Stochastic Mirror Descent with Statistical Applications

Shih-Kang Chao, University of Missouri-Columbia

Abstract: Stochastic gradient descent (SGD) is a popular algorithm that can handle extremely large data sets due to its low computational cost at each iteration and low memory requirement. Asymptotic distributional results of SGD are very well-known (Kushner and Yin, 2003). However, a major drawback of SGD is that it does not adapt well to the underlying structure of the solution, such as sparsity or constraints. Thus, many variations of SGD have been developed, and a lot of them are based on the concept of stochastic mirror descent (SMD). In this paper, we develop diffusion approximation to SMD with constant step size, using a theoretical tool termed “local Bregman divergence”. In particular, we establish a novel continuous mapping theorem type result for a sequence of conjugates of the local Bregman divergence. The diffusion approximation results shed light on how to fine-tune an l1-norm based SMD algorithm to yield “asymptotically unbiased” estimator which zeros inactive coefficients.

Host: Jose Figueroa-Lopez