Statistics and Data Science Seminar: "Sparse Learning at Scale: Convex, Mixed Integer Programming, and Statistical Perspectives"

Speaker: Rahul Mazumder, Massachusetts Institute of Technology

Abstract: Many fundamental high-dimensional statistics estimators, such as best-subset selection (BSS),  can be naturally expressed as discrete optimization problems. Recently, mixed integer programming (MIP) methods have been shown to be promising candidates for formulating and solving small/ moderate instances of these problems. This sheds interesting insights into some less-understood statistical aspects of BSS, suggesting the need to design new estimators. On the computational front, current high-performance commercial integer programming solvers are somewhat black-box and can be challenging to scale to large instances. I will discuss our recent work on tailored branch-and-bound methods to solve, to optimality, a family of regularized BSS problems with up to a million features. For the first time, we employ first-order convex optimization methods within a branch-and-bound framework to solve instances of regularized BSS that show speedups of over 5000X over commercial solvers. If time permits, I will discuss some ongoing work where statistical modeling considerations can lead to the design of computationally attractive MIP formulations in the context of the well-known sparse PCA problem.

 
[This represents joint work with: Hussein Hazimeh, Ali Saab,  Antoine Dedieu, Peter Radchenko and Kayhan Behdin]
 

Hosts: Likai Chen and Debashis Mondal

Access Zoom Meeting (Passcode: 746908)