Statistics and Data Science Seminar: "Bayesian screening and variable selection in ultra-high dimensional linear regression"

Speaker: Somak Dutta, Iowa State University

Abstract: During the last couple of decades, substantial research has been devoted to identifying the important covariates in an ultra-high dimensional linear regression where the number of covariates is in the lower exponential order of sample size. While the notion of variable screening focuses on identifying a smaller subset of covariates that includes the important ones with overwhelmingly large probability, the notion of variable selection indulges only on identifying the truly important ones. Typically, because variable selection is computationally costly, a screening step is performed to reduce the number of potential covariates. In this talk, we propose two new novel methodologies. For variable screening we first introduce a sequential Bayesian rule to incorporate prior information on the true model size and effect sizes.  Finally, we propose a scalable variable selection method that embeds variable screening in its algorithm, thus providing scalability and alleviating the need of a two-stage method. Our theoretical investigations relax some conditions for screening consistency and selection consistency under ultra-high dimensional setup. We illustrate our methods using a dataset with close to half-a-million covariates. This talk is based on several joint works with Dr. Vivekananda Roy and PhD students Dongjin Li and Run Wang.

Hosts: Likai Chen and Debashis Mondal