"A powerful empirical Bayes approach for high dimensional replicability analysis"

Speaker: Hongyuan Cao, Florida State University

Abstract: Researchers are interested in combining information across multiple (heterogeneous) studies to discover if findings are reproducible in different populations or in different studies.  The goal is to increase the power and control the false discovery rate, which results in reliable and robust scientific findings.  The focus of our study is to draw inferences regarding a large number of features, such as gene expression, DNA methylation, and others.  Rather than combining the underlying raw data, which is not always easy due to differences in the experimental designs, most approaches, including our proposed approach, are based on p-values derived from individual studies.  The popular approaches currently used in the literature either cannot control the false discovery rate or have low power since the null hypothesis of replicability analysis is a composite null hypothesis. We develop an empirical Bayes approach for the mixture model by jointly modeling the hidden states corresponding to null and alternative hypotheses across the studies. The method uses a non-parametric EM algorithm combined with the pool-adjacent-violator-algorithm (PAVA).  In doing so, our method borrows information across features and different studies while accounting for heterogeneity. We demonstrate theoretically that the proposed method controls the false discovery rate (FDR). Extensive simulation studies show that the proposed method has higher power than the existing methods while controlling the FDR. Datasets from spatial transcriptomic studies are used to illustrate our methodology.

Host: Todd Kuffner