Statistics and Data Science Seminar: "Linear and nonlinear correlations for compositional data with applications to analysis of microbiome data"

Speaker: Shyamal Peddada, Biostatistics and Computational Biology Branch, NIEHS, NIH

Abstract: In many applications, such as in biomedical research, the observed multivariate data are compositional and hence they are points in a simplex.  Although John Aitchison introduced and developed the field of compositional data analysis over 40 years ago, it is only in the past decade or so when statisticians and computational biologists begun to recognize compositionality in their biomedical data and started to develop and use suitable methods.  This is particularly true of microbiome count data which are to be viewed as relative abundances and hence compositional. There is growing evidence in the literature demonstrating that the (gut) microbiome is involved in inflammation and immune response, and hence human health and disease.  Thus, there is considerable interest among biomedical researchers to study the human microbiome.  Since microbes form an ecology, and hence are potentially inter-dependent on each other, there is considerable interest to describe associations among them.  The focus of this talk is to describe a novel methodology called SECOM (Lin, Eggesbo and Peddada, Nature Communications, 2022) to estimate linear and nonlinear correlations among microbiota under the simplex constraint and illustrate the methodology using an infant gut microbiome data. An infant’s gut ecology continuously evolves during the first year after birth due to various factors such as changes in feeding, sleep patterns, exposure to people and so on. Using SECOM, for the first time in the literature, we describe associations among infant gut microbiota at different time points during the first year after birth.


Host: Debashis Mondal