Statistics Seminar: "Two-sample test via classification"

Speaker: Haiyan Cai, University of Missouri-Saint Louis

Abstract: Robust classification algorithms (random forests, support vector machines, deep neural networks, for example) have been developed in recent years with great success. To take advantage of this development, we recast the classical two-sample test problem in the framework of a classification problem. Based on the estimates of class probabilities from a classifier trained from the samples, we propose a method for two-sample test. We will see why such a test can be a powerful test and compare its performance in terms of power and efficiency against those of some other recently proposed tests with some simulation and real-life data. Our method is nonparametric and can be applied to complex and high dimensional data whenever there is a good classifier that provides uniformly consistent estimate of class probabilities for such data. The talk will start with a brief review of the classification problem in machine learning and the basic concepts in hypothesis testing in statistics.

Host: Nan Lin