Titles and Abstracts
Titles/abstracts for the Fourth Workshop on Higher-Order
Asymptotics and Post-Selection Inference (WHOA-PSI)^{4}.
Click
here to go to the main conference page, where you can find
more information. Contact:
Todd Kuffner, email: kuffner@wustl.edu
Talks
Rina Foygel Barber,
University of Chicago
Title: Predictive
inference with the jackknife+
Abstract: We introduce the jackknife+, a
novel method for constructing predictive confidence intervals
that is robust to the distribution of the data. The jackknife+
modifies the well-known jackknife (leave-one-out
cross-validation) to account for the variability in the fitted
regression function when we subsample the training data.
Assuming exchangeable training samples, we prove that the
jackknife+ permits rigorous coverage guarantees regardless of
the distribution of the data points, for any algorithm that
treats the training points symmetrically. Such guarantees are
not possible for the original jackknife and we demonstrate
examples where the coverage rate may actually vanish. Our
theoretical and empirical analysis reveals that the jackknife
and jackknife+ intervals achieve nearly exact coverage and
have similar lengths whenever the fitting algorithm obeys some
form of stability. We also extend to the setting of K-fold
cross-validation. Our methods are related to cross-conformal
prediction proposed by Vovk [2015] and we discuss connections.
This work is joint with Emmanuel Candes, Aaditya Ramdas, and
Ryan Tibshirani.
Pallavi Basu,
Indian School of Business
Title: TBA
Abstract: TBA
Yuval Benjamini,
Hebrew University of Jerusalem
Title: Extrapolating
the accuracy of multi-class classification
Abstract: The difficulty of multi-class
classification generally increases with the number of classes.
This raises a natural question: Using data from a subset of
the classes, can we predict how well a classifier will scale
as the number of classes increases? In other words, how should
we extrapolate the accuracy for small pilot studies to larger
problems ? In this talk, I will present a framework that
allows us to analyze this question. Assuming classes are
sampled from a population (and some assumptions about the
classifiers), we can identify how expected classification
accuracy depends on the number of classes (k) via a specific
cumulative distribution function. I will present a
non-parametric method for estimating this function, which
allows extrapolation to K>k. I will show relations with the
ROC curve. Finally, I hope to discuss why the extrapolation
problem may be important for neuroscientists, who are
increasingly using mutliclass extrapolation accuracy as a
proxy for richness of representation. This is joint work with
Charles Zheng and Rakesh Achanta.
Florentina Bunea,
Cornell University
Title: Essential
regression
Abstract: Click
here
Brian Caffo,
Johns Hopkins University
Title: Statistical
properties of measurement in resting state functional magnetic
resonance imaging
Abstract: In this talk we discuss the
statistical measurement properties of resting state functional
magnetic resonance imaging data. Recent work has focused on
measures of brain connectivity via resting state fMRI as a
"fingerprint". We discuss the statistical properties of group
fingerprint matching vis a via the matching strategy and
statistical assumptions. We further explore the utility of
matching as a strategy for establishing measurement
quality. Alternate strategies using ranking and measures
of discriminability are also explored. Connections will be
made to the use of higher order asymptotics for estimating
distributional properties of matching statistics. We
further apply matching strategies on a group of subjects from
the Human Connectome Project comparing matching performance of
subjects to themselves, homozygous and heterozygous twins,
non-twin siblings and non-relations. Furthermore, we
investigate which brain connections are most and least
idiosyncratic.
Emmanuel Candes, Stanford University
Title: To be announced
Abstract: TBA
Daniela De Angelis, University of
Cambridge
Title: Value of
information for evidence synthesis
Abstract: In a Bayesian model that
combines evidence from several different sources, it is
important to know which parameters most affect the estimate or
decision from the model; which of the parameter uncertainties
drive the decision uncertainty; and what further data should
be collected to reduce such uncertainty. These questions can
be addressed by Value of Information (VoI) analysis, allowing
estimation of the expected gain from learning specific
parameters or collecting data of a given design. In this talk,
we introduce the concept of VoI for Bayesian evidence
synthesis, using and extending ideas from health economics,
computer modelling and Bayesian design. We then apply it to a
model developed to estimate prevalence of HIV infection, which
combines indirect information from surveys, registers, and
expert beliefs. Results show which parameters contribute most
to the uncertainty about each prevalence estimate, and the
expected improvements in precision from specific amounts of
additional data. These benefits can be traded with the costs
of sampling to determine an optimal sample size. Joint work
with : Chris Jackson, Anne Presanis and Stefano Conti.
Julia Fukuyama,
Indiana University
Title: Phylogenetically-informed
distance methods: their uses, properties, and potential
Abstract: Phylogenetically-informed
distances are widely used in ecology, often in conjunction
with multi-dimensional scaling, to describe the relationships
between communities of organisms and the taxa they comprise. A
large number of such distances have been developed, each
leading to a different representation of the communities. The
ecology literature often tries to interpret the differences
between representations given by different distances, but
without a good understanding of the properties of the
distances it is unclear how useful these interpretations are.
I give an overview of some of these distances, describe the
interpretational challenges they pose, develop some
interesting properties, and comment on opportunities for
post-selection inference in this domain.
Irina Gaynanova,
Texas A&M University
Title: Direct
inference for sparse differential network analysis
Abstract: We consider the problem of
constructing confidence intervals for the differential edges
between the two high-dimensional networks. The problem is
motivated by the comparison of gene interactions between two
molecular subtypes of colorectal cancer with distinct survival
prognosis. Unlike the existing approaches for differential
network inference that require sparsity of individual
precision matrices from both groups, we only require sparsity
of the precision matrix difference. We discuss the methods'
theoretical properties, evaluate its performance in numerical
studies and highlight directions for future research. This is
joint work with Mladen Kolar and Byol Kim.
Ed George,
University of Pennsylvania
Title: Multidimensional
monotonicity discovery with MBART
Abstract: For the discovery of a
regression relationship between y and x, a vector of p
potential predictors, the flexible nonparametric nature of
BART (Bayesian Additive Regression Trees) allows for a much
richer set of possibilities than restrictive parametric
approaches. To exploit the potential monotonicity of the
predictors, we introduce mBART, a constrained version of BART
that incorporates monotonicity with a multivariate basis of
monotone trees, thereby avoiding the further confines of a
full parametric form. Using mBART to estimate such effects
yields (i) function estimates that are smoother and more
interpretable, (ii) better out-of-sample predictive
performance and (iii) less post-data uncertainty. By using
mBART to simultaneously estimate both the increasing and the
decreasing regions of a predictor, mBART opens up a new
approach to the discovery and estimation of the decomposition
of a function into its monotone components. (This is
joint work with H. Chipman, R. McCulloch and T. Shively).
Iain Johnstone, Stanford University
Title: HOA-PSI for
top eigenvalues in spiked PCA models
Abstract:
The setting is principal components analysis
with number of variables proportional to sample size, both
large. The data are Gaussian with known spherical population
covariance except for a fixed number of larger and distinct
population eigenvalues, 'spikes'. If these spikes are large
enough, i.e. 'supercritical', then to leading order the sample
spike eigenvalues are known to be asymptotically independent
Gaussian. We give the first order Edgeworth correction for
this model (which is far from the usual smooth function of
means setting) and note how repulsion of supercritical sample
eigenvalues first becomes visible at this order. If time
allows, we outline implications for improved confidence
intervals for the spike values, using a minimal conditioning
strategy for post selection inference. This is joint work with
Jeha Yang.
Mladen Kolar,
University of Chicago
Title: TBA
Abstract: TBA
Vladimir Koltchinskii,
Georgia Tech
Title: Bias
reduction and efficiency in estimation of smooth functionals
of high-dimensional parameters
Abstract: A problem of estimation of
smooth functionals of high-dimensional parameters of
statistical models will be discussed. The focus will be on a
method of bias reduction based on approximate solutions of
integral equations on the parameter space with respect to
certain Markov kernels. It will be shown that, in the case of
high-dimensional normal models, this approach yields
estimators with optimal or nearly optimal mean squared error
rates (in particular, asymptotically efficient
estimators) for all sufficiently smooth functionals. The
proofs of these results rely on Gaussian concentration,
representations of Markov chains as superpositions of smooth
random maps and information-theoretic lower bounds. Possible
extensions of this approach beyond normal models will be
briefly discussed. The talk is based on a joint work with
Mayya Zhilova.
Ioannis Kosmidis,
University of Warwick
Title: Improved
estimation of partially specified models
Abstract: This talk focuses on a new
framework for reducing bias in estimation. Many bias reduction
methods rely on an approximation of the bias function of the
estimator under the assumption that the model is correct and
fully-specified. Other bias reduction methods, like the
bootstrap, the jackknife and indirect inference require fewer
assumptions to operate but are typically computer-intensive.
We present current research on a new framework for bias
reduction that:
i) can deliver estimators with smaller bias than reference
estimators even for partially specified models, as long as
estimation is through unbiased estimating functions;
ii) always results in closed-form bias-reducing penalties to
the objective function if estimation is through the
maximisation of one, like maximum likelihood and maximum
composite likelihood; and
iii) relies only on the estimating functions and their
derivatives, greatly facilitating implementation through
numerical or automatic differentiation techniques and standard
numerical optimisation routines.
Joint work with: Nicola Lunardon, University of
Milano-Bicocca, Italy
Arun
Kumar Kuchibhotla, University of Pennsylvania
Title: Post-selection
inference for all
Abstract: Inference after selection is
currently available for limited settings. PoSI, as meant in
Berk et al. (2013), has been extended to general M-estimators
only for fixed (not depending on sample size) number of
covariates is fixed. Selective inference, as studied by
Jonathan Taylor & Co., is only rigorously proved for fixed
number of covariates. In this talk, I will introduce
randomness free study of M-estimators which readily yields
uniform linear representation. This implies simultaneous and
hence post-selection inference even with diverging number of
covariates for a large class of M-estimators using
high-dimensional CLT. Talk is based on "Deterministic
Inequalities for Smooth M-estimators", arXiv:1809.05172.
Stephen M.S. Lee,
University of Hong Kong
Title: High-dimensional
local polynomial regression with variable selection and
dimension reduction
Abstract: Variable selection and
dimension reduction have been considered in non-parametric
regression for improving the precision of estimation, via the
formulation of a semiparametric multiple index model. However,
most existing methods are ill-equipped to cope with a
high-dimensional setting where the number of variables may
grow exponentially fast with sample size. We propose a new
procedure for simultaneous variable selection and dimension
reduction in high-dimensional nonparametric regression
problems. It consists essentially of penalised local
polynomial regression, with the bandwidth matrix regularised
to facilitate variable selection, dimension reduction and
optimal estimation at the oracle convergence rate, all in one
go. Unlike most existing methods, the proposed procedure does
not require explicit bandwidth selection or an additional step
of dimension determination using techniques like cross
validation or principal components. Empirical performance of
the procedure is illustrated with both simulated and real data
examples. Joint work with Kin Yap Cheung.
Xihong Lin,
Harvard University
Title: Hypothesis
testing for a large number of composite nulls in genome-wide
causal mediation analysis
Abstract: In genome-wide epigenetic
studies, it is often of scientific interest to assess whether
the effect of an exposure on a clinical outcome is mediated
through DNA methylation. Statistical inference for causal
mediation effects is challenged by the fact that one needs to
test a large number of composite null hypotheses across the
genome. In this paper, we first study the theoretical
properties of the commonly used methods for testing for causal
mediation effects, Sobel's test and the joint significance
test. We show the joint significance test is the likelihood
ratio test for the composite null hypothesis of no mediation
effect. Both Sobel's test and the joint significance test
follow non-standard distributions, and they are overly
conservative for testing mediation effects and yield invalid
inference in genome-wide epigenetic studies. We propose
a novel Divide-Aggregate Composite-null Test (DACT) for the
composite null hypothesis of no mediation effect in
genome-wide analysis. We show that the DACT method
provides valid statistical inference and boosts power for
testing mediation effects across the genome. We
propose a correction procedure to improve the DACT method
using Efron's empirical null method when the exposure-mediator
or/and the mediator-outcome association signals are not
sparse. Our extensive simulation studies show that
the DACT method properly controls type I error rates and
outperforms the Sobel's and the joint significance tests for
genome-wide causal mediation analysis. We applied the DACT
method to the Normative Aging Study to identify putative DNA
methylation sites that mediate the effect of smoking on lung
function. We also developed a computationally efficient R
package DACT for public use.
Kristin Linn,
University of Pennsylvania
Title: Interactive
Q-learning
Abstract: Forming evidence-based rules
for optimal treatment allocation over time is a priority in
personalized medicine research. Such rules must be estimated
from data collected in observational or randomized studies.
Popular methods for estimating optimal sequential decision
rules from data, such as Q-learning, are approximate dynamic
programming algorithms that require modeling non-smooth
transformations of the data. Postulating a simple,
well-fitting model for the transformed data can be difficult,
and under many simple generative models the most commonly
employed working models (namely, linear models) are known to
be misspecified. We propose an alternative strategy for
estimating optimal sequential decision rules wherein all
modeling takes place before applying non-smooth
transformations of the data. This simple re-ordering of the
modeling and transformation steps leads to high-quality
estimated sequential decision rules because the proposed
estimators involve only conditional mean and variance modeling
of smooth functionals of the data. Consequently, standard
statistical procedures can be used for exploratory analysis,
model building, and model validation. We will also discuss
extensions of Interactive Q-learning for optimizing non-mean
summaries of an outcome distribution.
Miles Lopes,
UC Davis
Title: Bootstrap
methods in high dimensions: spectral statistics and max
statistics
Abstract: Although bootstrap methods have
an extensive literature, relatively little is known about
their performance in high-dimensional problems. In this talk,
I will discuss two classes of statistics for which bootstrap
approximations can succeed in high dimensions. The first is
the class of "spectral statistics," which are functions of the
eigenvalues of sample covariance matrices. In this case, I
will describe a new type of bootstrap method with consistency
guarantees. The second class is based on the coordinate-wise
maxima of high-dimensional sample averages, which have
attracted recent interest in connection with the "multiplier
bootstrap". In this case, I will explain how existing
theoretical rates of bootstrap approximation can be improved
to near-parametric rates under certain structural conditions.
(Joint work with subsets of {Alexander Aue, Andrew Blandino,
Zhenhua Lin, Hans Mueller}.)
Xiao-Li Meng,
Harvard University
Title: The
Conditionality Principle is (still) safe and sound, but our
large-p-small-n models are ill (defined)
Abstract: In recent years, a number of
authors have questioned the applicability of the
Conditionality Principle to high-dimensional problems,
because they presented examples where certain model
parameters cannot be estimated without using ancillary
information. This talk points out that such questioning is
meaningful only if both model parameters and ancillary
statistics are defined in ways as required by the
Conditionality Principle. The mathematical assumptions
between the number of parameters p and the sample size n,
while very useful for approximation theory for
high-dimensional problems, are typically at odds with
statistical modeling both as a realizable process for
generating data and as a coherent vehicle for inference.
Furthermore, reaching consistency/estimability by
marginalization comes at the necessary expense of solving a
less relevant problem than the actual one we care
about. All these issues reinforce the time-honored
``no-free-lunch" principle, with an additional reminder to
read the menu carefully for dietary restrictions.
Art Owen,
Stanford University
Title: Six percent
power and barely selective inference
Abstract: Click
here
Snigdha Panigrahi,
University of Michigan
Title: Post-selective
estimation of linear mediation effects
Abstract: In an attempt to understand the
effect of exposures on an outcome variable, several mediators
often make key contributions towards an indirect effect.
A priori it is not typically known which mediating pathways
are of potential interest, out of a high dimensional set of
candidates. Previous work in this domain has mainly focused on
methods which identify likely mediators. However, a problem
that has not received much attention to date is consistent
estimation of the mediated associations between exposure and
response, using the available samples and based upon these
data-mined models. Specifically, the post-selective targets in
linear mediation models take the form of adaptive linear
combinations of model parameters. With the usual ``Polyhedral"
machinery no longer applicable to construct pivotal inference,
we deploy recently developed maximum likelihood techniques
(Panigrahi and Taylor; 2019) for interval estimation. To
showcase the merits of our approach, we will demonstrate in
simulations an optimal tradeoff in power and inferential
coherence. This is joint work with Yujia Pan.
Annie Qu,
University of Illinois Urbana-Champaign
Title: Community detection with
dependent connectivity
Abstract: In network analysis, within-community members
are more likely to be connected than between-community
members, which is reflected in that the edges within a
community are intercorrelated. However, existing probabilistic
models for community detection such as the stochastic block
model (SBM) are not designed to capture the dependence among
edges. In this paper, we propose a new community detection
approach to incorporate intra-community dependence of
connectivities through the Bahadur representation. The
proposed method does not require specifying the likelihood
function, which could be intractable for correlated binary
connectivities. In addition, the proposed method allows for
heterogeneity among edges between different communities. In
theory, we show that incorporating correlation information can
achieve a faster convergence rate compared to the independent
SBM, and the proposed algorithm has a lower estimation bias
and accelerated convergence compared to the variational EM.
Our simulation studies show that the proposed algorithm
outperforms the popular variational EM algorithm assuming
conditional independence among edges. We also demonstrate the
application of the proposed method to agricultural product
trading networks from different countries. This is joint work
with Yubai Yuan.
Aaditya Ramdas,
Carnegie Mellon University
Title: Online
control of the false coverage rate and false sign rate
Abstract: The reproducibility debate has
caused a renewed interest in changing how one reports
uncertainty, from $p$-values for testing a null hypothesis to
confidence intervals (CIs) for the corresponding parameter.
When CIs for multiple selected parameters are being reported,
the natural analog of the false discovery rate (FDR) is the
false coverage rate (FCR), which is the expected ratio of
number of reported CIs that fail to cover their respective
parameters to the total number of reported CIs. Here, we
consider the general problem of FCR control in the online
setting, where there is an infinite sequence of fixed unknown
parameters $\theta_t$ ordered by time. At each step, we see
independent data that is informative about $\theta_t$, and
must immediately make a decision whether to report a CI for
$\theta_t$ or not. If $\theta_t$ is selected for coverage, the
task is to determine how to construct a CI for $\theta_t$ such
that FCR $\leq \alpha$ for any $T\in \N$. While much progress
has been made in online FDR control (test $\theta_t \in
\Theta_{0,t}$) starting from the seminal alpha-investing paper
of Foster and Stine (JRSSB, 2008), the problem of online FCR
control is wide open. In this paper, we devise a novel
solution to the problem which only requires the statistician
to be able to construct a marginal CI at any given level. If
so desired, our framework also yields online FDR control as a
special case, or even online sign-classification procedures
that control the false sign rate (FSR). Last, all of our
methodology applies equally well to prediction intervals,
having particular implications for selective conformal
inference. This is joint work with Asaf Weinstein (preprint at
https://arxiv.org/abs/1905.01059).
Veronika Rockova,
University of Chicago
Title: TBA
Abstract: TBA
Cynthia Rush,
Columbia University
Title: Algorithmic analysis of SLOPE via
approximate message passing
Abstract: SLOPE is a relatively new
convex optimization procedure for high-dimensional linear
regression via the sorted L1 penalty: the larger the rank of
the fitted coefficient, the larger the penalty. This
non-separable penalty renders many existing techniques invalid
or inconclusive in analyzing the SLOPE solution. In this talk,
we propose using approximate message passing or AMP to
provably solve the SLOPE problem in the regime of linear
sparsity under Gaussian random designs. This algorithmic
approach allows one to approximate the SLOPE solution via the
much more amenable AMP iterates, and a consequence of this
analysis is an asymptotically exact characterization of the
SLOPE solution. Explicitly, we demonstrate that one can
characterize the asymptotic dynamics of the AMP iterates by
employing a recently developed state evolution analysis for
non-separable penalties, thereby overcoming the difficulty
caused by the sorted L1 penalty. This is joint work with
Zhiqi Bu, Jason Klusowski, and Weijie Su.
Richard Samworth,
University of Cambridge
Title: High-dimensional
principal component analysis with heterogeneous missingness
Abstract: We study the problem of
high-dimensional Principal Component Analysis (PCA) with
missing observations. In simple, homogeneous missingness
settings with a noise level of constant order, we show that an
existing inverse-probability weighted (IPW) estimator of the
leading principal components can (nearly) attain the minimax
optimal rate of convergence. However, deeper investigation
reveals both that, particularly in more realistic settings
where the missingness mechanism is heterogeneous, the
empirical performance of the IPW estimator can be
unsatisfactory, and moreover that, in the noiseless case, it
fails to provide exact recovery of the principal components.
We therefore introduce a new method for high-dimensional PCA,
called `primePCA', that is designed to cope with situations
where observations may be missing in a heterogeneous manner.
Starting from the IPW estimator, primePCA iteratively projects
the observed entries of the data matrix onto the column space
of our current estimate to impute the missing entries, and
then updates our estimate by computing the leading right
singular space of the imputed data matrix. It turns out that
the interaction between the heterogeneity of missingness and
the low-dimensional structure is crucial in determining the
feasibility of the problem. This leads us to impose an
incoherence condition on the principal components and we prove
that in the noiseless case, the error of primePCA converges to
zero at a geometric rate when the signal strength is not too
small. An important feature of our theoretical guarantees is
that they depend on average, as opposed to worst-case,
properties of the missingness mechanism. Joint work with Ziwei
Zhu and Tengyao Wang.
Ulrike Schneider, TU
Wien
Title: Uniformly valid
confidence sets based on the Lasso in low dimensions
Abstract: In a linear regression model
of fixed dimension p ≤ n, we construct confidence regions
for the unknown parameter vector based on the Lasso
estimator that uniformly and exactly hold the prescribed
coverage in finite samples (as well as in an asymptotic
setup). We thereby quantify estimation uncertainty as well
as the ``post-model selection error" of this estimator. More
concretely, in finite samples with Gaussian errors (and
asymptotically in the case where the Lasso estimator is
tuned to perform conservative model selection), we derive
exact formulas for the minimal coverage probability over the
entire parameter space and for a large class of shapes for
the confidence sets, thus enabling the construction of valid
confidence regions based on the Lasso estimator in these
settings. Our calculations are carried out without explicit
knowledge of the finite-sample distribution of the
estimator. Furthermore, we discuss the choice of shape for
the confidence sets and the comparison with the confidence
ellipse based on the least-squares estimator, along with
some ideas for extensions. [Reference: K. Ewald and U.
Schneider, Uniformly Valid Confidence Sets Based on the
Lasso, Electronic Journal of Statistics 12 (2018),
1358-1387.]
Peter Song,
University of Michigan
Title: Method of
Contraction-Expansion (MOCE) for simultaneous inference in
linear models
Abstract: Simultaneous inference after
model selection is of critical importance to address
scientific hypotheses involving a set of parameters. We
consider high-dimensional linear regression model in which a
regularization procedure such as LASSO is applied to yield a
sparse model. To establish a simultaneous post-model selection
inference, we propose a method of contraction and expansion
(MOCE) along the line of debiasing estimation that enables us
to balance the bias-and-variance tradeoff so that the
super-sparsity assumption may be relaxed. We establish key
theoretical results for the proposed MOCE procedure from which
the expanded model can be selected with theoretical guarantees
and simultaneous confidence regions can be constructed by the
joint asymptotic normal distribution. In comparison with
existing methods, our proposed method exhibits stable and
reliable coverage at a nominal significance level with
substantially less computational burden, and thus it is
trustworthy for its application in solving real-world
problems. This is a joint work with Wang, Zhou and Tang.
Weijie Su,
University of Pennsylvania
Title: Gaussian
differential privacy
Abstract: Privacy-preserving data
analysis has been put on a firm mathematical foundation
since the introduction of differential privacy (DP) in 2006,
with successful deployment in iOS and Chrome lately. This
privacy definition, however, has some well-known weaknesses:
notably, it does not tightly handle composition. This
weakness has inspired several recent relaxations of
differential privacy based on Renyi divergences. We propose
an alternative relaxation of differential privacy, which we
term "f-DP", which has a number of nice properties and
avoids some of the difficulties associated with divergence
based relaxations. First, it preserves the hypothesis
testing interpretation of differential privacy, which makes
its guarantees easily interpretable. It allows for lossless
reasoning about composition and post-processing, and
notably, a direct way to analyze privacy amplification by
subsampling. We define a canonical single-parameter family
of definitions within our class is termed "Gaussian
Differential Privacy", based on the hypothesis testing
region defined by two Gaussian distributions. We show that
this family is focal by proving a central limit theorem,
which shows that the privacy guarantees of -any-
hypothesis-testing based definition of privacy (including
differential privacy) converges to Gaussian differential
privacy in the limit under composition. This central limit
theorem also gives a tractable analysis tool. We demonstrate
the use of the tools we develop by giving an improved
analysis of the privacy guarantees of noisy stochastic
gradient descent. This is joint work with Jinshuo Dong and
Aaron Roth.
Jonathan
Taylor, Stanford University
Title: Inference
after selection through a black box
Abstract: We consider the problem of
inference for parameters selected for reporting only after
some algorithm, the canonical example be- ing inference for
model parameters after a model selection procedure. The
conditional correction for selection requires knowledge of how
the selection is affected by changes in the underlying data,
and much current re- search describes this selection
explicitly. In this work, we assume 1) we have have access, in
silico, to the selection algorithm itself and 2) for
parameters of interest, the data input into the algorithm
satisfies (pre-selection) a central limit theorem jointly with
an estimator of our parameter of interest. Under these
assumptions, we recast the problem into a statistical learning
problem which can be fit with off-the-shelf models for binary
regression. We consider two examples previously out of reach
of this conditional approach: stability selection and
inference after multiple runs of Model-X knockoffs.
Rob Tibshirani,
Stanford University
Title: Prediction
and outlier detection: a distribution-free prediction set with
a balanced objective
Abstract: We consider the multi-class
classification problem in the unmatched case where the
training data and the out-of-sample data may have different
distributions and propose a method called BCOPS
(balanced \& conformal optimized prediction set) that
constructs prediction sets $C(x)$ at each $x$ in the out-of
sample data. The method tries to optimize out-of-sample
performance, aiming to include the correct class as often as
possible, but also detecting outliers $x$, for which the
method returns no prediction (corresponding to $C(x)$ equal to
the empty set.)
BCOPS combines supervised-learning algorithms with the
conformal prediction to minimize the misclassification loss
over the distribution of the unlabeled out-of-sample data in
the offline setting, and over a proxy of the out-of-sample
distribution in the online setting. The constructed prediction
sets have a finite-sample coverage guarantee without
distributional assumptions. We also describe new methods for
the evaluation of out-of-sample performance in this unmatched
case. We prove asymptotic consistency and efficiency of the
proposed methods under suitable assumptions and illustrate
them in real data examples. Joint work with Leying Guan, Yale
University.
Ryan Tibshirani,
Carnegie Mellon University
Title: What deep
learning taught me about linear models
Abstract: Related to this paper: http://www.stat.cmu.edu/~ryantibs/papers/lsinter.pdf
. Joint work with Trevor Hastie, Andrea Montanari and Saharon
Rosset.
Jingshen Wang, UC Berkeley
Title: TBA
Abstract: TBA
Daniel Yekutieli,
Tel Aviv University
Title: TBA
Abstract: TBA
Alastair Young,
Imperial College London
Title: Challenges
for (Bayesian) selective inference
Abstract: The `condition on selection'
approach to selective inference is compelling, for both
frequentist and Bayesian contexts, and strongly supported by
classical, Fisherian, arguments. Yet, significant practical
and conceptual challenges remain. Our purpose in this talk is
to provide discussion of key issues, with the aim of providing
a clear, pragmatic perspective on the selective inference
problem, principally from a Bayesian angle. Assuming a
framework in which selection is performed on a randomized
version of sample data, several questions demand attention.
How much should we condition? How can the computational
challenge of Bayesian selective inference be met most
effectively? What if the selection condition is imprecise?
Should the selection condition be altered for application to
randomised data? Joint work with Daniel Garcia Rasines.
Linda Zhao,
University of Pennsylvania
Title: Nonparametric
empirical Bayes methods for sparse, noisy signals
Abstract: We consider the high
dimensional signal recovering problems. The goal is to
identify the true signals from the noise with precision.
Nonparametric empirical Bayesian schemes are proposed and
investigated. The method adapts well to varying degrees
of sparsity. It not only performs well to recover the signals,
but also provides credible intervals. A false discovery rate
control method is introduced with our flexible nonparametric
empirical Bayes schemes. The setup is built upon normal
distribution with heteroskedastic variance but well-adapted to
exponential family distributions. EM algorithm and other first
order optimization methods are used and studied. Simulations
show that our method outperforms existing ones. Applications
in microarray data as well as sport data such as predicting
batting averages in L. Brown (2008) will be discussed. This is
joint work with Junhui Cai.
Posters
Stephen Bates,
Stanford University
Title: TBA
Abstract: TBA
Zhiqi Bu,
University of Pennsylvania
Title: SLOPE is
better than LASSO: estimation and inference of SLOPE via
approximate message passing
Abstract: In high-dimensional problem of
reconstructing a sparse signal via the sorted L1 penalized
estimation, or SLOPE, we apply the approximate message passing
(AMP) to SLOPE minimization problem. We derive the AMP
algorithm and state evolution respectively. We then rigorously
prove that AMP solution converges to SLOPE minimization
solution as iteration increases. We also use the state
evolution for non-separable functions to asymptotically
characterize the SLOPE solution. As a consequence, AMP and
state evolution allow us to conduct inference on the SLOPE
solution and demonstrate cases where SLOPE is better than
LASSO (which is a special case of SLOPE). Our first result is
the trade-off between false and true positive rates or,
equivalently, between measures of type I and type II errors
along the SLOPE path. Especially, LASSO is known to suffer
from Donoho-Tanner phase transition where TPP may be bounded
away from 1. In contrast, SLOPE overcomes such phase
transition and one of the path can be nicely characterized as
a Mobius transformation. Our second result considers fixed
signal prior distribution and constructs SLOPE path that has
better TPP, FDP and MSE at the same time.
Hongyuan Cao, Florida
State University
Title: TBA
Abstract: TBA
Paromita Dubey,
UC Davis
Title: Frechet
analysis of variance and change point detection for random
objects
Abstract: With an increasing abundance of
complex non-Euclidean data, settings where data objects are
assumed to be random variables taking values in a metric space
are more frequently encountered. We propose a k-sample test
for samples of random objects using Frechet mean and variance
as generalizations of the notions of center and spread for
metric space valued random variables. Our method is free of
tuning parameters and is inspired from classical ANOVA, where
traditionally groupwise variances are compared to draw
inference regarding the mean. The proposed test is consistent
and powerful against contiguous alternatives addressing both
location and scale differences, which is captured using
Frechet means and variances. Theoretical challenges are
addressed using very mild assumptions on metric entropy,
making our method applicable to a broad class of metric
spaces, including networks, covariance matrices, probability
distributions etc. Inspired by the test, we develop a method
for estimation and testing of a change point in the
distribution of a sequence of independent data objects. Change
points are viewed as locations in a data sequence where the
distribution changes either in terms of Frechet mean and or
variance. We obtain the asymptotic distribution of the test
statistic under the null hypothesis of no change point. We
provide theoretical guarantees for consistency of the test
under contiguous alternatives when a change point exists and
for consistency of the estimated location of the change point.
We illustrate the new approach with detecting change points in
sequences of maternal fertility distributions.
Yinqiu He,
University of Michigan
Title: Likelihood
ratio test in multivariate linear regression: from low to high
dimension
Abstract: When testing the structure of
the regression coefficients matrix in multivariate linear
regressions, likelihood ratio test (LRT) is one of the most
popular approaches in practice. Despite its popularity, it is
known that the classical chi-square approximations for LRTs
often fail in high-dimensional settings, where the dimensions
of responses and predictors (m, p) are allowed to grow with
the sample size n. Though various corrected LRTs and other
test statistics have been proposed in the literature, the
fundamental question of when the classic LRT starts to fail is
less studied. We first give the asymptotic boundary where the
classic LRT fails and develops the corrected limiting
distribution of the LRT for a general asymptotic regime. We
then study the test power of the LRT in the high-dimensional
setting, and develops a power-enhanced LRT. Lastly, when
p>n, where the LRT is not well-defined, we propose a
two-step testing procedure by first performing dimension
reduction and then applying the proposed LRT. Theoretical
properties are developed to ensure the validity of the
proposed method. Numerical studies are also presented to
demonstrate its good performance.
David Hong, University
of Michigan
Title: Asymptotic
eigenstructure of weighted sample covariance matrices for
large dimensional low-rank models with heteroscedastic noise
Abstract: TBA
Byol Kim,
University of Chicago
Title: TBA
Abstract: TBA
John Kolassa,
Rutgers University
Title: TBA
Abstract: TBA
Lihua Lei, UC
Berkeley
Title: The
Bag-Of-Null-Statistics procedure: an adaptive framework for
selecting better test statistics
Abstract: Classical multiple testing
procedure often suffers from the curse of dimensionality. As
the dimension increases, a traditional method that uses an
agnostic p-value transformation is likely to fail to
distinguish between the null and alternative hypotheses. In
this work, we propose the Bag-Of-NUll-Statistic (BONuS)
procedure, an adaptive procedure for multiple testing for
mulvariate data, which helps improve the testing power while
controlling the false discovery rate (FDR). Contrary to
procedures that start with a set of p-values, our procedure
starts with the original data, and adaptively finds a more
powerful test statistic. It always controls FDR, works for a
fairly general setting, and can gain a higher power compared
to agnostic tests under mild conditions. In addition, with
certain implementation techniques (Double BONuS), we can
guarantee in probability that its performance is at least as
good as the agnostic test.
Cong Ma, Princeton
University
Title: Inference
and uncertainty quantification for noisy matrix completion
Abstract: Noisy matrix completion aims at
estimating a low-rank matrix given only partial and corrupted
entries. Despite substantial progress in designing efficient
estimation algorithms, it remains largely unclear how to
assess the uncertainty of the obtained estimates and how to
perform statistical inference on the unknown matrix (e.g.
constructing a valid and short confidence interval for an
unseen entry). This work takes a step towards inference and
uncertainty quantification for noisy matrix completion. We
develop a simple procedure to compensate for the bias of the
widely used convex and nonconvex estimators. The resulting
de-biased estimators admit nearly precise non-asymptotic
distributional characterizations, which in turn enable optimal
construction of confidence intervals/regions for, say, the
missing entries and the low-rank factors. Our inferential
procedures do not rely on sample splitting, thus avoiding
unnecessary loss of data efficiency. As a byproduct, we obtain
a sharp characterization of the estimation accuracy of our
de-biased estimators, which, to the best of our knowledge, are
the first tractable algorithms that provably achieve full
statistical efficiency (including the pre-constant). The
analysis herein is built upon the intimate link between convex
and nonconvex optimization -- an appealing feature recently
discovered by [CCF+19].
Matteo Sesia,
Stanford University
Title: Multi-resolution
localization of causal variants across the genome
Abstract:
We present KnockoffZoom, a
flexible method for the genetic mapping of complex traits at
multiple resolutions. KnockoffZoom localizes causal variants
precisely and provably controls the false discovery rate using
artificial genotypes as negative controls. Our method is
equally valid for quantitative and binary phenotypes, making
no assumptions about their genetic architectures. Instead, we
rely on well-established genetic models of linkage
disequilibrium. We demonstrate that our method can detect more
associations than mixed effects models and achieve
fine-mapping precision, at comparable computational cost.
Lastly, we apply KnockoffZoom to data from 350k subjects in
the UK Biobank and report many new findings.
Nicholas Syring,
Washington University in St. Louis
Title: TBA
Abstract: TBA
Armeen Taeb,
Caltech
Title:
TBA
Abstract:
TBA
Hua Wang,
University of Pennsylvania
Title: The
simultaneous inference trade-off analysis on Lasso path
Abstract:
In high dimensional linear
regression settings where explanatory variables have very low
correlations and the true effective variables are sparse, each
of large magnitude, it is expected that Lasso can find those
true variables with few mistakes - if any. However, recent
study suggest this is not the case in a regime of linear
sparsity where the fraction of true effective variables tends
to a constant, however small, even when the design is
independent Gaussian. We further demonstrate that true
features and null features are always inevitably interspersed
on the Lasso path, and this effect can even get worse when the
effect sizes are uniformly larger. We derive a complete
diagram reveals all possible trade-off between false and true
positive rates, or, equivalently, trade-off between measures
of type I and type II errors along the Lasso path. And such
diagram gives sharp upper and lower bounds which are sharp in
the global sense. We reveal that even though the trade-off is
inevitable, but the finer level of trade-off is not determined
by the absolute magnitude of the effect sizes but mainly due
to the relatively closeness between the effect sizes of true
variables. The best case among those trade-offs is when we
have all effect variables have very distinct magnitudes, and
there is always a price to pay the effect sizes of the true
signals are close to each other, which we interpret as ``the
price of competition'', namely the cost due to the vast
competition between comparable signals. Our analysis uses
tools from approximate message passing (AMP) theory as well as
novel elements to deal with a possibly adaptive selection of
the Lasso regularizing parameter and massive conditioning
techniques.
Yuling Yan,
Princeton University
Title: Noisy matrix
completion: understanding statistical guarantees for convex
relaxation via nonconvex optimization
Abstract:
This paper studies noisy
low-rank matrix completion: given partial and corrupted
entries of a large low-rank matrix, the goal is to estimate
the underlying matrix faithfully and efficiently. Arguably one
of the most popular paradigms to tackle this problem is convex
relaxation, which achieves remarkable efficacy in practice.
However, the theoretical support of this approach is still far
from optimal in the noisy setting, falling short of explaining
the empirical success. We make progress towards demystifying
the practical efficacy of convex relaxation {vis-a-vis} random
noise. When the rank of the unknown matrix is a constant, we
demonstrate that the convex programming approach achieves
near-optimal estimation errors --- in terms of the Euclidean
loss, the entrywise loss, and the spectral norm loss --- for a
wide range of noise levels. All of this is enabled by
bridging convex relaxation with the nonconvex Burer--Monteiro
approach, a seemingly distinct algorithmic paradigm that is
provably robust against noise. More specifically, we show that
an approximate critical point of the nonconvex formulation
serves as an extremely tight approximation of the convex
solution, allowing us to transfer the desired statistical
guarantees of the nonconvex approach to its convex
counterpart.
Yubai Yuan,
University of Illinois Urbana-Champaign
Title: High-order
embedding for hyperlink network prediction
Abstract:
In this poster, we are
interested in formulating multi-layer networks arising from
multiple structured relationships among vertices. This type of
network system has a unique feature in that the links
connecting vertices from a subgroup within or across layers of
network might be correlated. We propose a novel hyperlink
embedding to encode the potential subgroup structure of
vertices into latent space to capture the local link
dependency for the purpose of link inference. In addition, we
utilize tensor decomposition to reduce the dimensionality of
the high-order subgroup similarity modeling. Furthermore, to
achieve the hyperlink selection from a set of potential
candidates, we adopt regularizations to reinforce local
concordances among vertices for subgroup structure
identifications. The major advantage is that the proposed
method is able to perform hyperlink predictions through
observed pairwise links and underlying high-order subgroup
structure in latent space. Also this subgroup structure
enables pairwise link inference to borrow information through
the within-subgroup dependency. Numerical studies indicate
that the proposed method improves both hyperlink and pairwise
link prediction accuracy compared to existing popular link
prediction algorithms.
Xiaorui Zhu,
University of Cincinnati
Title: Simultaneous
confidence intervals using entire solution paths
Abstract:
An ideal simultaneous
confidence intervals for model selection should practically
provide important insights into the variable selection
results. In this paper, we propose a general approach to
construct simultaneous confidence intervals based on variable
selection method and residual bootstraps. Our simultaneous
confidence intervals has two features that are nearly
achievable in other methods: (1) among all available
approaches, it is the tightest one that can achieve the
nominal confidence level simultaneously; (2) it shrinks
intervals of most regression coefficients to zero width.
Because only a small set of intervals have nonempty intervals,
the simultaneous confidence intervals imply the inference of
variable selection. In addition, we invent an graphical tool
(named as Simultaneous Confidence Tube, SCT) to intuitively
manifest the estimation and variable selection information.
The theoretical properties of the simultaneous confidence
intervals have been developed. We then conduct numerical
studies and real data applications to illustrate the
advantages of the simultaneous confidence intervals and SCT
that are proposed in this article.