Professor Xiaohui Chen of Statistics has been awarded a prestigious National Science Foundation (NSF) CAREER Award for his work titled as "Computer-Intensive Statistical Inference on High-Dimensional and Massive Data: From Theoretical Foundations to Practical Computations." NSF CAREER awards are one of the most distinguished awards a junior faculty member can receive as the recipient has to demonstrate an exemplary regard to the role of teacher-scholar through integrating research and education within the context of their organization's mission statement. NSF grants these highly competitive awards only once a year to early-career faculty whose activities have built a firm foundation for a lifetime of leadership in combining education and research.
For additional details on the award please see NSF.gov
Professor Chen's abstract is as follows:
In an era of Big Data, computer-intensive statistical inference faces unprecedented challenges and opportunities. High-dimensional and massive data are now emerging in scientific areas including biomedical engineering, environmental science, financial econometrics, array signal processing, and social networks, among many others. An important associated research challenge is to develop efficient methods to extract information and quantify its uncertainty for a large number of variables and measurements. Data-driven statistical inferential procedures for uncertainty quantification via the bootstrap methods are often computationally intensive for high-dimensional large-scale datasets. On the computational side, this research project will make use of distributed inference via the parallel high-performance computing technique, which is an essential ingredient to speed up bootstrap calculations. On the statistical side, this research will introduce a general framework for studying the performance of various bootstrap methods. This research project aims to lead to a comprehensive understanding of the fundamental tradeoff between statistical and computational concerns in quantifying uncertainty for a broad class of inferential procedures, thus providing guidance to practically optimize statistical accuracy and computational cost in potential real applications. Both undergraduate and graduate students are involved in the project.
The overarching goal of this research project is to provide new insights and deepen the theoretical understanding of strengths and fundamental limitations of fully data-dependent inferential procedures (such as bootstraps) in the high-dimensional and massive data framework on two classical problems: i) change point detection and identification; ii) computationally-aware statistical inference for U-statistics. The research aims to develop statistically correct and computationally scalable inferential procedures when the dimension can be larger (or even much larger) than the sample size. In contrast to existing work, the methods under development have strong theoretical guarantees, are robust under mild assumptions, require no tuning, and are easy to parallelize. Of practical interest, the research will develop needed software tools for researchers from disciplines with applications of high-dimensional and nonparametric statistics. Theoretical contributions of the proposed research include establishing new approximation and coupling theorems (under weaker regularity conditions than existing literature) in high-dimensional and infinite-dimensional spaces of increasing dimension and complexity, where classical probability tools such as the central limit theorem and extreme value theory are no longer applicable. The mathematical theory is of independent interest and will provide powerful new tools to analyze other statistical procedures on high-dimensional and nonparametric models.