Statistics Seminar - Fall 2020
Seminars are held from 4:00 - 5:00pm, unless noted otherwise. Location: WebEx (Room Information will be posted at a later date)
For questions about the seminar schedule, please contact Zuofeng Shang
October 12
Sumit Mukherjee, Columbia University
Joint Estimation of Parameters in Ising Models
Inference in the framework of Ising models has received significant attention in Statistics and Machine Learning in recent years. In this talk we study joint estimation of the inverse temperature parameter $\beta$, and the magnetization parameter $B$, given one realization from the Ising model, under the assumption that the underlying graph of the Ising model is completely specified. We show that if the graph is either irregular or sparse, then both the parameters can be estimated at rate $n^{-1/2}$ using Besag’s pseudo-likelihood. Conversely, if the underlying graph is dense and regular, we show that no consistent estimates exist for $(\beta,B)$.
This is joint work with Promit Ghosal from MIT.
November 5
Pan Xu, New Jersey Institute of Technology
Matching Algorithms in E-Commerce
Matching is a fundamental model in combinatorial optimization. During the last decade, stochastic versions of matching models have seen broad applications in various matching markets emerging in E-Commerce. In this talk, I will first present several basic matching models and related fundamental algorithmic frameworks. Then I will survey new challenges and our corresponding algorithmic solutions when we apply matching models to different real-world matching markets, including crowdsourcing marketplaces (e.g., Amazon Mechanical Turk), ridesharing platforms (e.g., Uber and Lyft), online food-ordering platforms (e.g., Grubhub), and online recommendation systems (e.g., Amazon recommendations).
November 12
Yang Feng, New York University
RaSE: Random Subspace Ensemble Classification
We propose a new model-free ensemble classification framework, Random Subspace Ensemble (RaSE), for sparse classification. In the RaSE algorithm, we aggregate many weak learners, where each weak learner is a base classifier trained in a subspace optimally selected from a collection of random subspaces. To conduct subspace selection, we propose a new criterion, ratio information criterion (RIC), based on weighted Kullback-Leibler divergences. The theoretical analysis includes the risk and Monte-Carlo variance of RaSE classifier, establishing the weak consistency of RIC, and providing an upper bound for the misclassification rate of RaSE classifier. An array of simulations under various models and real-data applications demonstrate the effectiveness of the RaSE classifier in terms of low misclassification rate and accurate feature ranking. The RaSE algorithm is implemented in the R package RaSEn on CRAN.
This is joint work with Ye Tian.
November 19
Ruiqi Liu, Texas Tech University
A Computationally Efficient Classification Algorithm in Posterior Drift Model: Phase Transition and Minimax Adaptivity
In massive data analysis, training and testing data often come from very different sources, and their probability distributions are not necessarily identical. A feature example is nonparametric classification in posterior drift model where the conditional distributions of the label given the covariates are possibly different. In this paper, we derive minimax rate of the excess risk for nonparametric classification in posterior drift model in the setting that both training and testing data have smooth distributions, extending a recent work by Cai and Wei (2019) who only impose smoothness condition on the distribution of testing data. The minimax rate demonstrates a phase transition characterized by the mutual relationship between the smoothness orders of the training and testing data distributions. We also propose a computationally efficient and data-driven nearest neighbor classifier which achieves the minimax excess risk (up to a logarithm factor). Simulation studies and a real-world application are conducted to demonstrate our approach.
December 3
Fangfang Wang, Worcester Polytechnic Institute
On Modelling High-dimensional Continuous-Time Vector Time Series via Latent CARMA Processes
In this talk, we will introduce a structural continuous-time factor model for analyzing a large panel of continuous-time processes. The latent factors are parameterized by Continuous-time AutoRegressive and Moving Average (CARMA) processes. Statistical properties of this model, including model identification and calibration, are presented. Precisely speaking, we address the issues of the identifiability of common and idiosyncratic components and the identification of parameters using continuous record of the processes as well as the discrete-time observations. The model calibration involves two steps. We first estimate the latent factors and their loading matrices via principal component analysis on the basis of closely sampled observations, followed by the calibration of the CARMA processes via Kalman filter. In addition, we present consistent estimators for the number of common factors and the orders of the latent CARMA processes. For illustration, the proposed methodology is applied to analyze the urban particulate matter distributions in the town of Perugia in Italy.
Updated: December 1, 2020