Statistics Seminar - Fall 2025
Seminars are held on Thursdays from 1:00 - 2:00pm on Zoom unless otherwise noted. For access information, please contact the Math Department.
Seminars are held on Thursdays from 1:00 - 2:00pm on Zoom unless otherwise noted. For access information, please contact the Math Department.
For questions about the seminar schedule, please contact Chenlu Shi and Yan Sun
*******************************************************************************************************
Data, Architecture & Algorithms in In‑Context Learning
This talk introduces recent theoretical advancements on the in-context learning (ICL) capability of sequence models, focusing on the intricate interplay of data characteristics, architectural design, and the implicit algorithms models learn. We discuss how diverse architectural designs—ranging from linear attention to state-space models to gating mechanisms—implicitly emulate optimization algorithms that operate on the context and draw connections to variations of gradient descent and expectation maximization. We elucidate the critical influence of data characteristics, such as distributional alignment, task correlation, and the presence of unlabeled examples, on ICL performance, quantifying their benefits and revealing the mechanisms through which models leverage such information. Furthermore, we will explore the optimization landscapes governing ICL, establishing conditions for unique global minima and highlighting the architectural features (e.g., depth and dynamic gating) that enable sophisticated algorithmic emulation. As a central message, we advocate that the power of architectural primitives can be gauged from their capability to handle in-context regression tasks with varying sophistication.
Homepage: https://yingcong-li.github.io/
*******************************************************************************************************
Distribution-free inference: toward conditional inference
We discuss the problem of distribution-free conditional predictive inference. Prior work has established that achieving exact finite-sample control of conditional coverage without distributional assumptions is impossible, suggesting the need for relaxed settings or targets. In Part 1, we consider data with a hierarchical structure and discuss possible targets of conditional predictive inference under repeated measurements. We show that the $L^k$-norm of the conditional miscoverage rate can generally be controlled and provide procedures that achieve this coverage guarantee. In Part 2, we turn to the standard i.i.d. setting and introduce an inferential target motivated by the multiaccuracy condition, which enables conditional inference with an interpretable guarantee. Our method controls the $L^k$-norm of a relaxed notion of the conditional miscoverage rate, with a finite-sample, distribution-free guarantee.
Homepage: https://yhoon31.github.io/index.html
*******************************************************************************************************
Making neural networks tell you what information they use
While artificial neural networks are undoubtedly excellent information processing systems, their predominant formulation as deterministic point-to-point transformations makes it hard to say anything about what they actually do with information from the perspective of information theory. By inserting a probabilistic representation space into the system -- nothing more complicated than what you'd find in a variational autoencoder (VAE) -- we can quantify and characterize all information passing through the space. In this talk, we'll view such a space as an optimizable communication channel, and then construct communication networks that reveal where information resides in the original data and how different architectures process it. We'll close by exploring ways to characterize how information is organized in these learned spaces.
Homepage: https://kieranamurphy.com/
*******************************************************************************************************
Convergence of Markov Chains for Stochastic Gradient Descent
Stochastic gradient descent (SGD) is a popular algorithm for minimizing objective functions that arise in machine learning. For constant step-size SGD, the iterates form a Markov chain on a general state space. Focusing on a class of nonconvex objective functions, we establish a “Doeblin-type decomposition,” in that the state space decomposes into a uniformly transient set and a disjoint union of absorbing sets. Each absorbing set contains a unique invariant measure, with the set of all invariant measures being the convex hull. Moreover, the set of invariant measures is shown to be a global attractor of the Markov chain with a geometric convergence rate. The theory is highlighted with examples that show (1) the failure of the diffusion approximation to characterize the long-time dynamics of SGD; (2) the global minimum of an objective function may lie outside the support of the invariant measures (i.e., even if initialized at the global minimum, SGD iterates will leave); and (3) bifurcations may enable the SGD iterates to transition between local minima.
Homepage: https://scholar.google.com/citations?hl=en&user=BNXFcFQAAAAJ
*******************************************************************************************************
Two Stories in Quantile Regression: Fast Computation and Imbalanced Classification
Quantile regression is a popular and powerful tool in statistics and econometrics. In this talk, I will tell two complementary stories for advancing high-dimensional learning at intersection of quantile regression, optimization, and imbalanced classification. First, I will introduce a finite smoothing algorithm (FSA) to convert the non‑smooth losses of quantile regression into smooth surrogates that admit fast coordinate‑descent updates while retaining exact solutions despite smoothing. Simulations and benchmarks show FSA delivers orders‑of‑magnitude speedups with equal or better accuracy, and it is available in open‑source R packages hdsvm and hdqr. Second, I will introduce Quantile‑based Discriminant Analysis (QuanDA), which builds upon a novel while fundamental connection with quantile regression and naturally accounts for class imbalance through appropriately chosen quantile levels. QuanDA is theoretically validated in ultra‑high‑dimensional regimes and, in extensive studies, outperforms baselines including cost‑sensitive large‑margin classifiers, random forests, and SMOTE.
Homepage: https://stat.uiowa.edu/people/boxiang-wang
*******************************************************************************************************
Using Graphical Neural Networks(GNN) and Spatial-Temporal GNN in Earthquake Source Characterization
In recent years, there’s been a growing interest in applying machine learning and deep learning methods to analyze seismic signals and improve earthquake monitoring. In this talk, I will present two papers that employ Graphical Neural Network (GNN) and Spatial-Temporal Graph Neural Network (STGNN) for earthquake source characterization respectively.
Both studies recognize that while traditional seismology heavily relies on the geographic relationships between seismic stations, most machine learning approaches overlook this crucial spatial information. The first paper introduces a GNN framework that explicitly incorporates station locations, demonstrating that even modestly-sized networks can outperform location-agnostic methods in predicting earthquake locations and magnitudes. The second paper builds on this foundation with a Spatiotemporal Graph Neural Network (STGNN) that dynamically constructs graphs based on both geographical distances and waveform similarities. When tested on seismic data from Southern California and Oklahoma, both approaches show significant improvements in epicenter estimation compared to conventional methods, while maintaining competitive performance for depth and magnitude prediction.
Homepage: https://www.linkedin.com/in/tian-lu-xue-910b0723/
*******************************************************************************************************
Double Validity in Linear Regression: Rethinking Robust Inference When Error Structures Are Unknown
Linear regression is arguably the most widely used statistical method. Under fixed regressors and errors with particular dependence structures, the conventional wisdom is to adjust the variance-covariance estimator to accommodate the assumed error structures, such as the Eicker-Huber-White standard error for heteroskedasticity in errors, Liang-Zeger standard error for clusters in errors, and heteroskedasticity and autocorrelation consistent standard error (HAC) for serial correlation in errors. However, the chosen standard error adjustment does not necessarily reflect the true underlying error dependence.
We depart from the traditional framework by showing that $t$-statistics based on these standard error adjustments remain valid, even under unknown and misspecified error structures, provided the regressors exhibit the structure that the corresponding standard error is designed to accommodate. In contrast to the classical viewpoint, where the validity of $t$-statistics arise from modeling the structure of the errors, our results reveal a theoretical symmetry between the roles of regressors and errors, a property we refer to as ``double validity". As long as the correlation structure in either the regressor of interest or the error terms is correctly captured by the standard error adjustment, valid inference can be achieved. Building on this theoretical insight, we provide practical evidence suggesting that inference procedures should favor modeling the structure in the regressor rather than in the error terms, because the error-generating mechanism is unknown and difficult to specify accurately, whereas the structure of the regressor is often more tractable. We refer to this phenomenon as ``practical asymmetry".
Homepage: https://jornzhang.github.io/
*******************************************************************************************************
Updated: December 2, 2025