Statistics Seminar - Spring 2017

Seminar Schedule

Seminars are held on Thursdays at 4:00PM. Please note the location for each event in the schedule below, which will either be Cullimore 611 (CULM 611) or the Campus Center in Conference Room 230 (CTR 230). For questions about the seminar schedule, please contact Antai Wang.


​​
Date Location Speaker, Affiliation, and Title Host
February 23 CULM 611 Rong Chen, Dept. of Statistics, Rutgers University
Factor Model for High Dimensional Matrix Valued Time Series

In finance, economics and many other field, observations in a matrix form are often observed over time. For example, many economic indicators are obtained in different countries over time. Various financial characteristics of many companies over time. Although it is natural to turn the matrix observations into a long vector then use standard vector time series models or factor analysis, it is often the case that the columns and rows of a matrix represent different sets of information that are closely interplayed. We propose a novel factor model that maintains and utilizes the matrix structure to achieve greater dimensional reduction as well as easier interpretable factor structure. Estimation procedure and its theoretical properties and model validation procedures are investigated and demonstrated with simulated and real examples.

Joint work with Dong Wang (Rutgers University) and Xialu Liu (San Diego State University)
Antai Wang
March 9 CTR 230 Pierre C. Bellec, Dept. of Statistics, Rutgers University
Optimistic Lower Bounds for Convex Regularized Least-Squares

Minimax lower bounds are pessimistic in nature: for any given estimator, minimax lower bounds yield the existence of a worst-case target vector β^∗_{worst} for which the prediction error of the given estimator is bounded from below. However, minimax lower bounds shed no light on the prediction error of the given estimator for target vectors different than β^∗_{worst}. A characterization of the prediction error of any convex regularized leastsquares is given. This characterization provide both a lower bound and an upper bound on the prediction error. This produces lower bounds that are applicable for any target vector and not only for a single, worst-case β^∗_{worst}.

Finally, these lower and upper bounds on the prediction error are applied to the Lasso is sparse linear regression. We obtain a lower bound involving the compatibility constant for any tuning parameter, matching upper and lower bounds for the universal choice of the tuning parameter, and a lower bound for the Lasso with small tuning parameter.
Antai Wang
March 22
Wednesday*
CTR 235 Zhezhen Jin, Dept. of Biostatistics, Columbia University
Statistical Issues and Challenges in Biomedical Studies

In this talk, I will present statistical issues and challenges that I have encountered in my biomedical collaborative studies of item selection in disease screening, comparison and identification of biomarkers that are more informative to disease diagnosis, and estimation of weights on relatively importance of exposure variables on health outcome.

After a discussion on the issues and challenges with real examples, I will review available statistical methods and present our newly developed methods.
Antai Wang
March 30 CULM 611 Wei Biao Wu, Dept. of Statistics, University of Chicago
Asymptotic Theory for Quadratic Forms of High-Dimensional Data

I will present an asymptotic theory for quadratic forms of sample mean vectors of high-dimensional data. An invariance principle for the quadratic forms is derived under conditions that involve a delicate interplay between the dimension $p$, the sample size $n$ and the moment condition. Under proper normalization, central and non-central limit theorems are obtained. To perform the related statistical inference, I will propose a plug-in calibration method and a re-sampling procedure to approximate the distributions of the quadratic forms. The results will be applied multiple tests and inference of covariance matrix structures.
Yixin Fang & Antai Wang
April 6 CTR 230 Peter X. K. Song, Dept. of Biostatistics, University of Michigan
Regression Analysis of Networked Data

We develop a new regression analysis approach to evaluating associations of covariates with outcomes measured from networks. This development is motivated from a study of infant growth that collects outcomes of event related potentials (ERP, a type of neuroimaging) measured over electroencephalogram (EEG) electrodes on the scalp. We propose a new generalized method of moments (GMM) that incorporates both established and data-driven knowledge of network topology among nodes in the estimation and inference to achieve robustness and efficiency. The GMM approach is computationally fast and stable to handle the regression analysis of network data, and conceptually it is simple with desirable properties in both estimation and inference. Both simulation studies and real EEG data analysis will be presented for illustration.
Yixin Fang & Antai Wang
April 7
Friday*
CTR 230 Yichuan Zhao, Dept. of Mathematics and Statistics, Georgia State U.
A Nonparametric Approach for Partial Areas Under ROC Curves

The receiver operating characteristic (ROC) curve is a well-known measure of the performance of a classification method. Interest may only pertain to a specific region of the curve and, in this case, the partial area under the ROC curve (pAUC) provides a useful summary measure. Related measures such as the ordinal dominance curve (ODC) and the partial area under the ODC (pODC) are frequently of interest as well. Based on a novel estimator of pAUC proposed by Wang and Chang (2011), we develop nonparametric approaches to the pAUC and pODC using normal approximation, the jackknife and the jackknife empirical likelihood. A simulation study demonstrates the flaws of the existing method and shows proposed methods perform well. Simulations also substantiate the consistency of our jackknife variance estimator. The Pancreatic Cancer Serum Biomarker data set is used to illustrate the proposed methods.
Yixin Fang & Antai Wang
April 13 CULM 611 Haiyan Su, Dept. of Mathematical Sciences, Montclair State University
Generalized P-Values for Testing Zero-Variance Components in Linear Mixed-Effects Models

Linear mixed-effects models are widely used in analysis of longitudinal data. However, testing for zero-variance components of random effects has not been well resolved in statistical literature, although some likelihood-based procedures have been proposed and studied. In this article, we propose a generalized p-value based method in coupling with fiducial inference to tackle this problem. The proposed method is also applied to test linearity of the nonparametric functions in additive models. We provide theoretical justifications and develop an implementation algorithm for the proposed method. We evaluate its finite-sample performance and compare it with that of the restricted likelihood ratio test via simulation experiments. The proposed approach is illustrated by using an application from a nutritional study.
Antai Wang
April 20 CTR 230 Hui Zhang, Dept. of Biostatistics, St. Jude Children’s Research Hospital
Major Statistical Challenges in Count Data Analysis

Count data plays an important role in biomedical and clinical research, especially nowadays with the rapid biomedical technique progresses including next generation sequencing. However, using routine statistical model to analyze counts, such as Poisson model, often results in biased or even misleading conclusions. Primary challenges when applying Poisson model to count data include over-dispersion and zero-inflation, which are commonly encountered in practice. In addition, the repeated measures and incomplete observations in modern clinical trials and survey studies add to complexity. This talk will review these challenges in count data analysis and recent methodology progresses made by the speaker's group.
Yixin Fang & Antai Wang
April 27 CULM 611 Hongyuan Cao, Dept. of Statistics, University of Missouri
Analysis of Asynchronous Longitudinal Data with Partially Linear Models

We study partially linear models for asynchronous longitudinal data to incorporate nonlinear time trend effects. Local and global estimating equations are developed for estimating the parametric and nonparametric effects. We show that with a proper choice of the kernel bandwidth parameter, one can obtain consistent and asymptotically normal parameter estimates for the linear effects. Asymptotic properties of the estimated non-linear effects are established. Extensive simulation studies provide numerical support for the theoretical findings. Data from an HIV study are used to illustrate our methodology.
Yixin Fang & Antai Wang

Updated: April 25, 2017