Statistics Seminar - Fall 2023
Seminars are held on Thursdays from 4:00 - 5:00pm on Webex unless otherwise noted. For access information, please contact the Math Department.
For questions about the seminar schedule, please contact Chong Jin
October 26
Zijian Guo, Rutgers University
Robustness Against Weak or Invalid Instruments: Exploring Nonlinear Treatment Models with Machine Learning
We discuss causal inference for observational studies with possibly invalid instrumental variables. We propose a novel methodology called two-stage curvature identification (TSCI) by exploring the nonlinear treatment model with machine learning. The first-stage machine learning enables boosting the instrumental variable's strength and adjusting for different forms of violating the instrumental variable assumptions. The success of TSCI requires the instrumental variable's effect on treatment to differ from its violation form. A novel bias correction step is implemented to remove bias resulting from potentially high complexity of machine learning. Our proposed TSCI estimator is shown to be asymptotically unbiased and normal even if the machine learning algorithm does not consistently estimate the treatment model. We design a data-dependent method to choose the best among several candidate violation forms. We apply TSCI to study the effect of education on earnings.
This is based on a joint work with Mengchu Zheng and Peter Bühlmann.
November 9
Min Zhang, UC Irvine
Integrative Analyses for Genome-wide Gene Regulatory Network Construction
Gene regulatory network construction is crucial to unraveling the genetic architecture of complex diseases. Many methods have been proposed to construct undirected networks by calculating the correlation between genes based on transcriptomic data. However, constructing directed networks with genome-wide genes remains a challenge. Taking advantage of transcriptomic and genomic data, we proposed a two-stage penalized least squares method to build large systems of structural equations for directional network construction. A large system of structural equations can be constructed via consistent estimation of a set of conditional expectations at the first stage, and a consistent selection of regulatory effects was obtained at the second stage. The proposed method can simultaneously investigate all the genes across the entire genome, and the computation is efficient with the parallel implementation. We demonstrate the utility of the approach using both simulation studies and applications to real data.
November 16
Wodan Ling, Biostatistics Division in the Population Health Sciences Department at Weill Cornell Medicine
Statistical analysis of large-scale microbiome-profiling studies: batch effect and robust testing
Emerging large-scale microbiome-profiling studies introduce new opportunities as well as challenges. One challenge inherent to the large sample sizes is the batch effect, which arises from differential processing of specimens and can lead to spurious findings. Most existing strategies for mitigating batch effect rely on approaches designed for genomic analysis, failing to address the zero-inflated and over-dispersed microbiome data. Strategies tailored for microbiome data are restricted to association testing, failing to allow other analytic goals such as visualization. In this talk, we present the Conditional Quantile Regression (ConQuR) approach, the first robust and comprehensive method that accommodates the complex distributions of microbial read counts, and generates batch-removed zero-inflated read counts that can benefit all usual subsequent analyses. We demonstrate its state-of-the-art performance in removing the batch effect of microbiome data while preserving the signals of interest. Another challenge is the reliable biological implication of individual taxa. Classical tests often do not accommodate the realities of microbiome data, leading to power loss. Approaches tailored for microbiome data often have inflated false positive rates, generally due to unsatisfied distributional assumptions. Most extant approaches also fail in the presence of heterogeneous effects. In this talk, we present the zero-inflated quantile (ZINQ) approach, which is robust to complex distributions of microbiome data and improves testing power by summarizing signals over different quantiles of a taxon’s abundance, facilitating detection of heterogeneous effects. We show that ZINQ often has equivalent or higher power compared to existing tests even as it offers better control of false positives.
November 7, 2023