This is a sample blog post. Lorem ipsum I can’t remember the rest of lorem ipsum and don’t have an internet connection right now. Testing testing testing this blog post. Blog posts are cool.
This is a sample blog post. Lorem ipsum I can’t remember the rest of lorem ipsum and don’t have an internet connection right now. Testing testing testing this blog post. Blog posts are cool.
This is a sample blog post. Lorem ipsum I can’t remember the rest of lorem ipsum and don’t have an internet connection right now. Testing testing testing this blog post. Blog posts are cool.
This is a sample blog post. Lorem ipsum I can’t remember the rest of lorem ipsum and don’t have an internet connection right now. Testing testing testing this blog post. Blog posts are cool.
asus - An R package that implements the ASUS (Adpative SURE thresholding with Side Information) procedure for estimating a high-dimensional sparse parameter when along with the primary data we can also gather side information from secondary data sources.
fusionclust - An R package for clustering and feature screening in large scale problems. In particular, fusionclust provides the Big Merge Tracker (BMT) and COSCI algorithms for convex clustering and feature screening using an ?1 fusion penalty.
truh - An R package that implements the TRUH test statistic for nonparametric two sample testing under heterogeneity. TRUH incorporates the underlying heterogeneity and imbalance in the samples, and provides a conservative test for the composite null hypothesis that the two samples arise from the same mixture distribution but may differ with respect to the mixing weights. See Trambak Banerjee, Bhaswar B. Bhattacharya, Gourab Mukherjee Ann. Appl. Stat. 14(4): 1777-1805 (December 2020) for more details.
casp - An R package for Coordinate-wise Adaptive Shrinkage Prediction in a high-dimensional non-exchangeable hierarchical Gaussian model with an unknown location as well as an unknown spiked covariance structure.
cezij - A MATLAB toolbox for simultaneous and hierarchical selection of fixed and random effects in high-dimensional penalized generalized linear mixed models.
BSAN 450: Data Mining & Predictive Analytics (undergraduate) - Spring 2021 - 2025
The primary objective of this course is to enable students to explain and perform statistical analysis of data, with the view of being able to critically evaluate statistical reports or findings. The main focus of the course is Time Series analysis and includes an introduction to ARCH and GARCH models. Since Spring 2024, I have added an introduction to Conformal Inference for classification problems into the curriculum. This course relies heavily on computer programming using R and the emphasis is on applications.
BSAN 730: Large Scale Data Analysis (graduate) - Spring 2021 - 2025
In this course I focus on the statistical analysis of large-scale data. Students learn how some well-known statistical tools can be adapted for the analysis of Big Data and how the limitations of classical tools have engineered the development of modern techniques for data analysis. I cover topics such as split and conquer techniques for variable selection, scalable Bootstrap, Conformal Inference and a gentle introduction to large-scale Multiple Testing. This course relies heavily on computer programming using R and the emphasis is primarily on business applications. Special thanks to the guest speakers (Weinan Wang, Aniruddha Neogi, Joshua Derenski, Bradley Rava, Jacob Dice, Sara Almohtasib) who have given a lecture in this class and have shared their unique experiences in managing and analyzing large data.