A list of all the posts and pages found on the site. For you robots out there is an XML version available for digesting as well.

Pages

Posts

Future Blog Post

less than 1 minute read

Published:

This post will show up by default. To disable scheduling of future posts, edit config.yml and set future: false.

Blog Post number 4

less than 1 minute read

Published:

This is a sample blog post. Lorem ipsum I can’t remember the rest of lorem ipsum and don’t have an internet connection right now. Testing testing testing this blog post. Blog posts are cool.

Blog Post number 3

less than 1 minute read

Published:

This is a sample blog post. Lorem ipsum I can’t remember the rest of lorem ipsum and don’t have an internet connection right now. Testing testing testing this blog post. Blog posts are cool.

Blog Post number 2

less than 1 minute read

Published:

This is a sample blog post. Lorem ipsum I can’t remember the rest of lorem ipsum and don’t have an internet connection right now. Testing testing testing this blog post. Blog posts are cool.

Blog Post number 1

less than 1 minute read

Published:

This is a sample blog post. Lorem ipsum I can’t remember the rest of lorem ipsum and don’t have an internet connection right now. Testing testing testing this blog post. Blog posts are cool.

links

publications

Publications and Preprints

Published in , 2009

On this page I have organized my research into the following three categories: Working papers, Publications and Invited Discussions.

Working papers

  1. Harnessing The Collective Wisdom: Fusion Learning Using Decision Sequences From Diverse Sources.
    Banerjee, T., Gang, B., & He, J.
  2. Risk-Shifting, Regulation, and Government Assistance.
    Sharma, P., & Banerjee, T.
    2023 NBER-NSF Seminar on Bayesian Inference in Econometrics and Statistics.
    2025 Fighting a Financial Crisis Conference at Yale SOM.
  3. Nonparametric Empirical Bayes Estimation on Heterogeneous Data.
    Banerjee, T., Fu, L. J., James, G. M., Mukherjee, G., & Sun, W.

Publications (*corresponding author)

  1. Gang, B., & Banerjee, T*. (2025). Large-Scale Multiple Testing of Composite Null Hypotheses Under Heteroskedasticity.
    Biometrika, Volume 112, Issue 2, asaf007. [PDF, Code
    ]
  2. Sharma, P., & Banerjee, T*. (2025). Do financial regulators act in the public's interest? A Bayesian latent class estimation framework for assessing regulatory responses to banking crises.
    Journal of the Royal Statistical Society: Series A, qnae150 [PDF, Code
    ]
  3. Banerjee, T*., & Sharma, P. (2025). Nonparametric Empirical Bayes Prediction in Mixed Models.
    Statistics and Computing, Volume 35, article number 145. [PDF, Code
    ]
  4. Luo, J., Banerjee, T*., Mukherjee, G., & Sun, W. (2025). Empirical Bayes Estimation with Side Information: A Nonparametric Integrative Tweedie Approach.
    forthcoming at Statistica Sinica. [PDF, Code
    ]
  5. Banerjee, T., Bhattacharya, B. B., & Mukherjee, G. (2024). Bootstrapped Edge Count Tests for Nonparametric Two-Sample Inference Under Heterogeneity.
    Journal of Computational and Graphical Statistics, 34(1), 306–317. [PDF, Code
    ]
  6. Banerjee, T., Liu, P., Mukherjee, G., Dutta, S., & Che, H. (2023). Joint modeling of playing time and purchase propensity in massively multiplayer online role-playing games using crossed random effects.
    The Annals of Applied Statistics, 17(3), 2533-2554. [PDF, Code
    ]
  7. Sahu, A., Dutta, A., M Abdelmoniem, A., Banerjee, T., Canini, M., & Kalnis, P. (2021). Rethinking gradient sparsification as total error minimization.
    Advances in Neural Information Processing Systems, 34, 8133-8146.

  8. Banerjee T., Mukherjee G. & Paul D. (2021). Improved Shrinkage Prediction under a Spiked Covariance Structure.
    Journal of Machine Learning Research, 22 (180),1−40. [Code
    ]
  9. Banerjee T., Liu Q., Mukherjee G., & Sun W. (2021). A general framework for empirical Bayes estimation in discrete linear exponential family.
    Journal of Machine Learning Research, 22 (67), 1−46. [Code
    ]
  10. Banerjee T., Bhattacharya B. B., & Mukherjee G. (2020). A nearest-neighbor based nonparametric test for viral remodeling in heterogeneous single-cell proteomic data.
    Annals of Applied Statistics, Volume 14, no. 4, Pages 1777-1805. [Code
    ]
  11. Banerjee, T., Mukherjee, G., Dutta, S., & Ghosh, P. (2020). A large-scale constrained joint modeling approach for predicting user activity, engagement, and churn with application to freemium mobile games.
    Journal of the American Statistical Association, Volume 115, no. 530, Pages 538-554. [PDF, Code
    ]
    Best Paper award: 5th International Conference on Business Analytics and Intelligence, 2017 at Indian Institute of Management, Bangalore.
  12. Banerjee, T., Mukherjee, G., & Sun, W. (2020). Adaptive sparse estimation with side information.
    Journal of the American Statistical Association, 115(532), 2053-2067. [PDF, Code
    ]
    Distinguished Student Paper Award: 2019 ENAR Spring meeting.
    Runner-up: 2017 IISA annual conference student poster competition.
  13. Banerjee, T., Mukherjee, G., & Radchenko, P. (2017). Feature screening in large scale cluster analysis.
    Journal of Multivariate Analysis, 161, 191-212. [Code
    ]
  14. Cavrois, M., Banerjee, T., Mukherjee, G., Raman, N., Hussien, R., Rodriguez, B. A., ... & Roan, N. R. (2017). Mass cytometric analysis of HIV entry, replication, and remodeling in tissue CD4+ T cells.
    Cell reports, 20(4), 984-998.

Invited Discussions

  1. Discussion of CARS: covariate assisted ranking and screening for large-scale two-sample inference by Cai, Sun and Wang.
    Banerjee T and Mukherjee G. Journal of the Royal Statistical Society, Series B (2019), Volume 81, Pages 223-224.

softwares

Software

  1. asus - An R package that implements the ASUS (Adpative SURE thresholding with Side Information) procedure for estimating a high-dimensional sparse parameter when along with the primary data we can also gather side information from secondary data sources.
  2. fusionclust - An R package for clustering and feature screening in large scale problems. In particular, fusionclust provides the Big Merge Tracker (BMT) and COSCI algorithms for convex clustering and feature screening using an ?1 fusion penalty.
  3. truh - An R package that implements the TRUH test statistic for nonparametric two sample testing under heterogeneity. TRUH incorporates the underlying heterogeneity and imbalance in the samples, and provides a conservative test for the composite null hypothesis that the two samples arise from the same mixture distribution but may differ with respect to the mixing weights. See Trambak Banerjee, Bhaswar B. Bhattacharya, Gourab Mukherjee Ann. Appl. Stat. 14(4): 1777-1805 (December 2020) for more details.
  4. casp - An R package for Coordinate-wise Adaptive Shrinkage Prediction in a high-dimensional non-exchangeable hierarchical Gaussian model with an unknown location as well as an unknown spiked covariance structure.
  5. cezij - A MATLAB toolbox for simultaneous and hierarchical selection of fixed and random effects in high-dimensional penalized generalized linear mixed models.

teaching

Teaching

, , 1900

University of Kansas

  1. BSAN 450: Data Mining & Predictive Analytics (undergraduate) - Spring 2021 - 2025

    The primary objective of this course is to enable students to explain and perform statistical analysis of data, with the view of being able to critically evaluate statistical reports or findings. The main focus of the course is Time Series analysis and includes an introduction to ARCH and GARCH models. Since Spring 2024, I have added an introduction to Conformal Inference for classification problems into the curriculum. This course relies heavily on computer programming using R and the emphasis is on applications.

  2. BSAN 730: Large Scale Data Analysis (graduate) - Spring 2021 - 2025

    In this course I focus on the statistical analysis of large-scale data. Students learn how some well-known statistical tools can be adapted for the analysis of Big Data and how the limitations of classical tools have engineered the development of modern techniques for data analysis. I cover topics such as split and conquer techniques for variable selection, scalable Bootstrap, Conformal Inference and a gentle introduction to large-scale Multiple Testing. This course relies heavily on computer programming using R and the emphasis is primarily on business applications. Special thanks to the guest speakers (Weinan Wang, Aniruddha Neogi, Joshua Derenski, Bradley Rava, Jacob Dice, Sara Almohtasib) who have given a lecture in this class and have shared their unique experiences in managing and analyzing large data.