About Me
I am a fourth-year Ph.D. student in Statistics with research interests in machine learning. My work focuses on explainable artificial intelligence and developing methods to better understand black-box models. I am particularly interested in bridging the gap between predictive performance and interpretability to ensure machine learning models can be both accurate and transparent.
I work under the supervision of Professor Lucas Mentch
Currently I spend a lot of time thinking about...
- Model interpretability and transparency
- Why do Random Forests work so well?
- Large Language Models and their short and long term impact on our society
Education
Ph.D.
Master's Degree
Bachelor's Degree
Research
My research primarily focuses on explainable machine learning, though I also pursue projects in a variety of other domains. Current projects include:
Random Forests
My research investigates why random forests perform well across different signal-to-noise ratio (SNR) regimes. By studying the mechanisms that drive their success, I aim to uncover how random forests adapt to varying data environments and what this reveals about the role of regularization in machine learning.
Statistics Department Rankings
This project develops advanced statistical methods for ranking academic statistics departments. A key component of the work is creating new theoretical frameworks for understanding the statistics of tiers, which allow for more principled and robust comparisons across institutions.
Semantic similarity within ensemble tree methods
This project explores methods for grouping decision trees within ensemble tree models, based on semantic similarity. By identifying and analyzing clusters of trees that capture related predictive structures, the work aims to provide new insights into how ensembles represent information, improve interpretability, and potentially enhance model efficiency.
Other Projects
Bayesian Statistics Project: Predicting OPS
This project implements a Bayesian Hidden Markov Model (HMM) to model the career performance trajectories of MLB players, specifically focusing on On-base Plus Slugging (OPS). Markov Chain Monte Carlo (MCMC) simulations were utilized to estimate the posterior distributions of latent performance states, allowing the identification of elite player cohorts. The methodological framework adapts the modeling approach established by Jensen et al. (2009).
Legislative Transparency and Prediction
Motivated by the opacity of the U.S. legislative process stemming from the immense volume of annual bill proposals, this research developes a predictive framework utilizing Natural Language Processing (NLP) to forecast bill enactment. The methodology incorporates BERT word embeddings and applies advanced machine learning techniques to bag-of-words models. Results indicate that this approach predicts legislative passage with a high degree of accuracy.
NFL Big Data Bowl 2026: Swooping In
This project introduces a machine learning framework to evaluate NFL secondary players by analyzing spatial positioning and tracking data. By employing gradient-boosted models, the researchers quantify specific "Anticipation" (pre-throw) and "Reaction" (post-throw) metrics to calculate a defender's probability of successfully impacting a play. The study further validates player reliability by applying conformal prediction to establish lower bounds on success probabilities, ensuring that high performance metrics reflect consistent skill rather than statistical noise.
Consulting
I have extensive consulting experience, having worked on more than 10 projects spanning a wide range of fields, including:
- Natural Language Processing
- Computer Vision
- Psychiatry
- Nursing Research
- Neuroscience
- Education Research
- Biology
Here are a few consulting projects I’ve worked on, described in greater detail (specific papers and author names are omitted for confidentiality).
Evaluating large language models capacity to reason
Doctor bias against children with type 2 Diabetes
Covid19 Vaccine Resistance
Professional Services Offered
• Data analysis and statistical modeling
• Data cleaning
For consulting inquiries, please contact me directly to discuss your project needs and availability.
Teaching
I have over three years of experience teaching at both the undergraduate and graduate levels.
Data Modeling Using R (EE 5373)
Instructor: Intro course in data sceience for graduate students in the electrical engineering department, I have fully integrated LLM usage into this course and have expanded the content to include many different machine learning concepts as well.
Statistical Learning and Data Science (STAT 1361/2360)
TA: Advanced course on Data science and machine learning for senior undergraduates and graduate students.
Applied Statistical Methods (STAT 1000)
Instructor/TA: Intro course in statistics for undergraduates.
Curriculum Vitae
Download CV (PDF)Last updated: 9/2025
Contact
I welcome opportunities for collaboration, consulting projects, and academic discussions. Feel free to reach out through any of the following channels: