Alexander Dukart

PhD student

Statistics | University of Pittsburgh

About Me

I am a fourth-year Ph.D. student in Statistics with research interests in machine learning. My work focuses on explainable artificial intelligence and developing methods to better understand black-box models. I am particularly interested in bridging the gap between predictive performance and interpretability to ensure machine learning models can be both accurate and transparent.


I work under the supervision of Professor Lucas Mentch


Currently I spend a lot of time thinking about...


Education

Ph.D.

Doctor of Philosophy in Statistics (Expected 2028)
University of Pittsburgh, Pittsburgh, PA

Master's Degree

MA in Statistics 2025 (en route to PhD)
University of Pittsburgh, Pittsburgh, PA

Bachelor's Degree

B.S. Mathematics and Economics 2022
University of Minnesota, Minneapolis, MN

Research

My research primarily focuses on explainable machine learning, though I also pursue projects in a variety of other domains. Current projects include:


Random Forests

My research investigates why random forests perform well across different signal-to-noise ratio (SNR) regimes. By studying the mechanisms that drive their success, I aim to uncover how random forests adapt to varying data environments and what this reveals about the role of regularization in machine learning.

Statistics Department Rankings

This project develops advanced statistical methods for ranking academic statistics departments. A key component of the work is creating new theoretical frameworks for understanding the statistics of tiers, which allow for more principled and robust comparisons across institutions.

Semantic similarity within ensemble tree methods

This project explores methods for grouping decision trees within ensemble tree models, based on semantic similarity. By identifying and analyzing clusters of trees that capture related predictive structures, the work aims to provide new insights into how ensembles represent information, improve interpretability, and potentially enhance model efficiency.

Other Projects

Bayesian Statistics Project: Predicting OPS

This project implements a Bayesian Hidden Markov Model (HMM) to model the career performance trajectories of MLB players, specifically focusing on On-base Plus Slugging (OPS). Markov Chain Monte Carlo (MCMC) simulations were utilized to estimate the posterior distributions of latent performance states, allowing the identification of elite player cohorts. The methodological framework adapts the modeling approach established by Jensen et al. (2009).

Legislative Transparency and Prediction

Motivated by the opacity of the U.S. legislative process stemming from the immense volume of annual bill proposals, this research developes a predictive framework utilizing Natural Language Processing (NLP) to forecast bill enactment. The methodology incorporates BERT word embeddings and applies advanced machine learning techniques to bag-of-words models. Results indicate that this approach predicts legislative passage with a high degree of accuracy.

NFL Big Data Bowl 2026: Swooping In

This project introduces a machine learning framework to evaluate NFL secondary players by analyzing spatial positioning and tracking data. By employing gradient-boosted models, the researchers quantify specific "Anticipation" (pre-throw) and "Reaction" (post-throw) metrics to calculate a defender's probability of successfully impacting a play. The study further validates player reliability by applying conformal prediction to establish lower bounds on success probabilities, ensuring that high performance metrics reflect consistent skill rather than statistical noise.

Consulting

I have extensive consulting experience, having worked on more than 10 projects spanning a wide range of fields, including:


Evaluating large language models capacity to reason

Assisted in evaluating large language models' capacity for reasoning by testing several common LLMs on a structured prediction task. Contributed by verifying the statistical methodology and ensuring the correctness of mathematical computations underlying the project's analysis.

Doctor bias against children with type 2 Diabetes

Contributed to a research project investigating physician bias toward children with type 2 diabetes, with responsibilities including data cleaning and supporting the statistical analysis.

Covid19 Vaccine Resistance

Worked on a project evaluating a survey designed to measure resistance to COVID-19 vaccination, with responsibilities including assessing question design, data cleaning, and supporting the statistical analysis.

Professional Services Offered

• Research design and methodology consulting
• Data analysis and statistical modeling
• Data cleaning

For consulting inquiries, please contact me directly to discuss your project needs and availability.

Teaching

I have over three years of experience teaching at both the undergraduate and graduate levels.


Data Modeling Using R (EE 5373)

Instructor: Intro course in data sceience for graduate students in the electrical engineering department, I have fully integrated LLM usage into this course and have expanded the content to include many different machine learning concepts as well.

Statistical Learning and Data Science (STAT 1361/2360)

TA: Advanced course on Data science and machine learning for senior undergraduates and graduate students.

Applied Statistical Methods (STAT 1000)

Instructor/TA: Intro course in statistics for undergraduates.

Curriculum Vitae

Download CV (PDF)

Last updated: 9/2025

Contact

I welcome opportunities for collaboration, consulting projects, and academic discussions. Feel free to reach out through any of the following channels: