Shiwen Yang
Summary
Ph.D. statistician specializing in statistical modeling, experimental design, and causal inference. Experienced in study design evaluation, reproducible analytical workflows, and building statistical tools for evidence-based decision-making.
Experience
- Managed a team of 12 graduate statistical consultants supporting 16 cross-functional research projects.
- Designed and evaluated A/B tests; conducted power and sample-size analyses to ensure statistically valid study designs.
- Built analytical workflows for high-dimensional, noisy, and grouped datasets using GLMs and XGBoost in R and Python.
- Reviewed analyses, identified limitations in data and study design, and contributed to 2 peer-reviewed publications.
- Built reproducible pipelines in R and Python for simulation, estimation, and model evaluation.
- Designed R Shiny dashboards and visualizations to communicate complex statistical results.
- Developed a PostgreSQL database for messy research data and wrote SQL queries for extraction.
- Taught undergraduate statistics courses and mentored graduate students in statistical reasoning and experimental design.
Education
Selected Projects
Causal inference framework for geo-based incrementality experiments: study design, sample size determination, power analysis, and counterfactual estimation across geographic markets. Includes a Python RAG assistant for non-technical users.
Reusable Python ML pipeline for cross-validated model training, evaluation, and ensembling using XGBoost, LightGBM, CatBoost, and regularized logistic regression. Applied to real-world datasets in the NESS Statathon, earning top placements in 2023, 2024, and 2025.
Headless automation system on Linux using Python, OpenCV, OCR, and CNN-based screen parsing. Graph-based state controller for real-time perception and decision-making. Reduced manual supervision from hours per day to ~15 minutes; sustained >99% uptime over 6+ months.
Time-to-event analysis of clinical outcomes using Kaplan–Meier estimation and Cox proportional hazards modeling. Assessed associations between treatment exposure and survival while accounting for censored observations. Produced reproducible statistical reports with hazard ratios and confidence intervals.
Technical Skills
Statistical Methods
Programming
ML / Modeling
Tools & Libraries
Domains
Publications
Attractor-Based Coevolving Dot Product Random Graph Model
Modeled polarization and flocking behavior in dynamic networks using graph embedding methods. Proposed an attractor-based framework for coevolving latent-space network dynamics.
Simplex-Constrained Orthogonal Transformation Estimation
Introduced a penalty function to align point clouds with the simplex under orthogonal constraints. Targets applications in latent-space model identifiability and estimation.