Shiwen Yang

PhD statistician and data scientist focused on causal inference, statistical modeling, and decision systems.

Selected Work

Geo-Based Incrementality Testing Platform

In progress

Causal inference framework for geo-based incrementality experiments: study design, sample size determination, power analysis, and counterfactual estimation across geographic markets. Includes a Python RAG assistant for non-technical users.

  • causal inference
  • augmented synthetic control
  • Python
  • R
  • RAG
  • experimentation

Modular Classification Pipeline for Tabular Data

Research

Reusable Python ML pipeline for cross-validated model training, evaluation, and ensembling using XGBoost, LightGBM, CatBoost, and regularized logistic regression. Applied to real-world datasets in the NESS Statathon, earning top placements in 2023, 2024, and 2025.

  • Python
  • XGBoost
  • LightGBM
  • CatBoost
  • machine learning
  • cross-validation

Real-Time Computer Vision Automation System

Prototype

Headless automation system on Linux using Python, OpenCV, OCR, and CNN-based screen parsing. Graph-based state controller for real-time perception and decision-making. Reduced manual supervision from hours per day to ~15 minutes; sustained >99% uptime over 6+ months.

  • Python
  • OpenCV
  • CNN
  • OCR
  • Linux
  • graph-based control

View all projects →

Skills

Statistical Methods

  • experimental design
  • power & sample-size analysis
  • hypothesis testing
  • causal inference
  • A/B testing
  • GLMs
  • survival analysis
  • simulation

Programming

  • Python
  • R
  • SQL
  • PostgreSQL
  • Bash

ML / Modeling

  • scikit-learn
  • XGBoost
  • LightGBM
  • CatBoost
  • PyTorch
  • transformers
  • NLP

Tools & Libraries

  • pandas
  • NumPy
  • tidyverse
  • R Shiny
  • OpenCV
  • Git
  • Linux

Domains

  • biomedical research
  • marketing analytics
  • network models
  • computer vision