Research

Disease forecasting systems.

I build end-to-end outbreak forecasting systems that run monthly and produce decision-ready outputs: county-level risk scores, calibrated alert tiers, and diagnostics that explain model behavior.

Domain: biosurveillance (HPAI H5N1)
Scale: U.S. county × month
Horizon: one month ahead
At-a-glance

Highly Pathogenic Avian Influenza (HPAI) H5N1 early-warning forecasting system

A leakage-safe monthly pipeline that integrates agricultural, climate/environmental, ecological, and temporal signals to produce county-level risk predictions and alert tiers for operational biosurveillance.

walk-forward validation rare-event modeling (~1%) synthetic-month covariates cost-sensitive thresholds calibration diagnostics
Artifact
Monthly forecasting pipeline + dashboard-ready outputs
Outputs
Risk table • alert tiers • diagnostics • map layer
Validation
Walk-forward validation; leakage-checked; imbalance-aware
Core challenges
Rare events, drift, feature uncertainty, interpretability

What the system does

The system runs on a monthly cadence. Its core purpose is to produce outputs that are actionable and interpretable under rare-event reality, rather than just a probability score.


System overview

Forecast orchestration: a monthly job generates forecasts end-to-end (or reuses cached outputs), producing artifacts used by a public-facing dashboard and downstream evaluation.
Future covariate assumptions: forecast-month covariates are generated explicitly via ridge regression, rather than assuming access to future ground truth.
Feature engineering: lagged and rolling features, anomalies, cyclical time components, and context priors are constructed with strict temporal ordering to prevent leakage.
Modeling & blending: ensemble models are trained and combined across multiple time windows and geographic regions, with optimized blending accounting for temporal drift.
Decision policy: raw probability scores are stratified into risk levels using calibrated thresholds optimized under cost-sensitive objectives.
Diagnostics & reporting: each test month generates calibration and error-profile summaries (including prevalence effects and stability checks) designed to make model behavior legible to non-technical audiences.

Primary research questions

Future covariates: How sensitive are forecasts to synthetic-month assumptions, and where do they break?
Decision thresholds: How do alert tiers change under varying intervention costs with ~1% prevalence?
Generalization: How does performance vary across region/season/prevalence regimes, and what systematic biases emerge?

Manuscripts (in progress)

Methods and decision-support evaluation are in development. Availability depends on internal review and project timelines.

Artifacts produced each month

Risk table: county-level probabilities and alert tiers for the forecast month.
Diagnostics: calibration + error profiles + prevalence-stratified performance summaries.
Map layer: visualization-ready outputs for a public-facing dashboard.

Evaluation constraints

Walk-forward validation only, with explicit synthetic-month assumptions for unknown covariates. Performance is reported using imbalance-aware metrics and cost-sensitive thresholds, not random splits, default 0.5 thresholds, or accuracy-only summaries.

Future Plans

I’m building a reusable evaluation harness: standardized leakage checks, walk-forward validation, and sensitivity analyses for future-feature assumptions.

Research interests

Themes I’m focused on


Spatiotemporal forecasting

Time-aware validation, drift sensitivity, and spatiotemporal stratification.

Rare-event detection

Prevalence-aware evaluation and decision thresholds aligned to intervention costs.

Interpretability & diagnostics

Calibration, error profiling, and stability checks that translate to decisions.

Climate–disease signals

Environmental proxies, lag structure, and sensitivity to future-feature assumptions.