Probabilistic Machine Learning for Climate-Sensitive Cholera Outbreak Risk Prediction in Nigeria
This study develops a probabilistic early-warning framework for cholera outbreak risk across Nigeria using climate variables, WASH indicators, socioeconomic vulnerability, surface-water features, and outbreak-history signals.
Abstract Summary
Cholera remains a recurring public-health threat in Nigeria, where seasonal climate patterns, structural vulnerability, and persistent WASH deficits contribute to repeated sub-national outbreaks. This research builds a state-month analytical panel to estimate one-month-ahead outbreak risk.
The system is best understood as a high-sensitivity preparedness support tool rather than a high-precision classifier. It helps identify where intensified surveillance, WASH response, and pre-positioning of supplies may be needed before outbreaks accelerate.
Dataset and Modeling
- The panel covers 1,591 state-month rows across 37 administrative units from 2018 to early 2025.
- Data sources include NCDC surveillance reports, CHIRPS rainfall, ERA5 temperature and humidity proxies, WHO/UNICEF JMP WASH indicators, WorldPop, poverty estimates, and JRC Global Surface Water.
- Seven supervised models were evaluated using rolling time-aware cross-validation.
- Models included Logistic Regression, Random Forest, XGBoost, LightGBM, CatBoost, weighted probability blending, and logistic meta-stacking.
Key Results
Operational Insight
Outbreak-history features contributed the strongest predictive signal, while climate, WASH, and structural vulnerability indicators added secondary context. Burden was concentrated in the North-West and North-East, with seasonal risk peaking in July, August, and September.