B A C K G R O U N D & S O L U T I O N

Cholera in Yemen

Cholera is a waterborne disease caused by the bacterium Vibrio cholerae, which has plagued mankind for centuries and continues to do so despite the advances of modern medicine. The ongoing cholera outbreak in Yemen, which began in October of 2016, has been deemed “the largest documented cholera outbreak” through a comprehensive analysis of cholera surveillance data by Camacho et al. (2018). Enabled by a devastating civil war, cholera has spread rampantly across the country, with the World Health Organization’s weekly bulletins reporting that, as of April 2017, there have been 1,055,788 suspected and 612,703 confirmed cases of cholera, causing 2,255 confirmed deaths (World Health Organization, 2017). While cholera has several effective treatments, including Oral Cholera Vaccinations (OCVs) with an 80.2% prevention rate (Azman et al., 2016), the inefficient and untimely distribution of medicine has been the primary cause of cholera mortality (Camacho et al., 2018). This is because the Yemeni outbreak has been largely sporadic, occurring in waves spawned by a variety of environmental (rainfall), political (civil war conflicts), and epidemiological factors (cholera incidence and mortality) (Camacho et al., 2018). Studies suggest that the third wave of cholera transmission may resurge during the rainy season of 2018, resulting in an urgent need for a forecast that details precisely when, where, and how many people will contract the disease (Camacho et al., 2018). With a comprehensive, actionable forecast, health organizations have the opportunity to deploy prevention methods in a highly targeted, efficient fashion, allowing for the mitigation of the outbreak (Camacho et al,. 2018).

Map of cholera outbreak in Yemen in 2017 (Al Jazeera, 2017).

Our Solution

Unique to the Yemeni outbreak has been the availability of expansive epidemiological datasets. As opposed to nations such as Haiti, the Dominican Republic, and various African nations, the Yemeni outbreak has regular and reliable reporting of cholera and various related factors. This wealth of data has opened the possibility for the use of machine learning to predict cholera outbreaks. Thus, we have been able to construct CALM, the Cholera Artificial Learning Model, a system comprised of four extreme-gradient-boosting (XGBoost) machine learning models that, working together, forecast the exact number of cholera cases any given Yemeni governorate will experience for multiple time intervals ranging from 2 weeks to 2 months. With extensive engineering of predictive features, the models utilize a large span of relevant datasets, including multiple mathematical representations of rainfall, past cholera incidence and mortality, and civil war mortalities. By predicting the exact number of new cases (per 10,000 people) each governorate will experience in the next two months with 2-week intervals, CALM provides a comprehensive and accurate forecast of the Yemen cholera outbreak, allowing for necessary preventative action to be taken. Furthermore, the geographic divisions (governorates) for which incidence are predicted are specific enough that practical measures can be taken to distribute medicines to those in need. For reference, YE-AM (Amran), the governorate with the greatest cumulative cholera case count (normalized by population), has an area of 9,587 square kilometers (Yemen, 2014).

Diagram of CALM Conceptual Structure.