C A L M
S U P P L E M E N T A R Y
S U P P L E M E N T A R Y
Uniqueness of Approach
The authors have attempted to be as comprehensive as possible in representing the literature on existing models for the Yemeni outbreak. As of the writing of this page, various models, some incorporating machine learning, most not, have been constructed by others. Many of these models are accurate in their specific use cases-such as the one constructed by Jutla, Akanda, and Islam (2010)- but are applied in areas where cholera is seasonal and non-sporadic, such as Bangladesh (Jutla, Akanda, Unnikrishnan, Huq, & Colwell, 2015), and are thus fairly simple (often using various kinds of regression(s) or logistic models and modeling linear relationships). Cholera, in general, is seasonal, but is subject to non-seasonal influences (Emch et al., 2008). In fact, the Yemeni outbreak has been especially subject to many non-seasonal, sporadic influences, namely the Yemeni civil war, necessitating a more complex model that can capture these nonlinear, nonseasonal relations (Camacho et al., 2018). Our extreme gradient boosting approach provides this, offering a robust, principled approach used widely by data scientists to achieve state-of-the-art results on many machine learning challenges (Chen & Guestrin, 2016). The use of machine learning beyond regression is key, as by deriving a deeper understanding of the breadth of data available CALM is able to deliver a more useful forecast. While more complex machine learning algorithms like XGBoost can come at the cost of overfitting, viable complex models are possible without overfitting, as Pezeshki et al. (2016) have demonstrated by predicting cholera in Chabahar City, Iran, using an artificial neural network.
Additionally, forecasts produced by other models often undersupply comprehensiveness, lacking details on when an outbreak might strike and exactly how many will be impacted (for example, Jutla et al. developed a model predicting cholera risk and not cases (Cole, 2018)). In contrast, CALM forecasts the exact number of cholera cases any given Yemeni governorate will experience in 2-week time intervals ranging from 2 weeks to 2 months, providing fundamentally different information than a risk indicator or a broad cumulative incidence count ) to an aid organization or government official.
Finally, existing models often do not make use of the full breadth of cholera-predictive data available, usually making use of only seasonal environmental factors or only cholera incidence. Given that Yemen is currently in a civil war, we propose the incorporation of civil war fatality data along with environmental and epidemiological data to span the entire range of factors that can affect cholera. When paired with extensive feature engineering, CALM’s use of rainfall, past cholera cases and deaths, and civil war fatalities allows it to find key patterns in cholera incidence in Yemen to create a model capable of strongly modeling the nonlinear trends of cholera.
Additionally, forecasts produced by other models often undersupply comprehensiveness, lacking details on when an outbreak might strike and exactly how many will be impacted (for example, Jutla et al. developed a model predicting cholera risk and not cases (Cole, 2018)). In contrast, CALM forecasts the exact number of cholera cases any given Yemeni governorate will experience in 2-week time intervals ranging from 2 weeks to 2 months, providing fundamentally different information than a risk indicator or a broad cumulative incidence count ) to an aid organization or government official.
Finally, existing models often do not make use of the full breadth of cholera-predictive data available, usually making use of only seasonal environmental factors or only cholera incidence. Given that Yemen is currently in a civil war, we propose the incorporation of civil war fatality data along with environmental and epidemiological data to span the entire range of factors that can affect cholera. When paired with extensive feature engineering, CALM’s use of rainfall, past cholera cases and deaths, and civil war fatalities allows it to find key patterns in cholera incidence in Yemen to create a model capable of strongly modeling the nonlinear trends of cholera.
CALMWatch- an SMS bot
Lambert iGEM has also developed an SMS bot utilizing the Twilio API and the Flask web framework for gathering health and sanitation data from areas affected by a cholera outbreak. This bot, named CALMWatch, allows for a healthcare organization or government agency to distribute an SMS survey to a given population so that affected people can report data related to an ongoing cholera outbreak such as cleanliness of water sources, water storage, and waste management. This data can then be fed into the CALM model in real time to increase the accuracy of the model and increase the size of its databases. This bot is based on RatWatch, an open-source SMS-based rat reporting service for the Atlanta area developed by M. Koohang (Zegura & DiSalvo 2018), who graciously allowed Lambert iGEM to modify it for the purposes of the 2018 project.
Example CALMBot Interaction
Further Development
While the efficacy of the model has only been proven in Yemen, it is expected that with further development and adaptation CALM will be used to predict disease outbreaks around the world. As development on the project progresses, Lambert iGEM hopes to construct a fully autonomous web-based software platform comprising of data collection bots that collect data from major health and sanitation sources, scripts that are capable of syncing data from the ColorQ app and from CALMWatch surveys with CALM’s databases, and an online platform that coordinates global usage of the model so that users can share and distribute results and model improvements more easily. In terms of CALM itself, Lambert iGEM also hopes to engineer more features for the model by acquiring data for more environmental factors, possibly including algal blooms, migration data, and OCV (oral cholera vaccine) campaign data.