Modeling
Dataset
The data used was obtained from OD600 measurements of cells carrying the biobricks BBa_K2602010, BBa_K2602011, BBa_K2602013, BBa_K2602014, BBa_K2602015, BBa_K2602016 and BBa_R0011. For each strain, one single colony was picked and cultivated with a high head space in a triplicate along with a duplicate negative control containing an “empty” plasmid (BBa_R0011). The cell densities were measured over the course of 26 hours. In total, 20 cultures were grown.
Software
All analysis was performed using python 3.5. The scipy stack was used for matrix manipulations and visualization and includes the modules numpy, scipy, matplotlib, pandas and sympy. To perform the model fitting, scipys implementation of the BFGS algorithm was used using the function minimize.
Data preprocessing
Prior to analysis, the measurements were put in the log domain. The purpose of doing this is two-fold. First, it converts the exponential growth curve in the log-phase into a line from which the growth rate can easily be estimated. Secondly, it normalizes the variance across the measurements, making it approximately stationary across the measurements and thereby justifying the use of bootstrap to estimate the uncertainty in the estimated parameters. This is discussed in the results section.
Model
Typically, the bacterial growth consists of three phases, a lag-, log- and stationary phase. The cell density is approximately constant during the lag phase, reaching exponential growth at the log phase and finally a saturation at stationarity. Together, these phases constitute a sigmoidal curve as illustrated in fig. 1.
With this, we parametrize the growth curve using the Gompertz sigmoid. The purpose of this is two-fold. First, it greatly simplifies the analysis by reducing the entire set of measurements into a smaller set of parameters which can be handled in an automated fashion. Secondly, the parameters may be directly mapped to the growth rate, time to exponential growth and stationary cell density. This allows for a relatively simple uncertainty assessment via the bootstrapping. For our purpose, the Gompertz takes the following form: $$ g(t; A, \mu, \lambda) = A e^{-e^{\frac{\mu}{A}e(\lambda - t) + 1}} $$ where $A$, $\mu$ and $\lambda$ correspond to the stationary cell density, exponential growth rate and time to exponential growth, respectively. Fig. 1 illustrates a Gompertz sigmoidal where the parameters have been explicitly marked. As a side note, we mention that there are many other sigmoids to choose from, such as the logistic, Gompertz, Richards, Schnute and Stannard sigmoids. However, the Gompertz has been shown to be robust and sufficient, which motivates its use [1].
Model fitting
All analysis was performed using python 3.5. The scipy stack was used for matrix manipulation and visualization and includes the modules numpy, scipy, matplotlib, pandas and sympy. To estimate the parameters in Gompertz from a sequence of measurements, a quadratic cost function was set up. This was solved using the Broyden-Fletcher-Goldfarb-Shanno (BFGS) algorithm which is a quasi-Newton method which is commonly used to solve unconstrained and nonlinear optimization problems. The initial guess used was estimated by plotting all curves and manually estimating the stationary cell density, growth rate and time to exponential growth.
Uncertainty assessment
Bootstrapping was used to assess the uncertainty in the estimates. This was achieved by putting the Gompertz function under a statistical framework where the growth curves are considered as deterministic sigmoids, with some true set of parameters, corrupted by additive noise. This gives rise to the following model $$ f(t; A, \mu, \lambda) = g(t; A, \mu, \lambda) + e(t) $$ where $f$ corresponds to the measurements at time $t$ and $e$ corresponds to the measurement noise. Accordingly, since the only stochastic part of the model is the measurement noise, the confidence intervals of $A$, $mu$ and $lambda$ are estimated by resampling from the residuals.
The procedure is illustrated in fig. 2. First, the data from an entire set of replicates was used to fit the sigmoid. The corresponding fit was then considered as the true sigmoid, and in order to assess the uncertainty, the residuals were first calculated. A couple of measurements contained some strongly deviating points and the residuals were therefore subjected to filtering, where all residuals deviating by more than 2.5 standard deviations from the mean were removed. Then, the residuals were sampled and used to corrupt the estimated sigmoid, of which a new set of estimated parameters were obtained. The procedure of sampling, corrupting and re-estimating the sigmoid was performed 500 times and two-sided confidence intervals were obtained by the 2.5th and 97.5th percentiles of the corresponding histogram.
Now, while the bootstrap method is general and applicable to a wide range of problem, it does proceed under the assumption that the empirical distribution generated from the data is representative. In our case, this means that the residuals need to have the same statistical properties throughout the growth curve. It will be shown in the results section that this is approximately accomplished when performing all statistical analysis in the log-domain.
References
- [1] M. H. Zwietering, I. Jongenburger, F. M. Rombouts, K. Van 'T Riet. Modeling of the Bacterial Growth Curve. Applied and Environmental Microbiology, June 1990, p. 1875-1881