Team:DTU-Denmark/DesignOfExperiments

Design of Experiments

During practical experiments in the laboratory, one is often left with a large number of factors, which need to be tested in order to create a meaningful model. A model which is considered meaningful should be capable of relating experimental factors (explanatory variables) to a response variable (experimental outcome) via a process/system as shown in Figure 1.

Fig. 1:

The simplest linear response model is presented below: \begin{equation} y_{i}=\beta_{0}+\beta_{1}x_{1}+...+\beta_{n}x_{n}+\varepsilon_{i} , \varepsilon_{i} \sim N(0,\sigma^{2}I) \end{equation} Where $y_{i}$ is the response variable of the i’th observation, the $x_{1..n}$ are the explanatory variable of the i’th response, the $\beta_{0..n}$ are the regression coefficients and finally the residuals of the model $\varepsilon_{i}$ are considered to be normally distributed with a constant variance ($\sigma^{2}I$, with Ibeing the unity matrix) and a mean of 0.
This model will, for convenience, be written in matrix notation: \begin{equation} Y=X\beta+\varepsilon, \varepsilon \sim N(0,\sigma^{2}I) \end{equation} Where X is the design matrix of the model the size of k x N where k is the number of factors and N is the number of responses. Y is a vector of response variables of size 1 x Y, Y is a vector of regression coefficients of size 1 x N and finally, the \varepsilon is also a vector of the size of 1 x N containing the residuals for each response variable. The model is very important, as this is the foundation of the experimental design showing which factors should undergo testing and the selection is thus specifically related to the design matrix (see later).

To understand the importance of an experimental design one could look at previously employed methods. In traditional design of experiments (DOE) one would use the change “one factor at a time until no improvements can be achieved” principle (2). However, this technique does not take into account that of a possible interaction between factors. Why this is a problem can be illustrated via our experiment regarding the testing of compressive strength in the fungal bricks (indsæt link til experimentet-ikkeskrevet). Had we only taken one factor at a time and keeping everything else constant, the possible interaction of for instance different burning temperature and different burning time would not be identified, thus leaving out a significant part of the explained variance, resulting in a worse fit of the compressive strength, our response variable.

A general DOE which may provide a foundation of an experimental model (such as the linear model presented earlier) are often employed as a factorial design (4). When using a factorial design, one can with relative ease gain a solid modelling of the entire design space of an experiment. The design works by having multiple factors which are all considered to be either at a high level or low level, thus spanning over the largest design space possible. A simple example of such a design can be written as an ANOVA model: \begin{equation} y_{ijk}=\mu+\alpha_{i}+\beta_{j}+(\alpha\beta)_{ij}+\varepsilon_{ijk}, \varepsilon_{ijk} \sim N(0,\sigma^{2}I) \end{equation} $\mu$ is the overall mean $\alpha_{i}$ is the effect of factor A at the i’th level $\beta_{j}$ is the effect of factor B at the j’th level $(\alpha\beta)_{ij}$ is the interaction of factor A and B at different levels The k subscript in the $y_{ijk}$ response variable and residuals $\varepsilon_{ijk}$ denotes the amount of replicates $k=(1,2,3...m)$.

The theoretical flow of a solid DOE has 3 phases: 1 screening, 2 optimization and 3 robustness check. For the initial screening process the k is put to 1 thus creating an unreplicated factorial design to test multiple factors as fast as possible using only main effects. When the the significant factors are found, the optimization of factor levels can be carried out using a replicated factorial design. Finally a test of robustness must be carried out to prove that small factor fluctuations does not significantly influence the experiment. The ANOVA type model is very useful for both the screening process and the optimization process of experimentation.
The last part needs some other form of designs and will not be further discussed here (see post hoc analysis in the statistical model section, lav link).

The need for an optimal design

When creating a DOE the full factorial design creates the best results since every combination of factors can be checked. However, this is often a very time consuming affair to carry out in practice. As an example, consider 4 factors each at 3 levels. This results in $3^{4}=81$ , and if these are coupled to, say, 2 non-factorial variables, it would require $81*2=162$ samples to test out every possible combination. To solve this issue one can use an optimal design. The most common designs are the A-optimal design, G-optimal design, V-optimal design and D-optimal design(1). Each design has different focus areas when designing experiments, but a common feature for each of them is that they all work around the information matrix, which is given as: \begin{equation} I(\theta)=X'X \end{equation} Where X is the design matrix, X’ means that the matrix is transposed and I is the information matrix, dependant on the which is the parameters of the model. The information matrix is particularly relevant, since the square root of the inverse diagonal of the information matrix is the standard errors of each parameter, and these are sought to be minimized.

Other than the information matrix, each design type also needs to have a list of candidates of experimental designs to run. This list is referred to as a candidate list, and is given as a matrix. The connection between candidate list, design and model is illustrated via a simple example.

Say that you have the factors of burning temperature (x1) and burning time (x2), each having the levels of high/low burning temperature and burning time. You want to check every possible experimental combination before carrying out the experiment to gain an optimal design. The resulting candidate list would be:

Burning Temperature (x1) Burning Time (x2)
1 1
1 -1
-1 1
-1 -1

In the table high and low is denoted as 1 and -1 respectively, so called coded units(4). To make the linear algebra of the design work. From this candidate matrix, the design matrix can be created according to the model as shown below: \begin{equation} y_{i}=\beta_{0}+\beta_{1}x_{1}+\beta_{2}x_{2}+\beta_{3}x_{1}x_{2}+\varepsilon_{i} , \varepsilon_{i} \sim N(0,\sigma^{2}I) \end{equation} This translate to the following design matrix:

Burning Temperature (x1) Burning Time (x2)
1 1
1 -1
-1 1
-1 -1

Each row in the candidate matrix is an experiment and each column a factor. In the example the experiment is made as a full factorial design - that is, every combination of factors including all different levels will be taken into account. Note how the design matrix is based off the model and can drastically change when more terms/higher order terms are added. This often leads to a large possibilities of base model choices. However, when dealing with biological data it is rare that 3 factor interactions are considered meaningful which limits the combination options of the model(4).
Often more than 2 factors are in play, as in the small example, and so the designs can get very large, very quickly. So, a need for reduction is required to be able to carry out the experiments in a reasonable time frame, while also lowering the cost of each design, be they resources or manpower.

(1) Lejeune, R., Nielsen, J. og Baron, G. V. (1995) “Morphology of Trichoderma reesei QM 9414 in submerged cultures”, Biotechnology and Bioengineering, 47(5), s. 609–615. doi: 10.1002/bit.260470513.

(2) Spohr, A., Dam-Mikkelsen, C., Carlsen, M., Nielsen, J. og Villadsen, J. (1998) “On-line study of fungal morphology during submerged growth in a small flow-through cell”, Biotechnology and Bioengineering, 58(5), s. 541–553. doi: 10.1002/(SICI)1097-0290(19980605)58:5<541::AID-BIT11>3.0.CO;2-E.Lejeune, R.

(3) Lejeune, R. og Baron, G. V. (1996) “Simulation of growth of a filamentous fungus in 3 dimensions.”, Biotechnology and bioengineering, 53(2), s. 139–50. doi: 10.1002/(SICI)1097-0290(19970120)53:2<139::AID-BIT3>3.0.CO;2-P.

(4) Monod, J. (1949) “The Growth of Bacterial Cultures”, Annual Review of Microbiology. Annual Reviews 4139 El Camino Way, P.O. Box 10139, Palo Alto, CA 94303-0139, USA , 3(1), s. 371–394. doi: 10.1146/annurev.mi.03.100149.002103.