Design of Experiments

During practical experiments in the laboratory, one is often left with a large number of factors, which need to be tested in order to create a meaningful model. A model which is considered meaningful should be capable of relating experimental factors (explanatory variables) to a response variable (experimental outcome) via a process/system as shown in fig. 1.

Fig. 1:

The simplest linear response base model is presented below: \begin{equation} y_{i}=\beta_{0}+\beta_{1}x_{1}+...+\beta_{n}x_{n}+\varepsilon_{i} , \quad \varepsilon_{i} \sim N(0,\sigma^{2}I) \end{equation} Where $y_{i}$ is the response variable of the i’th observation, the $x_{1..n}$ are the explanatory variables of the i’th response, the $\beta_{0..n}$ are the regression coefficients and finally the residuals of the model $\varepsilon_{i}$ are considered to be normally distributed with a constant variance ($\sigma^{2}I$, with I being the unity matrix) and a mean of 0.
This model will, for convenience, be written in matrix notation: \begin{equation} Y=X\beta+\varepsilon, \quad \varepsilon \sim N(0,\sigma^{2}I) \end{equation} Where X is the design matrix of the model the size of k x N, where k is the number of factors and N is the number of responses. It contains all the combinations of factor levels, one in each row, and the resulting interaction values. Y is a vector of response variables of size 1 x Y, Y is a vector of regression coefficients of size 1 x N and finally, the $\varepsilon$ is also a vector of the size of 1 x N containing the residuals for each response variable. The choice of base model is very important, as this is the foundation of the experimental design showing which factors should undergo testing and the selection is thus specifically related to the design matrix (see later). The Linear one is the simplest and therefore we work with this to check it first and see if there is a need for a more complex one, at which point we can shift over to that.

To understand the importance of an experimental design one could look at previously employed methods. In traditional design of experiments (DoE), one would use the “change one factor at a time until no improvements can be achieved” principle (2). However, this technique does not take into account the possibility of interaction between factors. Why this is a problem can be illustrated via our experiment regarding the testing of compressive strength in the fungal bricks. Had we only taken one factor at a time and kept everything else constant, the possible interaction of for instance different burning temperature and different burning time would not be identified, thus leaving out a significant part of the explained variance. This would result in a worse fit of our response variable, the compressive strength.

The factorial design is a type of DoE which is often chosen to provide the foundation of an experimental model (such as the linear model presented earlier) (4). When using a factorial design, one can with relative ease gain a solid modeling of the entire design space of an experiment. The design works by having multiple factors of which the span can go from a high level to a low level. Combined resulting in a model spanning over the largest design space possible. A simple example of such a design can be written as an ANOVA model: \begin{equation} y_{ijk}=\mu+\alpha_{i}+\beta_{j}+(\alpha\beta)_{ij}+\varepsilon_{ijk}, \quad \varepsilon_{ijk} \sim N(0,\sigma^{2}I) \end{equation}

$\mu$ is the overall mean
$\alpha_{i}$ is the effect of factor A at the i’th level
$\beta_{j}$ is the effect of factor B at the j’th level
$(\alpha\beta)_{ij}$ is the interaction of factor A and B at different levels
The k subscript in the $y_{ijk}$ response variable and residuals $\varepsilon_{ijk}$ denotes the amount of replicates $k=(1,2,3...m)$
A and B are factors belonging to separate explanatory variables indicating their level

The theoretical flow of a solid DoE has 3 phases:

Screening
Optimization and
Robustness check

For the initial screening process, k is set to 1, thus creating an unreplicated factorial design to test multiple factors as fast as possible using only main effects. When then the significant factors are found, the optimization of factor levels can be carried out using a replicated factorial design. Finally a test of robustness must be carried out to prove that small factor fluctuations do not significantly influence the experiment. The ANOVA type model is very useful for both the screening process and the optimization process of experimentation.

The last part needs some other form of designs and will not be further discussed here.

The need for an optimal design

When creating a DoE, the full factorial design creates the best results since every combination of factors can be checked. However, this is often a very time-consuming affair to carry out in practice. As an example, consider 4 factors each at 3 levels. This results in $3^{4}=81$ , and if these are coupled to, say, 2 non-factorial variables, it would require $81*2=162$ samples to test out every possible combination. To solve this issue one can use an optimal design. The most common designs are the A-optimal design, G-optimal design, V-optimal design and D-optimal design (1). Each design has different focus areas when designing experiments, but a common feature for each of them is that they all work around the information matrix, which is given as: \begin{equation} I=X'X \end{equation} Where X is the design matrix, X’ means that the matrix is transposed and I is the information matrix. The information matrix is particularly relevant, since the square root of the inverse diagonal of the information matrix is the standard errors of each parameter, and these are sought to be minimized.

Other than the information matrix, each design type also needs to have a list of candidates of experiments to run. This list of scenarios is written in the candidate list, which is paradoxically a matrix. The connection between the candidate list, design and model is illustrated via a simple example.

Say that you have the factors of burning temperature (x1) and burning time (x2), each having the levels of high/low burning temperature and burning time. You want to check every possible experimental combination before carrying out the experiment to gain an optimal design. The resulting candidate list would be:

Burning Temperature (x1)	Burning Time (x2)
1	1
1	-1
-1	1
-1	-1

In the table high and low are denoted as 1 and -1, respectively, so-called coded units (4). Coded units are employed to make the linear algebra of the design work. From this candidate matrix, the design matrix can be created according to the model as shown below: \begin{equation} y_{i}=\beta_{0}+\beta_{1}x_{1}+\beta_{2}x_{2}+\beta_{3}x_{1}x_{2}+\varepsilon_{i} , \quad \varepsilon_{i} \sim N(0,\sigma^{2}I) \end{equation} This translate to the following design matrix:

intercept	Burning temperature (x1)	Burning time (x2)	Interaction (x1x2)
1	1	1	1
1	1	-1	-1
1	-1	1	-1
1	-1	-1	1

Each row in the candidate matrix is an experiment, and each column a factor. In this example, the experiment is made as a full factorial design - that is, every combination of factors including all different levels will be taken into account. Note how the design matrix is based off the model and can drastically change when more terms/higher order terms are added. This often leads to a large number of possible base model choices, since we need to choose relations between all elements for the design matrix. However, when dealing with biological data it is rare that 3 factor-interactions are considered meaningful, thus limiting the combination options that is put into the base model (4).
Often more than 2 factors are in play, as opposed to the small example, and so the designs can get very large, very quickly. As such, some reduction is required to be able to carry out the experiments within a reasonable timeframe, while also lowering the cost of each design, be they resources or manpower.

From the possible candidate matrix, based on the model, the design matrix is found and from this, the optimal design is created. How each of the candidate lists (as a design matrix) are tested as described in this appendix.

Applying the optimal design to our experiments - base model, method and results

The base model of the design

The design of the experiment is often made to provide adequate support for a specific model. In our design, we chose to focus on a linear model including the fixed effects (controlled factors) of the substrate, mixing ratio, burning time, burning temperature, burning time and incubation time. This lead to the following base-models for our design shown in this section.

The first base-model is the purely fixed effect model with no interactions, where it is assumed that the only variation present in the experiments was the residual variance and that the response variable was following a normal distribution. This choice is used both for first the unreplicated experiments and then replicated experiments, in the latter including more effects.

The unreplicated main effect design model:

\begin{equation} \hat{y}_{i}= \beta _{0} +\beta _{1} x_{1} +\beta _{2} x_{2}+\beta _{3} x_{3}+\beta _{4}x_{4}+ \varepsilon_{i} , \quad \varepsilon_{i} \sim N(0, \sigma^{2}) \end{equation}

$\beta _{0}$ is the intercept
$\beta _{1}$ is the selected substrate feed to the fungi
$\beta _{2}$ is the burning time of the fungi
$\beta _{3}$ is the burning temperature of the fungi
$\beta _{4}$ is the incubation time of the fungi

The random component of the $\varepsilon_{i}$ are the i’th residual to the i’th $\hat{y}$. This is considered the only source of variation which is presumed constant and normal distributed with a mean around 0.

The replicated main effect design model:

\begin{equation} \hat{y}_{i}= \beta _{0} +\beta _{1} x_{1} +\beta _{2} x_{2}+\beta _{3} x_{3}+\beta _{4}x_{4}+\beta _{5} x_{5}+ \varepsilon_{i} , \quad \varepsilon_{i} \sim N(0, \sigma^{2}) \end{equation}

$\beta _{0}$ is the intercept
$\beta _{1}$ is the selected substrate for the fungi
$\beta _{2}$ is the burning time of the fungi
$\beta _{3}$ is the burning temperature of the fungi
$\beta _{4}$ is the incubation time of the fungi
$\beta _{5}$ is the mixing ratio of substrates

The random component of the $\varepsilon_{i}$ are the i’th residual to i’th $\hat{y}$. Also, this is considered the only source of variation which is presumed constant and normal distributed with a mean around 0.

These models are what directly constructs our DoE, since the main effects defines the factors of the experiments. Each factor in the model had the following levels as shown in the tables in the next section.

The unreplicated pilot design

The purpose of this design was to identify if a specific substrate was better than other substrates in regards to increasing compression strength while also affected by different burning times, burning temperatures and incubation times.

Factor	Substrate type	Burning time	Burning temperature	Incubation time
Levels	8	3	3	3
factor type	mixture	factorial	factorial	factorial

The terminology of mixture factor and factorial factor is to be understood as follows: A mixture factor is a factor that does not change through experiments of every combination of factorial factors. That is, if a mixture factor is substrate 1, it does not change levels when checking every combination of burning time, burning temperature and incubation time. Therefore the design space of substrate needs to be fully considered. A factorial factor on the other hand, does change throughout the testing of a mixture factor. That is, the value for a factorial factor can be considered free in the span of the design space and can in principle be freely altered.
The full factorial design of the above table would lead to 216 trials to span all of the design space. Via the optimal design this number is reduced to 64.(link to DoE results).

The specific levels for each factor is listed in the corresponding columns(no relation row wise):

Factor	Substrate type	Burning time	Burning temperature	Incubation time
Level 1	1 (un boiled rice)	low	low	low
Level 2	2 (boiled rice)	medium	medium	medium
Level 3	3 (coffee)	high	high	high
Level 4	4 (blended rice)	-	-	-
Level 5	5 (saw dust+ rice)	-	-	-
Level 6	6 (saw dust + blended rice)	-	-	-
Level 7	7 (saw dust + coffe)	-	-	-
Level 8	8 (saw dust)	-	-	-

From the candidate list and design matrix, generated by the factors in the above table, every main effect was considered in the DoE which is shown in the following model: \begin{equation} \hat{y}_{ijklr} =\mu+\alpha_{i}+\beta_{j}+\gamma_{k}+\delta_{l}+ \varepsilon_{ijklr} , \quad \varepsilon_{ijklr} \sim N(0, I\sigma^{2}) \end{equation}

$\mu$ is the overall mean
$\alpha_{i}$ is the effect of the mixture factor of substrate at the level of i=1..8
$\beta_{j}$ is the effect of the factorial factor of burning time at the level of j=1..3
$\gamma_{k}$ is the effect of the factorial factor of burning temperature at the level of k=1..3
$\delta_{l}$ is the effect of the factorial factor of incubation time at the level of l=1..3

The random component of the $\varepsilon_{ijklr}$ are the residuals to the $\hat{y}_{ijklr}$. This is the only source of variation which is presumed constant and normal distributed with a mean around 0.
The r subscript in residuals $\varepsilon_{ijklr}$ and response variable $\hat{y}_{ijklr}$ denotes the amount of replicates in the experiment, in this case r=1 since the design is unreplicated.

The replicated compression experimental design

The conduction of these experiments had the purpose of identifying the influence of different substrate mixtures. The factors, levels of factors and type of factors are shown in the table below:

Factor	Substrate type	Burning time	Burning temperature	Incubation time
Levels	2	3	3	3
Factor type	Mixture	Factorial	Factorial	Factorial

The full factorial design of the above table would lead to 162 trials to span all of the design space (even without replications). Via the optimal design this number is reduced and shown in the results. The specific level of each factor was(same style as before):

Factor	Substrate type	Burning time	Burning temperature	Incubation time
Level 1	1	low	low	low
Level 2	2	medium	medium	medium
Level 3	-	high	high	high

The candidate list and design matrix created from the above factors lead to the following model including every 2 factor interaction: \begin{equation} \hat{y}_{ijklmr} =\mu+\alpha_{i}+\beta_{j}+\gamma_{k}+\delta_{l}+\zeta{m}+(\alpha,\beta)_{ij} +(\alpha,\gamma)_{ik}+(\alpha,\delta)_{il}\\+(\alpha,\zeta)_{im}+(\beta,\gamma)_{jk}+(\beta,\delta)_{jl}\\ +(\beta,\zeta)_{jm}+(\gamma,\delta)_{kl}+(\gamma,\zeta)_{km}+(\delta,\zeta)_{lm}+ \varepsilon_{ijklmr} , \quad \varepsilon_{ijklmr} \sim N(0, I\sigma^{2}) \end{equation}

Method

To obtain the most efficient experimental design spanning over as much design space as possible the combination of the D-optimal design and G-optimal design was utilized starting with a random design the same length of the initial design matrix from the model thereby including all two-factor interactions.

From here on out a selection of different number of trials and replicates was selected via the design plot function created in R. The first output of the function gave a result of the determinant, while the second part showed the degree of orthogonality, confounding, G-efficiency and D-efficiency which provided a selection method between the multiple designs created from the Fedorov exchange algorithm. The determinant tells us the variance, which is inversely proportional to how much of the design space is searched through, thereby tells us how likely we are to find the correct values. Orthogonality of the matrix and confounding of the explanatory variables are directly related and tells how much covariance is present between factors. In a perfect full factorial case every effect is separated and it is 1, and reducing the number of experiments means we cannot change all variables seperately. The G- and D-efficiency indicate the efficiency of how well the design matrix utilities the experiments to get predictive power, meaning high value means for every experiment done you get maximum explanatory power.

From the plot function the number of trials as well as D- and G-efficiency was shown and the maximal sum of the two terms was chosen as an initial point of interest. In general, if both are above 0.7-0.8 the design is worth looking into (6). To ultimately decided if the design is considered solid the diagonality is shown from the evaluation of the design. From here if the diagonality is larger than 0.8 it is considered a design with little or almost no confounding(6).

Results

The design of the first screening experiment involving the factors stated in the base model design section, lead to the following results:

Fig. 1: From the plot, it is seen that the highest G-efficiency and D-efficiency are at exactly 64 trials. This makes sense as this is a partial factorial design of the full factorial design where 216 trials are the full design. The exact theoretical design fraction is unknown since it is a 3 level mixed design and no table seemingly exists for this.

In fig. 1 it is shown that for our optimal chosen number of experiments and levels of explanatory variables, the diagonality is 0.832, with a D-efficiency and G-efficiency both at 1. A lower number of trials shows a D-efficiency and G-efficiency below the accepted limit of 0.7 each. The determinant obtained at 64 trials was 0.149 with an average coefficient of variance of 13.8. This was shown to be the minimum average variance and the highest determinant in the plots below.

Fig. 2: As seen the determinant was maximized from the design and the average variance of coefficients also hit a minimum (further explanation see appendix).
). So, from this the design was shown to be optimal.

The 64 trials with 0 replicates resulted in the final pilot experimental design is shown in the link.

The experimental design of the first replicated experiment was designed in the same manner as the initial unreplicated experiment.The results of the design was:

Fig. 3: As seen from the above plot the optimum is at 32 trials and the minimum of the average coefficients and maximum determinant is obtained at the values of 2.250 for the A criteria and 0.648 for the D criteria. this is shown below

Fig. 4: As seen the determinant was maximized from the design and the average variance of coefficients also hit a minimum. So, from this the design was shown to be optimal.

As seen from the plot the maxima for the determinant is shown on the left plot and the A criteria is shown on the right plot, which tells us whether the number of trials gives a good average variance prediction of the model, where low is good (further explanation see appendix.Now as for these plots and for the 3d plot not only the optimum is of interest, studies have shown (6, that when dealing with DoE the required size of an acceptable design occurs when the D- and G-efficiency is above 0.75. Going through the designs below 32 trials it is seen that 17 trials had a sufficient D-efficiency and G-efficiency value of 0.939 and 0.941. evaluation of the 17 trials design resulted only in a slight lowering of the determinant into 0.637 and a slight increase in the A-criteria into 2.29, while the diagonality went from 0.831 to 0.804 which only a loss of around 3 % in orthogonality, meaning that very little covariance increase between the parameters is expected to occur when going from 32 to 17 trials. This is with replicates 128 experiments vs 68 experiments which saves a lot of time. The 17 trials with 4 replicates resulted in the final experimental design shown in the link.

Conclusion

From the process of designing the compression strength experiments, it was shown that the designs greatly improved the work in the lab. This was evident since the designs minimized the number of experiments needed to gain a sufficient amount of data to create meaningful statistical models with little theoretical confounding of factors. It was shown that the full factorial design of 216 trials for the unreplicated pilot experiment, could be reduced into 64 trials. Also, it was shown that the full factorial design of the 4 times replicated experiment with 162 trials, could be reduced to 17 trials with 4 replicates. both of these designs showed sufficient diagonality above 0.8 resulting in only little theoretical confounding (covariance) between factors.

(1) de Aguiar PF, Bourguignon B, Khots MS, Massart DL, Phan-Than-Luu R. 1995. D-optimal designs. Chemometrics and Intelligent Laboratory Systems 30:199–210. Retrieved from here.

(2) Triefenbach F. 2008. Design of Experiments: The D-Optimal Approach and Its Implementation as a Computer Algorithm. Retrieved from here

(3) Wheeler RE. 2004. Algorithmic design. Retrieved from here.

(4) Montgomery DC. n.d.. Design and analysis of experiments. Retrieved from here.

(5) Kuhfeld WF. n.d.. Experimental Design: Efficiency, Coding, and Choice Designs. Retrieved from here..

(6) Kuhfeld WF, Tobias RD. n.d.. Efficient Experimental Design with Marketing Research Applications Mark Garratt. Retrieved from here.

Team:DTU-Denmark/DesignOfExperiments