Difference between revisions of "Team:DTU-Denmark/ModellingTheDesign"

Line 242: Line 242:
 
<div style="width:100%;">
 
<div style="width:100%;">
 
<div class="col-xs-6">
 
<div class="col-xs-6">
<p style="text-align:center"><img src="https://static.igem.org/mediawiki/2018/4/4f/T--DTU-Denmark--modellingexpThies1.png" style="width: 90%;"  ><figcaption><p style="text-align:center; font-size:14px;"><b>Fig. 1: </b> - Thies</p></figcaption>
+
<p style="text-align:center"><img src="https://static.igem.org/mediawiki/2018/4/4f/T--DTU-Denmark--modellingexpThies1.png" style="width: 90%;"  ><figcaption><p style="text-align:center; font-size:14px;"><b>Fig. 1: </b> - Thies</p></figcaption></p>
 
</div>
 
</div>
 
<div class="col-xs-6">
 
<div class="col-xs-6">
Line 286: Line 286:
 
<div class="verticalLine textbreather interlabspace verticalleftrealyellow">
 
<div class="verticalLine textbreather interlabspace verticalleftrealyellow">
  
<h2 class="media-heading" style="text-align: left;margin-bottom: 35px; color:#F8A05B;">Results</h2>
+
<h2 class="media-heading" style="text-align: left;margin-bottom: 35px; color:#F8A05B;">Modelling formulation, fitting of model and reduction of model</h2>
  
 
<p style="text-align:justify">
 
<p style="text-align:justify">
 +
During this section multiple models  are considered. The models included fixed effect, general and generalized random effect models. These are listed in the table below:
 +
<style type="text/css">
 +
.tg  {border-collapse:collapse;border-spacing:0;}
 +
.tg td{font-family:Arial, sans-serif;font-size:14px;padding:10px 5px;border-style:solid;border-width:1px;overflow:hidden;word-break:normal;border-color:black;}
 +
.tg th{font-family:Arial, sans-serif;font-size:14px;font-weight:normal;padding:10px 5px;border-style:solid;border-width:1px;overflow:hidden;word-break:normal;border-color:black;}
 +
.tg .tg-n9g5{background-color:#ffffff;color:#000000;border-color:inherit;text-align:left;vertical-align:top}
 +
.tg .tg-3bed{background-color:#555555;color:#ffffff;border-color:inherit;text-align:left;vertical-align:top}
 +
.tg .tg-0pky{border-color:inherit;text-align:left;vertical-align:top}
 +
</style>
 +
<table class="tg">
 +
  <tr>
 +
    <th class="tg-0pky">Model number</th>
 +
    <th class="tg-0pky">Fixed effect models</th>
 +
    <th class="tg-0pky">General random effect models</th>
 +
    <th class="tg-0pky">Generalized random effect models</th>
 +
  </tr>
 +
  <tr>
 +
    <td class="tg-3bed">1</td>
 +
    <td class="tg-n9g5">General gaussian linear fixed effect</td>
 +
    <td class="tg-0pky">Compound symmetry random effect model</td>
 +
    <td class="tg-0pky">Gamma model with gaussian second stage model</td>
 +
  </tr>
 +
  <tr>
 +
    <td class="tg-3bed">2</td>
 +
    <td class="tg-n9g5">Generalized gamma fixed effect model</td>
 +
    <td class="tg-0pky">-</td>
 +
    <td class="tg-0pky">-</td>
 +
  </tr>
 +
</table>
 +
 +
<p style="text-align:justify">
 +
 +
The best model would end up being chosen based on the BIC and AIC likelihood criterion where the lowest numbers would decide which model is the best. This is then further compared to the goodness of fit. Note that all the less successful models along with their diagnostics can be seen in the modelling appendix (link).
 +
<br><br>
 +
First the general model reduction and diagnostic workflow is presented.
 +
The method of forward selection was utilized. This is started by creating a simple main effect model and gradually building up with higher order terms. The reduction of the model is done one step at a time utilizing the partial partition ANOVA (type II ANOVA test) for the main effects, and afterwards utilizing a likelihood ratio test along with AIC selection for the higher order terms. These was chosen since neither of the methods relies on the design being completely balanced (1). The methods are shown in detail in the likelihood appendix (link)
 +
<br><br>
 +
 +
<h3>Fixed gaussian effect model formulation, reduction and diagnostic</h3>
 +
 +
<p style="text-align:justify">
 +
The fixed gaussian model can be formulated as previously stated in the DOE section, in the following general matrix nation:
 +
\begin{equation}
 +
\textbf{Y}=X\boldsymbol{\beta}+\boldsymbol{\varepsilon}, \boldsymbol{\varepsilon} \sim N(0,\sigma^{2}\textbf{I})
 +
\end{equation}
 +
 +
For the response variable $\textbf{Y}$, it is assumed that the $\textbf{Y} \sim N(\mu,\sigma^{2})$.
 +
<br>
 +
The design matrix of the experiments is denoted by $\textbf{X}$ and $\beta$ is a vector of regression coefficients spanning between $\beta_{0}...\beta_{n}$.
 +
The $\varepsilon$ is the random residual component which is assumed to be i.i.d and follow a normal distribution with an equally weighted $\sigma^{2}$ around a mean of 0.
 +
<br><br>
 +
The full main effect models of the replicated and unreplicated fixed gaussian linear models can be seen below:
 +
<br><br>
 +
  
 
</p>
 
</p>
 +
 +
 +
 +
<h3>Pilot model</h3>
 +
 +
 +
<p style="text-align:justify">
 +
 +
\begin{equation}
 +
Y_{i}=\mu+\alpha(burn.time_{i})+\beta(burn.temp_{i})+\gamma(mix_{i})+\delta(incu.time_{i})+\zeta(sub_{i})+eta(W_{i})+ \varepsilon_{i} \ ,where \ \varepsilon_{i}  \sim N(0,\sigma^{2})
 +
\end{equation}
 +
The model is reduced via the type 2 ANOVA as shown below:
 +
<style type="text/css">
 +
.tg  {border-collapse:collapse;border-spacing:0;}
 +
.tg td{font-family:Arial, sans-serif;font-size:14px;padding:10px 5px;border-style:solid;border-width:1px;overflow:hidden;word-break:normal;border-color:black;}
 +
.tg th{font-family:Arial, sans-serif;font-size:14px;font-weight:normal;padding:10px 5px;border-style:solid;border-width:1px;overflow:hidden;word-break:normal;border-color:black;}
 +
.tg .tg-uys7{border-color:inherit;text-align:center}
 +
.tg .tg-xldj{border-color:inherit;text-align:left}
 +
</style>
 +
<table class="tg">
 +
  <tr>
 +
    <th class="tg-uys7" colspan="4">Type 2 ANOVA table</th>
 +
  </tr>
 +
  <tr>
 +
    <td class="tg-xldj">Factor</td>
 +
    <td class="tg-xldj">$\chi^{2}$</td>
 +
    <td class="tg-xldj">Degrees of freedom</td>
 +
    <td class="tg-xldj">P(&gt;$\chi^{2}$)</td>
 +
  </tr>
 +
  <tr>
 +
    <td class="tg-xldj">$\zeta(sub_{i})$</td>
 +
    <td class="tg-xldj">9.48</td>
 +
    <td class="tg-xldj">5</td>
 +
    <td class="tg-xldj">0.06</td>
 +
  </tr>
 +
  <tr>
 +
    <td class="tg-xldj">$\delta(incu.time_{i})$</td>
 +
    <td class="tg-xldj">7.96</td>
 +
    <td class="tg-xldj">1</td>
 +
    <td class="tg-xldj">&lt;0.01</td>
 +
  </tr>
 +
</table>
 +
 +
 +
<p style="text-align:justify">
 +
The pilot model is now fully reduced and has no need of higher order terms since this only had the purpose of screening for significant levels of main effects (2). The reason as to why the substrate is kept in the model is due to the fact that this was the main factor of interest (see more in post hoc section). The replicated model of this type can be seen in the appendix (link til model appendix)
 +
<br><br>
 +
  
 
</div>
 
</div>
  
  
 +
<h3>Gamma model with gaussian second stage model
 +
</h3>
  
  
 +
<p style="text-align:justify">
 +
 +
The next model is more complicated than the previous models (lnk til model appendix) as no explicit likelihood solution exists for it (2). In general the models of this type can be formulated as:
 +
\begin{equation}
 +
\boldsymbol{y} = X \boldsymbol{\beta} + Z \boldsymbol{u} + \boldsymbol{\epsilon}
 +
\end{equation}
 +
where X is the design matrix of the fixed effects, and the Z is the design matrix of the random effects. Where \boldsymbol{y} follows a Gamma distribution and the \boldsymbol{u} follows a gaussian distribution.
 +
<br>
 +
 +
The coefficients are stored in the $\boldsymbol{\beta}$ for the fixed effect and the \boldsymbol{u} contains the random effect coefficient. The explicit likelihood formulation and Laplace approximation solution to this type of the model is seen in the (link til likelihood appendix).
 +
<br><br>
 +
Using the same link function (log link) as presented in the gamma model (link til model appendix), the main effect model of this type can be written as:
 +
\begin{equation}
 +
\eta_{i}=log(\mu_[i})
 +
where \mu_{i}=\betax_{i}+\Psi_{i}
 +
\end{equation}
 +
This translates into:
 +
\begin{equation}
 +
log(Y_{i})=\mu+\alpha(burn.time_{i})+\beta(burn.temp_{i})+\gamma(mix_{i})+\delta(incu.time_{i})+\zeta(sub_{i})+eta(W_{i})+ \A(id_{i})+ \varepsilon_{i} \\
 +
where \ \varepsilon_{i}  \sim G(\alpha,\beta) \ and \ id_{i} \sim N(0,\Psi^{2})
 +
\end{equation}
 +
The response variable of the model is:
 +
<ul><li>
 +
$Y_{i}$ refers to the load measured in pascal for the i’th fungal cube
 +
</li></ul>
 +
<p>
 +
The fixed factor effects of the model are:
 +
 +
<ul>
 +
<li>$\mu$ is the overall mean</li>
 +
<li>$\alpha$ is the term describing the different burning times at the level of i=1..2
 +
</li>
 +
<li>$\beta$ is the term describing the different burning temperatures at the level of i=1..2</li>
 +
<li>$\gamma$ is the term describing the different mixing ratios at the level of i=1..3</li>
 +
 +
<li>$\delta$ is the term describing the different incubation times at the level of i=1..2</li>
 +
<li>$\zeta$ is the term describing the different substrates at the level of i=1..2</li>
 +
</ul>
 +
<p style="text-align:justify">
 +
Each term is to be understood as deviations from the overall mean <br><br>
 +
 +
The numerical effect of the model is<br>
 +
<ul><li>
 +
$\eta$ which is the weight of each cube $W_{i}$ corresponding to the $Y_{i}$ measurement</li></ul>
 +
 +
The random effects of the model are:
 +
<ul><li>$\A(id_{i})$ which is the individual cubes tested on each tray. </li></ul>
 +
 +
From this the following reduced model is found via LRT testing, resulting in the following:
 +
 +
<style type="text/css">
 +
.tg  {border-collapse:collapse;border-spacing:0;}
 +
.tg td{font-family:Arial, sans-serif;font-size:14px;padding:10px 5px;border-style:solid;border-width:1px;overflow:hidden;word-break:normal;border-color:black;}
 +
.tg th{font-family:Arial, sans-serif;font-size:14px;font-weight:normal;padding:10px 5px;border-style:solid;border-width:1px;overflow:hidden;word-break:normal;border-color:black;}
 +
.tg .tg-3bed{background-color:#555555;color:#ffffff;border-color:inherit;text-align:left;vertical-align:top}
 +
.tg .tg-c3ow{border-color:inherit;text-align:center;vertical-align:top}
 +
.tg .tg-uogg{background-color:#555555;color:#ffffff;text-align:left;vertical-align:top}
 +
.tg .tg-0pky{border-color:inherit;text-align:left;vertical-align:top}
 +
.tg .tg-0lax{text-align:left;vertical-align:top}
 +
</style>
 +
<table class="tg">
 +
  <tr>
 +
    <th class="tg-c3ow" colspan="5">LRT testing of fixed effects</th>
 +
  </tr>
 +
  <tr>
 +
    <td class="tg-3bed">Factor</td>
 +
    <td class="tg-0pky">Degrees of freedom</td>
 +
    <td class="tg-0lax">AIC</td>
 +
    <td class="tg-0pky">LRT</td>
 +
    <td class="tg-0lax">p(&gt;$\chi^{2}$)</td>
 +
  </tr>
 +
  <tr>
 +
    <td class="tg-uogg">$\delta(incu.time_{i})$</td>
 +
    <td class="tg-0lax">1</td>
 +
    <td class="tg-0lax">459.24</td>
 +
    <td class="tg-0lax">15.93</td>
 +
    <td class="tg-0lax">&lt;0.01</td>
 +
  </tr>
 +
  <tr>
 +
    <td class="tg-3bed">$\beta(burn.temp_{i})$</td>
 +
    <td class="tg-0pky">1</td>
 +
    <td class="tg-0lax">447.79</td>
 +
    <td class="tg-0pky">4.49</td>
 +
    <td class="tg-0lax">0.03</td>
 +
  </tr>
 +
  <tr>
 +
    <td class="tg-uogg">$\zeta(sub_{i})$</td>
 +
    <td class="tg-0lax">1</td>
 +
    <td class="tg-0lax">449.82</td>
 +
    <td class="tg-0lax">6.52</td>
 +
    <td class="tg-0lax">0.01</td>
 +
  </tr>
 +
</table>
 +
 +
<style type="text/css">
 +
.tg  {border-collapse:collapse;border-spacing:0;}
 +
.tg td{font-family:Arial, sans-serif;font-size:14px;padding:10px 5px;border-style:solid;border-width:1px;overflow:hidden;word-break:normal;border-color:black;}
 +
.tg th{font-family:Arial, sans-serif;font-size:14px;font-weight:normal;padding:10px 5px;border-style:solid;border-width:1px;overflow:hidden;word-break:normal;border-color:black;}
 +
.tg .tg-3bed{background-color:#555555;color:#ffffff;border-color:inherit;text-align:left;vertical-align:top}
 +
.tg .tg-c3ow{border-color:inherit;text-align:center;vertical-align:top}
 +
.tg .tg-uogg{background-color:#555555;color:#ffffff;text-align:left;vertical-align:top}
 +
.tg .tg-0pky{border-color:inherit;text-align:left;vertical-align:top}
 +
.tg .tg-0lax{text-align:left;vertical-align:top}
 +
</style>
 +
<table class="tg">
 +
  <tr>
 +
    <th class="tg-c3ow" colspan="5">LRT testing of random effects</th>
 +
  </tr>
 +
  <tr>
 +
    <td class="tg-3bed">Factor</td>
 +
    <td class="tg-0pky">Degrees of freedom</td>
 +
    <td class="tg-0lax">AIC</td>
 +
    <td class="tg-0pky">LRT</td>
 +
    <td class="tg-0lax">p(&gt;$\chi^{2}$)</td>
 +
  </tr>
 +
  <tr>
 +
    <td class="tg-uogg">Random effect removed</td>
 +
    <td class="tg-0lax">5</td>
 +
    <td class="tg-0lax">459.85</td>
 +
    <td class="tg-0lax">-</td>
 +
    <td class="tg-0lax">-</td>
 +
  </tr>
 +
  <tr>
 +
    <td class="tg-3bed">$\A(id_{i})$</td>
 +
    <td class="tg-0pky">6</td>
 +
    <td class="tg-0lax">456.01</td>
 +
    <td class="tg-0pky">7.62</td>
 +
    <td class="tg-0lax">&lt;0.01</td>
 +
  </tr>
 +
</table>
 +
 +
 +
<p style="text-align:justify">
 +
 +
(table above text): It seems the random cube to cube variation does contribute to the model as the LRT is significant and the AIC goes up if the term is removed. Thus the following model is identified:
 +
\begin{equation}
 +
Y_{i}=\mu+\beta(burn.temp_{i})+\delta(incu.time_{i})+\zeta(sub_{i})+ \A(id_{i})+ \varepsilon_{i} \\
 +
where \ \varepsilon_{i}  \sim G(\alpha,\beta) \ and \ id_{i} \sim N(0,\Psi^{2})
 +
\end{equation}
 +
<p style="text-align:justify">
 +
Inserting the 2 factor interactions resulted in a final model where no interaction terms were significant both for random and fixed terms. So, the final model of the gamma mixed model is the one presented above.
 +
<br><br>The residual analysis of the model is carried out the same way as the gaussian mixed model.<br><br>
 +
<p style="text-align:center"><img src="https://static.igem.org/mediawiki/2018/d/da/T--DTU-Denmark--modellingexpThies16.png" style="width: 90%;"  ><figcaption><p style="text-align:center; font-size:14px;"><b>Fig. 1: </b> - Thies</p></figcaption></p>
 +
<p style="text-align:center"><img src="https://static.igem.org/mediawiki/2018/d/d0/T--DTU-Denmark--modellingexpThies17.png" style="width: 90%;"  ><figcaption><p style="text-align:center; font-size:14px;"><b>Fig. 1: </b> - Thies</p></figcaption></p>
 +
<p style="text-align:center"><img src="https://static.igem.org/mediawiki/2018/9/9f/T--DTU-Denmark--modellingexpThies18.png" style="width: 90%;"  ><figcaption><p style="text-align:center; font-size:14px;"><b>Fig. 1: </b> - Thies</p></figcaption></p>
 +
<p style="text-align:center"><img src="https://static.igem.org/mediawiki/2018/2/21/T--DTU-Denmark--modellingexpThies19.png" style="width: 90%;"  ><figcaption><p style="text-align:center; font-size:14px;"><b>Fig. 1: </b> - Thies</p></figcaption></p>
 +
<p style="text-align:center"><img src="https://static.igem.org/mediawiki/2018/3/3c/T--DTU-Denmark--ranefStat.png" style="width: 90%;"  ><figcaption><p style="text-align:center; font-size:14px;"><b>Fig. 1: </b> - Thies</p></figcaption></p>
 +
 +
<p style="text-align:justify">
 +
 +
From the model it does not seem as if no autocorrelation is present in residuals or random effects, thus the assumption of independence is fulfilled. The normality assumption seems to hold for the random effect, but could be better for the residuals. The residual vs fitted shows that the model reasonably models out most variation in the data, but the variance is not constant as it is seen for the higher predicted values and the higher variance.<br><br>
 +
</p>
 +
 +
<h3>Model comparison/selection</h3>
 +
 +
 +
<p style="text-align:justify">
 +
As mentioned in the beginning, to choose the best models several parameters comes into play. The direct comparison via the AIC, BIC and goodness of fit the table is shown here:
 +
<style type="text/css">
 +
.tg  {border-collapse:collapse;border-spacing:0;}
 +
.tg td{font-family:Arial, sans-serif;font-size:14px;padding:10px 5px;border-style:solid;border-width:1px;overflow:hidden;word-break:normal;border-color:black;}
 +
.tg th{font-family:Arial, sans-serif;font-size:14px;font-weight:normal;padding:10px 5px;border-style:solid;border-width:1px;overflow:hidden;word-break:normal;border-color:black;}
 +
.tg .tg-3bed{background-color:#555555;color:#ffffff;border-color:inherit;text-align:left;vertical-align:top}
 +
.tg .tg-c3ow{border-color:inherit;text-align:center;vertical-align:top}
 +
.tg .tg-uogg{background-color:#555555;color:#ffffff;text-align:left;vertical-align:top}
 +
.tg .tg-0pky{border-color:inherit;text-align:left;vertical-align:top}
 +
.tg .tg-0lax{text-align:left;vertical-align:top}
 +
</style>
 +
<table class="tg">
 +
  <tr>
 +
    <th class="tg-c3ow">Model</th>
 +
    <th class="tg-0pky">AIC</th>
 +
    <th class="tg-0lax">BIC</th>
 +
    <th class="tg-0pky">Goodness of fit</th>
 +
  </tr>
 +
  <tr>
 +
    <td class="tg-3bed">gaussian fixed effect</td>
 +
    <td class="tg-0pky">487.2746</td>
 +
    <td class="tg-0lax">496.1956</td>
 +
    <td class="tg-0pky">0 %</td>
 +
  </tr>
 +
  <tr>
 +
    <td class="tg-uogg">gamma fixed effect</td>
 +
    <td class="tg-0lax">446.2626</td>
 +
    <td class="tg-0lax">456.9677</td>
 +
    <td class="tg-0lax">100 %</td>
 +
  </tr>
 +
  <tr>
 +
    <td class="tg-3bed">gaussian mixed effect</td>
 +
    <td class="tg-0pky">479.5087</td>
 +
    <td class="tg-0lax">490.2138</td>
 +
    <td class="tg-0pky">0 %</td>
 +
  </tr>
 +
  <tr>
 +
    <td class="tg-uogg">gamma mixed effect</td>
 +
    <td class="tg-0lax">441.3038</td>
 +
    <td class="tg-0lax">452.009</td>
 +
    <td class="tg-0lax">100 %</td>
 +
  </tr>
 +
</table>
 +
<p style="text-align:justify">
 +
 +
From the tabel is it seen that the gamma mixed effect model is the better choice due to goodness of fit of the data and the lower BIC and AIC. With the model choice in place, it is now time to do the post hoc analysis of the gamma mixed model.
 +
</p>
  
  

Revision as of 22:08, 17 October 2018

Modelling the Design

In order to create a successful brick, we needed to try out different combinations of growth medium and preparation of the bricks to gain the best compressive strength. An important factor when designing experiments is to decide what to test and which combinations are needed. Figuring this out without any help is pretty much impossible, which is why we approach it by modelling the experimental design. By doing this we can reduce the number of experiments significantly.

Two experimental designs were successfully created and carried out utilizing the optimal design approach for the compressive strength experiments. The designs were a replicated design with 17 trials, with four replicates, and an unreplicated full factorial design with 64 trials.

The data obtained in these experiments, lead to the creation of multiple statistical models. The aim of the models are to link the factors of burning time, burning temperature, substrate, incubation time, mixing ratio and density to compressive strength. The models enable us to predict the optimal processing of the fungal bricks, ensuring that the best possible compressive strength can be achieved.

To identify the best models, the following steps are carried out:

  1. Explorative and a descriptive analysis of the data. This was done at first to identify if any immediate trends can be discovered, which have to be taken into account in the modelling process.
  2. The actual modelling process, where different types of models are fitted and considered based on the explorative analysis.
  3. Modelling selection process, where the models are compared to determine which model describes the data best, with regards to parameter choice, likelihood and so on.
  4. Post hoc analysis to quantify uncertainties of the parameters, prediction uncertainty and contrast comparison.

With these four steps the optimal statistical model can be identified by linking compression strength to the previously mentioned parameters.

Explorative and descriptive analysis of the data

Summary stats

The unreplicated pilot experiment consisted of the load response variable and X factors. These can be seen from the table below:

Level Substrate Burning time Burning temperature Incubation time
1 s1: (jasmin rice) -1 (12 hours) -1 ($75^{\circ}$ ) -1 (1 week)
2 s2: (jasmin rice + saw dust) 1 (16 hours) 1,($100 ^{\circ}$ ) 1 (1.5 week)
3 s3: (coffee) - - -
5 s5 (blended rice + saw dust) - - -
6 s6 (coffe + saw dust) - - -
7 s7 blended rice - - -

Table 1: - Something

The replicated experiment consisted of 5 factors, 1 random effect and 1 response variable. These are listed in the table below.

level substrate burning time burning temperature incubation time mixing ratio
1 s1 (jasmin rice) -1 (12 hours) -1 (75) -1 (1 week) mix1 1:0 (rice:saw dust)
2 s2 (blended rice) 1 (16 hours) 1 (100) 1 (1.5 week) mix2 50:50(rice:saw dust)
3
coffee
- - - mix3 75:25(rice:saw dust)

Table 1: - Something.

Correlation matrix

To check the actual orthogonality of our data, provided by our design, a factor correlation matrix was calculated.

s1 s2 burn time burn temp. incubation time
s1 1.00 -1.00 0.22 0.22 0.07
s2 -1.00 1.00 0.22 0.22 -0.07
burn time -0.22 0.22 1.00 0.27 -0.03
burn temp. -0.22 0.22 0.27 1.00 -0.03
incubation time 0.07 -0.07 -0.03 0.03 1.00

Table 1: - From the correlation matrix it is seen that outside each factor level, very little confounding is present. Thus the actual design fulfills the purpose of little confounding between each factor.

Boxplots

To get an idea of which different levels of the fixed effects might have an influence on the material compressive strength, a set of box plots for the main effects was created to identify if any immediate visual difference would occur. This is shown in fig below (lav link til fig under) These are shown below.

Fig. 1: - Thies

Fig. 1: - Thies

Fig. 1: - Thies

Fig. 1: - Thies

Fig. 1: - Thies

From the boxplots it is seen that there is seemingly little difference between the burning times nor is there any difference between any of the mixing ratios. The factors which may differ are the burning temperatures, incubation times and substrates. To see if there actually is a significant difference see the post hoc analysis (link til post hoc).

Modelling formulation, fitting of model and reduction of model

During this section multiple models are considered. The models included fixed effect, general and generalized random effect models. These are listed in the table below:

Model number Fixed effect models General random effect models Generalized random effect models
1 General gaussian linear fixed effect Compound symmetry random effect model Gamma model with gaussian second stage model
2 Generalized gamma fixed effect model - -

The best model would end up being chosen based on the BIC and AIC likelihood criterion where the lowest numbers would decide which model is the best. This is then further compared to the goodness of fit. Note that all the less successful models along with their diagnostics can be seen in the modelling appendix (link).

First the general model reduction and diagnostic workflow is presented. The method of forward selection was utilized. This is started by creating a simple main effect model and gradually building up with higher order terms. The reduction of the model is done one step at a time utilizing the partial partition ANOVA (type II ANOVA test) for the main effects, and afterwards utilizing a likelihood ratio test along with AIC selection for the higher order terms. These was chosen since neither of the methods relies on the design being completely balanced (1). The methods are shown in detail in the likelihood appendix (link)

Fixed gaussian effect model formulation, reduction and diagnostic

The fixed gaussian model can be formulated as previously stated in the DOE section, in the following general matrix nation: \begin{equation} \textbf{Y}=X\boldsymbol{\beta}+\boldsymbol{\varepsilon}, \boldsymbol{\varepsilon} \sim N(0,\sigma^{2}\textbf{I}) \end{equation} For the response variable $\textbf{Y}$, it is assumed that the $\textbf{Y} \sim N(\mu,\sigma^{2})$.
The design matrix of the experiments is denoted by $\textbf{X}$ and $\beta$ is a vector of regression coefficients spanning between $\beta_{0}...\beta_{n}$. The $\varepsilon$ is the random residual component which is assumed to be i.i.d and follow a normal distribution with an equally weighted $\sigma^{2}$ around a mean of 0.

The full main effect models of the replicated and unreplicated fixed gaussian linear models can be seen below:

Pilot model

\begin{equation} Y_{i}=\mu+\alpha(burn.time_{i})+\beta(burn.temp_{i})+\gamma(mix_{i})+\delta(incu.time_{i})+\zeta(sub_{i})+eta(W_{i})+ \varepsilon_{i} \ ,where \ \varepsilon_{i} \sim N(0,\sigma^{2}) \end{equation} The model is reduced via the type 2 ANOVA as shown below:

Type 2 ANOVA table
Factor $\chi^{2}$ Degrees of freedom P(>$\chi^{2}$)
$\zeta(sub_{i})$ 9.48 5 0.06
$\delta(incu.time_{i})$ 7.96 1 <0.01

The pilot model is now fully reduced and has no need of higher order terms since this only had the purpose of screening for significant levels of main effects (2). The reason as to why the substrate is kept in the model is due to the fact that this was the main factor of interest (see more in post hoc section). The replicated model of this type can be seen in the appendix (link til model appendix)

Gamma model with gaussian second stage model

The next model is more complicated than the previous models (lnk til model appendix) as no explicit likelihood solution exists for it (2). In general the models of this type can be formulated as: \begin{equation} \boldsymbol{y} = X \boldsymbol{\beta} + Z \boldsymbol{u} + \boldsymbol{\epsilon} \end{equation} where X is the design matrix of the fixed effects, and the Z is the design matrix of the random effects. Where \boldsymbol{y} follows a Gamma distribution and the \boldsymbol{u} follows a gaussian distribution.
The coefficients are stored in the $\boldsymbol{\beta}$ for the fixed effect and the \boldsymbol{u} contains the random effect coefficient. The explicit likelihood formulation and Laplace approximation solution to this type of the model is seen in the (link til likelihood appendix).

Using the same link function (log link) as presented in the gamma model (link til model appendix), the main effect model of this type can be written as: \begin{equation} \eta_{i}=log(\mu_[i}) where \mu_{i}=\betax_{i}+\Psi_{i} \end{equation} This translates into: \begin{equation} log(Y_{i})=\mu+\alpha(burn.time_{i})+\beta(burn.temp_{i})+\gamma(mix_{i})+\delta(incu.time_{i})+\zeta(sub_{i})+eta(W_{i})+ \A(id_{i})+ \varepsilon_{i} \\ where \ \varepsilon_{i} \sim G(\alpha,\beta) \ and \ id_{i} \sim N(0,\Psi^{2}) \end{equation} The response variable of the model is:

  • $Y_{i}$ refers to the load measured in pascal for the i’th fungal cube

The fixed factor effects of the model are:

  • $\mu$ is the overall mean
  • $\alpha$ is the term describing the different burning times at the level of i=1..2
  • $\beta$ is the term describing the different burning temperatures at the level of i=1..2
  • $\gamma$ is the term describing the different mixing ratios at the level of i=1..3
  • $\delta$ is the term describing the different incubation times at the level of i=1..2
  • $\zeta$ is the term describing the different substrates at the level of i=1..2

Each term is to be understood as deviations from the overall mean

The numerical effect of the model is

  • $\eta$ which is the weight of each cube $W_{i}$ corresponding to the $Y_{i}$ measurement
The random effects of the model are:
  • $\A(id_{i})$ which is the individual cubes tested on each tray.
From this the following reduced model is found via LRT testing, resulting in the following:
LRT testing of fixed effects
Factor Degrees of freedom AIC LRT p(>$\chi^{2}$)
$\delta(incu.time_{i})$ 1 459.24 15.93 <0.01
$\beta(burn.temp_{i})$ 1 447.79 4.49 0.03
$\zeta(sub_{i})$ 1 449.82 6.52 0.01
LRT testing of random effects
Factor Degrees of freedom AIC LRT p(>$\chi^{2}$)
Random effect removed 5 459.85 - -
$\A(id_{i})$ 6 456.01 7.62 <0.01

(table above text): It seems the random cube to cube variation does contribute to the model as the LRT is significant and the AIC goes up if the term is removed. Thus the following model is identified: \begin{equation} Y_{i}=\mu+\beta(burn.temp_{i})+\delta(incu.time_{i})+\zeta(sub_{i})+ \A(id_{i})+ \varepsilon_{i} \\ where \ \varepsilon_{i} \sim G(\alpha,\beta) \ and \ id_{i} \sim N(0,\Psi^{2}) \end{equation}

Inserting the 2 factor interactions resulted in a final model where no interaction terms were significant both for random and fixed terms. So, the final model of the gamma mixed model is the one presented above.

The residual analysis of the model is carried out the same way as the gaussian mixed model.

Fig. 1: - Thies

Fig. 1: - Thies

Fig. 1: - Thies

Fig. 1: - Thies

Fig. 1: - Thies

From the model it does not seem as if no autocorrelation is present in residuals or random effects, thus the assumption of independence is fulfilled. The normality assumption seems to hold for the random effect, but could be better for the residuals. The residual vs fitted shows that the model reasonably models out most variation in the data, but the variance is not constant as it is seen for the higher predicted values and the higher variance.

Model comparison/selection

As mentioned in the beginning, to choose the best models several parameters comes into play. The direct comparison via the AIC, BIC and goodness of fit the table is shown here:

Model AIC BIC Goodness of fit
gaussian fixed effect 487.2746 496.1956 0 %
gamma fixed effect 446.2626 456.9677 100 %
gaussian mixed effect 479.5087 490.2138 0 %
gamma mixed effect 441.3038 452.009 100 %

From the tabel is it seen that the gamma mixed effect model is the better choice due to goodness of fit of the data and the lower BIC and AIC. With the model choice in place, it is now time to do the post hoc analysis of the gamma mixed model.