This year our team created a mathematical model to optimize the arrangement of the nif gene cluster. This model helped we optimized our design and provided some new perspectives of our nitrogen-fixation system in transcriptional level.
We developed this model with two goals in mind:
1.We want to achieve the best stoichiometric proportion of each nif gene, which is nifB:nifH:nifD:nifK:nifE:nifN:nifX:nifV=1:3:4:4:1:1:1:1.
2.We want our system as simple as possible, that means minimizing the number of promoters and copy number of each nif gene.
We made the following assumptions:
1.There are two kinds of promoters, both of which can successfully launch the expression of every nitrogen fixation gene involved in our system.
2.One promoter is stronger (called H) while the other is relatively weak(called L). Under promoter H, each gene’s transcription level is double that of under promoter L.
3.The order of genes has little influence on their transcriptional level.
We conducted Real-time Quantitative PCR to detect the transcription level of nif gene cluster and the experimental data we received became an important reference for our modeling.
gene |
Average value of Cq |
Relative expression level |
16S DNA |
6.33 |
|
nifB |
19.97 |
7.80E-05 |
nifH |
17.37 |
4.74E-04 |
nifD |
18.34 |
2.42E-04 |
nifK |
20.77 |
4.48E-05 |
nifE |
22.20 |
1.66E-05 |
nifN |
22.24 |
1.62E-05 |
nifX |
22.92 |
1.01E-05 |
nifV |
21.25 |
3.22E-05 |
Table1 The result of qPCR
Method:
To start with, we put all genes into two groups. One group is under the strong promoter while the other is under the weak one. We introduced some parameters shown in table2.
Parameters/data |
Meanings |
weak[] |
the expression level of each nif gene under the weak promoter |
strong[] |
the expression level of each nif gene under the strong promoter |
expected[] |
the ideal stoichiometric proportion |
d |
deviation between the expected expression level and the actual expression level |
Table 2
Then we did some necessary preprocessing. Firstly, we presumed the smallest element in each array was 1 and normalized all the other data accordingly. In addition, to ensure there is at least one solution, we adjusted expected[] to make each element greater than or equal to the smallest expression level of the corresponding gene.
After that, we began the organization. In order to minimize the total number of genes, we arranged the strong promoter group first, and considered the weak group later. For each gene, we constantly added one copy of it to the strong promoter group, calculated the current deviation after each addition and compared the current deviation with the last one. If the deviation was decreasing ,we added one more copy and repeated the operation until the last deviation was smaller than the current one. In that way, we were able to determine the number of each gene with which the deviations were the smallest and completed the arrangement of the strong group. Similarly, we arranged the weak group and finally received the result.
Fig 1. A flow diagram describing the idea of our modeling process
According to this flow diagram, we programmed with Python and got the following results:
Fig 2. The best arrangement of nif genes according to our calculation
With this arrangement, the proportion of nifB: nifH: nifD: nifK: nifE: nifN: nifX: nifV = 15.44: 46.93: 71.88: 62.10: 16.44: 16.04: 16.0: 15.94, which is most close to the ideal proportion among all the solutions.