Team:NPU-China/Model

This year, we designed the MitoCRAFT genome based on bioinformatics analysis and model guidance.
Mitochondria are the main sites where cellular aerobic respiration transpires. The Saccharomyces cerevisiae mitochondrial genome is 85,799 bp in length and contains 35 genes encoding cytochrome oxidase (cox1, cox2 and cox3), cytochrome b (cob), ATP synthase (atp6, atp8, atp9), ribosomal RNA (rnl and Rns), ribosomal protein (var1), RNA subunit of RNase P (rpm1) and 24 tRNAs, among which cox1, cob and rnl contain introns.
Figure 1.Mitochondrial gene annotation map
Note: This part of the bio-information analysis was mainly assisted by Dr. Cheng Jian (TIB-CAS).

1. Analysis of the relationship between mitochondrial genome size and intergenic region size in S. cerevisiae
In the process of the construction of genomic phylogenetic tree and the induction of the genome-scale size of S. cerevisiae, we found that there was a huge difference in the scales of mitochondrial genome of S. cerevisiae. It can be seen from the phylogenetic tree that the mitochondrial genome of C. castellii is only about 20 kbp in length, while that of the mitochondrial genome of S. cerevisiae is 86 kbp.
Figure 2. The phylogenetic tree of the Saccharomyces sensu stricto groups and their relatives. The tree was constructed based on the concatenation of eight protein-coding genes in all mtDNAs. The right pane of the histogram shows the size of each MT genome
Freel, Friedrich et al. pointed out that changes in the size of the mitochondrial genome are mainly determined by differences in intergenic regions. [1] We first analyzed the relationship between the size of the mitochondrial genomes of 15 species of Saccharomyces cerevisiae and that of their intergenic regions. The results are as follows:
Figure 3. (A) Scatter plot and line regression of the relationship between the size of MT genome and intergenic regions. (B) The expansion of intergenic region of mtDNA. The intergenic sequences were divided into four types: ORF sequences, ori sequences, AT spacers and GC clusters. The Y axis represented the size of four types of intergenic regions, for detailed data, see the S3 Table. (C) The (AT+TA)/(AA+TT) ratio in the AT spacers of fifteen yeasts. (D) Scatter plot and line regression of the relationship between the MT genome size and the relative number of GC clusters. The scatter plots in red ellipse represented five SSS yeasts.

The figure above reveals that the size of the intergenic region exhibits a significant linear correlation with the size of the entire mitochondrial genome. And these intergenic regions are primarily composed of AT/GC regions without specific executive function, which provides a basis for the simplification of the wild-type S. cerevisiae mitochondrial genome.
Before the simplification of the wild-type S. cerevisiae mitochondrial genome, we analyzed the sequence functionality and conservation of each part of the genome to determine whether each sequence is important.

2. Deletion of relatively non-conserved intergenic region sequences by Syntenic Orthologous Blocks
We initially considered the deletion of the intergenic region of the mitochondrial genome of Saccharomyces cerevisiae, but we knew that some intergenic sequences have direct effects on the regulation of Saccharomyces cerevisiae and we needed to find out intergenic regions that were relatively stable in evolutionary relationships. During the evolution process, different genes in the genomes will undergo rearrangement and reversal, when the relative positions of some genes will be stable, which are called syntenic orthologous blocks, while the rest will change dramatically. Theoretically, more mutations will occur in the intergenic sequences between these blocks than their internal intergenic regions, in other word, they are less conserved and can be deleted without influencing too much on mitochondrial functions.
We analyzed the evolution of gene order of Saccharomyces sensu stricto group and C. glabrata, then divided Blocks according to the order.
Gene order conservation (GOC) was defined as the number of contiguous orthologous pair in two genomes (Northologues, contiguous) divided by the total number of orthologues (Northologues). [2] All the GOC values between the 15 yeasts were estimated in the phylogenetic tree (Fig. 1-2) as:

The gene order loss (GOL) was defined as 1-GOC. Branch-specific GOL (bsGOL) could be obtained by minimizing the sum over 105 pairwise comparisons, of the squared differences between the frequency of the observed GOL and the sum of the predicted bsGOL values, the likelihood function is:
Where bi,j was a Boolean variable indicating the specific relation for the estimation of a particular bsGOL, GOLi were obtained from the pairwise comparisons (GOLi = 1-GOCi),xj was the actual bsGOL value apply for minimizing L. [2]
Table 1. Branch-Specific Rates of GOL in fifteen yeasts
Note: The ‘Branch’ col indicated all evolutional branches in the phylogenetic tree of fifteen yeasts. The ‘Leaf_Node’ col indicated the branches which included leaf nodes (i.e., fifteen yeasts). The ‘Branch Length’ col indicated the length of each branch in the phylogenetic tree. The ‘MT bsGOL’ indicated the branch-specific GOL of mitochondrial genome based on the pairwise comparisons of GOL. The ‘MT GOL Rate’ indicated the rearrangement rate of gene order in mitochondrial genome which was calculated by the ratio of bsGOL to branch length. The ‘Nuclear GOL Rate’ indicated the GOL rates in the nuclear genome of S. cerevisiae and S. paradoxus which were from Fischer et al. 2006.
Genes that were relatively unchanged in evolution are classified into a block. In total, the entire genome was divided into 7 blocks, each of which starts with a promoter and ends with a terminator, ori or ORF. The detailed results are as follows:
Figure 4. Evolution of gene order within the Saccharomyces sensu stricto group. Block1 includes rnl, tRNAs (T2,C,H,L,Q,K,R1,G,D,S1,R2,A,I,Y,N,M1) and cox2. Block2 includes tRNAs (F,T1,V), and cox3. Block3 includes tRNA (M2), rpm1 and tRNA (P). Block4 includes cox1, atp8 and atp6. Block5 includes tRNA (E) and cob. Block6 includes rns and tRNA (W). Block7 includes atp9, tRNA (S2) and var1. The downward and upward black arrows indicate the ori sequences in the positive and negative strands, respectively. The dashed arrows indicate that the ori sequences contain intervening GC clusters.

Based on the results of the division of Syntenic Orthologous blocks, we deleted all the intergenic regions except the necessary ori sequences between these blocks after retaining 100bp upstream and downstream of the blocks as a sequence buffer, 29301bp deleted in total.
3. Conservative analysis of functional genes and introns
The conservation of a gene refers to a sequence that has been preserved or similar in different species during evolution. The more conserved a sequence is, the more important it is to the life function of the organisms.
In order to determine the conservation of various functional genes in the mitochondrial genomes of Saccharomyces cerevisiae, we gauged it according to sequence identity and employed clustal W for multiple sequence alignment, based on the genomic sequence of five strains of the SSS group of S. cerevisiae and their annotation files. We discovered that most of the functional genes in the mitochondrial genomes of the five strains bore high similarity, which proved that these genes were very conserved in evolution and played a critical role in function.

Figure 5. The nucleotide identities of all mitochondrial genes in the Saccharomyces sensu stricto group. The nucleotide identity was calculated based on the proportion of completely conserved nucleotides in multiple sequence alignments of five SSS yeasts conducted using ClustalOmega. The red bars represent protein-coding genes; the green bar is rRNA; the gray bar is tRNA and the yellow bar is rpm1.

According to the conservative analysis of introns, we found that introns have completely opposite evolutionary properties to functional genes.
Figure 6. The distribution of introns in mtDNAs. (A) cox1 gene; (B) cob gene (C) rnl gene. The X axis represents the gene length and the vertical lines indicate the position of introns. The numbers on top represent the relative location of each intron in different yeasts. The rectangular frames indicate Group II introns, which include introns 1, 2 and 10 in cox1, and intron 3 in cob. The triangular frames indicate the Group I introns. The filled frames indicate the introns with embedded ORFs, and the empty frames indicate introns without ORFs.
The diagram illustrates that only the 10th intron of cox1 is relatively conserved. With the intention of bolstering the dirigibility of MitoCRAFT, we decided to delete these non-conserved introns. Additionally, with consideration of functional integrity, we also deleted the remaining one conserved intron, a total of 17,512 bp being deleted.

Figure 7. DNA Map of Mito Zero.

The simplified mitochondrial genome (Mito Zero) contains 8 proteins, 2 rRNAs, 1 ncRNA, 24 tRNAs, 3 ORFs and 3 oris, with a total length of 38966 bp.
Reference
[1]. Freel KC, Friedrich A, Schacherer J: Mitochondrial genome evolution in yeasts: an all-encompassing view. FEMS yeast research 2015, 15(4): fov023.
[2]. Rocha EP: Inference and analysis of the relative stability of bacterial chromosomes. Molecular Biology & Evolution 2006, 23(3):513–522.
[3]. Fischer G, Rocha EP, Brunet F, Vergassola M, Dujon B: Highly variable rates of genome rearrangements between hemiascomycetous yeast lineages. PLoS Genetics 2006, 2(3):e32.