In our project we introduce RNA interference (RNAi) and silencing with small interfering (si)RNAs as an alternative to CRISPR/Cas. To make use of our siRNA vector system Tace, functional siRNA for prokaryotic organisms must be determined. Thus, we developed a siRNA construction tool, which can find possible siRNAs for a given gene sequence and calculate their gene silencing probability. It consists of the three modules siRNAs for RNAi, siRNA, and check siRNA. Obtained siRNAs are perfectly compatible with our siRNA vector system. To the best of our knowledge, this is the first tool dedicated to predicting customized siRNA for the application in prokaryotes. This Python tool comes in two versions: a command line application and an easy-to-use graphical interface.
siRNAS short introduction
siRNAs are small single- or double-stranded RNAs with an average length of 21-25 nucleotides. They are non-coding RNAs which can bind a specific complementary coding mRNA and silence its function. During RNAi siRNAs are loaded to Argonaute proteins, which carry out the repression.(Siomi and Siomi, 2009). More about the siRNAs and the mechanisms are .
Choosing appropriate design methods
In 2012 the SYSU-Software Team integrated an siRNA cDNA designer as a small part in their project. siRNAs designed with this tool were applicable in eukaryotic organisms. They included two different design methods: Tom Tuschl’s method and Rational siRNA design.
Tom Tuschl’s method focuses basically on the existence of 5’ and 3’ ‘TT’ overhangs (Figure X)(Elbashir et al., 2001). These are not compatible with overhangs and scaffold sequences necessary for the prokaryotic mechanisms. Therefore, we decided to use the Ui-Tei rules as an alternative design method (Naito and Ui-Tei, 2012). We adapted the rational siRNA design since it was more suitable for our application (Reynolds et al., 2004). Both design rules apply only to the 19nt long target binding sequence.
Rational siRNA design
By a systematic analysis of 180 eukaryotic siRNAs Reynolds et al. identified eight criteria that are important for ther functionality (Reynolds et al., 2004). Each criterion gets a sciore that can be either positive or negative, corresponding to its effect on the siRNA. All siRNA candidates that have a score above six are potential high functional siRNAs.
Table 1:Rational siRNA design criteria with corresponding score
Rule
Score
30%-52% G/C content
+1
At least 3 'A/U' bases at positions 15-19
+1 (for each 'A/U' base)
Absence of internal repeats (\(T_m \lt 20\))
+1
An 'A' base at position 3
+1
An 'A' base at position 19
+1
An 'U' base at position 19
+1
A base other than 'G' or 'C' at 19
-1
A base other than 'G' at position 13
-1
The melting Temperature Tm is calculated as followed (Kibbe, 2007):
$$ T_m = 79.8 + 18.5 * log_10([Na^+]) + (58.4 * [G/C content])+(11.8*([G/C content])^2) - \left \frac{820}{length siRNA} \right $$
The tool checks each criterion and only considers siRNAs with a score higher than six for further steps.
Ui-Tei rule
Ui-Tei et al analyzed 62 eukaryotic siRNAs and identified four design rules for effective siRNAs (Ui-Tei, 2004). Only siRNAs fulfilling all four criteria are considered functional siRNAs.
An ‘A’ or ‘T’ at position 19
A ‘G’ or ‘C’ at position 1
At least five ‘U’ or ‘A’ residues from positions 13 to 19
No ‘GC’ stretch more than 9nt long
Calculating silencing probability
Not only the sequences of possible effective siRNAs are to be determined and returned by the tool, but also the probability with which they are effective. This probability can be calculated with the help of Bayes’ theorem by calculating probabilities of dependent events. The following calculations and formular are based on Takasaki (2009).
The initial hypothesis is that the given siRNA effectively silences an mRNA. To perform the calculations a prior probability is necessary. The prior probability for effective gene silencing of mammalian genes can be obtained from former siRNA experiments and is approximately 0.1. Since we have no data on prokaryotic siRNAs, we use the same prior probability for our prediction.
The gene silencing probability can be described as:
$$ P(eff|X) = \frac{P^{eff} P(X|eff)}{P(X)} \qquad (1)$$
\(P^{eff}\) is the prior probability 0.1 as mentioned above. The siRNA sequence is represented by \(X\), where \(X_1, X_2 ... X_n\) belong to the possible nucleotides adenine, guanine, cytosine and thymine. As \(P(X|eff)\) is the probability, that the given siRNA sequence will effectively silence if the nucleotides belong to the frequent nucleotides of common effective siRNAs, it is computed as the product of the probabilities that a particular nucleotide is located at a particular position of the siRNA:
$$ P(X|eff) = \prod_{i=1}^{19} q_{x_i^n}^{eff} \qquad (2)$$
\(q_{x_i^n}^{eff}\) indicates how likely the occurrence of base \(n\) is at position \(i\) based on known effective siRNAs. It can also be called frequency ratio of \(n\) at position \(i\). The last element \(P(X)\) of formula \((1)\) is the possibility that \(X\) will effectively silence the target sequence. It is the sum of the probability that \(X\) is effective if its nucleotides are found in effective siRNAs plus the probability that \(X\) is effective if its nucleotides are found in ineffective siRNAs. Both probabilities are weighted with the prior probabilities \(P^{eff}\) and \(P^{inf} = 1-P^{eff}\).
$$ P(X) = P^{eff} P(X|eff)+P^{inf} P(X|inf) \qquad (3)$$
\(P(X|inf)\) is calculated similar to \(P(X|eff)\) and is the probability that \(X\) will effectively silence if the nucleotides belong to the frequent nucleotides of common ineffective siRNAs.
$$ P(X|inf) = \prod_{i=1}^{19} q_{x_i^n}^{inf} \qquad (4)$$
In this case, \(q_{x_i^n}^{eff}\) indicates how likely the occurrence of base \(n\) is at position \(i\) based on known ineffective siRNAs.
With all defined formulas \((2)\),\((3)\) and \((4)\), formula \((1)\) can now be calculated as follows:
$$P(eff|X) = \frac{P^{eff} P(X|eff)}{P^{eff} P(X|eff)+P^{inf} P(X|inf)} = \frac{P^{eff} \prod_{i=1}^{19} q_{x_i^n}^{eff}}{P^{eff} \prod_{i=1}^{19} q_{x_i^n}^{eff}+P^{inf} \prod_{i=1}^{19} q_{x_i^n}^{inf}} $$
In order to actually calculate the silencing probability, only the frequency ratios q_{x_i^n}^{eff} and q_{x_i^n}^{inf} of the individual nucleotides at positions 1 to 19 are missing. These could be taken from the same publication from Takasaki as the calculations.
For the frequency ratios 833 effective and 847 ineffective siRNAs from previous publications were analyzed. For each nucleotide, the probability of occurrence was determined for each position of the siRNA. Different models were taken into account in the calculation. First of all, the occurrence of the different nucleotides at positions 1 to 19 can be considered independently. The probabilities for each position are then calculated independently. However, the occurrences of the nucleotides can also be considered dependently. This means the occurrence of a nucleotide depends on the nucleotide at the position before. For the calculation of dependent probabilities, the Simple Markow Model was used. It has been found that the resulting silence probability is most accurate when the frequency ratios of the effective siRNAs are calculated dependent and the frequency ratios of the ineffective siRNAs are calculated independent. All frequency ratios are summarized in Table X and Table Y.
Together with the frequency ratios it is now possible to calculate the silencing probability for the 19 bp long binding site of siRNAs.
siRNA overhangs and scaffolds
In order to achieve effective gene silencing or knockdown, the 19 bp binding sequence must be supplemented with overhangs. There are different sequences that can be added to the binding sequence for different functionalities.
In Figure N the scheme of RNAi siRNAs is shown. For the siRNA to be recognized by the RNase E, the 5’ end of the siRNA have to start with the nucleotides adenine and guanine (Foley et al., 2015). Furthermore, the nucleotides at position three and four are not allowed to match with the target mRNA. At the 3’ end of the siRNA the MicC scaffold is added, which recruits the RNase E and facilitates the hybridization of siRNA and target mRNA (Na et al., 2013).
Figure M shows the scheme of a siRNA that should only silence the mRNA target. To achieve a higher stability of the siRNA, the OmpA scaffold is added at the 5’ end. In addition, the hybridization of the siRNA and the target mRNA should be facilitated by MicC again.
These overhang and scaffold sequences are also part of our vector system. If the vector system is selected when using our tool, the fitting overlaps to the vectors are added automatically. More theoretical information about the overhangs and scaffolds can be found here.
Check siRNA (Alpha)
Beside the construction of siRNAs, we also implemented a check siRNA functionality, which is still under construction and only supports a few features right know. For a given target sequence and a corresponding siRNA it is checked whether the siRNA might bind to its target and how well it fulfills the described criteria’s. Furthermore, its silence effectivity is calculated.