Team:Cardiff Wales/Bioinformatics

Bioinformatics




Deciding on an RNA


When early discussions about the project took place, we considered targeting the aphid microbiome using sgRNAs, hoping that the bacteria they hold have a CRISPR system that we could essentially hijack. However, this idea was quickly put to rest as we ran a short bioinformatic test to identify whether the bacteria had a CRISPR system. These results can be seen below, and show that there is no identifiable CRISPR system in Buchnera aphidicola.





The above screenshot shows "3 Analysed Sequences" as the bacterial genome FASTA sequence was assembled into 3 contigs. This genome can be found on NCBI, or here.


Integrating Human Practices


For our human practices, we communicated with several stakeholders about our project, and their general knowledge and opinions of GM. One concern we found, especially when talking to retirees, was for the well-being of the local fauna. More specifically, we initially spoke to an amateur beekeeper, who expressed concerns about honeybee welfare by raising a previously unknown issue. That is, they suggested that wasps feed on aphid honeydew, and have been known to attack bees when they are low on food. Obviously any risk to bee populations would make the project completely unfavourable, so we took up the issue with the Welsh Beekeepers’ Association (WBKA). These professional beekeepers rapidly informed us that the honeybees themselves actually feed on aphid honeydew. The reduction of a food source for these bees is of minor concern; instead, the risk of the siRNAs that are secreted in the aphid honeydew raises more immediate and interesting concerns. To address these concerns and prove that our project is safe for the field, we analysed binding sites of every potential siRNA produced from our RNAi gene constructs, and ran a BLASTn of each of the produced siRNAs against any of the transcriptomes of any aphid predators, bees, and humans. Where the transcriptome sequences were not available, we used the genomic sequences. Here, it is important to note that as siRNAs only bind to RNA sequences, they can only affect transcribed sequences, so much of the matches to genomic sequences will not be in coding regions. This is why we used transcriptomic sequences where available. In addition, many genomes were not available for direct aphid predators, so instead of ruling these out, we took the genome sequence of the next closely available organism(s), moving up taxa, using the assumption that the genomic regions between these relatives and the actual predators must be at least somewhat conserved, and thus still serve as good indicators. Finally, we did not attempt to analyse the genomes of organisms that may eat the honeydew, but not the aphid directly, as this information is fairly sparse and indefinite. However, if interested, one can download the program from the link at the bottom of this page, and run the analysis themselves.




'Testing' toxicity


To analyse the toxicity of each siRNA, we entered each of their full FASTA sequences (BCR3, SP3, and C002) into the program, which splits them into 21 nucleotide probes (for maximum siRNA toxicity, assuming the minimum length for an siRNA is 21 nucleotides), each sliding by 1 nucleotide each time. Each of these output siRNAs was then run against each of the genomes or transcriptomes we were interested in with a BLASTn. Here it is important to note that due to our filtering of the E-value, the output only shows regions with a 100% match, anywhere from about 16 nucleotides to 21 nucleotides. This is because these then have no mismatches, which lowers the E-value. In reality, many of these sequences will be near perfect (but not quite) matches across the entire 21 nucleotide region, but the display only shows the 100% matches as these have low E-values. Adding mismatches would rapidly increase the E-values to non-significant levels. However, if one desires to run the script themselves, these values can be changed, showing a greater number of results as the minimum limit for the E-value increases. Below you can see the output windows with some annotation for each of the three siRNAs.


BCR3





The full unedited output for BCR3 can be downloaded here. This shows where each siRNA has matches in the host genome, allowing for more detailed analysis.

For example, the above PDF shows a single match against Apis mellifera, the Western Honey Bee. Of course, harming this species could have huge ecological consequences, so before our insecticides could be applied we would need to ensure that it could cause no harm to essential species. Thus, when we look at where the siRNA binds in the transcriptome of the bee, we find the following result:

INSERT SCREENSHOT OF GENOMIC HIT LOCATION

SP3





One again, the full xlsx file showing where each siRNA hits within each genome can be downloaded here.

This siRNA has no hits against the Honey Bee genome. The outputs at the bottom of the PDF were once hits against transcripts, but NCBI has annotated these and considers them non-functional, and thereby safe. Instead, let's look at the first Harmonia axyridis hit:

INSERT SCREENSHOT OF GENOMIC HIT LOCATION

C002 (positive control)





Again, the full file can be downloaded here. This is the positive control, an siRNA that has been used in research before and targets transcripts in the salivary glands of aphids. It is a much shorter sequence than BCR3 and SP3, meaning fewer siRNAs are produced, reducing the potential off-target effects. In future, we would optimise our BCR3 and SP3 sequences using this bioinformatic analysis, to create a shorter pre-siRNA from these genes, thus reducing the chance of off-target effects, potentially to include sequences that only occur in the aphid genomes. Of course, as more complete sequencing data becomes available, these results will likely change.

For continuity, lets look at the first genomic hit against Lasius niger, the Black Garden Ant.

INSERT SCREENSHOT OF GENOMIC HIT LOCATION


A link to the bioinformatics script can be found on Dr. Daniel Pass's GitHub. INSERT LINK. This script could prove to be a very useful tool for anyone who wishes to analyse potential siRNA toxicity. The input sequence to be made into siRNA sequences can be changed, the siRNA length (the default is 21 because siRNAs are usually 21-25 nucleotides in length, and so 21 nucleotides provides the highest toxicity, as shorter sequences are likely to come up with more matches, and will include all matches for siRNAs of larger lengths). In addition, the genomic or transcriptomic sequences to have a BLASTn run against can be changed. Thus, this tool is very flexible, and potentially useful to other siRNA using projects, be them for iGEM or not. Finally, the script comes with annotated help for all the variables and how to change them in the command. To see this, in the command window, enter the script name with the "-h" or "-help" switch (note, you don't need the quotation marks!).


  • Why we performed this analysis (link to HP)
  • What the bioinformatics results show
  • What the code does
  • Where to find the code (Dan's GitHub).