Team:Rice/Software/Step1

Library Initiation Details

To determine the ideal Shine-Dalgarno (SD) / Anti-Shine-Dalgarno (ASD) sequence for use as a truly orthogonal ribosome in a particular organism, we first start by generating a dictionary of all possible SD sequences. Since the SD sequence is 6 base-pairs and each position has 4 possible options (A, U, C, G), we generate a total of 4096 different possible sequences. This starting dictionary will be narrowed down in future steps to ultimately determine which complementary ASD candidates are the best for use in an orthogonal ribosome.

Function: libbuild


Step 1 was done by using six nested for loops, each of which choose a number between 1 and 4. A 6-digit number was generated from this process, each number of which was then translated into a specific base pair (1 = A, 2 = U, 3 = C, 4 = G). Repeating this loop for all possible 6-digit combinations of numbers between 1 and 4 gave us all 4096 possible pre-ASD sequences. The ASD is generated by wrapping each pre-ASD sequence in a 4 (prefix) and 2 bp (suffix) sequence, respectively. For example, the ASD for E. coli was generated by wrapping each 6 bp SD sequence in AUCA (prefix) and UA (suffix), to ultimately generate the ASD in the form “AUCA------UA”, where the dashes represent the 6 randomized bases (see description of getASD function to see how we found the correct prefix and suffix). We then take the reverse complement of those 6-base sequences to get the corresponding SD sequences. We then format the entire dictionary so that the first SD sequence can be called by the key “SD1”, and the first ASD sequence by “ASD1”, and so on.

Figure S1. Randomization of six base in the anti Shine-Dalgarno sequence.


Function: getASD


It has been known that “ACCUCC” is a highly prevalent sequence found on rRNA that is complementary to the SD sequence found on the mRNA. To determine the prefix and suffix of the ASD of a particular organism, we took the highly conserved 16s rRNA sequence and found the “ACCUCC” motif within it (using the finditer method, which locates patterns within a particular string). We then looked 3 bps upstream of this motif to find the prefix and 3 bps downstream of the motif to find the suffix of the ASD. The function also returns the index of the starting base of the ASD sequence in the chromosome.