Background
In 2017, an article calling DNA “an excellent medium for archiving data” was published in Nature Magazine. The research conducted by PhD. Shipman and PhD. Church tested the ability of CRISPR-Cas associated proteins, Cas1 and Cas2, as integrases in the addition of nucleotides to bacteria’s genome in a deliberate, designed fashion, enabling the writing of arbitrary information inside the genome. This way, any particular sequence of DNA can be inserted into bacteria’s genome and then interpreted as any kind of information; whether it is code represented by the four base pairs of DNA, or a specific sequence representing a particular message. The work of these scientists concluded in the demonstration of DNA’s capacity of capturing and storing real data.The E. coding system takes this and the work of other synthetic biologists, such as PhD. Núñez and PhD. Farzadfard, to create a new possibility for DNA as an Information Technology.
E. coding aims for a system that is able to receive information from its surroundings, in the form of any stimuli that may internalize in the genome, to then store a DNA sequence in the bacteria’s genome, which is specific to a particular stimulus. This system essentially comprises a biological memory, which first operates as a biosensor and then as an information storage unit.
The system consists of two main constructs, each carrying out a particular function. First, two cassettes regulated by one same inductor produce a retron sequence containing a designed DNA sequence to be integrated into the genome and produce a reverse transcriptase that retrotranscribes the retron. The DNA sequence from retrotranscription contains a PAM sequence that is recognizable by Cas1 and Cas2 proteins so that it is integrated into the CRISPR locus of the bacterium.
Module 1
Retrotranscriptase and Target DNA
Many approaches in DNA storage of information in living cells have been made in the past. SCRIBE (Synthetic Cellular Recorders Integrating Biological Events) is a programmable architecture that generates ssDNA inside of living cells in response to gene regulatory signals. When coexpressed with a recombinase, this ssDNA address specific target loci on the basis of sequence homology and introduce specific mutations into genomic DNA1,3. Instead of a recombinase, E. coding exploits the integrase activity of Cas1 and Cas2 proteins to insert the ssDNA into the bacteria’s genome.
SCRIBE uses 3 main components:
The msr-msd sequence in the retron cassette is flanked by two inverted repeats. Once transcribed, the msr-msd RNA folds into a secondary structure guided by the base pairing of the inverted repeats and the msr-msd sequence. The RT recognizes this secondary structure and uses a conserved guanosine residue in the msr as a priming site to reverse transcribe the msd sequence and produce a hybrid RNA-ssDNA molecule called msDNA (i.e., multicopy single-stranded DNA).
The msd region in the cassette is designed to include a specific sequence, referred to as PAM sequence, as well as the particular DNA sequence that is intended to be inserted. The PAM sequence is the site recognized by the Cas1-Cas2 protein complex to then be integrated into the genome. The DNA sequence chosen for a proof-of-concept was found to have particular affinity for integration into E. coli’s CRISPR locus. The sequence is set to identify the presence of the inductor that triggers the expression of the system when integrated into the genome.
SCRIBE uses 3 main components:
- A reverse transcriptase (RT) protein
- One msr RNA moiety which acts as a primer for the RT.
- A second RNA moiety, called msd, which represents a template for the reverse transcriptase.
The msr-msd sequence in the retron cassette is flanked by two inverted repeats. Once transcribed, the msr-msd RNA folds into a secondary structure guided by the base pairing of the inverted repeats and the msr-msd sequence. The RT recognizes this secondary structure and uses a conserved guanosine residue in the msr as a priming site to reverse transcribe the msd sequence and produce a hybrid RNA-ssDNA molecule called msDNA (i.e., multicopy single-stranded DNA).
The msd region in the cassette is designed to include a specific sequence, referred to as PAM sequence, as well as the particular DNA sequence that is intended to be inserted. The PAM sequence is the site recognized by the Cas1-Cas2 protein complex to then be integrated into the genome. The DNA sequence chosen for a proof-of-concept was found to have particular affinity for integration into E. coli’s CRISPR locus. The sequence is set to identify the presence of the inductor that triggers the expression of the system when integrated into the genome.
Single-stranded DNA generation in response to gene regulatory signals. A retrotranscriptase protein. An msr RNA moiety, which acts as a primer for the RT. An msd RNA moiety, which represents a template for the reverse transcriptase. This sequence contains the message of interest.
Module 2
Cas1-Cas2 Complex
Module 3
Genomic Spacer Acquisition
With each of the constructs from the first two modules tested and demonstrated, they were then tested with two approaches: inducing bacteria with Cas proteins production only and electroporating the target DNA as oligonucleotides ready for integration, and co-transforming E. coli with both Modules and inducing them.Induction & DNA Electroporation
As an initial test, bacteria containing the Cas1-Cas2 device were induced with IPTG under set parameters of concentration and time. After induction, they were electroporated to introduce oligonucleotides of the target ssDNA to be inserted in the genome, following the procedure of *******.[***]
In this procedure, it is expected that after induction the Cas1-Cas2 protein complex is formed and with the internalized oligonucleotides, it will begin the integration of the target DNA sequence into the bacteria’s CRISPR locus. This process determines the integrase activity of Cas1-Cas2 proteins, with an excessive amount of to-be-inserted DNA.
Results for this assay can be seen ************
Co-transforming & Induction
The purpose of the E. coding system is the detection of a stimulus external to the bacteria and the storage of information regarding that stimulus. For this purpose, it is necessary for the bacteria to generate the ssDNA, as well as the Cas integrase proteins whenever the stimulus is present. This is only possible if the bacteria contain both of the systems devices.
Co-transforming bacteria with the designed devices should enable them in the storage of a message corresponding to the input by which the promoters are induced. The proof-of-concept is tested under promoters induced by IPTG; one device generating a single type of message and the other device expressing the Cas proteins.
System Adaptation
The E. coding system could potentially monitor and store any kind of input that has an associated promoter. In that sense, a DNA producing device, just like the one used with IPTG, could be designed with its stimulus-specific promoter and a distinctive DNA sequence that represents the stimulus once integrated in the genome.There is also the possibility of storing the presence of more than one input in the same bacteria. By transforming with the DNA producing device specifically induced by each stimulus, the bacteria would produce each of the messages when they were exposed to them. There is are two different possibilities here regarding the production of the Cas proteins. A single device constantly expressing the integrase proteins could be included, however, it may represent great metabolic weight for the bacteria, which may hinder the system's functionality. On the other hand, a device expressing the proteins when induced by its associated input could be used for each of the inputs to be tracked. This would in turn mean that for each stimulus of interest two different devices would have to be transformed into the bacteria, which may prove unpractical for a more complex system.
References
- Sheth, R. U., Yim, S. S., Wu, F. L. & Wang, H. H. Science. 358, 1457–1461 (2017).
- A. Levy et al. Nature. 520, 505–510 (2015).
- S. L. Shipman et al. Science. 353, aaf1175 (2016).
- E. S. Lander. Cell. 164, 18–28 (2016).
- Nuñez, J. K., Kranzusch, P. J., Noeske, J., Wright, A., Davies, C. & Doudna, J. Nat. Struct. Mol. Biol. 21, 528–534 (2014).
- Díez-Villaseñor, C., Guzmán, N., Almendros, C., García-Martínez, J. & Mojica, F. J. RNA Biology. 10, 792-802 (2013).
- Yosef, I., Goren, M. & Qimron, U. Nucleic Acids Research. 40, 5569–5576 (2012).
- Yosef, I., Shitrit, D., Goren, M., Burstein, D., Pupko, T. & Qimron U. Proceedings of the National Academy of Sciences. 110, 14396-14401 (2013).
- Tesis de nuñez
- Xue, C., Whitis, N., Sashital, D. Molecular Cell. 64, 826–834 (2016).
- Church, G., Gao, Y., Kosuri, S. Science. 337, 1628 (2012).
- Panda, D. et al. 3 Biotech. 8, 239 (2018).
- Farzadfard, F., Lu, T. Science. 346 (2014).