Background
In 2017, an article calling DNA “an excellent medium for archiving data” was published in Nature Magazine. The research conducted by PhD. Shipman and PhD. Church tested the ability of CRISPR-Cas associated proteins, Cas1 and Cas2, as integrases in the addition of nucleotides to bacteria’s genome in a deliberate, designed fashion, enabling the writing of arbitrary information inside the genome. This way, any particular sequence of DNA can be inserted into bacteria’s genome and then interpreted as any kind of information; whether it is code represented by the four base pairs of DNA, or a specific sequence representing a particular message. The work of these scientists concluded in the demonstration of DNA’s capacity of capturing and storing real data.
CRISPR (Clustered regularly interspaced short palindromic repeats)-Cas(CRISPR-associated sequence) is a prokaryotic immune system that protects bacteria and archaea from phages and plasmids. It is comprised of multiple Cas genes and an array (CRISPR-spacer arrays) of short repetition sequences, called repeats, separated by different short sequences of the same length, called spacers, most of which are derived from exogenous DNA.[2,3]
This immunological memory represents an efficient and robust form of recording events in living cells.[3] It consists of three main stages: adaptation, or spacer acquisition, where new spacers are acquired from foreign DNA and integrated into the array; expression, where repeat-spacer array is transcribed and further processed into short CRISPR RNAs (crRNAs); and interference, where the crRNAs bound to Cas proteins (forming the effector complex) identifies foreign sequences via base pairing of the crRNA and targets it for degradation.[2] The encoding Cas genes are usually arranged in operons located in close proximity to the arrays.[6] E. coding focuses on the adaptation stage, where exogenous DNA is integrated into the genome.
The E. coding system takes the work of synthetic biologists, such as Núñez, Farzadfard, and Shipman, to create a new possibility for DNA as an Information Technology.
E. coding aims for a system that is able to receive information from its surroundings, in the form of any stimuli that may internalize in the genome, to then store a particular DNA sequence in the bacteria’s genome, which is specific to a particular stimulus. This system essentially comprises a biological memory, which first operates as a biosensor and then as an information storage unit.
E. coding aims for a system that is able to receive information from its surroundings, in the form of any stimuli that may internalize in the genome, to then store a particular DNA sequence in the bacteria’s genome, which is specific to a particular stimulus. This system essentially comprises a biological memory, which first operates as a biosensor and then as an information storage unit.
CRISPR-spacer array, repeats represented as black rhomboids and each spacer represented in color.
The system consists of two main constructs, each carrying out a particular function. First, two cassettes regulated by one same inductor produce a retron sequence containing a designed DNA sequence to be integrated into the genome and produce a reverse transcriptase that retrotranscribes the retron. The DNA sequence from retrotranscription contains a PAM sequence that is recognizable by Cas1 and Cas2 proteins so that it is integrated into the CRISPR locus of the bacterium.
Module 1
Retrotranscriptase and Target DNA
Many approaches in DNA storage of information in living cells have been made in the past. SCRIBE (Synthetic Cellular Recorders Integrating Biological Events) is a programmable architecture that generates ssDNA inside of living cells in response to gene regulatory signals. When coexpressed with a recombinase, this ssDNA address specific target loci on the basis of sequence homology and introduce specific mutations into genomic DNA.[1,3] Instead of a recombinase, E. coding exploits the integrase activity of Cas1 and Cas2 proteins to insert the ssDNA into the bacteria’s genome.
SCRIBE uses 3 main components:
SCRIBE uses 3 main components:
- A reverse transcriptase (RT) protein
- One msr RNA moiety which acts as a primer for the RT.
- A second RNA moiety, called msd, which represents a template for the reverse transcriptase.
Single-stranded DNA generation in response to gene regulatory signals. A retrotranscriptase protein. An msr RNA moiety, which acts as a primer for the RT. An msd RNA moiety, which represents a template for the reverse transcriptase. This sequence contains the message of interest.
The msr-msd sequence in the retron cassette is flanked by two inverted repeats. Once transcribed, the msr-msd RNA folds into a secondary structure guided by the base pairing of the inverted repeats and the msr-msd sequence. The RT recognizes this secondary structure and uses a conserved guanosine residue in the msr as a priming site to reverse transcribe the msd sequence and produce a hybrid RNA-ssDNA molecule called msDNA (i.e., multicopy single-stranded DNA).
(Left) Msr-msd sequence before being retrotranscribed. (Right) Structure after retrotranscription.
The msd region in the cassette is designed to include a specific sequence, referred to as PAM sequence, as well as the particular DNA sequence that is intended to be inserted. The PAM sequence is the site recognized by the Cas1-Cas2 protein complex to then be integrated into the genome. The DNA sequence chosen for a proof-of-concept was found to have particular affinity for integration into E. coli's CRISPR locus. The sequence is set to identify the presence of the inductor that triggers the expression of the system when integrated into the genome.
Location of the PAM sequence in the msd region.
Module 2
Genomic Spacer Acquisition
Cas1 and Cas2 are the proteins in charge of acquisition of new sequences and are highly conserved among CRISPR systems from different species, which suggest a common integration mechanism. It has been demonstrated that these proteins form a complex and are the only ones needed for new spacer acquisition.[5] However, overexpression of Cas1-Cas2 in the absence of other Cas proteins is needed to afford high integration rates in vivo.[2,9]In BL21 (DE3), our model bacteria, there are two CRISPR-spacer arrays. Next to the arrays, there is usually an AT-rich sequence known as the leader, which contains the promoter that directs transcription of the adjacent array. It has been demonstrated that the leader sequence is essential for spacer acquisition. Repeats consist of 29 nucleotides, and the sequence is virtually the same for every CRISPR-Cas array, varying in only 1 nucleotide. These duplicons are almost always interspaced by 33-nucleotide spacers.[6]
The integration of new spacer sequences is nonrandom[9], and is influenced mainly by the protospacer adjacent motif (PAM, 5’-AAG-3’).[8] This trinucleotide is critical for correct target DNA binding and cleavage.[5] The endonuclease activity of the Cas1-Cas2 complex consists of cutting 35 base-pairs, starting with the PAM sequence, and ending with an AAM trinucleotide sequence. This 35 base-pair oligonucleotide, also known as protospacer, is taken to the genome by the Cas1-Cas2 complex, which recognize the leader sequence, generate a double-strand cut, and insert the new sequence in the leader-proximal end of the array, cutting off the first two nucleotides of the protospacer, and leaving a 33 base-pair spacer. After the insertion, a new repeat is generated. The exact mechanism is unknown.
The second module of E. coding consists of the integrase activity of Cas1-Cas2 proteins by recognition of a PAM sequence in the target ssDNA.
Cas complex recognizing PAM sequence and excising protospacer.
Spacer acquisition in CRISPR-spacer array.
Induction & DNA Electroporation
In testing this module, bacteria containing the Cas1-Cas2 device were induced with IPTG under set parameters of concentration and time. After induction, they were electroporated to introduce oligonucleotides of the target ssDNA to be inserted in the genome, following the procedure of *******.[***]
In this procedure, it is expected that after induction the Cas1-Cas2 protein complex is formed and with the internalized oligonucleotides, it will begin the integration of the target DNA sequence into the bacteria’s CRISPR locus. This process determines the integrase activity of Cas1-Cas2 proteins, with an excessive amount of to-be-inserted DNA.
Results for this assay can be seen ************
System Functionality
Message Generation and Integration
With each of the constructs from the first two modules tested and demonstrated, they were then tested together: co-transforming E. coli with both Modules and inducing them.Co-transforming & Induction
The purpose of the E. coding system is the detection of a stimulus external to the bacteria and the storage of information regarding that stimulus. For this purpose, it is necessary for the bacteria to generate the ssDNA, as well as the Cas integrase proteins whenever the stimulus is present. This is only possible if the bacteria contain both of the systems devices.
Co-transforming bacteria with the designed devices should enable them in the storage of a message corresponding to the input by which the promoters are induced. The proof-of-concept is tested under promoters induced by IPTG; one device generating a single type of message and the other device expressing the Cas proteins.
System Adaptation
The E. coding system could potentially monitor and store any kind of input that has an associated promoter. In that sense, a DNA producing device, just like the one used with IPTG, could be designed with its stimulus-specific promoter and a distinctive DNA sequence that represents the stimulus once integrated in the genome.
There is also the possibility of storing the presence of more than one input in the same bacteria. By transforming with the DNA producing device specifically induced by each stimulus, the bacteria would produce each of the messages when they were exposed to them. There is are two different possibilities here regarding the production of the Cas proteins. A single device constantly expressing the integrase proteins could be included, however, it may represent great metabolic weight for the bacteria, which may hinder the system's functionality. On the other hand, a device expressing the proteins when induced by its associated input could be used for each of the inputs to be tracked. This would in turn mean that for each stimulus of interest two different devices would have to be transformed into the bacteria, which may prove unpractical for a more complex system.
References
- Sheth, R. U., Yim, S. S., Wu, F. L. & Wang, H. H. Science. 358, 1457–1461 (2017).
- A. Levy et al. Nature. 520, 505–510 (2015).
- S. L. Shipman et al. Science. 353, aaf1175 (2016).
- E. S. Lander. Cell. 164, 18–28 (2016).
- Nuñez, J. K., Kranzusch, P. J., Noeske, J., Wright, A., Davies, C. & Doudna, J. Nat. Struct. Mol. Biol. 21, 528–534 (2014).
- Díez-Villaseñor, C., Guzmán, N., Almendros, C., García-Martínez, J. & Mojica, F. J. RNA Biology. 10, 792-802 (2013).
- Yosef, I., Goren, M. & Qimron, U. Nucleic Acids Research. 40, 5569–5576 (2012).
- Yosef, I., Shitrit, D., Goren, M., Burstein, D., Pupko, T. & Qimron U. Proceedings of the National Academy of Sciences. 110, 14396-14401 (2013).
- Tesis de nuñez
- Xue, C., Whitis, N., Sashital, D. Molecular Cell. 64, 826–834 (2016).
- Church, G., Gao, Y., Kosuri, S. Science. 337, 1628 (2012).
- Panda, D. et al. 3 Biotech. 8, 239 (2018).
- Farzadfard, F., Lu, T. Science. 346 (2014).