Background
In 2017, an article calling DNA “an excellent medium for archiving data” was published in Nature Magazine. The research conducted by PhD. Shipman and PhD. Church tested the ability of CRISPR-Cas associated proteins, Cas1 and Cas2, as integrases in the addition of nucleotides to bacteria’s genome in a deliberate, designed fashion, enabling the writing of arbitrary information inside the genome. This way, any particular sequence of DNA can be inserted into bacteria’s genome and then interpreted as any kind of information; whether it is code represented by the four base pairs of DNA, or a specific sequence representing a particular message. The work of these scientists concluded in the demonstration of DNA’s capacity of capturing and storing real data.The E. coding system takes this and the work of other synthetic biologists, such as PhD. Núñez and PhD. Farzadfard, to create a new possibility for DNA as an Information Technology.
E. coding aims for a system that is able to receive information from its surroundings, in the form of any stimuli that may internalize in the genome, to then store a DNA sequence in the bacteria’s genome, which is specific to a particular stimulus. This system essentially comprises a biological memory, which first operates as a biosensor and then as an information storage unit.
The system consists of two main constructs, each carrying out a particular function. First, two cassettes regulated by one same inductor produce a retron sequence containing a designed DNA sequence to be integrated into the genome and produce a reverse transcriptase that retrotranscribes the retron. The DNA sequence from retrotranscription contains a PAM sequence that is recognizable by Cas1 and Cas2 proteins so that it is integrated into the CRISPR locus of the bacterium.
Module 1
Retrotranscriptase and Target DNA
Many approaches in DNA storage of information in living cells have been made in the past. SCRIBE (Synthetic Cellular Recorders Integrating Biological Events) is a programmable architecture that generates ssDNA inside of living cells in response to gene regulatory signals. When coexpressed with a recombinase, this ssDNA address specific target loci on the basis of sequence homology and introduce specific mutations into genomic DNA1,3. Instead of a recombinase, E. coding exploits the integrase activity of Cas1 and Cas2 proteins to insert the ssDNA into the bacteria’s genome. SCRIBE uses 3 main components:
- A reverse transcriptase (RT) protein
- One msr RNA moiety which acts as a primer for the RT.
- A second RNA moiety, called msd, which represents a template for the reverse transcriptase.
The msr-msd sequence in the retron cassette is flanked by two inverted repeats. Once transcribed, the msr-msd RNA folds into a secondary structure guided by the base pairing of the inverted repeats and the msr-msd sequence. The RT recognizes this secondary structure and uses a conserved guanosine residue in the msr as a priming site to reverse transcribe the msd sequence and produce a hybrid RNA-ssDNA molecule called msDNA (i.e., multicopy single-stranded DNA).
The msd region in the cassette is designed to include a specific sequence, referred to as PAM sequence, as well as the particular DNA sequence that is intended to be inserted. The PAM sequence is the site recognized by the Cas1-Cas2 protein complex to then be integrated into the genome. The DNA sequence chosen for a proof-of-concept was found to have particular affinity for integration into E. coli’s CRISPR locus. The sequence is set to identify the presence of the inductor that triggers the expression of the system when integrated into the genome.
Module 2
Cas1-Cas2 Complex
Module 3
Genomic Spacer Acquisition
References
- A reverse transcriptase (RT) protein
- One msr RNA moiety which acts as a primer for the RT.
- A second RNA moiety, called msd, which represents a template for the reverse transcriptase.