Barcoding and Library Preparation
Each sequencing technology has its own mechanism of sequencing. Third generation of sequencing uses pores through which nucleic acid strand is pulled through to read the genetic information. In order to assure that nucleic acids pass the pore at correct speed and orientation, adaptors with motor proteins need to be attached to ends of cDNA molecules. Motor protein then anneals to the pore and pulls the molecule through [1].
Other the adaptors, barcodes are also attached to the cDNA sample. In our application, two different samples are sequenced simultaneously using one flow cell. In order to distinguish which molecule belongs to what sample, barcodes (short DNA fragments with known sequence) are ligated to the cDNA. Subsequent bioinformatic analysis allows sorting the reads according to the barcodes and assign them into two distinct samples.
Experiment
Initially, cDNA is treated with Ultra II End Prep (NEB), which performs end-repair and tailing. End-prep assures that all fragments end in blunt ends and that there are no overhangs, end-tailing adds non-template dAMP to 3´end, which is complementary with dT on barcodes, which are ligated in the subsequent step using Blunt/TA ligase. After barcode ligation, adaptors are ligated using Quick T4 Ligation Kit. Library is then ready to be loaded into the flow cell after passing through the checkpoint.
Results
At this stage, the limited amount of material limits ways of assuring that the library preparation was successful. Under normal circumstances, it would be possible to check quality of cDNA library with Nanodrop. But due to rather small volume and concentration, it was decided that only quantity will be measured using Qubit (as Nanodrop has shown to not being very accurate below concentrations of 30 ng/µl). Table 1 shows the usual yield in various steps.
Table 1. Approximate yields of material at various steps of library prep. Values show most common yield that were obtained throughout the different library preparations. Measured by Qubit
Step | Amount [ng] |
---|---|
Input mRNA (per sample) | 250 |
Output cDNA (per sample) | 550 |
After End-Prep (per sample) | 400 |
After barcoding (per sample) | 350 |
Pooled (both samples) | 700 |
Library (both samples) | 350 |
Troubleshooting: Are adaptors/barcodes attached properly?
During the library prep, usually about 20% of material was lost in the beads purification step. Interestingly, 50% of all material was lost in the final purification of the library. This step should potentially be optimized as this could be one of the reasons for low throughput.
Hypothesis
It was not certain whether the barcodes and adaptors were attached properly. If some of these steps fails, all the subsequent part would most likely fail as well and therefore lead to low sequencing throughput. In order to test if the library preparation has been done correctly, we prepared a library of standard lambda phage DNA (provided with the kit for troubleshooting). Table 2 shows the yields at the various steps. Interestingly, large amount of material is lost at the adaptor ligation step.
Table 2. Yields at the various steps.
Step | Yield [ng] |
---|---|
Input gDNA | 870 |
After End-prep | 750 |
Adaptor ligation | 450 |
Figure 1. Sequencing throughput. In light green, the actively sewuencing pores are show, dark green are currently empty pores waiting for a molecule, blue and other pores are inactive.
Results in figure 1 suggest that there is relatively small amount of DNA (low sequencing throughput), which can be caused by loss of material during the bead purification step.
Figure 2. Quality score of obtained reads.
Graph shown in figure 2 shows that reads are of high quality. In our actual sequencing runs, reads were always of very low quality. This result suggests that low quality / amount of passed reads is most likely due to input material (cDNA library) rather than to the library preparation itself.
We have seen that when library from genomic DNA is performed, sequencing is of decent quality. The throughput is also rather low, but quality of reads is high, something that has never been achieved with our library. We can therefore assume that in general, library preparation has one issue which is common across all experiments. The issue is most likely loss of material during bead purification which leads to lower throughout as not all pores are occupied at all times.
Discussion
Library preparation is a complex procedure involving multiple enzymes and purification steps. Decreased efficiency of library preparation can be due to malfunctioning of any of the steps. The major issue in prepared libraries has been low sequencing throughput and low quality of reads. We have therefore tested if the issue is somehow connected to our samples or to the actual library prep. Since preparing library from supplied phage DNA was successful (high quality reads, decent throughput), we concluded that the issue was in fact in our input material. This has later proven to be true due to RNA contamination of the libraries as described in cDNA synthesis.
Even with RNA contamination as the potential explanation for low quality reads (RNA is being sequenced using algorithm for DNA and therefore the bases are not being recognized) the problem of low throughput persisted. Major losses are seen during the library prep (up to 75%). According to Oxford Nanopore, this loss is expected, Question is whether it would be worth to increase input material above the recommendation of the manufacturer to achieve higher throughput.
Sequencing using Oxford Nanopore has been used mainly for long fragments of genomic DNA. In our application we aimed to sequenced very short reads (average about 1 kb) of cDNA. As this application is relatively new, we assume the process might not be fully optimized (eg. retention of small fragments by beads, amount of input library, etc.) for our application.
Additional troubleshooting would need to be performed to adjust the protocols provided by Oxford Nanopore to our application, which was unfortunately not possible in the course of this project due to budgetary and time restrictions.
Conclusion
Most issues connected with low sequencing throughput link back to contamination of the library with RNA. If this issue was to be removed, sequencing in sufficient throughput and quality would be possible as shown on the example of sequencing lambda phage gDNA.
References
[1] Oxford Nanopore, DNA: nanopore sequencing, [online], 2018, https://nanoporetech.com/applications/dna-nanopore-sequencing