Team:AFCM-Egypt/Machine Learning

 

Deep Learning Model to classify DNA oligonucleotides according to TLR binding

Code Availability: Click Here



The aim of this deep learning is to provide an efficient classifier for predicting the likelihood of TLR binding of various DNA oligonucleotides through sequence features of binding motifs and CpG content.
Analysis Summary

Dataset
For the purpose of classification a dataset of 1,000,000 random ODNs varying from 6 to 20 nucleotides to VaccineDA -based on literature mining of TLR binding ODNs- which assigned immune-modulatory or non immune-modulatory in length and then submitting this dataset to R tensorflow to build the model. After down sampling a dataset of total 25,000 data points assigned randomly into test and training data. Each ODN will be assigned an encoded vector of equal length to the sequence of that ODN.


ODN classification model
We have categorized ODN into binders or non-binders according to Vaccine SVM algorithm. Hence, the classification predicts which ODN will bind to TLR or not based on testing through A deep feedforward fully connected ANN.


R packages for deep learning

We have installed essential R packages for deep learning including, TensorFlow, K Keras, tidyverse and DNAshapeR.


Network model definitions
We have implemented the sequential model through Keras which includes a dense layer which defines a standard neural network with proper connections between input (ODN vector) and output (binding classification) followed by a dropout layer which helps to prevent overfitting the model.



Figure shows performance plot of the FFN classifier


Conclusions

In this model we have provided a straightforward example of deep learning usage in classifying DNA oligos on the basis of TLR binding which could facilitate studies of metagenomic DNA effects on innate immunity and inflammatory pathways of various human diseases.
Although the accuracy of prediction have gone up for approximately 95% of accuracy, we still need to develop better deep learning models to avoid the over fluctuations of complex deep learning models during the process of parameter adjustment, we also need to train our models on curated datasets of proved significant as this model could be considered a model training on a predictive model despite the fact of including curated ODNs that have been well proven to be TLR-binding ligands.