Team:Tongji China/Modeling

Programme
Dry Lab
Model

Acknowledge:CPU_China. This part was built up together with Team CPU_China and thanks for their collaboration!


Phase 1. Bayesian statistics

We use Bayesian statistics to predict which type of mutation is most likely to product MHC strong binding peptides with the sum of the affinity of each mutation site and each allele type.

Bayesian statistics is a theory in the field of statistics based on the Bayesian interpretation of probability where probability expresses a degree of belief in an event. The degree of belief may be based on prior knowledge about the event, such as the results of previous experiments, or on personal beliefs about the event. This differs from a number of other interpretations of probability, such as the frequentist interpretation that views probability as the limit of the relative frequency of an event after a large number of trials.

Bayes' theorem is a fundamental theorem in Bayesian statistics, as it is used by Bayesian methods to update probabilities, which are degrees of belief, after obtaining new data. Given two events A and B, the conditional probability of A given that B is true is expressed as follows:

In the above equation, A usually represents a proposition (such as the statement that a coin lands on heads fifty percent of the time) and B represents the evidence, or new data that is to be considered (such as the result of a series of coin flips). P(A) is the prior probability of A which expresses one's beliefs about A before evidence is considered. The prior probability may also quantify prior knowledge or information about A. P(B|A) is the likelihood function, which can be interpreted as the probability of the evidence B given that A is true. The likelihood quantifies the extent to which the evidence B supports the proposition A. P(A|B) is the posterior probability, the probability of the proposition B into account. Essentially, Bayes' theorem updates one's prior beliefs P(A) after considering the new evidence B.

The probability of the evidence P(B) can be calculated using the law of total probability. If {A1, A2, …, An} is a partition of the sample space, which is the set of all outcomes of an experiment, then,

The formulation of statistical models using Bayesian statistics has the identifying feature of requiring the specification of prior distributions for any unknown parameters. Indeed, parameters of prior distributions may themselves have prior distributions, leading to Bayesian hierarchical modeling, or may be interrelated, leading to Bayesian networks.

Phase 2. Prediction of mutations most likely to bind MHC I

We use Bayesian statistics to predict which type of mutation is most likely to product MHC strong binding peptides with the sum of the affinity of each mutation site and each allele type.
The heat map below shows the sum of the affinity of each allele type and each mutation.

Figure.Model.1 Heatmap of the affinity of each allele type and each mutation.


From the heatmap above, we could know that the mutation sites at the bottom of the heatmap have big affinity amount, and some mutation sites at the middle show small sum of affinity. Considering the "affinity" presents the amount of peptides binding to a certain amount of MHC-I molicule, lower "affinity" means stronger binding to the MHC-I. So in that heatmap, if cell color is close to yellow, the mutation site with the allele of that cell may product MHC strong binding peptides. On the contrary if cell color is close to blue, the mutation site with the allele of that cell probabily can not product MHC-I binding peptides.
If a colorectal cancer patient is detected to have an immune response to our medicine, we can predict which mutation is playing the strongest role in cancer using this model. If a patient's mutation sites are already known, this model can also help predict which site can be the best one for peptide making for this certain patient which contributes to the individual therapy.


Reference:
https://en.wikipedia.org/wiki/Bayesian_statistics


affinity_conseqeunce.csv Click to download the consequence file