A novel algorithm predicts new tumor antigens

Neoantigens are novel peptide sequences produced by sources such as somatic mutations in tumors. They can trigger recognition by T cells when loaded onto major histocompatibility complex (MHC) molecules. Once recognized, T cells can then signal cell death and mount an immune response to the tumor. Several studies have shown the potential of neoantigen-based immunotherapy for cancer treatment, and numerous clinical trials have been initiated.

Accurate neoantigen prediction and prioritization are critical to understanding tumor immunology, immune checkpoint blockade therapies, and the design of personalized vaccines and T-cell therapies. The effectiveness of vaccines against neoantigens depends in part on whether the sequences presented to T cells have been previously exposed to the immune system and are subject to central tolerance.

Although various mutation types are being explored as sources of neoantigens, the vast majority of somatic mutations currently identified as sources of neoantigens are single nucleotide variants (SNVs) so we need to be careful to deal with subtle variations between wild-type and mutant peptides. A currently overlooked variable in the neoantigen prediction process is the relative position of mutation sites and anchor locations of patient-specific MHC molecules. In the novel peptide sequence, a portion of the peptide sites are presented to the T cell receptor for recognition, while other sites are responsible for anchoring to the MHC. T cell response forecasting relies heavily on taking these regions into account.

On April 7, 2023, Malachi and Obi Griffith's research group at Washington University in St. Louis published a paper in Science Immunology titled: Computational prediction of MHC anchor locations guides neoantigen identification and prioritization.

To predict the location of anchor points in different HLA alleles, the authors assembled an HLA-peptide dataset containing strongly binding peptides with a median IC50 of less than 500 nm. The authors identified 609,807 peptides (corresponding to 328 HLA alleles in 1,443 tumor samples) from TCGA and from a dataset of mutations in patients with different cancers including lymphoma, glioblastoma, breast cancer, and melanoma.

First, for each HLA allele for which data were obtained, the authors sorted and classified the peptides separately by their different lengths. Then, these peptides were in silico mutated at all possible positions and amino acids, and the binding affinity of each individual peptide was predicted.

The authors compared the predicted results with the binding affinity of the unmutated peptide sequences and thus assessed the change in binding interaction between the strongly binding peptide and the MHC molecule when each individual position was mutated—significant changes observed at a given position indicated that the amino acid at that position was more likely to act as an anchor. Conversely, little or no change in binding affinity indicated that the position was less likely to act as an anchor. Anchor scores for each position were obtained by summing all the peptides analyzed for each HLA allele. The results showed that there were significant differences in anchoring positions between the different HLA alleles.

To verify the anchor point predictions, the authors collected X-ray crystal structures of MHC molecules with bound peptides and used two computational analysis methods.

l Measurement of the physical distance between the peptide and the MHC binding groove

l Calculation of the solvent accessible surface area (SASA) of peptide residues

The authors further performed experimental validation of representative HLA alleles, including 8 different HLA alleles, by means of a cell-based stability assay and an IC50 competition binding assay.

To further analyze the impact of anchor loci, the authors predicted neoantigens for 923 selected TCGA patient-HLA allele pairs. The results showed that 5.7%–38.3% of neoantigens could be misclassified when comparing datasets screened under different criteria. These misclassifications contain peptides that may be affected by central tolerance and peptides that may be strong candidates.