한빛사논문
S. Kim1,†. H. S. Kim2,3,†, E. Kim1, M. G. Lee2, E. Shin4, S. Paik1,3, S. Kim1,*
1 Severance Biomedical Science Institute, Brain Korea 21 PLUS Project for Medical Sciences, Yonsei
University College of Medicine, Seoul 03722, Korea
2 Department of Pharmacology, Pharmacogenomic Research Center for Membrane Transporters, Brain Korea
21 PLUS Project for Medical Sciences, Yonsei University College of Medicine, Seoul 03722, Korea
3 Yonsei Cancer Center, Division of Medical Oncology, Department of Internal Medicine, Yonsei University
College of Medicine, Seoul 03722, Korea
4 Graduate School of Medical Science and Engineering, KAIST, Daejeon 34141, Korea
* Correspondence to: Prof. Sangwoo Kim, Severance Biomedical Science Institute, Yonsei University College of
Medicine, Seoul 120-752, Korea.
† These authors contributed equally to this work.
Abstract
Background
Tumor-specific mutations form novel immunogenic peptides called neoantigens. Neoantigens can be used as a biomarker predicting patient response to cancer immunotherapy. Although a predicted binding affinity (IC50) between peptide and major histocompatibility complex class I (MHC-I) is currently used for neoantigen prediction, large number of false-positives exist.
Materials and methods
We developed Neopepsee, a machine learning-based neoantigen prediction program for next-generation sequencing data. With raw RNA-seq data and a list of somatic mutations, Neopepsee automatically extracts mutated peptide sequences and gene expression levels. We tested 14 immunogenicity features to construct a machine-learning classifier and compared with the conventional methods based on IC50 regarding sensitivity and specificity. We tested Neopepsee on independent data sets from melanoma, leukemia, and stomach cancer.
Results
Nine of 14 immunogenicity features that are informative and inter-independent were used to construct the machine-learning classifiers. Neopepsee provides a rich annotation of candidate peptides with 87 immunogenicity-related values, including IC50, expression levels of neopeptides and immune regulatory genes (e.g., PD1, PD-L1), matched epitope sequences, and a three-level (high, medium, and low) call for neoantigen probability. Compared to the conventional methods, the performance was improved in sensitivity and especially 2- to 3-fold in the specificity. Tests with validated datasets and independently proven neoantigens confirmed the improved performance in melanoma and chronic lymphocytic leukemia. Additionally, we found sequence similarity in proteins to known pathogenic epitopes to be a novel feature in classification. Application of Neopepsee to 224 public stomach adenocarcinoma datasets predicted ∼7 neoantigens per patient, the burden of which was correlated with patient prognosis.
Conclusions
Neopepsee can detect neoantigen candidates with less false positives and be used to determine the prognosis of the patient. We expect that retrieval of neoantigen sequences with Neopepsee will help advance research on next-generation cancer immunotherapies, predictive biomarkers, and personalized cancer vaccines.
Keywords: Cancer, Classification, Immunoinformatics, Neoantigen, Next-generation sequencing
논문정보
관련 링크
연구자 ID
관련분야 연구자보기
소속기관 논문보기
관련분야 논문보기
해당논문 저자보기