Donghyo Kim 1†, Doyeon Ha 1†, Kwanghwan Lee 1, Heetak Lee 1, Inhae Kim 2, Sanguk Kim 1 3 4
1Department of Life Sciences.
2ImmunoBiome Inc., Pohang, South Korea.
3Artificial Intelligence Graduate Program, Pohang University of Science and Technology, Pohang 790-784, South Korea.
4Institute of Convergence Research and Education in Advanced Technology, Yonsei University, Seoul 120-149, South Korea.
†These authors contributed equally to this work.
Corresponding author: Sanguk Kim
Identifying cancer type-specific driver mutations is crucial for illuminating distinct pathologic mechanisms across various tumors and providing opportunities of patient-specific treatment. However, although many computational methods were developed to predict driver mutations in a type-specific manner, the methods still have room to improve. Here, we devise a novel feature based on sequence co-evolution analysis to identify cancer type-specific driver mutations and construct a machine learning (ML) model with state-of-the-art performance. Specifically, relying on 28 000 tumor samples across 66 cancer types, our ML framework outperformed current leading methods of detecting cancer driver mutations. Interestingly, the cancer mutations identified by sequence co-evolution feature are frequently observed in interfaces mediating tissue-specific protein-protein interactions that are known to associate with shaping tissue-specific oncogenesis. Moreover, we provide pre-calculated potential oncogenicity on available human proteins with prediction scores of all possible residue alterations through user-friendly website (http://sbi.postech.ac.kr/w/cancerCE). This work will facilitate the identification of cancer type-specific driver mutations in newly sequenced tumor samples.