한빛사논문
연세대학교, University of California, Berkeley
Abstract
Minseung Kima,b,1 and Sung-Hou Kima,b,c,2
aInstitute of Life Science and Technology and bDepartment of Integrative OMICS for Biomedical Sciences, Graduate School, Yonsei University, Seoul 120-749, Republic of Korea; and cDepartment of Chemistry, University of California, Berkeley, CA 94720
Abstract
An empirical approach is presented for predicting the genomic susceptibility of an individual to the most likely one among nine traits, consisting of eight major cancer classes plus a healthy trait. We use four prediction methods by applying two supervised learning algorithms to two different descriptors of common genomic variations (the profiles of genotypes of SNPs and SNP syntaxes with low P values or low frequencies) of each individual genome from normal cells. All four methods made correct predictions substantially better than random predictions for most cancer classes, but not for some others. A combination of the four results using Bayesian inference better predicted overall than any individual method. The multiclass accuracy of the combined prediction ranges from 33% to 56% depending on cancer classes of testing sets, compared with 11% for a random prediction among nine traits. Despite limited SNP data available and the absence of rare SNPs in public databases, at present, the results suggest that the framework of this approach or its improvement can predict cancer susceptibility with probability estimates useful for making health decisions for individuals or for a population.
genetic risk prediction, genomic risk prediction, cancer risk, multiclass prediction, cancer probability
1Present address: Department of Computer Science, Genome Center, University of California, Davis, CA 95616.
2To whom correspondence should be addressed.
Author contributions: M.K. and S.-H.K. designed research; M.K. and S.-H.K. performed research; M.K. contributed new reagents/analytic tools; M.K. and S.-H.K. analyzed data; and M.K. and S.-H.K. wrote the paper.
논문정보