한빛사논문
Seok-Ju Hahn,a,e Suhyeon Kim,a,e Young Sik Choi,c Junghye Lee,a,b,* and Jihun Kangd,**
aDepartment of Industrial Engineering, Ulsan National Institute of Science and Technology (UNIST), Ulsan 44919, Republic of Korea
bGraduate School of Artificial Intelligence, Ulsan National Institute of Science and Technology (UNIST), Ulsan 44919, Republic of Korea
cDivision of Endocrinology, Department of Internal Medicine, Kosin University College of Medicine, Kosin University Gospel Hospital, Busan 49267, Republic of Korea
dDepartment of Family Medicine, Kosin University College of Medicine, Kosin University Gospel Hospital, Busan 49267, Republic of Korea
eSeok-Ju Hahn and Suhyeon Kim equally contributed to the work.
*Corresponding author: Junghye Lee
**Corresponding author: Jihun Kang
Abstract
Background: Previous work on predicting type 2 diabetes by integrating clinical and genetic factors has mostly focused on the Western population. In this study, we use genome-wide polygenic risk score (gPRS) and serum metabolite data for type 2 diabetes risk prediction in the Asian population.
Methods: Data of 1425 participants from the Korean Genome and Epidemiology Study (KoGES) Ansan-Ansung cohort were used in this study. For gPRS analysis, genotypic and clinical information from KoGES health examinee (n = 58,701) and KoGES cardiovascular disease association (n = 8105) sub-cohorts were included. Linkage disequilibrium analysis identified 239,062 genetic variants that were used to determine the gPRS, while the metabolites were selected using the Boruta algorithm. We used bootstrapped cross-validation to evaluate logistic regression and random forest (RF)-based machine learning models. Finally, associations of gPRS and selected metabolites with the values of homeostatic model assessment of beta-cell function (HOMA-B) and insulin resistance (HOMA-IR) were further estimated.
Findings: During the follow-up period (8.3 ± 2.8 years), 331 participants (23.2%) were diagnosed with type 2 diabetes. The areas under the curves of the RF-based models were 0.844, 0.876, and 0.883 for the model using only demographic and clinical factors, model including the gPRS, and model with both gPRS and metabolites, respectively. Incorporation of additional parameters in the latter two models improved the classification by 11.7% and 4.2% respectively. While gPRS was significantly associated with HOMA-B value, most metabolites had a significant association with HOMA-IR value.
Interpretation: Incorporating both gPRS and metabolite data led to enhanced type 2 diabetes risk prediction by capturing distinct etiologies of type 2 diabetes development. An RF-based model using clinical factors, gPRS, and metabolites predicted type 2 diabetes risk more accurately than the logistic regression-based model.
논문정보
관련 링크
연구자 키워드
관련분야 연구자보기
관련분야 논문보기