한빛사논문
Shaherin Basith, Gwang Lee and Balachandran Manavalan
Corresponding authors : Gwang Lee, Balachandran Manavalan
Shaherin Basith is a research assistant professor in the Department of Physiology, Ajou University School of Medicine, Republic of Korea.
Gwang Lee is a professor in the Department of Physiology, Ajou University School of Medicine, Republic of Korea.
Balachandran Manavalan is an assistant professor in the Department of Physiology, Ajou University School of Medicine, Republic of Korea.
Abstract
Protein post-translational modification (PTM) is an important regulatory mechanism that plays a key role in both normal and disease states. Acetylation on lysine residues is one of the most potent PTMs owing to its critical role in cellular metabolism and regulatory processes. Identifying protein lysine acetylation (Kace) sites is a challenging task in bioinformatics. To date, several machine learning-based methods for the in silico identification of Kace sites have been developed. Of those, a few are prokaryotic species-specific. Despite their attractive advantages and performances, these methods have certain limitations. Therefore, this study proposes a novel predictor STALLION (STacking-based Predictor for ProkAryotic Lysine AcetyLatION), containing six prokaryotic species-specific models to identify Kace sites accurately. To extract crucial patterns around Kace sites, we employed 11 different encodings representing three different characteristics. Subsequently, a systematic and rigorous feature selection approach was employed to identify the optimal feature set independently for five tree-based ensemble algorithms and built their respective baseline model for each species. Finally, the predicted values from baseline models were utilized and trained with an appropriate classifier using the stacking strategy to develop STALLION. Comparative benchmarking experiments showed that STALLION significantly outperformed existing predictor on independent tests. To expedite direct accessibility to the STALLION models, a user-friendly online predictor was implemented, which is available at: http://thegleelab.org/STALLION.
Keywords : lysine acetylation sites, bioinformatics, stacking strategy, machine learning, feature optimization, performance assessment
논문정보
관련 링크
연구자 키워드
관련분야 연구자보기
소속기관 논문보기
관련분야 논문보기