한빛사논문
Hunyong Cho1†, Yixiang Qu2†, Chuwen Liu3, Boyang Tang4, Ruiqi Lyu5, Bridget M. Lin6, Jeffrey Roach7, M. Andrea Azcarate-Peril8, Apoena Aguiar Ribeiro9, Michael I. Love10, Kimon Divaris11 and Di Wu12
1Hunyong Cho was a PhD student at the Department of Biostatistics, Gillings School of Global Public Health at University of North Carolina at Chapel Hill, currently is at Amazon.
2Yixiang Qu is a PhD student at the Department of Biostatistics, Gillings School of Global Public Health at University of North Carolina at Chapel Hill.
3Chuwen Liu is a PhD student at the Department of Biostatistics, Gillings School of Global Public Health at University of North Carolina at Chapel Hill.
4Boyang Tang is a PhD student at the Department of Statistics, University of Connecticut.
5Ruiqi Lyu is a PhD student at the School of Computer Science, Carnegie Mellon University.
6Bridget Lin is a PhD student at the Department of Biostatistics, Gillings School of Global Public Health at University of North Carolina at Chapel Hill.
7Jeffrey Roach is a Senior Scientific Research Associate in Research Computing at University of North Carolina at Chapel Hill.
8M. Andrea Azcarate-Peril is an Associate Professor of in School of Medicine, and the Director of Microbiome Core at University of North Carolina at Chapel Hill.
9Apoena de Aguiar Ribeiro is an Associate Professor of Oral Microbiology in the Adam School of Dentistry at University of North Carolina at Chapel Hill.
10Michael I. Love is an Associate Professor of Genetics, Biostatistics, and Computer Science at the University of North Carolina at Chapel Hill.
11Kimon Divaris is a Professor in the Division of Pediatric and Public Health the Adam School of Dentistry and the Department of Epidemiology, at the University of North Carolina at Chapel Hill.
12Di Wu is an Associate Professor in the Department of Biostatistics and Division of Oral Craniofacial Health Science at the University of North Carolina at Chapel Hill.
†Hunyong Cho and Yixiang Qu are co-first authors.
*Corresponding author: correspondence to Di Wu
Abstract
Understanding the function of the human microbiome is important but the development of statistical methods specifically for the microbial gene expression (i.e. metatranscriptomics) is in its infancy. Many currently employed differential expression analysis methods have been designed for different data types and have not been evaluated in metatranscriptomics settings. To address this gap, we undertook a comprehensive evaluation and benchmarking of 10 differential analysis methods for metatranscriptomics data. We used a combination of real and simulated data to evaluate performance (i.e. type I error, false discovery rate and sensitivity) of the following methods: log-normal (LN), logistic-beta (LB), MAST, DESeq2, metagenomeSeq, ANCOM-BC, LEfSe, ALDEx2, Kruskal–Wallis and two-part Kruskal–Wallis. The simulation was informed by supragingival biofilm microbiome data from 300 preschool-age children enrolled in a study of childhood dental disease (early childhood caries, ECC), whereas validations were sought in two additional datasets from the ECC study and an inflammatory bowel disease study. The LB test showed the highest sensitivity in both small and large samples and reasonably controlled type I error. Contrarily, MAST was hampered by inflated type I error. Upon application of the LN and LB tests in the ECC study, we found that genes C8PHV7 and C8PEV7, harbored by the lactate-producing Campylobacter gracilis, had the strongest association with childhood dental disease. This comprehensive model evaluation offers practical guidance for selection of appropriate methods for rigorous analyses of differential expression in metatranscriptomics. Selection of an optimal method increases the possibility of detecting true signals while minimizing the chance of claiming false ones.
논문정보
관련 링크
연구자 키워드
연구자 ID
관련분야 연구자보기
소속기관 논문보기
관련분야 논문보기
해당논문 저자보기