한빛사논문
Chanwoo Kim1,2,†, Hanbin Lee3,†, Juhee Jeong4, Keehoon Jung4,5,6,* and Buhm Han4,7,*
1Department of Electrical and Computer Engineering, Seoul National University, Seoul, Republic of Korea, 2Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, WA, USA, 3Department of Medicine, Seoul National University College of Medicine, Seoul, Republic of Korea, 4Department of Biomedical Sciences, BK21 Plus Biomedical Science Project, Seoul National University College of Medicine, Seoul, Republic of Korea, 5Department of Anatomy and Cell Biology, Seoul National University College of Medicine, Seoul, Republic of Korea, 6Institute of Allergy and Clinical Immunology, Seoul National University Medical Research Center, Seoul, Republic of Korea and 7Interdisciplinary Program in Bioengineering, Seoul National University, Seoul, Republic of Korea
*To whom correspondence should be addressed.
†These authors contributed equally
Abstract
The standard analysis pipeline for single-cell RNA-seq data consists of sequential steps initiated by clustering the cells. An innate limitation of this pipeline is that an imperfect clustering result can irreversibly affect the succeeding steps. For example, there can be cell types not well distinguished by clustering because they largely share the global structure, such as the anterior primitive streak and mid primitive streak cells. If one searches differentially expressed genes (DEGs) solely based on clustering, marker genes for distinguishing these types will be missed. Moreover, clustering depends on many parameters and can often be subjective to manual decisions. To overcome these limitations, we propose MarcoPolo, a method that identifies informative DEGs independently of prior clustering. MarcoPolo sorts out genes by evaluating if the distributions are bimodal, if similar expression patterns are observed in other genes, and if the expressing cells are proximal in a low-dimensional space. Using real datasets with FACS-purified cell labels, we demonstrate that MarcoPolo recovers marker genes better than competing methods. Notably, MarcoPolo finds key genes that can distinguish cell types that are not distinguishable by the standard clustering. MarcoPolo is built in a convenient software package that provides analysis results in an HTML file.
논문정보
관련 링크
관련분야 연구자보기
소속기관 논문보기
관련분야 논문보기