Eunjung Lee1†, Han-Yu Chuang2,3†, Jong-Won Kim4, Trey Ideker2,3*, Doheon Lee1*
1 Department of Bio and Brain Engineering, KAIST, Daejeon, South Korea, 2 Bioinformatics Program, University of California San Diego, La Jolla, California, United States of America, 3 Department of Bioengineering, University of California San Diego, La Jolla, California, United States of America, 4 Department of Laboratory Medicine and Genetics, Sungkyunkwan University, School of Medicine, Samsung Medical Center, Seoul, South Korea
†These authors contributed equally to this work.
The advent of microarray technology has made it possible to classify disease states based on gene expression profiles of patients. Typically, marker genes are selected by measuring the power of their expression profiles to discriminate among patients of different disease states. However, expression-based classification can be challenging in complex diseases due to factors such as cellular heterogeneity within a tissue sample and genetic heterogeneity across patients. A promising technique for coping with these challenges is to incorporate pathway information into the disease classification procedure in order to classify disease based on the activity of entire signaling pathways or protein complexes rather than on the expression levels of individual genes or proteins. We propose a new classification method based on pathway activities inferred for each patient. For each pathway, an activity level is summarized from the gene expression levels of its condition-responsive genes (CORGs), defined as the subset of genes in the pathway whose combined expression delivers optimal discriminative power for the disease phenotype. We show that classifiers using pathway activity achieve better performance than classifiers based on individual gene expression, for both simple and complex case-control studies including differentiation of perturbed from non-perturbed cells and subtyping of several different kinds of cancer. Moreover, the new method outperforms several previous approaches that use a static (i.e., non-conditional) definition of pathways. Within a pathway, the identified CORGs may facilitate the development of better diagnostic markers and the discovery of core alterations in human disease.