Jae Hoon Sul1, 2, Towfique Raj2, 3, 4, Simone de Jong5, Paul I.W. de Bakker6, Soumya Raychaudhuri1, 2, 7, 8, Roel A. Ophoff5, 9, 10, Barbara E. Stranger11, 12, Eleazar Eskin10, 13,*, Buhm Han14, 15,*
1 Division of Genetics, Brigham and Women’s Hospital, Harvard Medical School, Harvard University, Boston, MA 02115, USA
2 Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
3 Harvard Medical School, Harvard University, Boston, MA 02115, USA
4 Program in Translational NeuroPsychiatric Genomics, Institute for the Neurosciences, Department of Neurology, Brigham and Women’s Hospital, Boston, MA 02115, USA
5 Center for Neurobehavioral Genetics, Semel Institute for Neuroscience and Behavior, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA 90095, USA
6 Departments of Epidemiology and Medical Genetics, University Medical Center Utrecht, Utrecht 3584 CG, the Netherlands
7 Arthritis Research UK Epidemiology Unit, Musculoskeletal Research Group, University of Manchester, Manchester Academic Health Sciences Centre, Manchester M13 9PT, UK
8 Division of Rheumatology, Brigham and Women’s Hospital, Harvard Medical School, Harvard University, Boston, MA 02115, USA
9 Brain Center Rudolf Magnus, Department of Psychiatry, University Medical Center Utrecht, Utrecht 3584 CG, the Netherlands
10 Department of Human Genetics, University of California, Los Angeles, Los Angeles, CA 90095, USA
11 Section of Genetic Medicine, Department of Medicine, University of Chicago, Chicago, IL 60637, USA
12 Institute for Genomics and Systems Biology, University of Chicago, Chicago, IL 60637, USA
13 Computer Science Department, University of California, Los Angeles, Los Angeles, CA 90095, USA
14 Asan Institute for Life Sciences, Asan Medical Center, Seoul 138-736, Republic of Korea
15 Department of Medicine, University of Ulsan College of Medicine, Seoul 138-736, Republic of Korea
*Corresponding author : Eleazar Eskin, Buhm Han
In studies of expression quantitative trait loci (eQTLs), it is of increasing interest to identify eGenes, the genes whose expression levels are associated with variation at a particular genetic variant. Detecting eGenes is important for follow-up analyses and prioritization because genes are the main entities in biological processes. To detect eGenes, one typically focuses on the genetic variant with the minimum p value among all variants in cis with a gene and corrects for multiple testing to obtain a gene-level p value. For performing multiple-testing correction, a permutation test is widely used. Because of growing sample sizes of eQTL studies, however, the permutation test has become a computational bottleneck in eQTL studies. In this paper, we propose an efficient approach for correcting for multiple testing and assess eGene p values by utilizing a multivariate normal distribution. Our approach properly takes into account the linkage-disequilibrium structure among variants, and its time complexity is independent of sample size. By applying our small-sample correction techniques, our method achieves high accuracy in both small and large studies. We have shown that our method consistently produces extremely accurate p values (accuracy > 98%) for three human eQTL datasets with different sample sizes and SNP densities: the Genotype-Tissue Expression pilot dataset, the multi-region brain dataset, and the HapMap 3 dataset.