Namjin Cho1,*, Byungjin Hwang1,*, Jung-ki Yoon2,*, Sangun Park3,*, Joongoo Lee4,*, Han Na Seo1, Jeewon Lee1, Sunghoon Huh1, Jinsoo Chung5 & Duhee Bang1
1 Department of Chemistry, Yonsei University, Seoul 120-749, Republic of Korea. 2 College of Medicine, Seoul National University, Seoul 110-744, Republic of Korea. 3 Department of Chemistry, Korea Military Academy, Seoul 139-799, Republic of Korea. 4 Department of Chemistry, University of Oxford, Oxford OX1 3TA, UK. 5 Department of Chemical and Biomolecular Engineering, University of California, Berkeley, California 94720-1462, USA.
* These authors contributed equally to this work.
Correspondence to : Duhee Bang
Abstract
Interpreting epistatic interactions is crucial for understanding evolutionary dynamics of complex genetic systems and unveiling structure and function of genetic pathways. Although high resolution mapping of en masse variant libraries renders molecular biologists to address genotype-phenotype relationships, long-read sequencing technology remains indispensable to assess functional relationship between mutations that lie far apart. Here, we introduce JigsawSeq for multiplexed sequence identification of pooled gene variant libraries by combining a codon-based molecular barcoding strategy and de novo assembly of short-read data. We first validate JigsawSeq on small sub-pools and observed high precision and recall at various experimental settings. With extensive simulations, we then apply JigsawSeq to large-scale gene variant libraries to show that our method can be reliably scaled using next-generation sequencing. JigsawSeq may serve as a rapid screening tool for functional genomics and offer the opportunity to explore evolutionary trajectories of protein variants.