Hansol Choi 1,8, Yeongjae Choi 2,7,8, Jaewon Choi3,4, Amos Chungwon Lee5, Huiran Yeom5, Jinwoo Hyun1, Taehoon Ryu 6,* and Sunghoon Kwon 1,2,3,5,*
1Department of Electrical and Computer Engineering, Seoul National University, Seoul, Republic of Korea. 2Nano Systems Institute, Seoul National University, Seoul, Republic of Korea. 3Interdisciplinary Program in Bioengineering, Seoul National University, Seoul, Republic of Korea. 4Integrated Major in Innovative Medical Science, Seoul National University, Seoul, Republic of Korea. 5Bio-MAX Institute, Seoul National University, Seoul, Republic of Korea. 6ATG Lifetech Inc., Seoul, Republic of Korea. 7Present address: Wyss Institute for Biologically Inspired Engineering, Harvard University, Boston, MA, USA. 8These authors contributed equally: Hansol Choi, Yeongjae Choi.
*Correspondence to Taehoon Ryu or Sunghoon Kwon.
Abstract
Complex oligonucleotide (oligo) libraries are essential materials for diverse applications in synthetic biology, pharmaceutical production, nanotechnology and DNA-based data storage. However, the error rates in synthesizing complex oligo libraries can be substantial, leading to increment in cost and labor for the applications. As most synthesis errors arise from faulty insertions and deletions, we developed a length-based method with single-base resolution for purification of complex libraries containing oligos of identical or different lengths. Our method—purification of multiplex oligonucleotide libraries by synthesis and selection—can be performed either step-by-step manually or using a next-generation sequencer. When applied to a digital data-encoded library containing oligos of identical length, the method increased the purity of full-length oligos from 83% to 97%. We also show that libraries encoding the complementarity-determining region H3 with three different lengths (with an empirically achieved diversity >106) can be simultaneously purified in one pot, increasing the in-frame oligo fraction from 49.6% to 83.5%.