Myungjae Song1,2,9, Hui Kwon Kim 1,2,3,4,9, Sungtae Lee1,9, Younggwang Kim1,2, Sang-Yeon Seo1,2, Jinman Park1,2, Jae Woo Choi1, Hyewon Jang1,2, Jeong Hong Shin1,2, Seonwoo Min5, Zhejiu Quan6, Ji Hun Kim6, Hoon Chul Kang6, Sungroh Yoon 5,7 and Hyongbum Henry Kim 1,2,3,4,8,*
1Department of Pharmacology, Yonsei University College of Medicine, Seoul, Republic of Korea. 2Brain Korea 21 Plus Project for Medical Sciences, Yonsei University College of Medicine, Seoul, Republic of Korea. 3Center for Nanomedicine, Institute for Basic Science, Seoul, Republic of Korea. 4Graduate Program of Nano Biomedical Engineering, Advanced Science Institute, Yonsei University, Seoul, Republic of Korea. 5Electrical and Computer Engineering, Seoul National University, Seoul, Republic of Korea. 6Division of Pediatric Neurology, Department of Pediatrics, Severance Children’s Hospital, Yonsei University College of Medicine, Seoul, Republic of Korea. 7Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul, Republic of Korea. 8Severance Biomedical Science Institute, Yonsei University College of Medicine, Seoul, Republic of Korea. 9These authors contributed equally: Myungjae Song, Hui Kwon Kim, Sungtae Lee.
Base editors, including adenine base editors (ABEs)1 and cytosine base editors (CBEs)2,3, are widely used to induce point mutations. However, determining whether a specific nucleotide in its genomic context can be edited requires time-consuming experiments. Furthermore, when the editable window contains multiple target nucleotides, various genotypic products can be generated. To develop computational tools to predict base-editing efficiency and outcome product frequencies, we first evaluated the efficiencies of an ABE and a CBE and the outcome product frequencies at 13,504 and 14,157 target sequences, respectively, in human cells. We found that there were only modest asymmetric correlations between the activities of the base editors and Cas9 at the same targets. Using deep-learning-based computational modeling, we built tools to predict the efficiencies and outcome frequencies of ABE- and CBE-directed editing at any target sequence, with Pearson correlations ranging from 0.50 to 0.95. These tools and results will facilitate modeling and therapeutic correction of genetic diseases by base editing.