한빛사논문
Gahee Park1,2†, Joo Kyung Park3†, Seung-Ho Shin1,4†, Hyo-Jeong Jeon1, Nayoung K. D. Kim1, Yeon Jeong Kim1, Hyun-Tae Shin1, Eunjin Lee1, Kwang Hyuck Lee3,4, Dae-Soon Son1, Woong-Yang Park1,4,5* and Donghyun Park1*
1Samsung Genome Institute, Samsung Medical Center, Seoul 06351, Korea. 2Department of Biomedical Sciences, Seoul National University College of Medicine, Seoul 03080, Korea. 3Department of Medicine, Samsung Medical Center, Sungkyunkwan University School of Medicine, Seoul 06351, Korea. 4Department of Health Sciences and Technology, Samsung Advanced Institute for Health Sciences & Technology, Sungkyunkwan University, Seoul 06351, Korea. 5Department of Molecular Cell Biology, Sungkyunkwan University School of Medicine, Suwon 16419, Korea.
* Correspondence: Donghyun Park, Woong-Yang Park
†Equal contributors
Abstract
Background
Targeted deep sequencing is increasingly used to detect low-allelic fraction variants; it is therefore essential that errors that constitute baseline noise and impose a practical limit on detection are characterized. In the present study, we systematically evaluate the extent to which errors are incurred during specific steps of the capture-based targeted sequencing process.
Results
We removed most sequencing artifacts by filtering out low-quality bases and then analyze the remaining background noise. By recognizing that plasma DNA is naturally fragmented to be of a size comparable to that of mono-nucleosomal DNA, we were able to identify and characterize errors that are specifically associated with acoustic shearing. Two-thirds of C:G > A:T errors and one quarter of C:G > G:C errors were attributed to the oxidation of guanine during acoustic shearing, and this was further validated by comparative experiments conducted under different shearing conditions. The acoustic shearing step also causes A > G and A > T substitutions localized to the end bases of sheared DNA fragments, indicating a probable association of these errors with DNA breakage. Finally, the hybrid selection step contributes to one-third of the remaining C:G > A:T and one-fifth of the C > T errors.
Conclusions
The results of this study provide a comprehensive summary of various errors incurred during targeted deep sequencing, and their underlying causes. This information will be invaluable to drive technical improvements in this sequencing method, and may increase the future usage of targeted deep sequencing methods for low-allelic fraction variant detection.
Keywords : Next-generation sequencing ; Targeted deep sequencing ; Substitution rate ; Background error ; DNA fragmentation ; Plasma DNA
논문정보
관련 링크
연구자 키워드
연구자 ID
관련분야 연구자보기
소속기관 논문보기
관련분야 논문보기