Ensel Oh1,2,3, Yoon-La Choi1,2,4,5, Mi Jeong Kwon6,7, Ryong Nam Kim3, Yu Jin Kim1,2, Ji-Young Song1,2, Kyung Soo Jung1,5, Young Kee Shin3*
1 Laboratory of Cancer Genomics and Molecular Pathology, Samsung Biomedical Research Institute, Samsung Medical Center, Seoul, Korea, 2 Institute for Refractory Cancer Research, Samsung Medical Center, Seoul, Korea, 3 Department of Pharmacy, College of Pharmacy, Seoul National University, Seoul, Korea, 4 Department of Pathology, Samsung Medical Center, Sungkyunkwan University College of Medicine, Seoul, Korea, 5 Samsung Advanced Institute for Health Sciences & Technology, Sungkyunkwan University School of Medicine, Seoul, Korea, 6 College of Pharmacy, Kyungpook National University, Daegu, Korea, 7 Research Institute of Pharmaceutical Sciences, College of Pharmacy, Kyungpook National University, Daegu, Korea
Formalin fixing with paraffin embedding (FFPE) has been a standard sample preparation method for decades, and archival FFPE samples are still very useful resources. Nonetheless, the use of FFPE samples in cancer genome analysis using next-generation sequencing, which is a powerful technique for the identification of genomic alterations at the nucleotide level, has been challenging due to poor DNA quality and artificial sequence alterations. In this study, we performed whole-exome sequencing of matched frozen samples and FFPE samples of tissues from 4 cancer patients and compared the next-generation sequencing data obtained from these samples. The major differences between data obtained from the 2 types of sample were the shorter insert size and artificial base alterations in the FFPE samples. A high proportion of short inserts in the FFPE samples resulted in overlapping paired reads, which could lead to overestimation of certain variants; >20% of the inserts in the FFPE samples were double sequenced. A large number of soft clipped reads was found in the sequencing data of the FFPE samples, and about 30% of total bases were soft clipped. The artificial base alterations, C>T and G>A, were observed in FFPE samples only, and the alteration rate ranged from 200 to 1,200 per 1M bases when sequencing errors were removed. Although high-confidence mutation calls in the FFPE samples were compatible to that in the frozen samples, caution should be exercised in terms of the artifacts, especially for low-confidence calls. Despite the clearly observed artifacts, archival FFPE samples can be a good resource for discovery or validation of biomarkers in cancer research based on whole-exome sequencing.