Sangtae Kim1,6, Konrad Scheffler1,6, Aaron L. Halpern1, Mitchell A. Bekritsky2, Eunho Noh1, Morten Källberg2,4, Xiaoyu Chen1, Yeonbin Kim1, Doruk Beyter3,5, Peter Krusche2 and Christopher T. Saunders1,*
1Illumina, Inc., San Diego, CA, USA. 2Illumina Cambridge Ltd, Essex, UK. 3Department of Computer Science and Engineering, University of California, San Diego, La Jolla, CA, USA. 4Present address: Seven Bridges Genomics, London, UK. 5Present address: deCODE Genetics/Amgen, Inc., Reykjavik, Iceland.
6These authors contributed equally: Sangtae Kim and Konrad Scheffler.
*To whom correspondence may be addressed.
We describe Strelka2 (https://github.com/Illumina/strelka), an open-source small-variant-calling method for research and clinical germline and somatic sequencing applications. Strelka2 introduces a novel mixture-model-based estimation of insertion/deletion error parameters from each sample, an efficient tiered haplotype-modeling strategy, and a normal sample contamination model to improve liquid tumor analysis. For both germline and somatic calling, Strelka2 substantially outperformed the current leading tools in terms of both variant-calling accuracy and computing cost.