Minkyung Baek 1, Ryan McHugh 2,3, Ivan Anishchenko 2,3, Hanlun Jiang 4, David Baker 2,3,5 & Frank DiMaio 2,3,*
1School of Biological Sciences, Seoul National University, Seoul, Republic of Korea.
2Department of Biochemistry, University of Washington, Seattle, WA, USA.
3Institute for Protein Design, University of Washington, Seattle, WA, USA.
4Department of Electrical Engineering and Computer Sciences, University of California, Berkeley, CA, USA.
5Howard Hughes Medical Institute, University of Washington, Seattle, WA, USA.
*Corresponding author: correspondence to Frank DiMaio
Protein–RNA and protein–DNA complexes play critical roles in biology. Despite considerable recent advances in protein structure prediction, the prediction of the structures of protein–nucleic acid complexes without homology to known complexes is a largely unsolved problem. Here we extend the RoseTTAFold machine learning protein-structure-prediction approach to additionally predict nucleic acid and protein–nucleic acid complexes. We develop a single trained network, RoseTTAFoldNA, that rapidly produces three-dimensional structure models with confidence estimates for protein–DNA and protein–RNA complexes. Here we show that confident predictions have considerably higher accuracy than current state-of-the-art methods. RoseTTAFoldNA should be broadly useful for modeling the structure of naturally occurring protein–nucleic acid complexes, and for designing sequence-specific RNA and DNA-binding proteins.