Chang Keun Kang 1, Jihoon Shin 1, YoonKyung Cha 1, Min Sun Kim, Min Sun Choi, TaeHo Kim 2, Young-Kwon Park, Yong Jun Choi
School of Environmental Engineering, University of Seoul, Seoul 02504, Republic of Korea
1These authors contributed equally to this work.
2Present address: Department of Civil and Environmental Engineering, University of Michigan, Ann Arbor, MI, USA.
Corresponding author: Yong Jun Choi
The process of designing streamlined workflows for developing microbial strains using classical methods from vast amounts of biological big data has reached its limits. With the continuous increase in the amount of biological big data, data-driven machine learning approaches are being used to overcome the limits of classical approaches for strain development. Here, machine learning-guided engineering of Deinococcus radiodurans R1 for high-yield production of lycopene was demonstrated. The multilayer perceptron models were first trained using the mRNA expression levels of the key genes along with lycopene titers and yields obtained from 17 strains. Then, the potential overexpression targets from 2,047 possible combinations were predicted by the multilayer perceptron combined with a genetic algorithm. Through the machine learning-aided fine-tuning of the predicted genes, the final-engineered LY04 strain resulted in an 8-fold increase in the lycopene production, up to 1.25 g/L from glycerol, and a 6-fold increase in the lycopene yield.