한빛사논문
Euijin Seo 1, Yun-Nam Choi 1, Ye Rim Shin 1, Donghyuk Kim 2,3, Jeong Wook Lee 1,4,5*
1Department of Chemical Engineering, Pohang University of Science and Technology (POSTECH), 77 Cheongam-Ro, Nam-Gu, Pohang, Gyeongbuk37673, Korea.
2School of Energy and Chemical Engineering, Ulsan National Institute of Science and Technology (UNIST), 50 UNIST-Gil, Eonyang-Eup, Ulsan44919, Korea.
3Department of Energy Engineering, Ulsan National Institute of Science and Technology (UNIST), 50 UNIST-Gil, Eonyang-Eup, Ulsan44919, Korea.
4School of Interdisciplinary Bioscience and Bioengineering, Pohang University of Science and Technology (POSTECH), 77 Cheongam-Ro, Nam-Gu, Pohang, Gyeongbuk 37673, Korea.
5Graduate School of Artificial Intelligence, Pohang University of Science and Technology (POSTECH), 77 Cheongam-Ro, Nam-Gu, Pohang, Gyeongbuk37673, Korea.
The authors wish it to be known that, in their opinion, the first two authors should be regarded as Joint First Authors.
*To whom correspondence should be addressed. : Jeong Wook Lee
Abstract
Deep generative models, which can approximate complex data distribution from large datasets, are widely used in biological dataset analysis. In particular, they can identify and unravel hidden traits encoded within a complicated nucleotide sequence, allowing us to design genetic parts with accuracy. Here, we provide a deep-learning based generic framework to design and evaluate synthetic promoters for cyanobacteria using generative models, which was in turn validated with cell-free transcription assay. We developed a deep generative model and a predictive model using a variational autoencoder and convolutional neural network, respectively. Using native promoter sequences of the model unicellular cyanobacterium Synechocystis sp. PCC 6803 as a training dataset, we generated 10 000 synthetic promoter sequences and predicted their strengths. By position weight matrix and k-mer analyses, we confirmed that our model captured a valid feature of cyanobacteria promoters from the dataset. Furthermore, critical subregion identification analysis consistently revealed the importance of the -10 box sequence motif in cyanobacteria promoters. Moreover, we validated that the generated promoter sequence can efficiently drive transcription via cell-free transcription assay. This approach, combining in silico and in vitro studies, will provide a foundation for the rapid design and validation of synthetic promoters, especially for non-model organisms.
논문정보
관련 링크
연구자 키워드
연구자 ID
관련분야 연구자보기
소속기관 논문보기
관련분야 논문보기
해당논문 저자보기