1. Can you please briefly summarize the paper?
Hello, I’m Nhat Truong Pham, a first-year Ph.D. student working under Prof. Balachandran Manavalan at the Computational Biology and Bioinformatics Laboratory (CBBL-SKKU: https://balalab-skku.org/), Department of Integrative Biotechnology at Sungkyunkwan University. First of all, I’m grateful to the Biological Research Information Center (BRIC) for providing me with this exceptional opportunity to introduce our research and laboratory to other scientists, students, and all potential readers of this interview. In most sections, I will use “we” and “our” to acknowledge the overall team effort behind the publication in this high-impact journal.
In recent years, the emergence of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) has generated significant concern and posed a significant challenge to global health. Phosphorylation is a common post-translational modification that affects many vital cellular functions and is closely associated with SARS-CoV-2 infection. Hence, precise identification of phosphorylation sites could provide more insights into the processes underlying SARS-CoV-2 infection and alleviate the ongoing coronavirus disease 2019 (COVID-19) crisis. At present, there are no accurate and effective computational tools available for predicting these sites.
(Overview of MeL-STPhos framework proposed in the paper. Created with BioRender.com)
In this study, we initially demonstrated that SARS-CoV-2 infection induces alterations in phosphorylation, as evidenced by bioinformatics analysis of A549 cells infected with the virus. Subsequently, we designed an innovative meta-learning model, Meta-Learning for Serine (S)/Threonine (T) Phosphorylation (MeL-STPhos), to precisely identify protein phosphorylation sites. We constructed three different S/T datasets, including A549 cells, Vero E6 cells, and a combination of A549 and Vero E6 cells. A comprehensive review of 29 unique sequence-derived features was initially conducted for each dataset, followed by the establishment of prediction models for each dataset using 14 renowned machine learning (ML) algorithms, ranging from traditional classifiers to advanced deep learning (DL). Subsequently, the most effective model for each feature was selected by integrating the predicted values. Rigorous feature selection strategies were employed to identify the optimal base models and classifiers for each cell-specific dataset. To the best of our knowledge, this is the first study to report two cell-specific models (A549 and Vero E6 cells) and a generic model for phosphorylation site prediction by utilizing an extensive range of sequence-derived features and ML algorithms. A case study was also conducted to identify S/T phosphorylation sites in lung fibroblasts (IMR-90 infected with type 2 adenovirus). Extensive cross-validation and independent testing revealed that MeL-STPhos surpasses existing state-of-the-art tools for phosphorylation site prediction. We also developed a publicly accessible platform at https://balalab-skku.org/MeL-STPhos. Currently, there are no accurate models for predicting SARS-CoV-2 phosphorylation modification sites, but our study provides an extensive approach that may serve as a starting point for future research. MeL-STPhos could prove useful in complementing wet laboratory experiments for identifying SARS-CoV-2 phosphorylation modification sites, uncovering associated biological functions, and facilitating a variety of sequence-focused analyses.
2. Can you please tell us the main difficulties you had in the laboratory work and how you overcame them?
The main challenges we encountered initially revolved around the lack of diverse datasets across various cell lines and the limited computational resources. The development of a robust ML model applicable to real-world scenarios required integrating data from different cell lines affected by SARS-CoV-2. Subsequently, substantial computational power was indispensable to execute a comprehensive optimization procedure in pursuit of the most effective predictive model.
To overcome these challenges, we conducted an extensive literature review to identify existing and potential datasets. Fortunately, we discovered an experimentally validated dataset of Vero E6 cells published in the Lancet journal, which provided experimentally validated phosphorylation sites. Furthermore, the SKKU Supercomputing Center provided invaluable support for our training and optimization processes through its high-performance computing system, which was pivotal in the development of MeL-STPhos.
3. Please introduce your laboratory, university or organization to bio-researchers in Korea.
The Department of Integrative Biotechnology, part of the distinguished College of Biotechnology and Bioengineering at Sungkyunkwan University is renowned for its academic excellence. This department embodies a multidisciplinary approach, encompassing a diverse array of fields including biology, biotechnology, bioengineering, bioinformatics, environmental microbiology, and biomaterials. All faculty members maintain dynamic research programs, as well as actively engage in teaching graduate students in classrooms and laboratories. Moreover, the robustness and standing of these research programs significantly contribute to the quality of the graduate program. By integrating advanced research methodologies and contemporary concepts into the curriculum, graduate students gain not only a thorough understanding of these methods but also the art of scholarly writing. Researchers are encouraged to publish papers in leading international conferences and high-impact journals.
The Computational Biology and Bioinformatics Laboratory, supervised by Prof. Balachandran Manavalan at the Department of Integrative Biotechnology, Sungkyunkwan University, has been established since 2022. Our laboratory stands as a pioneering research hub specializing in applied cutting-edge artificial intelligence, ML, and DL, specifically aimed at tackling emerging challenges in computational biology and bioinformatics. We conduct research in the following areas, but not limited to:
- Prediction of DNA regulatory elements, such as nucleosome, origin of replications, transcription
start sites and promoter.
- Designing computational methods to predict post-transcriptional (RNA) modification sites, RNA
subcellular localization, and RNA splicing.
- Designing computational methods to predict post-replication (DNA) modification sites.
- Developing ML and DL methods to identify protein function and type.
- Developing ML and DL methods to identify peptide therapeutic function.
- Developing ML and DL models aimed at predicting the zoonotic potential of the Influenza A
virus and other related viruses.
- Pre-training foundation language models (DNA/RNA/peptide/protein) and fine-tuning them for
several downstream tasks, such as post- transcriptional (RNA) modification sites, RNA
subcellular localization and RNA splicing, post-replication (DNA) modification sites,
protein function and type, and peptide therapeutic function.
- Simulation of biomolecular systems.
4. Please tell us your experiences and your thoughts related to research activities abroad.
Pursuing my Ph.D. at Sungkyunkwan University has been a highly rewarding experience. Joining the CBBL-SKKU has provided me with a unique opportunity to collaborate with top research scientists in computational biology and bioinformatics. Engaging in collaborative projects with respected professors and scientists has not only expanded my knowledge but also offered valuable insights. Moreover, being part of the international environment at CBBL-SKKU has given me a unique perspective on various cultures. The experiences and opportunities I have gained thus far will definitely help me build a solid career path in the future.
5. Can you provide some advice for younger scientists who have plans to study abroad?
Despite being in my first year of the Ph.D. program at Sungkyunkwan University (second semester), I have been actively engaged in research and collaborative projects for over three years in Vietnam. Based on my experiences, I would like to share suggestions with aspiring young scientists and students planning to study abroad, whether in Korea or elsewhere in the world.
First and foremost, an individual needs to have an open mind and cultivate a strong research attitude. This refers to approaching research with a positive mindset, being open to new ideas, methodologies, and perspectives. As part of the research process, an individual must be curious, open to exploring different options, and have the ability to adapt to changing circumstances or findings as they arise. It is about maintaining a proactive and inquisitive stance towards problem-solving.
Secondly, passion and patience are essential virtues in the realm of research. Passion is the driving force behind sustained efforts in research. It is the enthusiasm and deep interest in the subject that fuels motivation. However, research often requires patience as well. Results might not come quickly, experiments may fail, and the process can be slow. Forging ahead despite challenges requires patience, iterating on ideas, and persevering through setbacks.
I trust that these suggestions will aid aspiring young scientists and students in preparing thoroughly for their journey to study abroad. If you embody all the qualities mentioned above, I am confident that you will evolve into successful research scientists, achieving numerous milestones throughout your academic journey.
6. Future plan?
In future studies, we plan to expand the exploration of phosphorylation sites across different SARS-CoV-2 variants, including Alpha, Beta, Delta, Gamma, and Omicron. We then aim to integrate the expanded data into our current computational framework or potentially develop cutting-edge DL-based methods, such as Siamese network-based contrastive learning, transformers, and large language models. Furthermore, we anticipate the construction of a cell-specific model based on future data. This strategy will pave the way for improving predictive accuracy and model application, ultimately leading to a more comprehensive and reliable understanding of phosphorylation sites.
7. Do you have anything else that you would like to tell Korean scientists and students?
Once again, I want to express my gratitude to BRIC for providing me with the valuable opportunity to showcase my research achievements and introduce our laboratory to Korean scientists and students. While I have offered some advice for aspiring young scientists and students considering studying abroad, I would like to share some special messages, particularly with Korean scientists and students. To Korean scientists, let’s actively share our knowledge and experiences to inspire and mentor younger scientists and students. As Vaughn K. Lauer said, “The best part of learning is sharing what you know.” To students, embrace an active approach to learning and be open to exploring new knowledge. Never feel hesitant to ask questions - remember, the more you inquire, the more you expand your understanding. Ultimately, I hope that we can collectively advance life sciences research beyond BRIC to a global audience.
8. Other things you would like to say.....
I extend my heartfelt gratitude to all the individuals who contributed to this research endeavor. First and foremost, I am deeply indebted to Prof. Balachandran Manavalan, whose guidance and support were invaluable throughout this project. Our collaboration felt more like a partnership, where discussions flowed freely, and our joint effort led to this significant achievement. I owe a special debt of thanks to Le Thi Phan, whose equal dedication and contributions were pivotal in shaping this paper. Her seamless collaboration during the revisions greatly enhanced the quality of our work. Without the mentorship of Professor Balachandran and the equal partnership of Le, achieving publication in this high-impact journal would not have been possible.
I also wish to express my appreciation to Prof. Young-Jun Jeon and his students, Jimin Seo and Yeonwoo Kim, for their involvement in the bioinformatics analysis, which enriched our study. A special mention goes to Nattanong Bupi for collaborating with Le to create an exceptionally insightful and visually captivating figure depicting the proposed MeL-STPhos framework.
Lastly, but not least, I extend my gratitude to my family, labmates, and friends for their unwavering support. Their assistance was instrumental in ensuring my mental well-being throughout this challenging and pioneering research journey in the field of computational biology and bioinformatics.
# Machine learning
# Integrative biotechnology