Chemoinformatics
Chemoinformatics refers to use of physicochemical properties of molecules-in-interest with computer and "in silico" techniques to find out drugable hit compounds of target disease. Such in silico techniques are used to aid and inform the process of drug discovery, in the design of well-defined combinatorial libraries of synthetic compounds, or to assist in structure based drug design.
Encoding chemical structures
Molecules cannot be fed into machine learning tools without encoding them. The chemical structure needs to be transformed into a numerical description of the molecule to develop mathematical models that relate chemical structures to biological activities. The mathematical disciplines of graph theory and geometry, among others, provide techniques to encode molecules. The resulting numerical representation is called molecular descriptor. Molecular descriptors can be used for a number of predictive modeling tasks such as virtual high-throughput screening, visualizing chemical libraries, the analysis of quantitative structure-activity relationships, and for predicting a molecule’s target structure.
Development and validation of chemoinformatic models
Most machine learning techniques can be used to develop chemoinformatic models. The employed molecular descriptor is of utmost importance for the successful predictive modeling. If the numerical description of the molecule is unsuitable for the purpose, good results are rather unlikely. Since molecular descriptors are mostly complex and high dimensional descriptions of chemical molecules, data analysis may be prone to chance correlation and overfitting. Rigorous validation and assessment of the resulting models is therefore essential to exclude seemingly good models that would perform badly in the productive phase of the model.
Molecular Modeling
Molecular modelling encompasses all methods, theoretical and computational, used to model or mimic the behaviour of molecules. The methods are used in the fields of computational chemistry, drug design, computational biology and materials science to study molecular systems ranging from small chemical systems to large biological molecules and material assemblies. Various strategies exist in Molecular Modeling, for example, structure-based drug design (SBDD) using the three-dimensional structure of proteins, ligand-based drug design (LBDD) using the pharmacophoric features of ligand, fragment-based drug design (FBDD) and Denovo Design. An appropriate strategy must be selected according to the target protein. We mainly develop inhibitors for protein kinase that mediates signaling by phosphorylation in living organisms.
When a ligand binds to a disease-related target protein, entropy increases as water molecules in the binding site go out. This is called the hydrophobic effect, and the protein-ligand is bound by the thermodynamic process of these water molecules. In the past, computer-based drug design (CADD) research methods considered only proteins or ligands. But recently, studies consider the physicochemical properties of these water molecules more actively.
However most known algorithms require calculation of the total free energy of the system and research is concentrated only on the inner surface of the protein binding site (first layer). Topological water network (TWN) considered that water molecules can be quantified without free energy calculation. TWN can consider not only the protein surface but also the inner (second layer). TWN consists of hydrogen bonds in ring-shaped polygons such as 3-ring, 4-ring, 5-ring, and 6-ring formed among water molecules.
Employing the concept of TWN, we have optimized compounds for drug repurposing and selectivity. We are also developing methods that use TWN for predicting binding site similarity (TWN-RENCOD) and performing fragment screening (TWN-FS). We aim to further develop this into a methodology applicable to compound optimization and screening.
TWN-RENCOD
TWN-FS method