Artificial intelligence - Machine learning vs Deep Learning
GPCRs have long been recognized as important drug targets, and numerous drugs that modulate GPCRs are already in clinical use for various diseases. The integration of AI in GPCR drug discovery has the potential to accelerate the identification and development of novel drugs, optimize drug design processes, and enhance the understanding of GPCR biology.
High-throughput approaches used in drug discovery create large datasets regarding ligand synthesis and screening, ligand binding assays, signaling assays, cell imaging, protein structure determination, and omics applications, which can be analyzed by an established AI framework which involves three stages: 1) feature extraction or pattern identification; 2) vector space construction and metric definition where data is classified and compared; and 3) detection, prediction, or generation (e.g. prediction of a protein structure, or ligand design).
Machine learning (ML) and deep learning (DL) are subfields of artificial intelligence (AI) that involve training algorithms to learn from data. While they share similarities, there are notable differences in respect to neural network architecture, complexity, data requirements, training time, computational resources and interpretability. ML algorithms typically use traditional ML models, such as decision trees and support vector machines which require handcrafted feature engineering, where domain experts manually select and engineer relevant features from the input data. While classical ML models are effective for datasets for which the relevant features are well understood, their use for datasets for which the relevant features of the input data are unknown, such as in drug discovery, is limited - the solution is DL. DL algorithms employ artificial neural networks with multiple layers of interconnected nodes which learn hierarchical representations of the data, eliminating the need for explicit feature engineering and allowing the exploitation of features that would not typically occur using classical ML algorithms. However, DL algorithms typically require large amounts of labelled training data comprising millions of parameters, becoming infeasible for a human to interpret, in contrast with alternative ML models which comprise handcrafted features and simpler models. For this reason, DL models generally require more computational resources (such as powerful GPUs) and longer training times.
Artificial intelligence in GPCR drug discovery
The use of AI in GPCR drug discovery has increased over the last decade and is revolutionising the way new GPCR-targeted drugs are developed. AI is providing a dramatic acceleration of the drug discovery process at multiple stages:
1. Classification: AI models can be used to distinguish GPCRs from non-GPCRs, and to classify GPCRs into families, subfamilies, sub-subfamilies, and subtypes. The models use a variety of data for the input, including amino acid sequences or structural data (from X-ray crystallography, cryoEM or molecular dynamics simulation experiments).
2. Mutations: ML methods can determine stabilising mutations that enable structure determination and can also predict the effect of mutagenesis on GPCR function.
3. Structure: the development of algorithms such DeepMind’s Alphafold2 (Jumper et al., 2021) and RoseTTAFold (Baek et al., 2021) allow the prediction of the 3D structure of a protein from the amino acid sequence even where no similar structure is known.
4. GPCR-ligand interactions: ML can predict GPCR-ligand interactions based on input data of protein sequences and molecular fingerprints coming from databases such as the GPCR-Ligand Association (GLASS), bindingDB and DrugBank. One major challenge is the identification of receptor subtype-selective ligands. In this context, BRS-3D was used to predict subtype-selective ligands for dopamine receptors and adenosine receptors (He, Ben, Kuang, Wang & Kong, 2016; Kuang, Feng, Hu, Wang, He & Kong, 2016).
5. Virtual screening: molecular docking and virtual screening can efficiently analyze large databases of compounds and predict their binding affinity to GPCRs, aiding the identification of lead compounds for further development. Paremeters such as ligand affinity for the receptor (pKi), the ability of a ligand to induce or inhibit a cellular response (pEC50 or pIC50, respectively), and how long the ligand remains bound to the receptor (koff), are used to short-list candidate drugs for further assessment.
6. De novo drug design: AI algorithms can generate new molecules with desired properties, such as binding affinity and selectivity to specific GPCRs.
7. Predicting GPCR properties: AI models can predict various properties of GPCRs, such as ligand binding sites (orthosteric, allosteric), activation mechanisms, and conformational changes.
8. Multi-target drug design: GPCRs are often involved in complex signaling networks. AI algorithms can integrate data from multiple sources, including genomics, proteomics, and pharmacological data, to identify potential drug targets within the GPCR signaling pathways.
9. Side effect prediction and clinical responses: AI can predict potential off-target effects and adverse drug reactions associated with GPCR-targeted drugs. With the revolution of biased signalling of GPCRs comes the possibility of designing drugs that selectively activate therapeutically important pathways over those that lead to undesired side effects. However, determination of ligand bias remains a major bottleneck requiring extensive experimental datasets and in vivo validation which does not always align with in vitro evidence (Kenakin, 2019).
10. Drug repurposing: AI algorithms can screen existing drugs and repurpose them for GPCR-related diseases. By analyzing drug-target interactions and disease pathways, AI can identify potential candidates that may have therapeutic effects on GPCRs.
Pros and cons of current technology
AI has the potential to revolutionize drug discovery by accelerating the process, optimizing drug design, and improving success rates. However, there are also certain cons associated with the use of AI in this field: 1) the quality and availability of data can be a challenge in AI-driven drug discovery; 2) the need for pattern recognition, often presented in the form of assumptions or hypotheses places similarity-based scoring functions at the core of any AI approach (Sanavia, Birolo, Montanucci, Turina, Capriotti & Fariselli, 2020) with intrinsic limitations (e.g. using sequence information alone to determine similarity cannot adequately predict protein structure (Jumper et al., 2021) ); 3) the same limitation holds when searching for novel receptor ligands, as “ligand-based approaches often bias molecule generation towards previously established chemical space, limiting their ability to identify truly novel chemotypes”(Thomas, Smith, O’Boyle, de Graaf & Bender, 2021) and 4) limit of input data can lead to a phenomenon called “the burden of sequence identity” (Sanavia, Birolo, Montanucci, Turina, Capriotti & Fariselli, 2020).
According to the authors, the future of AI relies in 6 main areas:
1. Open-source data: accessibility of databases to a wider audience will encourage the development of new ideas and methods.
2. Greater application of unsupervised machine learning: most applications of ML in GPCR drug discovery have relied on traditional, hand-crafted feature construction, which limits the recognition of unknown patterns. Unsupervised deep learning that takes advantage of large unlabelled data (e.g., millions of compounds and assays on CHEMBL) to understand new proteins, ligands and their relationships is the future.
3. Interpretable machine learning: some AI models, such as deep learning neural networks, are often referred to as "black boxes" because they lack transparency and interpretability. Efforts will likely be made to develop AI models that can provide transparent explanations for their predictions and decisions.
4. Towards a comprehensive understanding of GPCRs, ligands, diseases, and their associations: open-access databases from experimental labs and new AI techniques will allow more associations and will promote multi-task learning.
5. Precision medicines for GPCRs: AI can facilitate the development of personalized treatments by analyzing patient data, including genomic information, clinical records, and treatment outcomes. By integrating this data with GPCR-related knowledge, AI can help identify patient-specific GPCR targets and optimize drug selection and dosing.
6. Automated tools for researchers: AI solutions which bring together theoretical, computational, and experimental labs will allow faster discovery and invention.
While AI holds great potential, it is important to note that it is not a replacement for experimental validation and human expertise. Instead, it complements and assists researchers in the GPCR drug discovery process. Regulatory challenges, ethical considerations, and the need for interpretability of AI-driven models will need to be addressed to fully realize the potential of AI in GPCR drug discovery. The sharing and collaboration of data across academia, industry, and regulatory bodies will likely increase in the coming years and will facilitate the creation of larger and more diverse datasets for AI model training, improving their accuracy and generalizability.
Check the original article at https://pubmed.ncbi.nlm.nih.gov/37161878/