GEFA Early Fusion Approach in Drug-Target Affinity Prediction Academic Project

Project

GEFA Early Fusion Approach in Drug-Target Affinity Prediction

Posted by Admin: System Admin

Beginner

Abstract

Predicting the interaction between a compound and a target is crucial for rapid drug repurposing. Deep learning has been successfully applied in drug-target affinity (DTA) problem. However, previous deep learning-based methods ignore modeling the direct interactions between drug and protein residues. This would lead to inaccurate learning of target representation which may change due to the drug binding effects. In addition, previous DTA methods learn protein representation solely based on a small number of protein sequences in DTA datasets while neglecting the use of proteins outside of the DTA datasets. We propose GEFA (Graph Early Fusion Affinity), a novel graph-in-graph neural network with attention mechanism to address the changes in target representation because of the binding effects. Specifically, a drug is modeled as a graph of atoms, which then serves as a node in a larger graph of residues-drug complex. The resulting model is an expressive deep nested graph neural network. We also use pre-trained protein representation powered by the recent effort of learning contextualized protein representation. The experiments are conducted under different settings to evaluate scenarios such as novel drugs or targets. The results demonstrate the effectiveness of the pre-trained protein embedding and the advantages our GEFA in modeling the nested graph for drug-target interaction. Machine learning is an important component of the growing field of data science. Through the use of statistical methods, different type of algorithms is trained to make classifications or predictions, and to uncover key insights in this project. These insights subsequently drive decision making within applications and businesses, ideally impacting key growth metrics. Machine learning algorithms build a model based on this project data, known as training data, in order to make predictions or decisions without being explicitly programmed to do so. Machine learning algorithms are used in a wide variety of datasets, where it is difficult or unfeasible to develop conventional algorithms to perform the needed tasks.

Existing System & Flaws

Drug re-purposing [18] is the process of identifying well established medications for the novel target disease. The advantages of this drug re-purposing over developing a completely novel drug are lower risk and fast-track development [19]. The process of drug re-purposing consists of three key steps: identifying the candidate molecules given the target disease, drug effect assessment in the preclinical trial, and effectiveness assessment in clinical trial [20]. The first step, hypothesis generation, is critical as it decides the success of the whole process. Advanced computational approaches are used for hypothesis generation. Computational approaches in drug re-purposing can be categorized into six groups [20]: genetic association [21], [22], pathway pathing, retrospective clinical analysis, novel data sources, signature matching [29]–[31], molecular docking [32]–[34]. Drug-target binding affinity indicates the strength of the binding force between the target protein and its ligand (drug or inhibitor) [35]. The drug-target binding affinity prediction problem is a regression task predicting the value of the binding force. The binding strength is measured bythe equilibrium dissociation constant (KD). A smaller KD value indicates a stronger binding affinity between protein and ligand [35]. There are two main approaches: structural approach and non-structural approach [1]. Structural methods utilize the 3D structure of protein and ligands to run the interaction simulation between protein and ligand. On the other hand, the non-structural approach relies on ligand and protein features such as sequence, hydrophobic, similarity or other alternative structural information. The structure-based approach involves molecular docking, predicting the three-dimensional structure of the targetligand complex. In molecular docking, there are a large number of target-ligand complex conformations. The conformations are evaluated by the scoring function. Based on the scoring function types, the structural approach can be categories into three groups [1]: classical scoring function method [36]–[39], machine learning scoring function method [40], and deep learning scoring function method [41], [42]. In classical scoring approaches, Elanie et al. [36] uses DelPhi-calculated potential at each ligand atom for the contact scoring function. In machine learning approaches, Kundu et al. [40] extracts ligand features (e.g. atom count, physicochemical properties) and protein features (e.g. accessible surface, number of chains) from 3D structure data then applies machine learning to learn the scoring function. In deep learning approaches, Marta et al. [41] uses 3D convolution with protein-ligand 3D structure to predict the binding affinity. Disadvantages ? The system is not implemented compare the drug representation which extracted from the drug-protein fusion graph and drug representation extracted from the drug graph. ? The system is not implemented Graph Early Fusion for binding Affinity prediction (GEFA).

Proposed System & Advantages

• In summary, the contribution of our work is two-fold. First, we combine the protein sequence embedding feature and protein contact map to build the graph representation of a target protein. Second, in order to reflect the target representation change during the binding process, we propose a so-called Graph Early Fusion for binding Affinity prediction (GEFA) for more accurate biological modeling. We demonstrate the effects of the GEFA on Davis dataset [17] where it has shown superior performance against previous studies on different settings. To address target protein representation change, the system proposes an early-fusion-based approach. Initially, we extract representation feature for a given drug molecule from its drug graph structure. Then, the drug representation is integrated into the protein graph structure before the protein representation learning phrase. This is basically a graph structure nested inside another graph structure. This graphin- graph neural network design allows the model to learn changes in protein representation caused by the binding process with the drug molecule. Advantages ? The proposed system refines the Graph-Graph Integration with Early Fusion and Graph Early Fusion for binding Affinity prediction (GEFA). ? The proposed system implemented the usage of attention mask as the graph edge. Instead of using attention as drug-residue edge weight, drug-residue edges are weighted the same as the residue-residue edges in the target graph.

Software Requirements

? Operating system : Windows 7 Ultimate.
? Coding Language : Python.
? Front-End : Python.
? Back-End : Django-ORM
? Designing : Html, css, javascript.
? Data Base : MySQL (WAMP Server).

Hardware Requirements

? H/W System Configuration:-
? Processor - Pentium –IV
? RAM - 4 GB (min)
? Hard Disk - 20 GB
? Key Board - Standard Windows Keyboard
? Mouse - Two or Three Button Mouse
? Monitor - SVGA

Interested in this Project?

You need an active student profile to apply for this project.

Need help? Contact Support