guides
The submodule that contains the guides, i.e., the weak learners in DebiasedDTA that learn a weighting of the training set to improve generalizability. The implemented guides are IDDTA and BoWDTA, and an abstract classes is also available to quickly implement custom guides.
Guide
Bases: ABC
An abstract class that implements the interface of a guide in pydebiaseddta
.
The guides are characterized by a train
function and a predict
function,
whose signatures are implemented by this class.
Any instance of the Guide
class can be trained in the DebiasedDTA
training framework,
and therefore, Guide
can be inherited to design custom guide models.
Source code in pydebiaseddta/guides/abstract_guide.py
5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 |
|
predict(ligands, proteins)
abstractmethod
An abstract method to define the prediction interface of the guides.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
ligands |
List[Any]
|
Ligands in any representation. |
required |
proteins |
List[Any]
|
Proteins in any representation. |
required |
Returns:
Type | Description |
---|---|
List[float]
|
The predicted affinities. |
Source code in pydebiaseddta/guides/abstract_guide.py
30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 |
|
train(train_ligands, train_proteins, train_labels)
abstractmethod
An abstract method to define the training interface of the guides.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
train_ligands |
List[Any]
|
Training ligands in any representation. |
required |
train_proteins |
List[Any]
|
Training proteins in any representation. |
required |
train_labels |
List[float]
|
Affinity scores of the training protein-ligand pairs. |
required |
Source code in pydebiaseddta/guides/abstract_guide.py
13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 |
|
BoWDTA
Bases: Guide
Source code in pydebiaseddta/guides/bowdta.py
17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 |
|
__init__()
Constructor to create a BoWDTA model. BoWDTA represents the proteins and ligands as "bag-of-words` and uses a decision tree for prediction. BoWDTA uses the same biomolecule vocabulary as BPEDTA.
Source code in pydebiaseddta/guides/bowdta.py
18 19 20 21 22 23 24 25 26 27 28 29 30 |
|
predict(ligands, proteins)
Predicts the affinities of a list of protein-ligand pairs.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
ligands |
List[str]
|
SMILES strings of the ligands. |
required |
proteins |
List[str]
|
Amino-acid sequences of the proteins. |
required |
Returns:
Type | Description |
---|---|
List[float]
|
Predicted affinities. |
Source code in pydebiaseddta/guides/bowdta.py
130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 |
|
tokenize_ligands(smiles)
Segments SMILES strings of the ligands into their ligand words and applies label encoding.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
smiles |
List[str]
|
The SMILES strings of the ligands |
required |
Returns:
Type | Description |
---|---|
List[List[int]]
|
Label encoded sequences of ligand words. |
Source code in pydebiaseddta/guides/bowdta.py
32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 |
|
tokenize_proteins(aa_sequences)
Segments amino-acid sequences of the proteins into their protein words and applies label encoding.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
aa_sequences |
List[str]
|
The amino-acid sequences of the proteins. |
required |
Returns:
Type | Description |
---|---|
List[List[int]]
|
Label encoded sequences of protein words. |
Source code in pydebiaseddta/guides/bowdta.py
51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 |
|
train(train_ligands, train_proteins, train_labels)
Trains a BoWDTA model on the provided protein-ligand interactions. The biomolecules are represented as bag of their biomolecule words and a decision tree is used for affinity prediction.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
train_ligands |
List[str]
|
SMILES strings of the training ligands. |
required |
train_proteins |
List[str]
|
Amino-acid sequences of the training ligands. |
required |
train_labels |
List[float]
|
Affinity scores of the training interactions. |
required |
Source code in pydebiaseddta/guides/bowdta.py
101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 |
|
vectorize_ligands(smiles_words)
Computes bag-of-words vectors of the ligands based on their frequency.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
smiles_words |
List[List[int]]
|
ligand words of each ligand as a sequence of sequences. |
required |
Returns:
Type | Description |
---|---|
np.array
|
Bag-of-words vectors stacked in a matrix. |
Source code in pydebiaseddta/guides/bowdta.py
67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 |
|
vectorize_proteins(protein_words)
Computes bag-of-words vectors of the proteins based on their frequency.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
protein_words |
List[List[int]]
|
Protein words of each protein as a sequence of sequences. |
required |
Returns:
Type | Description |
---|---|
np.array
|
Bag-of-words vectors stacked in a matrix. |
Source code in pydebiaseddta/guides/bowdta.py
84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 |
|
IDDTA
Bases: Guide
Source code in pydebiaseddta/guides/iddta.py
13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 |
|
__init__()
Constructor to create an IDDTA model. IDDTA represents the proteins and ligands with one-hot vectors of their identities and uses a decision tree for prediction.
Source code in pydebiaseddta/guides/iddta.py
14 15 16 17 18 19 20 21 |
|
predict(ligands, proteins)
Predicts the affinities of a list of protein-ligand pairs.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
ligands |
List[str]
|
SMILES strings of the ligands. |
required |
proteins |
List[str]
|
Amino-acid sequences of the proteins. |
required |
Returns:
Type | Description |
---|---|
List[float]
|
Predicted affinities. |
Source code in pydebiaseddta/guides/iddta.py
83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 |
|
train(train_ligands, train_proteins, train_labels)
Trains the IDDTA model. IDDTA represents the biomolecules with one-hot-encoding of their identities and applies decision tree for affinity prediction.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
train_ligands |
List[str]
|
SMILES strings of the training ligands. |
required |
train_proteins |
List[str]
|
Amino-acid sequences of the training proteins. |
required |
train_labels |
List[float]
|
Affinity scores of the interactions. |
required |
Source code in pydebiaseddta/guides/iddta.py
55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 |
|
vectorize_ligands(ligands)
Creates one-hot vectors of the ligands.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
ligands |
List[str]
|
SMILES strings of the input ligands (other representations are also possible, but SMILES is used in this study). |
required |
Returns:
Type | Description |
---|---|
np.array
|
One-hot encoded vectors of the ligands. |
Source code in pydebiaseddta/guides/iddta.py
23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 |
|
vectorize_proteins(proteins)
Creates one-hot vectors of the proteins.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
proteins |
List[str]
|
Amino-acid sequences of the input proteins. |
required |
Returns:
Type | Description |
---|---|
np.array
|
One-hot encoded vectors of the proteins. |
Source code in pydebiaseddta/guides/iddta.py
39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 |
|