Welcome to DISTIL’s documentation!¶
DISTIL:: Deep dIverSified inTeractIve Learning is an efficient and scalable active learning library built on top of PyTorch.
What is DISTIL?

DISTIL is an active learning toolkit that implements a number of state-of-the-art active learning strategies with a particular focus for active learning in the deep learning setting. DISTIL is built on PyTorch and decouples the training loop from the active learning algorithm, thereby providing flexibility to the user by allowing them to control the training procedure and model. It allows users to incorporate new active learning algorithms easily with minimal changes to their existing code. DISTIL also provides support for incorporating active learning with your custom dataset and allows you to experiment on well-known datasets. We are continuously incorporating newer and better active learning selection strategies into DISTIL.
Principles of DISTIL:
Minimal changes to add it to the existing training structure.
Independent of the training strategy used.
Achieving similar test accuracy with less amount of training data.
Huge reduction in labeling cost and time.
Access to various active learning strategies with just one line of code.
Preliminary Results: CIFAR-10

Preliminary Results: MNIST

Preliminary Results: Fashion MNIST

Preliminary Results: SVHN

DISTIL¶
Active Learning Strategies¶
BADGE¶
-
class
distil.active_learning_strategies.badge.
BADGE
(X, Y, unlabeled_x, net, handler, nclasses, args)[source]¶ Bases:
distil.active_learning_strategies.strategy.Strategy
This method is based on the paper Deep Batch Active Learning by Diverse, Uncertain Gradient Lower Bounds 1. According to the paper, this strategy, Batch Active learning by Diverse Gradient Embeddings (BADGE), samples groups of points that are disparate and high magnitude when represented in a hallucinated gradient space, a strategy designed to incorporate both predictive uncertainty and sample diversity into every selected batch. Crucially, BADGE trades off between uncertainty and diversity without requiring any hand-tuned hyperparameters. Here at each round of selection, loss gradients are computed using the hypothesised labels. Then to select the points to be labeled are selected by applying k-means++ on these loss gradients.
- Parameters
X (numpy array) – Present training/labeled data
Y (numpy array) – Labels of present training data
unlabeled_x (numpy array) – Data without labels
net (class) – Pytorch Model class
handler (class) – Data Handler, which can load data even without labels.
nclasses (int) – Number of unique target variables
args (dict) – Specify optional parameters. batch_size Batch size to be used inside strategy class (int, optional)
-
select
(budget)[source]¶ Select next set of points
- Parameters
budget (int) – Number of indexes to be returned for next set
- Returns
chosen – List of selected data point indexes with respect to unlabeled_x
- Return type
list
-
select_per_batch
(budget, batch_size)[source]¶ Select points to label by using per-batch BADGE strategy
- Parameters
budget (int) – Number of indices to be selected from unlabeled set
batch_size (int) – Size of batches to form
- Returns
chosen – List of selected data point indices with respect to unlabeled_x
- Return type
list
Core-Set Approch¶
-
class
distil.active_learning_strategies.core_set.
CoreSet
(X, Y, unlabeled_x, net, handler, nclasses, args={})[source]¶ Bases:
distil.active_learning_strategies.strategy.Strategy
Implementation of CoreSet 2 Strategy. A diversity-based approach using coreset selection. The embedding of each example is computed by the network’s penultimate layer and the samples at each round are selected using a greedy furthest-first traversal conditioned on all labeled examples.
- Parameters
X (numpy array) – Present training/labeled data
Y (numpy array) – Labels of present training data
unlabeled_x (numpy array) – Data without labels
net (class) – Pytorch Model class
handler (class) – Data Handler, which can load data even without labels.
nclasses (int) – Number of unique target variables
args (dict) –
Specify optional parameters
batch_size Batch size to be used inside strategy class (int, optional)
-
furthest_first
(X, X_set, n)[source]¶ Selects points with maximum distance
- Parameters
X (numpy array) – Embeddings of unlabeled set
X_set (numpy array) – Embeddings of labeled set
n (int) – Number of points to return
- Returns
idxs – List of selected data point indexes with respect to unlabeled_x
- Return type
list
CRAIG-ACTIVE¶
-
class
distil.active_learning_strategies.craig_active.
CRAIGActive
(X, Y, unlabeled_x, net, criterion, handler, nclasses, lrn_rate, selection_type, linear_layer, args={})[source]¶ Bases:
distil.active_learning_strategies.strategy.Strategy
This is an implementation of an active learning variant of CRAIG from the paper Coresets for Data-efficient Training of Machine Learning Models . This algorithm calculates hypothesized labels for each of the unlabeled points and feeds this hypothesized set to the original CRAIG algorithm. The selected points from CRAIG are used as the queried points for this algorithm.
- Parameters
X (Numpy array) – Features of the labled set of points
Y (Numpy array) – Lables of the labled set of points
unlabeled_x (Numpy array) – Features of the unlabled set of points
net (class object) – Model architecture used for training. Could be instance of models defined in distil.utils.models or something similar.
criterion (class object) – The loss type used in training. Could be instance of torch.nn.* losses or torch functionals.
handler (class object) – It should be a subclass of torch.utils.data.Dataset i.e, have __getitem__ and __len__ methods implemented, so that is could be passed to pytorch DataLoader.Could be instance of handlers defined in distil.utils.DataHandler or something similar.
nclasses (int) – No. of classes in tha dataset
lrn_rate (float) – The learning rate used in training. Used by the CRAIG algorithm.
selection_type (string) – Should be one of “PerClass”, “Supervised”, or “PerBatch”. Selects which approximation method is used.
linear_layer (bool) – Sets whether to include the last linear layer parameters as part of the gradient computation.
args (dictionary) – This dictionary should have keys ‘batch_size’ and ‘lr’. ‘lr’ should be the learning rate used for training. ‘batch_size’ ‘batch_size’ should be such that one can exploit the benefits of tensorization while honouring the resourse constraits.
Entropy Sampling¶
-
class
distil.active_learning_strategies.entropy_sampling.
EntropySampling
(X, Y, unlabeled_x, net, handler, nclasses, args={})[source]¶ Bases:
distil.active_learning_strategies.strategy.Strategy
Implements the Entropy Sampling Strategy, one of the most basic active learning strategies, where we select samples about which the model is most uncertain. To quantify the uncertainity we use entropy and therefore select points which have maximum entropy.
Suppose the model has nclasses output nodes and each output node is denoted by \(z_j\). Thus, \(j \in [1,nclasses]\). Then for a output node \(z_i\) from the model, the corresponding softmax would be
\[\sigma(z_i) = \frac{e^{z_i}}{\sum_j e^{z_j}}\]Then entropy can be calculated as,
\[ENTROPY = -\sum_j \sigma(z_j)*log(\sigma(z_i))\]The algorithm then selects budget no. of elements with highest ENTROPY.
- Parameters
X (numpy array) – Present training/labeled data
y (numpy array) – Labels of present training data
unlabeled_x (numpy array) – Data without labels
net (class) – Pytorch Model class
handler (class) – Data Handler, which can load data even without labels.
nclasses (int) – Number of unique target variables
args (dict) –
Specify optional parameters
batch_size Batch size to be used inside strategy class (int, optional)
Entropy Sampling with Dropout¶
-
class
distil.active_learning_strategies.entropy_sampling_dropout.
EntropySamplingDropout
(X, Y, unlabeled_x, net, handler, nclasses, args={})[source]¶ Bases:
distil.active_learning_strategies.strategy.Strategy
Implements the Entropy Sampling Strategy with dropout. Entropy Sampling Strategy is one of the most basic active learning strategies, where we select samples about which the model is most uncertain. To quantify the uncertainity we use entropy and therefore select points which have maximum entropy.
Suppose the model has nclasses output nodes and each output node is denoted by \(z_j\). Thus, \(j \in [1,nclasses]\). Then for a output node \(z_i\) from the model, the corresponding softmax would be
\[\sigma(z_i) = \frac{e^{z_i}}{\sum_j e^{z_j}}\]Then entropy can be calculated as,
\[ENTROPY = -\sum_j \sigma(z_j)*log(\sigma(z_i))\]The algorithm then selects budget no. of elements with highest ENTROPY.
The drop out version uses the predict probability dropout function from the base strategy class to find the hypothesised labels. User can pass n_drop argument which denotes the number of times the probabilities will be calculated. The final probability is calculated by averaging probabilities obtained in all iteraitons.
- Parameters
X (numpy array) – Present training/labeled data
y (numpy array) – Labels of present training data
unlabeled_x (numpy array) – Data without labels
net (class) – Pytorch Model class
handler (class) – Data Handler, which can load data even without labels.
nclasses (int) – Number of unique target variables
args (dict) –
Specify optional parameters
batch_size Batch size to be used inside strategy class (int, optional)
n_drop Dropout value to be used (int, optional)
FASS¶
-
class
distil.active_learning_strategies.fass.
FASS
(X, Y, unlabeled_x, net, handler, nclasses, args={})[source]¶ Bases:
distil.active_learning_strategies.strategy.Strategy
Implements FASS 3 combines the uncertainty sampling method with a submodular data subset selection framework to label a subset of data points to train a classifier. Here the based on the ‘top_n’ parameter, ‘top_n*budget’ most uncertain parameters are filtered. On these filtered points one of the submodular functions viz. ‘facility_location’ , ‘graph_cut’, ‘saturated_coverage’, ‘sum_redundancy’, ‘feature_based’ is applied to get the final set of points.
We select a subset \(F\) of size \(\beta\) based on uncertainty sampling, such that \(\beta \ge k\).
Then select a subset \(S\) by solving
\[\max \{f(S) \text{ such that } |S| \leq k, S \subseteq F\}\]where \(k\) is the is the budget and \(f\) can be one of these functions - ‘facility location’ , ‘graph cut’, ‘saturated coverage’, ‘sum redundancy’, ‘feature based’.
- Parameters
X (numpy array) – Present training/labeled data
y (numpy array) – Labels of present training data
unlabeled_x (numpy array) – Data without labels
net (class) – Pytorch Model class
handler (class) – Data Handler, which can load data even without labels.
nclasses (int) – Number of unique target variables
args (dict) – Specify optional parameters - batch_size Batch size to be used inside strategy class (int, optional)
submod (str) –
of submodular function - 'facility_location' | 'graph_cut' | 'saturated_coverage' | 'sum_redundancy' | 'feature_based' (Choice) –
selection_type (str) –
of selection strategy - 'PerClass' | 'Supervised' (Choice) –
-
select
(budget, top_n=5)[source]¶ Select next set of points
- Parameters
budget (int) – Number of indexes to be returned for next set
top_n (float) – It is the multiper to the budget which decides the size of the data points on which submodular functions will be applied. For example top_n = 5, if 5*budget points will be passed to the submodular functions.
- Returns
return_indices – List of selected data point indexes with respect to unlabeled_x
- Return type
list
GLISTER¶
-
class
distil.active_learning_strategies.glister.
GLISTER
(X, Y, unlabeled_x, net, handler, nclasses, args, valid, X_val=None, Y_val=None, loss_criterion=CrossEntropyLoss(), typeOf='none', lam=None, kernel_batch_size=200)[source]¶ Bases:
distil.active_learning_strategies.strategy.Strategy
This is implementation of GLISTER-ACTIVE from the paper GLISTER: Generalization based Data Subset Selection for Efficient and Robust Learning 4. GLISTER methods tries to solve a bi-level optimisation problem.
\[\overbrace{\underset{{S \subseteq {\mathcal U}, |S| \leq k}}{\operatorname{argmin\hspace{0.7mm}}} L_V(\underbrace{\underset{\theta}{\operatorname{argmin\hspace{0.7mm}}} L_T( \theta, S)}_{inner-level}, {\mathcal V})}^{outer-level}\]In the above equation, \(\mathcal{U}\) denotes the Data without lables i.e. unlabeled_x, \(\mathcal{V}\) denotes the validation set that guides the subset selection process, \(L_T\) denotes the training loss, \(L_V\) denotes the validation loss, \(S\) denotes the data subset selected at each round, and \(k\) is the budget. Since, solving the complete inner-optimization is expensive, GLISTER-ONLINE adopts a online one-step meta approximation where we approximate the solution to inner problem by taking a single gradient step. The optimization problem after the approximation is as follows:
\[\overbrace{\underset{{S \subseteq {\mathcal U}, |S| \leq k}}{\operatorname{argmin\hspace{0.7mm}}} L_V(\underbrace{\theta - \eta \nabla_{\theta}L_T(\theta, S)}_{inner-level}, {\mathcal V})}^{outer-level}\]In the above equation, \(\eta\) denotes the step-size used for one-step gradient update.
- Parameters
X (Numpy array) – Features of the labled set of points
Y (Numpy array) – Lables of the labled set of points
unlabeled_x (Numpy array) – Features of the unlabled set of points
net (class object) – Model architecture used for training. Could be instance of models defined in distil.utils.models or something similar.
handler (class object) – It should be a subclass of torch.utils.data.Dataset i.e, have __getitem__ and __len__ methods implemented, so that is could be passed to pytorch DataLoader.Could be instance of handlers defined in distil.utils.DataHandler or something similar.
nclasses (int) – No. of classes in tha dataset
args (dictionary) – This dictionary should have keys ‘batch_size’ and ‘lr’. ‘lr’ should be the learning rate used for training. ‘batch_size’ ‘batch_size’ should be such that one can exploit the benefits of tensorization while honouring the resourse constraits.
valid (boolean) – Whether validation set is passed or not
X_val (Numpy array, optional) – Features of the points in the validation set. Mandatory if valid=True.
Y_val (Numpy array, optional) – Lables of the points in the validation set. Mandatory if valid=True.
loss_criterion (class object, optional) – The type of loss criterion. Default is torch.nn.CrossEntropyLoss()
typeOf (str, optional) – Determines the type of regulariser to be used. Default is ‘none’. For random regulariser use ‘Rand’. To use Facility Location set functiom as a regulariser use ‘FacLoc’. To use Diversity set functiom as a regulariser use ‘Diversity’.
lam (float, optional) – Determines the amount of regularisation to be applied. Mandatory if is not typeOf=’none’ and by default set to None. For random regulariser use values should be between 0 and 1 as it determines fraction of points replaced by random points. For both ‘Diversity’ and ‘FacLoc’, lam determines the weightage given to them while computing the gain.
kernel_batch_size (int, optional) – For ‘Diversity’ and ‘FacLoc’ regualrizer versions, similarity kernel is to be computed, which entails creating a 3d torch tensor of dimenssions kernel_batch_size*kernel_batch_size* feature dimenssion.Again kernel_batch_size should be such that one can exploit the benefits of tensorization while honouring the resourse constraits.
GRADMATCH¶
-
class
distil.active_learning_strategies.gradmatch_active.
GradMatchActive
(X, Y, unlabeled_x, net, criterion, handler, nclasses, lrn_rate, selection_type, linear_layer, args={}, valid=False, X_val=None, Y_val=None)[source]¶ Bases:
distil.active_learning_strategies.strategy.Strategy
This is an implementation of an active learning variant of GradMatch from the paper GRAD-MATCH: A Gradient Matching Based Data Subset Selection for Efficient Learning . This algorithm solves a fixed-weight version of the error term present in the paper by a greedy selection algorithm akin to the original GradMatch’s Orthogonal Matching Pursuit. The gradients computed are on the hypothesized labels of the loss function and are matched to either the full gradient of these hypothesized examples or a supplied validation gradient. The indices returned are the ones selected by this algorithm.
\[Err(X_t, L, L_T, \theta_t) = \left |\left| \sum_{i \in X_t} \nabla_\theta L_T^i (\theta_t) - \frac{k}{N} \nabla_\theta L(\theta_t) \right | \right|\]where,
Each gradient is computed with respect to the last layer’s parameters
\(\theta_t\) are the model parameters at selection round \(t\)
\(X_t\) is the queried set of points to label at selection round \(t\)
\(k\) is the budget
\(N\) is the number of points contributing to the full gradient \(\nabla_\theta L(\theta_t)\)
\(\nabla_\theta L(\theta_t)\) is either the complete hypothesized gradient or a validation gradient
\(\sum_{i \in X_t} \nabla_\theta L_T^i (\theta_t)\) is the subset’s hypothesized gradient with \(|X_t| = k\)
- Parameters
X (Numpy array) – Features of the labled set of points
Y (Numpy array) – Lables of the labled set of points
unlabeled_x (Numpy array) – Features of the unlabled set of points
net (class object) – Model architecture used for training. Could be instance of models defined in distil.utils.models or something similar.
criterion (class object) – The loss type used in training. Could be instance of torch.nn.* losses or torch functionals.
handler (class object) – It should be a subclass of torch.utils.data.Dataset i.e, have __getitem__ and __len__ methods implemented, so that is could be passed to pytorch DataLoader.Could be instance of handlers defined in distil.utils.DataHandler or something similar.
nclasses (int) – No. of classes in tha dataset
lrn_rate (float) – The learning rate used in training. Used by the original GradMatch algorithm.
selection_type (string) – Should be one of “PerClass” or “PerBatch”. Selects which approximation method is used.
linear_layer (bool) – Sets whether to include the last linear layer parameters as part of the gradient computation.
args (dictionary) – This dictionary should have keys ‘batch_size’ and ‘lr’. ‘lr’ should be the learning rate used for training. ‘batch_size’ ‘batch_size’ should be such that one can exploit the benefits of tensorization while honouring the resourse constraits.
valid (boolean) – Whether validation set is passed or not
X_val (Numpy array, optional) – Features of the points in the validation set. Mandatory if valid=True.
Y_val (Numpy array, optional) – Lables of the points in the validation set. Mandatory if valid=True.
-
select
(budget, use_weights)[source]¶ Select next set of points
- Parameters
budget (int) – Number of indexes to be returned for next set
use_weights (bool) – Whether to use fixed-weight version (false) or OMP version (true)
- Returns
subset_idxs – List of selected data point indexes with respect to unlabeled_x and, if use_weights is true, the weights associated with each point
- Return type
list
Least Confidence¶
-
class
distil.active_learning_strategies.least_confidence.
LeastConfidence
(X, Y, unlabeled_x, net, handler, nclasses, args={})[source]¶ Bases:
distil.active_learning_strategies.strategy.Strategy
Implements the Least Confidence Sampling Strategy a active learning strategy where the algorithm selects the data points for which the model has the lowest confidence while predicting its label.
Suppose the model has nclasses output nodes denoted by \(\overrightarrow{\boldsymbol{z}}\) and each output node is denoted by \(z_j\). Thus, \(j \in [1, nclasses]\). Then for a output node \(z_i\) from the model, the corresponding softmax would be
\[\sigma(z_i) = \frac{e^{z_i}}{\sum_j e^{z_j}}\]Then the softmax can be used pick budget no. of elements for which the model has the lowest confidence as follows,
\[\mbox{argmin}_{{S \subseteq {\mathcal U}, |S| \leq k}}{\sum_S(\mbox{argmax}_j{(\sigma(\overrightarrow{\boldsymbol{z}}))})}\]where \(\mathcal{U}\) denotes the Data without lables i.e. unlabeled_x and \(k\) is the budget.
- Parameters
X (numpy array) – Present training/labeled data
y (numpy array) – Labels of present training data
unlabeled_x (numpy array) – Data without labels
net (class) – Pytorch Model class
handler (class) – Data Handler, which can load data even without labels.
nclasses (int) – Number of unique target variables
args (dict) –
Specify optional parameters
batch_size Batch size to be used inside strategy class (int, optional)
Least Confidence with Dropout¶
-
class
distil.active_learning_strategies.least_confidence_dropout.
LeastConfidenceDropout
(X, Y, unlabeled_x, net, handler, nclasses, args={})[source]¶ Bases:
distil.active_learning_strategies.strategy.Strategy
Implements the Least Confidence Sampling Strategy with dropout a active learning strategy where the algorithm selects the data points for which the model has the lowest confidence while predicting its label.
Suppose the model has nclasses output nodes denoted by \(\overrightarrow{\boldsymbol{z}}\) and each output node is denoted by \(z_j\). Thus, \(j \in [1, nclasses]\). Then for a output node \(z_i\) from the model, the corresponding softmax would be
\[\sigma(z_i) = \frac{e^{z_i}}{\sum_j e^{z_j}}\]Then the softmax can be used pick budget no. of elements for which the model has the lowest confidence as follows,
\[\mbox{argmin}_{{S \subseteq {\mathcal U}, |S| \leq k}}{\sum_S(\mbox{argmax}_j{(\sigma(\overrightarrow{\boldsymbol{z}}))})}\]where \(\mathcal{U}\) denotes the Data without lables i.e. unlabeled_x and \(k\) is the budget.
The drop out version uses the predict probability dropout function from the base strategy class to find the hypothesised labels. User can pass n_drop argument which denotes the number of times the probabilities will be calculated. The final probability is calculated by averaging probabilities obtained in all iteraitons.
- Parameters
X (numpy array) – Present training/labeled data
y (numpy array) – Labels of present training data
unlabeled_x (numpy array) – Data without labels
net (class) – Pytorch Model class
handler (class) – Data Handler, which can load data even without labels.
nclasses (int) – Number of unique target variables
args (dict) –
Specify optional parameters
batch_size Batch size to be used inside strategy class (int, optional)
n_drop Dropout value to be used (int, optional)
Margin Sampling¶
-
class
distil.active_learning_strategies.margin_sampling.
MarginSampling
(X, Y, unlabeled_x, net, handler, nclasses, args={})[source]¶ Bases:
distil.active_learning_strategies.strategy.Strategy
Implements the Margin Sampling Strategy a active learning strategy similar to Least Confidence Sampling Strategy. While least confidence only takes into consideration the maximum probability, margin sampling considers the difference between the confidence of first and the second most probable labels.
Suppose the model has nclasses output nodes denoted by \(\overrightarrow{\boldsymbol{z}}\) and each output node is denoted by \(z_j\). Thus, \(j \in [1, nclasses]\). Then for a output node \(z_i\) from the model, the corresponding softmax would be
\[\sigma(z_i) = \frac{e^{z_i}}{\sum_j e^{z_j}}\]Let,
\[m = \mbox{argmax}_j{(\sigma(\overrightarrow{\boldsymbol{z}}))}\]Then using softmax, Margin Sampling Strategy would pick budget no. of elements as follows,
\[\mbox{argmin}_{{S \subseteq {\mathcal U}, |S| \leq k}}{\sum_S(\mbox{argmax}_j {(\sigma(\overrightarrow{\boldsymbol{z}}))}) - (\mbox{argmax}_{j \ne m} {(\sigma(\overrightarrow{\boldsymbol{z}}))})}\]where \(\mathcal{U}\) denotes the Data without lables i.e. unlabeled_x and \(k\) is the budget.
- Parameters
X (numpy array) – Present training/labeled data
y (numpy array) – Labels of present training data
unlabeled_x (numpy array) – Data without labels
net (class) – Pytorch Model class
handler (class) – Data Handler, which can load data even without labels.
nclasses (int) – Number of unique target variables
args (dict) –
Specify optional parameters
batch_size Batch size to be used inside strategy class (int, optional)
Margin sampling with Dropout¶
-
class
distil.active_learning_strategies.margin_sampling_dropout.
MarginSamplingDropout
(X, Y, unlabeled_x, net, handler, nclasses, args={})[source]¶ Bases:
distil.active_learning_strategies.strategy.Strategy
Implements the Margin Sampling Strategy with dropout a active learning strategy similar to Least Confidence Sampling Strategy with dropout. While least confidence only takes into consideration the maximum probability, margin sampling considers the difference between the confidence of first and the second most probable labels.
Suppose the model has nclasses output nodes denoted by \(\overrightarrow{\boldsymbol{z}}\) and each output node is denoted by \(z_j\). Thus, \(j \in [1, nclasses]\). Then for a output node \(z_i\) from the model, the corresponding softmax would be
\[\sigma(z_i) = \frac{e^{z_i}}{\sum_j e^{z_j}}\]Let,
\[m = \mbox{argmax}_j{(\sigma(\overrightarrow{\boldsymbol{z}}))}\]Then using softmax, Margin Sampling Strategy would pick budget no. of elements as follows,
\[\mbox{argmin}_{{S \subseteq {\mathcal U}, |S| \leq k}}{\sum_S(\mbox{argmax}_j {(\sigma(\overrightarrow{\boldsymbol{z}}))}) - (\mbox{argmax}_{j \ne m} {(\sigma(\overrightarrow{\boldsymbol{z}}))})}\]where \(\mathcal{U}\) denotes the Data without lables i.e. unlabeled_x and \(k\) is the budget.
The drop out version uses the predict probability dropout function from the base strategy class to find the hypothesised labels. User can pass n_drop argument which denotes the number of times the probabilities will be calculated. The final probability is calculated by averaging probabilities obtained in all iteraitons.
- Parameters
X (numpy array) – Present training/labeled data
y (numpy array) – Labels of present training data
unlabeled_x (numpy array) – Data without labels
net (class) – Pytorch Model class
handler (class) – Data Handler, which can load data even without labels.
nclasses (int) – Number of unique target variables
args (dict) –
Specify optional parameters
batch_size Batch size to be used inside strategy class (int, optional)
n_drop Dropout value to be used (int, optional)
Random Sampling¶
-
class
distil.active_learning_strategies.random_sampling.
RandomSampling
(X, Y, unlabeled_x, net, handler, nclasses, args={})[source]¶ Bases:
distil.active_learning_strategies.strategy.Strategy
Implementation of Random Sampling Strategy. This strategy is often used as a baseline, where we pick a set of unlabeled points randomly.
- Parameters
X (numpy array) – Present training/labeled data
y (numpy array) – Labels of present training data
unlabeled_x (numpy array) – Data without labels
net (class) – Pytorch Model class
handler (class) – Data Handler, which can load data even without labels.
nclasses (int) – Number of unique target variables
args (dict) –
Specify optional parameters
batch_size Batch size to be used inside strategy class (int, optional)
Submodular Sampling¶
-
class
distil.active_learning_strategies.submod_sampling.
SubmodSampling
(X, Y, unlabeled_x, net, handler, nclasses, typeOf, selection_type, if_grad=False, args={}, kernel_batch_size=200)[source]¶ Bases:
distil.active_learning_strategies.strategy.Strategy
This strategy uses one of the submodular functions viz. ‘facility_location’, ‘graph_cut’, ‘saturated_coverage’, ‘sum_redundancy’, ‘feature_based’ 5 or Disparity-sum, Disparity-min 6 or DPP 7 is used to select the points to be labeled. These techniques can be applied directly to the features/embeddings or on the gradients of the loss functions.
- Parameters
X (Numpy array) – Features of the labled set of points
Y (Numpy array) – Lables of the labled set of points
unlabeled_x (Numpy array) – Features of the unlabled set of points
net (class object) – Model architecture used for training. Could be instance of models defined in distil.utils.models or something similar.
handler (class object) – It should be a subclass of torch.utils.data.Dataset i.e, have __getitem__ and __len__ methods implemented, so that is could be passed to pytorch DataLoader.Could be instance of handlers defined in distil.utils.DataHandler or something similar.
nclasses (int) – No. of classes in tha dataset
typeOf (str) – Choice of submodular function - ‘facility_location’ | ‘graph_cut’ | ‘saturated_coverage’ | ‘sum_redundancy’ | ‘feature_based’ | ‘Disparity-min’ | ‘Disparity-sum’ | ‘DPP’
selection_type (str) – selection strategy - ‘Full’ |’PerClass’ | ‘Supervised’
if_grad (boolean, optional) – Determines if gradients to be used for subset selection. Default is False.
args (dictionary) – This dictionary should have keys ‘batch_size’ and ‘lr’. ‘lr’ should be the learning rate used for training. ‘batch_size’ ‘batch_size’ should be such that one
kernel_batch_size (int, optional) – For ‘Diversity’ and ‘FacLoc’ regualrizer versions, similarity kernel is to be computed, which entails creating a 3d torch tensor of dimenssions kernel_batch_size*kernel_batch_size* feature dimenssion.Again kernel_batch_size should be such that one can exploit the benefits of tensorization while honouring the resourse constraits.
Adversarial BIM¶
-
class
distil.active_learning_strategies.adversarial_bim.
AdversarialBIM
(X, Y, unlabeled_x, net, handler, nclasses, args={})[source]¶ Bases:
distil.active_learning_strategies.strategy.Strategy
Implements Adversial Bim Strategy which is motivated by the fact that often the distance computation from decision boundary is difficult and intractable for margin based methods. This technique avoids estimating distance by using BIM(Basic Iterative Method) 8 to estimate how much adversarial perturbation is required to cross the boundary. Smaller the required the perturbation, closer the point is to the boundary.
Basic Iterative Method (BIM): Given a base input, the approach is to perturb each feature in the direction of the gradient by magnitude \(\epsilon\), where is a parameter that determines perturbation size. For a model with loss \(\nabla J(\theta, x, y)\), where \(\theta\) represents the model parameters, x is the model input, and y is the label of x, the adversarial sample is generated iteratively as,
\[ \begin{align}\begin{aligned}\begin{eqnarray} x^*_0 & = &x,\\ x^*_i & = & clip_{x,e} (x^*_{i-1} + sign(\nabla_{x^*_{i-1}} J(\theta, x^*_{i-1} , y))) \end{eqnarray}\end{aligned}\end{align} \]- Parameters
X (numpy array) – Present training/labeled data
y (numpy array) – Labels of present training data
unlabeled_x (numpy array) – Data without labels
net (class) – Pytorch Model class
handler (class) – Data Handler, which can load data even without labels.
nclasses (int) – Number of unique target variables
args (dict) –
Specify optional parameters
batch_size- Batch size to be used inside strategy class (int, optional)
eps-epsilon value for gradients
Adversarial DeepFool¶
-
class
distil.active_learning_strategies.adversarial_deepfool.
AdversarialDeepFool
(X, Y, unlabeled_x, net, handler, nclasses, args={})[source]¶ Bases:
distil.active_learning_strategies.strategy.Strategy
Implements Adversial Deep Fool Strategy 9, a Deep-Fool based Active Learning strategy that selects unlabeled samples with the smallest adversarial perturbation. This technique is motivated by the fact that often the distance computation from decision boundary is difficult and intractable for margin-based methods. This technique avoids estimating distance by using Deep-Fool 10 like techniques to estimate how much adversarial perturbation is required to cross the boundary. The smaller the required perturbation, the closer the point is to the boundary.
- Parameters
X (numpy array) – Present training/labeled data
y (numpy array) – Labels of present training data
unlabeled_x (numpy array) – Data without labels
net (class) – Pytorch Model class
handler (class) – Data Handler, which can load data even without labels.
nclasses (int) – Number of unique target variables
args (dict) –
Specify optional parameters
batch_size Batch size to be used inside strategy class (int, optional)
max_iter Maximum Number of Iterations (int, optional)
Bayesian Active Learning Disagreement Dropout¶
-
class
distil.active_learning_strategies.bayesian_active_learning_disagreement_dropout.
BALDDropout
(X, Y, unlabeled_x, net, handler, nclasses, args={})[source]¶ Bases:
distil.active_learning_strategies.strategy.Strategy
Implements Bayesian Active Learning by Disagreement (BALD) Strategy 11, which assumes a Basiyan setting and selects points which maximise the mutual information between the predicted labels and model parameters. This implementation is an adaptation for a non-bayesian setting, with the assumption that there is a dropout layer in the model.
- Parameters
X (numpy array) – Present training/labeled data
y (numpy array) – Labels of present training data
unlabeled_x (numpy array) – Data without labels
net (class) – Pytorch Model class
handler (class) – Data Handler, which can load data even without labels.
nclasses (int) – Number of unique target variables
args (dict) –
Specify optional parameters
batch_size Batch size to be used inside strategy class (int, optional)
n_drop Dropout value to be used (int, optional)
KMeans Sampling¶
-
class
distil.active_learning_strategies.kmeans_sampling.
KMeansSampling
(X, Y, unlabeled_x, net, handler, nclasses, args={})[source]¶ Bases:
distil.active_learning_strategies.strategy.Strategy
Implements KMeans Sampling selection strategy, the last layer embeddings are calculated for all the unlabeled data points.Then the KMeans clustering algorithm is run over these embeddings with the number of clusters equal to the budget. Then the distance is calculated for all the points from their respective centers. From each cluster, the point closest to the center is selected to be labeled for the next iteration. Since the number of centers are equal to the budget, selecting one point from each cluster satisfies the total number of data points to be selected in one iteration.
- Parameters
X (numpy array) – Present training/labeled data
y (numpy array) – Labels of present training data
unlabeled_x (numpy array) – Data without labels
net (class) – Pytorch Model class
handler (class) – Data Handler, which can load data even without labels.
nclasses (int) – Number of unique target variables
args (dict) –
Specify optional parameters
batch_size Batch size to be used inside strategy class (int, optional)
REFERENCES¶
- 1
Jordan T. Ash, Chicheng Zhang, Akshay Krishnamurthy, John Langford, and Alekh Agarwal. Deep batch active learning by diverse, uncertain gradient lower bounds. CoRR, 2019. URL: http://arxiv.org/abs/1906.03671, arXiv:1906.03671.
- 2
Ozan Sener and Silvio Savarese. Active learning for convolutional neural networks: a core-set approach. 2018. arXiv:1708.00489.
- 3
Kai Wei, Rishabh Iyer, and Jeff Bilmes. Submodularity in data subset selection and active learning. In Francis Bach and David Blei, editors, Proceedings of the 32nd International Conference on Machine Learning, volume 37 of Proceedings of Machine Learning Research, 1954–1963. Lille, France, 07–09 Jul 2015. PMLR. URL: http://proceedings.mlr.press/v37/wei15.html.
- 4
Krishnateja Killamsetty, Durga Sivasubramanian, Ganesh Ramakrishnan, and Rishabh Iyer. Glister: generalization based data subset selection for efficient and robust learning. 2020. arXiv:2012.10630.
- 5
Rishabh Iyer, Ninad Khargonkar, Jeff Bilmes, and Himanshu Asnani. Submodular combinatorial information measures with applications in machine learning. 2021. arXiv:2006.15412.
- 6
Anirban Dasgupta, Ravi Kumar, and Sujith Ravi. Summarization through submodularity and dispersion. In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 1014–1022. Sofia, Bulgaria, August 2013. Association for Computational Linguistics. URL: https://www.aclweb.org/anthology/P13-1100.
- 7
Laming Chen, Guoxin Zhang, and Eric Zhou. Fast greedy map inference for determinantal point process to improve recommendation diversity. In S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett, editors, Advances in Neural Information Processing Systems, volume 31. Curran Associates, Inc., 2018. URL: https://proceedings.neurips.cc/paper/2018/file/dbbf603ff0e99629dda5d75b6f75f966-Paper.pdf.
- 8
Florian Tramèr, Alexey Kurakin, Nicolas Papernot, Ian Goodfellow, Dan Boneh, and Patrick McDaniel. Ensemble adversarial training: attacks and defenses. arXiv preprint arXiv:1705.07204, 2017.
- 9
Melanie Ducoffe and Frederic Precioso. Adversarial active learning for deep networks: a margin based approach. 2018. arXiv:1802.09841.
- 10
Seyed-Mohsen Moosavi-Dezfooli, Alhussein Fawzi, and Pascal Frossard. Deepfool: a simple and accurate method to fool deep neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). June 2016.
- 11
Neil Houlsby, Ferenc Huszár, Zoubin Ghahramani, and Máté Lengyel. Bayesian active learning for classification and preference learning. 2011. arXiv:1112.5745.
utils¶
DataHandler¶
-
class
distil.utils.data_handler.
DataHandler_CIFAR10
(X, Y=None, select=True, use_test_transform=False)[source]¶ Bases:
Generic
[torch.utils.data.dataset.T_co
]Data Handler to load CIFAR10 dataset. This class extends
torch.utils.data.Dataset
to handle loading data even without labels- Parameters
X (numpy array) – Data to be loaded
y (numpy array, optional) – Labels to be loaded (default: None)
select (bool) – True if loading data without labels, False otherwise
use_test_transform (bool) – True if the data handler should apply the test transform. Otherwise, the data handler will use the training transform (default: False)
-
class
distil.utils.data_handler.
DataHandler_CIFAR100
(X, Y=None, select=True, use_test_transform=False)[source]¶ Bases:
Generic
[torch.utils.data.dataset.T_co
]Data Handler to load CIFAR100 dataset. This class extends
torch.utils.data.Dataset
to handle loading data even without labels- Parameters
X (numpy array) – Data to be loaded
y (numpy array, optional) – Labels to be loaded (default: None)
select (bool) – True if loading data without labels, False otherwise
use_test_transform (bool) – True if the data handler should apply the test transform. Otherwise, the data handler will use the training transform (default: False)
-
class
distil.utils.data_handler.
DataHandler_FASHION_MNIST
(X, Y=None, select=True, use_test_transform=False)[source]¶ Bases:
Generic
[torch.utils.data.dataset.T_co
]Data Handler to load FASHION_MNIST dataset. This class extends
torch.utils.data.Dataset
to handle loading data even without labels- Parameters
X (numpy array) – Data to be loaded
y (numpy array, optional) – Labels to be loaded (default: None)
select (bool) – True if loading data without labels, False otherwise
use_test_transform (bool) – True if the data handler should apply the test transform. Otherwise, the data handler will use the training transform (default: False)
-
class
distil.utils.data_handler.
DataHandler_KMNIST
(X, Y=None, select=True, use_test_transform=False)[source]¶ Bases:
Generic
[torch.utils.data.dataset.T_co
]Data Handler to load KMNIST dataset. This class extends
torch.utils.data.Dataset
to handle loading data even without labels- Parameters
X (numpy array) – Data to be loaded
y (numpy array, optional) – Labels to be loaded (default: None)
select (bool) – True if loading data without labels, False otherwise
use_test_transform (bool) – True if the data handler should apply the test transform. Otherwise, the data handler will use the training transform (default: False)
-
class
distil.utils.data_handler.
DataHandler_MNIST
(X, Y=None, select=True, use_test_transform=False)[source]¶ Bases:
Generic
[torch.utils.data.dataset.T_co
]Data Handler to load MNIST dataset. This class extends
torch.utils.data.Dataset
to handle loading data even without labels- Parameters
X (numpy array) – Data to be loaded
y (numpy array, optional) – Labels to be loaded (default: None)
select (bool) – True if loading data without labels, False otherwise
use_test_transform (bool) – True if the data handler should apply the test transform. Otherwise, the data handler will use the training transform (default: False)
-
class
distil.utils.data_handler.
DataHandler_Points
(X, Y=None, select=True, use_test_transform=False)[source]¶ Bases:
Generic
[torch.utils.data.dataset.T_co
]Data Handler to load data points. This class extends
torch.utils.data.Dataset
to handle loading data even without labels- Parameters
X (numpy array) – Data to be loaded
y (numpy array, optional) – Labels to be loaded (default: None)
select (bool) – True if loading data without labels, False otherwise
-
class
distil.utils.data_handler.
DataHandler_STL10
(X, Y=None, select=True, use_test_transform=False)[source]¶ Bases:
Generic
[torch.utils.data.dataset.T_co
]Data Handler to load STL10 dataset. This class extends
torch.utils.data.Dataset
to handle loading data even without labels- Parameters
X (numpy array) – Data to be loaded
y (numpy array, optional) – Labels to be loaded (default: None)
select (bool) – True if loading data without labels, False otherwise
use_test_transform (bool) – True if the data handler should apply the test transform. Otherwise, the data handler will use the training transform (default: False)
-
class
distil.utils.data_handler.
DataHandler_SVHN
(X, Y=None, select=True, use_test_transform=False)[source]¶ Bases:
Generic
[torch.utils.data.dataset.T_co
]Data Handler to load SVHN dataset. This class extends
torch.utils.data.Dataset
to handle loading data even without labels- Parameters
X (numpy array) – Data to be loaded
y (numpy array, optional) – Labels to be loaded (default: None)
select (bool) – True if loading data without labels, False otherwise
use_test_transform (bool) – True if the data handler should apply the test transform. Otherwise, the data handler will use the training transform (default: False)
Dataset¶
-
distil.utils.dataset.
add_label_noise
(y_trn, num_cls, noise_ratio=0.8)[source]¶ Adds noise to the specified list of labels. This functionality is taken from CORDS and applied here.
- Parameters
y_trn (list) – The list of labels to add noise.
num_cls (int) – The number of classes possible in the list.
noise_ratio (float, optional) – The percentage of labels to modify. The default is 0.8.
- Returns
y_trn – The list of now-noisy labels
- Return type
list
-
distil.utils.dataset.
get_CIFAR10
(path, tr_load_args=None, te_load_args=None)[source]¶ Downloads CIFAR10 dataset
- Parameters
path (str) – Path to save the downloaded dataset
- Returns
X_tr (numpy array) – Train set
Y_tr (torch tensor) – Training Labels
X_te (numpy array) – Test Set
Y_te (torch tensor) – Test labels
-
distil.utils.dataset.
get_CIFAR100
(path, tr_load_args=None, te_load_args=None)[source]¶ Downloads CIFAR100 dataset
- Parameters
path (str) – Path to save the downloaded dataset
- Returns
X_tr (numpy array) – Train set
Y_tr (torch tensor) – Training Labels
X_te (numpy array) – Test Set
Y_te (torch tensor) – Test labels
-
distil.utils.dataset.
get_FASHION_MNIST
(path, tr_load_args=None, te_load_args=None)[source]¶ Downloads FASHION_MNIST dataset
- Parameters
path (str) – Path to save the downloaded dataset
- Returns
X_tr (numpy array) – Train set
Y_tr (torch tensor) – Training Labels
X_te (numpy array) – Test Set
Y_te (torch tensor) – Test labels
-
distil.utils.dataset.
get_KMNIST
(path, tr_load_args=None, te_load_args=None)[source]¶ Downloads KMNIST dataset
- Parameters
path (str) – Path to save the downloaded dataset
- Returns
X_tr (numpy array) – Train set
Y_tr (torch tensor) – Training Labels
X_te (numpy array) – Test Set
Y_te (torch tensor) – Test labels
-
distil.utils.dataset.
get_MNIST
(path, tr_load_args=None, te_load_args=None)[source]¶ Downloads MNIST dataset
- Parameters
path (str) – Path to save the downloaded dataset
- Returns
X_tr (numpy array) – Train set
Y_tr (torch tensor) – Training Labels
X_te (numpy array) – Test Set
Y_te (torch tensor) – Test labels
-
distil.utils.dataset.
get_STL10
(path, tr_load_args=None, te_load_args=None)[source]¶ Downloads STL10 dataset
- Parameters
path (str) – Path to save the downloaded dataset
- Returns
X_tr (numpy array) – Train set
Y_tr (torch tensor) – Training Labels
X_te (numpy array) – Test Set
Y_te (torch tensor) – Test labels
-
distil.utils.dataset.
get_SVHN
(path, tr_load_args=None, te_load_args=None)[source]¶ Downloads SVHN dataset
- Parameters
path (str) – Path to save the downloaded dataset
- Returns
X_tr (numpy array) – Train set
Y_tr (torch tensor) – Training Labels
X_te (numpy array) – Test Set
Y_te (torch tensor) – Test labels
-
distil.utils.dataset.
get_dataset
(name, path, tr_load_args=None, te_load_args=None)[source]¶ Loads dataset
- Parameters
name (str) – Name of the dataset to be loaded. Supports MNIST and CIFAR10
path (str) – Path to save the downloaded dataset
tr_load_args (dict) – String dictionary for train distribution shift loading
te_load_args (dict) – String dictionary for test distribution shift loading
- Returns
X_tr (numpy array) – Train set
Y_tr (torch tensor) – Training Labels
X_te (numpy array) – Test Set
Y_te (torch tensor) – Test labels
-
distil.utils.dataset.
get_imbalanced_idx
(y_trn, num_cls, class_ratio=0.6)[source]¶ Returns a list of indices of the supplied dataset that constitute a class-imbalanced subset of the supplied dataset. This functionality is taken from CORDS and applied here.
- Parameters
y_trn (numpy ndarray) – The label set to choose imbalance.
num_cls (int) – The number of classes possible in the list.
class_ratio (float, optional) – The percentage of classes to affect. The default is 0.6.
- Returns
subset_idxs – The list of indices of the supplied dataset that constitute a class-imbalanced subset
- Return type
list
-
distil.utils.dataset.
make_data_redundant
(X, Y, intial_bud, unique_points=5000, amtRed=2)[source]¶ Modifies the input dataset in such a way that only X.shape(0)/amtRed are original points and rest are repeated or redundant.
- Parameters
X (numpy ndarray) – The feature set to be made redundant.
Y (numpy ndarray) – The label set corresponding to the X.
intial_bud (int) – Number of inital points that are assumed to be labled.
unique_points (int) – Number of points to be kept unique in unlabled pool.
amtRed (float, optional) – Factor that determines redundancy. The default is 2.
- Returns
X – Modified feature set.
- Return type
numpy ndarray
Submodular Functions¶
-
class
distil.utils.submodular.
SubmodularFunction
(device, x_trn, y_trn, N_trn, batch_size, submod, selection_type)[source]¶ Bases:
distil.utils.similarity_mat.SimilarityComputation
Implementation of Submodular Function. This class allows you to use different submodular functions
- Parameters
device (str) – Device to be used, cpu|gpu
x_trn (torch tensor) – Data on which submodular optimization should be applied
y_trn (torch tensor) – Labels of the data
model (class) – Model architecture used for training
N_trn (int) – Number of samples in dataset
batch_size (int) – Batch size to be used for optimization
if_convex (bool) – If convex or not
submod (str) – Choice of submodular function - ‘facility_location’ | ‘graph_cut’ | ‘saturated_coverage’ | ‘sum_redundancy’ | ‘feature_based’
selection_type (str) – Type of selection - ‘PerClass’ | ‘Supervised’ | ‘Full’
Similarity Matrix¶
-
class
distil.utils.similarity_mat.
SimilarityComputation
(device, x_trn, y_trn, N_trn, batch_size)[source]¶ Bases:
object
Implementation of Submodular Function. This class allows you to use different submodular functions
- Parameters
device (str) – Device to be used, cpu|gpu
x_trn (torch tensor) – Data on which submodular optimization should be applied
y_trn (torch tensor) – Labels of the data
model (class) – Model architecture used for training
N_trn (int) – Number of samples in dataset
batch_size (int) – Batch size to be used for optimization
if_convex (bool) – If convex or not
submod (str) – Choice of submodular function - ‘facility_location’ | ‘graph_cut’ | ‘saturated_coverage’ | ‘sum_redundancy’ | ‘feature_based’
selection_type (str) – Type of selection - ‘PerClass’ | ‘Supervised’ | ‘Full’
-
compute_score
(idxs)[source]¶ Compute the score of the indices. :param model_params: Python dictionary object containing models parameters :type model_params: OrderedDict :param idxs: The indices :type idxs: list
distil.utils.models package¶
- We have incorporated several neural network architectures in the DISTIL repository. Below given is a list of Neural network architectures:
densenet
dla
dla_simple
dpn
efficientnet
googlenet
lenet
mobilenet
mobilenetv2
pnasnet
preact_resnet
regnet
resnet
resnext
senet
shufflenet
shufflenetv2
vgg
To use custom model architecture, modify the model architecture in the following way:
The forward method should have two more variables:
A boolean variable last which -
If *true: returns the model output and the output of the second last layer
If *false: Returns the model output.
A boolean variable ‘freeze’ which -
If *true: disables the tracking of any calculations required to later calculate a gradient i.e skips gradient calculation over the weights
If *false: otherwise
get_embedding_dim() method which returns the number of hidden units in the last layer.
Configuration Files for Training¶
This page gives a tutorial on how to generate your custom training configuration files.
This configuration files can be used to select datasets, training configuration, and active learning settings. These files are in json format.
{
"model": {
"architecture": "resnet18",
"target_classes": 10
},
"train_parameters": {
"lr": 0.001,
"batch_size": 1000,
"n_epoch": 50,
"max_accuracy": 0.95,
"isreset": true,
"islogs": true,
"logs_location": "./logs.txt"
},
"active_learning":{
"strategy": "badge",
"budget": 1000,
"rounds": 15,
"initial_points":1000,
"strategy_args":{
"batch_size" : 1000,
"lr":0.001
}
},
"dataset":{
"name":"cifar10"
}
}
The configuration files consists of following sections:
Model
Training Parameters
Active Learning Configuration
Dataset
Symbol (%) represents mandatory arguments
model
- architecture %
- Model architecture to be used, Presently it supports the below mentioned architectures.
resnet18
two_layer_net
- target_classes %
Number of output classes for prediction.
- input_dim
Input dimension of the dataset. To be mentioned while using two layer net.
- hidden_units_1
Number of hidden units to be used in the first layer. To be mentioned while using two layer net.
train_parameters
- lr %
Learning rate to be used for training.
- batch_size %
Batch size to be used for training.
- n_epoch %
Maximum number of epochs for the model to train.
- max_accuracy
Maximum training accuracy after which training should be stopped.
- isreset
- Reset weight whenever the model training starts.
True
False
- islogs
- Log training output.
True
False
- logs_location %
Location where logs should be saved.
active_learning
- strategy %
- Active learning strategy to be used.
badge
glister
entropy_sampling
margin_sampling
least_confidence
core_set
random_sampling
fass
bald_dropout
adversarial_bim
kmeans_sampling
baseline_sampling
adversarial_deepfool
- budget %
Number of points to be selected by the active learning strategy.
- rounds %
Total number of rounds to run active learning for.
- initial_points
Initial number of points to start training with.
- strategy_args
Arguments to pass to the strategy. It varies from strategy to strategy. Please refer to the documentation of the strategy that is being used.
dataset
- name
- Name of the dataset to be used. It presently supports following datasets.
cifar10
mnist
fmnist
svhn
cifar100
satimage
ijcnn1
You can refer to various configuration examples in the configs/ folders of the DISTIL repository.