Active Learning Strategies

BADGE

class distil.active_learning_strategies.badge.BADGE(X, Y, unlabeled_x, net, handler, nclasses, args)[source]

Bases: distil.active_learning_strategies.strategy.Strategy

This method is based on the paper Deep Batch Active Learning by Diverse, Uncertain Gradient Lower Bounds 1. According to the paper, this strategy, Batch Active learning by Diverse Gradient Embeddings (BADGE), samples groups of points that are disparate and high magnitude when represented in a hallucinated gradient space, a strategy designed to incorporate both predictive uncertainty and sample diversity into every selected batch. Crucially, BADGE trades off between uncertainty and diversity without requiring any hand-tuned hyperparameters. Here at each round of selection, loss gradients are computed using the hypothesised labels. Then to select the points to be labeled are selected by applying k-means++ on these loss gradients.

Parameters
  • X (numpy array) – Present training/labeled data

  • Y (numpy array) – Labels of present training data

  • unlabeled_x (numpy array) – Data without labels

  • net (class) – Pytorch Model class

  • handler (class) – Data Handler, which can load data even without labels.

  • nclasses (int) – Number of unique target variables

  • args (dict) – Specify optional parameters. batch_size Batch size to be used inside strategy class (int, optional)

select(budget)[source]

Select next set of points

Parameters

budget (int) – Number of indexes to be returned for next set

Returns

chosen – List of selected data point indexes with respect to unlabeled_x

Return type

list

select_per_batch(budget, batch_size)[source]

Select points to label by using per-batch BADGE strategy

Parameters
  • budget (int) – Number of indices to be selected from unlabeled set

  • batch_size (int) – Size of batches to form

Returns

chosen – List of selected data point indices with respect to unlabeled_x

Return type

list

Core-Set Approch

class distil.active_learning_strategies.core_set.CoreSet(X, Y, unlabeled_x, net, handler, nclasses, args={})[source]

Bases: distil.active_learning_strategies.strategy.Strategy

Implementation of CoreSet 2 Strategy. A diversity-based approach using coreset selection. The embedding of each example is computed by the network’s penultimate layer and the samples at each round are selected using a greedy furthest-first traversal conditioned on all labeled examples.

Parameters
  • X (numpy array) – Present training/labeled data

  • Y (numpy array) – Labels of present training data

  • unlabeled_x (numpy array) – Data without labels

  • net (class) – Pytorch Model class

  • handler (class) – Data Handler, which can load data even without labels.

  • nclasses (int) – Number of unique target variables

  • args (dict) –

    Specify optional parameters

    batch_size Batch size to be used inside strategy class (int, optional)

furthest_first(X, X_set, n)[source]

Selects points with maximum distance

Parameters
  • X (numpy array) – Embeddings of unlabeled set

  • X_set (numpy array) – Embeddings of labeled set

  • n (int) – Number of points to return

Returns

idxs – List of selected data point indexes with respect to unlabeled_x

Return type

list

select(budget)[source]

Select next set of points

Parameters

budget (int) – Number of indexes to be returned for next set

Returns

chosen – List of selected data point indexes with respect to unlabeled_x

Return type

list

CRAIG-ACTIVE

class distil.active_learning_strategies.craig_active.CRAIGActive(X, Y, unlabeled_x, net, criterion, handler, nclasses, lrn_rate, selection_type, linear_layer, args={})[source]

Bases: distil.active_learning_strategies.strategy.Strategy

This is an implementation of an active learning variant of CRAIG from the paper Coresets for Data-efficient Training of Machine Learning Models . This algorithm calculates hypothesized labels for each of the unlabeled points and feeds this hypothesized set to the original CRAIG algorithm. The selected points from CRAIG are used as the queried points for this algorithm.

Parameters
  • X (Numpy array) – Features of the labled set of points

  • Y (Numpy array) – Lables of the labled set of points

  • unlabeled_x (Numpy array) – Features of the unlabled set of points

  • net (class object) – Model architecture used for training. Could be instance of models defined in distil.utils.models or something similar.

  • criterion (class object) – The loss type used in training. Could be instance of torch.nn.* losses or torch functionals.

  • handler (class object) – It should be a subclass of torch.utils.data.Dataset i.e, have __getitem__ and __len__ methods implemented, so that is could be passed to pytorch DataLoader.Could be instance of handlers defined in distil.utils.DataHandler or something similar.

  • nclasses (int) – No. of classes in tha dataset

  • lrn_rate (float) – The learning rate used in training. Used by the CRAIG algorithm.

  • selection_type (string) – Should be one of “PerClass”, “Supervised”, or “PerBatch”. Selects which approximation method is used.

  • linear_layer (bool) – Sets whether to include the last linear layer parameters as part of the gradient computation.

  • args (dictionary) – This dictionary should have keys ‘batch_size’ and ‘lr’. ‘lr’ should be the learning rate used for training. ‘batch_size’ ‘batch_size’ should be such that one can exploit the benefits of tensorization while honouring the resourse constraits.

select(budget)[source]

Select next set of points

Parameters

budget (int) – Number of indexes to be returned for next set

Returns

subset_idxs – List of selected data point indexes with respect to unlabeled_x

Return type

list

Entropy Sampling

class distil.active_learning_strategies.entropy_sampling.EntropySampling(X, Y, unlabeled_x, net, handler, nclasses, args={})[source]

Bases: distil.active_learning_strategies.strategy.Strategy

Implements the Entropy Sampling Strategy, one of the most basic active learning strategies, where we select samples about which the model is most uncertain. To quantify the uncertainity we use entropy and therefore select points which have maximum entropy.

Suppose the model has nclasses output nodes and each output node is denoted by \(z_j\). Thus, \(j \in [1,nclasses]\). Then for a output node \(z_i\) from the model, the corresponding softmax would be

\[\sigma(z_i) = \frac{e^{z_i}}{\sum_j e^{z_j}}\]

Then entropy can be calculated as,

\[ENTROPY = -\sum_j \sigma(z_j)*log(\sigma(z_i))\]

The algorithm then selects budget no. of elements with highest ENTROPY.

Parameters
  • X (numpy array) – Present training/labeled data

  • y (numpy array) – Labels of present training data

  • unlabeled_x (numpy array) – Data without labels

  • net (class) – Pytorch Model class

  • handler (class) – Data Handler, which can load data even without labels.

  • nclasses (int) – Number of unique target variables

  • args (dict) –

    Specify optional parameters

    batch_size Batch size to be used inside strategy class (int, optional)

select(budget)[source]

Select next set of points

Parameters

budget (int) – Number of indexes to be returned for next set

Returns

U_idx – List of selected data point indexes with respect to unlabeled_x

Return type

list

Entropy Sampling with Dropout

class distil.active_learning_strategies.entropy_sampling_dropout.EntropySamplingDropout(X, Y, unlabeled_x, net, handler, nclasses, args={})[source]

Bases: distil.active_learning_strategies.strategy.Strategy

Implements the Entropy Sampling Strategy with dropout. Entropy Sampling Strategy is one of the most basic active learning strategies, where we select samples about which the model is most uncertain. To quantify the uncertainity we use entropy and therefore select points which have maximum entropy.

Suppose the model has nclasses output nodes and each output node is denoted by \(z_j\). Thus, \(j \in [1,nclasses]\). Then for a output node \(z_i\) from the model, the corresponding softmax would be

\[\sigma(z_i) = \frac{e^{z_i}}{\sum_j e^{z_j}}\]

Then entropy can be calculated as,

\[ENTROPY = -\sum_j \sigma(z_j)*log(\sigma(z_i))\]

The algorithm then selects budget no. of elements with highest ENTROPY.

The drop out version uses the predict probability dropout function from the base strategy class to find the hypothesised labels. User can pass n_drop argument which denotes the number of times the probabilities will be calculated. The final probability is calculated by averaging probabilities obtained in all iteraitons.

Parameters
  • X (numpy array) – Present training/labeled data

  • y (numpy array) – Labels of present training data

  • unlabeled_x (numpy array) – Data without labels

  • net (class) – Pytorch Model class

  • handler (class) – Data Handler, which can load data even without labels.

  • nclasses (int) – Number of unique target variables

  • args (dict) –

    Specify optional parameters

    batch_size Batch size to be used inside strategy class (int, optional)

    n_drop Dropout value to be used (int, optional)

select(budget)[source]

Select next set of points

Parameters

budget (int) – Number of indexes to be returned for next set

Returns

U_idx – List of selected data point indexes with respect to unlabeled_x

Return type

list

FASS

class distil.active_learning_strategies.fass.FASS(X, Y, unlabeled_x, net, handler, nclasses, args={})[source]

Bases: distil.active_learning_strategies.strategy.Strategy

Implements FASS 3 combines the uncertainty sampling method with a submodular data subset selection framework to label a subset of data points to train a classifier. Here the based on the ‘top_n’ parameter, ‘top_n*budget’ most uncertain parameters are filtered. On these filtered points one of the submodular functions viz. ‘facility_location’ , ‘graph_cut’, ‘saturated_coverage’, ‘sum_redundancy’, ‘feature_based’ is applied to get the final set of points.

We select a subset \(F\) of size \(\beta\) based on uncertainty sampling, such that \(\beta \ge k\).

Then select a subset \(S\) by solving

\[\max \{f(S) \text{ such that } |S| \leq k, S \subseteq F\}\]

where \(k\) is the is the budget and \(f\) can be one of these functions - ‘facility location’ , ‘graph cut’, ‘saturated coverage’, ‘sum redundancy’, ‘feature based’.

Parameters
  • X (numpy array) – Present training/labeled data

  • y (numpy array) – Labels of present training data

  • unlabeled_x (numpy array) – Data without labels

  • net (class) – Pytorch Model class

  • handler (class) – Data Handler, which can load data even without labels.

  • nclasses (int) – Number of unique target variables

  • args (dict) – Specify optional parameters - batch_size Batch size to be used inside strategy class (int, optional)

  • submod (str) –

  • of submodular function - 'facility_location' | 'graph_cut' | 'saturated_coverage' | 'sum_redundancy' | 'feature_based' (Choice) –

  • selection_type (str) –

  • of selection strategy - 'PerClass' | 'Supervised' (Choice) –

select(budget, top_n=5)[source]

Select next set of points

Parameters
  • budget (int) – Number of indexes to be returned for next set

  • top_n (float) – It is the multiper to the budget which decides the size of the data points on which submodular functions will be applied. For example top_n = 5, if 5*budget points will be passed to the submodular functions.

Returns

return_indices – List of selected data point indexes with respect to unlabeled_x

Return type

list

GLISTER

class distil.active_learning_strategies.glister.GLISTER(X, Y, unlabeled_x, net, handler, nclasses, args, valid, X_val=None, Y_val=None, loss_criterion=CrossEntropyLoss(), typeOf='none', lam=None, kernel_batch_size=200)[source]

Bases: distil.active_learning_strategies.strategy.Strategy

This is implementation of GLISTER-ACTIVE from the paper GLISTER: Generalization based Data Subset Selection for Efficient and Robust Learning 4. GLISTER methods tries to solve a bi-level optimisation problem.

\[\overbrace{\underset{{S \subseteq {\mathcal U}, |S| \leq k}}{\operatorname{argmin\hspace{0.7mm}}} L_V(\underbrace{\underset{\theta}{\operatorname{argmin\hspace{0.7mm}}} L_T( \theta, S)}_{inner-level}, {\mathcal V})}^{outer-level}\]

In the above equation, \(\mathcal{U}\) denotes the Data without lables i.e. unlabeled_x, \(\mathcal{V}\) denotes the validation set that guides the subset selection process, \(L_T\) denotes the training loss, \(L_V\) denotes the validation loss, \(S\) denotes the data subset selected at each round, and \(k\) is the budget. Since, solving the complete inner-optimization is expensive, GLISTER-ONLINE adopts a online one-step meta approximation where we approximate the solution to inner problem by taking a single gradient step. The optimization problem after the approximation is as follows:

\[\overbrace{\underset{{S \subseteq {\mathcal U}, |S| \leq k}}{\operatorname{argmin\hspace{0.7mm}}} L_V(\underbrace{\theta - \eta \nabla_{\theta}L_T(\theta, S)}_{inner-level}, {\mathcal V})}^{outer-level}\]

In the above equation, \(\eta\) denotes the step-size used for one-step gradient update.

Parameters
  • X (Numpy array) – Features of the labled set of points

  • Y (Numpy array) – Lables of the labled set of points

  • unlabeled_x (Numpy array) – Features of the unlabled set of points

  • net (class object) – Model architecture used for training. Could be instance of models defined in distil.utils.models or something similar.

  • handler (class object) – It should be a subclass of torch.utils.data.Dataset i.e, have __getitem__ and __len__ methods implemented, so that is could be passed to pytorch DataLoader.Could be instance of handlers defined in distil.utils.DataHandler or something similar.

  • nclasses (int) – No. of classes in tha dataset

  • args (dictionary) – This dictionary should have keys ‘batch_size’ and ‘lr’. ‘lr’ should be the learning rate used for training. ‘batch_size’ ‘batch_size’ should be such that one can exploit the benefits of tensorization while honouring the resourse constraits.

  • valid (boolean) – Whether validation set is passed or not

  • X_val (Numpy array, optional) – Features of the points in the validation set. Mandatory if valid=True.

  • Y_val (Numpy array, optional) – Lables of the points in the validation set. Mandatory if valid=True.

  • loss_criterion (class object, optional) – The type of loss criterion. Default is torch.nn.CrossEntropyLoss()

  • typeOf (str, optional) – Determines the type of regulariser to be used. Default is ‘none’. For random regulariser use ‘Rand’. To use Facility Location set functiom as a regulariser use ‘FacLoc’. To use Diversity set functiom as a regulariser use ‘Diversity’.

  • lam (float, optional) – Determines the amount of regularisation to be applied. Mandatory if is not typeOf=’none’ and by default set to None. For random regulariser use values should be between 0 and 1 as it determines fraction of points replaced by random points. For both ‘Diversity’ and ‘FacLoc’, lam determines the weightage given to them while computing the gain.

  • kernel_batch_size (int, optional) – For ‘Diversity’ and ‘FacLoc’ regualrizer versions, similarity kernel is to be computed, which entails creating a 3d torch tensor of dimenssions kernel_batch_size*kernel_batch_size* feature dimenssion.Again kernel_batch_size should be such that one can exploit the benefits of tensorization while honouring the resourse constraits.

select(budget)[source]

Select next set of points

Parameters

budget (int) – Number of indexes to be returned for next set

Returns

chosen – List of selected data point indexes with respect to unlabeled_x

Return type

list

GRADMATCH

class distil.active_learning_strategies.gradmatch_active.GradMatchActive(X, Y, unlabeled_x, net, criterion, handler, nclasses, lrn_rate, selection_type, linear_layer, args={}, valid=False, X_val=None, Y_val=None)[source]

Bases: distil.active_learning_strategies.strategy.Strategy

This is an implementation of an active learning variant of GradMatch from the paper GRAD-MATCH: A Gradient Matching Based Data Subset Selection for Efficient Learning . This algorithm solves a fixed-weight version of the error term present in the paper by a greedy selection algorithm akin to the original GradMatch’s Orthogonal Matching Pursuit. The gradients computed are on the hypothesized labels of the loss function and are matched to either the full gradient of these hypothesized examples or a supplied validation gradient. The indices returned are the ones selected by this algorithm.

\[Err(X_t, L, L_T, \theta_t) = \left |\left| \sum_{i \in X_t} \nabla_\theta L_T^i (\theta_t) - \frac{k}{N} \nabla_\theta L(\theta_t) \right | \right|\]

where,

  • Each gradient is computed with respect to the last layer’s parameters

  • \(\theta_t\) are the model parameters at selection round \(t\)

  • \(X_t\) is the queried set of points to label at selection round \(t\)

  • \(k\) is the budget

  • \(N\) is the number of points contributing to the full gradient \(\nabla_\theta L(\theta_t)\)

  • \(\nabla_\theta L(\theta_t)\) is either the complete hypothesized gradient or a validation gradient

  • \(\sum_{i \in X_t} \nabla_\theta L_T^i (\theta_t)\) is the subset’s hypothesized gradient with \(|X_t| = k\)

Parameters
  • X (Numpy array) – Features of the labled set of points

  • Y (Numpy array) – Lables of the labled set of points

  • unlabeled_x (Numpy array) – Features of the unlabled set of points

  • net (class object) – Model architecture used for training. Could be instance of models defined in distil.utils.models or something similar.

  • criterion (class object) – The loss type used in training. Could be instance of torch.nn.* losses or torch functionals.

  • handler (class object) – It should be a subclass of torch.utils.data.Dataset i.e, have __getitem__ and __len__ methods implemented, so that is could be passed to pytorch DataLoader.Could be instance of handlers defined in distil.utils.DataHandler or something similar.

  • nclasses (int) – No. of classes in tha dataset

  • lrn_rate (float) – The learning rate used in training. Used by the original GradMatch algorithm.

  • selection_type (string) – Should be one of “PerClass” or “PerBatch”. Selects which approximation method is used.

  • linear_layer (bool) – Sets whether to include the last linear layer parameters as part of the gradient computation.

  • args (dictionary) – This dictionary should have keys ‘batch_size’ and ‘lr’. ‘lr’ should be the learning rate used for training. ‘batch_size’ ‘batch_size’ should be such that one can exploit the benefits of tensorization while honouring the resourse constraits.

  • valid (boolean) – Whether validation set is passed or not

  • X_val (Numpy array, optional) – Features of the points in the validation set. Mandatory if valid=True.

  • Y_val (Numpy array, optional) – Lables of the points in the validation set. Mandatory if valid=True.

select(budget, use_weights)[source]

Select next set of points

Parameters
  • budget (int) – Number of indexes to be returned for next set

  • use_weights (bool) – Whether to use fixed-weight version (false) or OMP version (true)

Returns

subset_idxs – List of selected data point indexes with respect to unlabeled_x and, if use_weights is true, the weights associated with each point

Return type

list

Least Confidence

class distil.active_learning_strategies.least_confidence.LeastConfidence(X, Y, unlabeled_x, net, handler, nclasses, args={})[source]

Bases: distil.active_learning_strategies.strategy.Strategy

Implements the Least Confidence Sampling Strategy a active learning strategy where the algorithm selects the data points for which the model has the lowest confidence while predicting its label.

Suppose the model has nclasses output nodes denoted by \(\overrightarrow{\boldsymbol{z}}\) and each output node is denoted by \(z_j\). Thus, \(j \in [1, nclasses]\). Then for a output node \(z_i\) from the model, the corresponding softmax would be

\[\sigma(z_i) = \frac{e^{z_i}}{\sum_j e^{z_j}}\]

Then the softmax can be used pick budget no. of elements for which the model has the lowest confidence as follows,

\[\mbox{argmin}_{{S \subseteq {\mathcal U}, |S| \leq k}}{\sum_S(\mbox{argmax}_j{(\sigma(\overrightarrow{\boldsymbol{z}}))})}\]

where \(\mathcal{U}\) denotes the Data without lables i.e. unlabeled_x and \(k\) is the budget.

Parameters
  • X (numpy array) – Present training/labeled data

  • y (numpy array) – Labels of present training data

  • unlabeled_x (numpy array) – Data without labels

  • net (class) – Pytorch Model class

  • handler (class) – Data Handler, which can load data even without labels.

  • nclasses (int) – Number of unique target variables

  • args (dict) –

    Specify optional parameters

    batch_size Batch size to be used inside strategy class (int, optional)

select(budget)[source]

Select next set of points

Parameters

budget (int) – Nuber of indexes to be returned for next set

Returns

U_idx – List of selected data point indexes with respect to unlabeled_x

Return type

list

Least Confidence with Dropout

class distil.active_learning_strategies.least_confidence_dropout.LeastConfidenceDropout(X, Y, unlabeled_x, net, handler, nclasses, args={})[source]

Bases: distil.active_learning_strategies.strategy.Strategy

Implements the Least Confidence Sampling Strategy with dropout a active learning strategy where the algorithm selects the data points for which the model has the lowest confidence while predicting its label.

Suppose the model has nclasses output nodes denoted by \(\overrightarrow{\boldsymbol{z}}\) and each output node is denoted by \(z_j\). Thus, \(j \in [1, nclasses]\). Then for a output node \(z_i\) from the model, the corresponding softmax would be

\[\sigma(z_i) = \frac{e^{z_i}}{\sum_j e^{z_j}}\]

Then the softmax can be used pick budget no. of elements for which the model has the lowest confidence as follows,

\[\mbox{argmin}_{{S \subseteq {\mathcal U}, |S| \leq k}}{\sum_S(\mbox{argmax}_j{(\sigma(\overrightarrow{\boldsymbol{z}}))})}\]

where \(\mathcal{U}\) denotes the Data without lables i.e. unlabeled_x and \(k\) is the budget.

The drop out version uses the predict probability dropout function from the base strategy class to find the hypothesised labels. User can pass n_drop argument which denotes the number of times the probabilities will be calculated. The final probability is calculated by averaging probabilities obtained in all iteraitons.

Parameters
  • X (numpy array) – Present training/labeled data

  • y (numpy array) – Labels of present training data

  • unlabeled_x (numpy array) – Data without labels

  • net (class) – Pytorch Model class

  • handler (class) – Data Handler, which can load data even without labels.

  • nclasses (int) – Number of unique target variables

  • args (dict) –

    Specify optional parameters

    batch_size Batch size to be used inside strategy class (int, optional)

    n_drop Dropout value to be used (int, optional)

select(budget)[source]

Select next set of points

Parameters

budget (int) – Nuber of indexes to be returned for next set

Returns

U_idx – List of selected data point indexes with respect to unlabeled_x

Return type

list

Margin Sampling

class distil.active_learning_strategies.margin_sampling.MarginSampling(X, Y, unlabeled_x, net, handler, nclasses, args={})[source]

Bases: distil.active_learning_strategies.strategy.Strategy

Implements the Margin Sampling Strategy a active learning strategy similar to Least Confidence Sampling Strategy. While least confidence only takes into consideration the maximum probability, margin sampling considers the difference between the confidence of first and the second most probable labels.

Suppose the model has nclasses output nodes denoted by \(\overrightarrow{\boldsymbol{z}}\) and each output node is denoted by \(z_j\). Thus, \(j \in [1, nclasses]\). Then for a output node \(z_i\) from the model, the corresponding softmax would be

\[\sigma(z_i) = \frac{e^{z_i}}{\sum_j e^{z_j}}\]

Let,

\[m = \mbox{argmax}_j{(\sigma(\overrightarrow{\boldsymbol{z}}))}\]

Then using softmax, Margin Sampling Strategy would pick budget no. of elements as follows,

\[\mbox{argmin}_{{S \subseteq {\mathcal U}, |S| \leq k}}{\sum_S(\mbox{argmax}_j {(\sigma(\overrightarrow{\boldsymbol{z}}))}) - (\mbox{argmax}_{j \ne m} {(\sigma(\overrightarrow{\boldsymbol{z}}))})}\]

where \(\mathcal{U}\) denotes the Data without lables i.e. unlabeled_x and \(k\) is the budget.

Parameters
  • X (numpy array) – Present training/labeled data

  • y (numpy array) – Labels of present training data

  • unlabeled_x (numpy array) – Data without labels

  • net (class) – Pytorch Model class

  • handler (class) – Data Handler, which can load data even without labels.

  • nclasses (int) – Number of unique target variables

  • args (dict) –

    Specify optional parameters

    batch_size Batch size to be used inside strategy class (int, optional)

select(budget)[source]

Select next set of points

Parameters

budget (int) – Number of indexes to be returned for next set

Returns

U_idx – List of selected data point indexes with respect to unlabeled_x

Return type

list

Margin sampling with Dropout

class distil.active_learning_strategies.margin_sampling_dropout.MarginSamplingDropout(X, Y, unlabeled_x, net, handler, nclasses, args={})[source]

Bases: distil.active_learning_strategies.strategy.Strategy

Implements the Margin Sampling Strategy with dropout a active learning strategy similar to Least Confidence Sampling Strategy with dropout. While least confidence only takes into consideration the maximum probability, margin sampling considers the difference between the confidence of first and the second most probable labels.

Suppose the model has nclasses output nodes denoted by \(\overrightarrow{\boldsymbol{z}}\) and each output node is denoted by \(z_j\). Thus, \(j \in [1, nclasses]\). Then for a output node \(z_i\) from the model, the corresponding softmax would be

\[\sigma(z_i) = \frac{e^{z_i}}{\sum_j e^{z_j}}\]

Let,

\[m = \mbox{argmax}_j{(\sigma(\overrightarrow{\boldsymbol{z}}))}\]

Then using softmax, Margin Sampling Strategy would pick budget no. of elements as follows,

\[\mbox{argmin}_{{S \subseteq {\mathcal U}, |S| \leq k}}{\sum_S(\mbox{argmax}_j {(\sigma(\overrightarrow{\boldsymbol{z}}))}) - (\mbox{argmax}_{j \ne m} {(\sigma(\overrightarrow{\boldsymbol{z}}))})}\]

where \(\mathcal{U}\) denotes the Data without lables i.e. unlabeled_x and \(k\) is the budget.

The drop out version uses the predict probability dropout function from the base strategy class to find the hypothesised labels. User can pass n_drop argument which denotes the number of times the probabilities will be calculated. The final probability is calculated by averaging probabilities obtained in all iteraitons.

Parameters
  • X (numpy array) – Present training/labeled data

  • y (numpy array) – Labels of present training data

  • unlabeled_x (numpy array) – Data without labels

  • net (class) – Pytorch Model class

  • handler (class) – Data Handler, which can load data even without labels.

  • nclasses (int) – Number of unique target variables

  • args (dict) –

    Specify optional parameters

    batch_size Batch size to be used inside strategy class (int, optional)

    n_drop Dropout value to be used (int, optional)

select(budget)[source]

Select next set of points

Parameters

budget (int) – Number of indexes to be returned for next set

Returns

U_idx – List of selected data point indexes with respect to unlabeled_x

Return type

list

Random Sampling

class distil.active_learning_strategies.random_sampling.RandomSampling(X, Y, unlabeled_x, net, handler, nclasses, args={})[source]

Bases: distil.active_learning_strategies.strategy.Strategy

Implementation of Random Sampling Strategy. This strategy is often used as a baseline, where we pick a set of unlabeled points randomly.

Parameters
  • X (numpy array) – Present training/labeled data

  • y (numpy array) – Labels of present training data

  • unlabeled_x (numpy array) – Data without labels

  • net (class) – Pytorch Model class

  • handler (class) – Data Handler, which can load data even without labels.

  • nclasses (int) – Number of unique target variables

  • args (dict) –

    Specify optional parameters

    batch_size Batch size to be used inside strategy class (int, optional)

select(budget)[source]

Select next set of points

Parameters

budget (int) – Number of indexes to be returned for next set

Returns

rand_idx – List of selected data point indexes with respect to unlabeled_x

Return type

list

Submodular Sampling

class distil.active_learning_strategies.submod_sampling.SubmodSampling(X, Y, unlabeled_x, net, handler, nclasses, typeOf, selection_type, if_grad=False, args={}, kernel_batch_size=200)[source]

Bases: distil.active_learning_strategies.strategy.Strategy

This strategy uses one of the submodular functions viz. ‘facility_location’, ‘graph_cut’, ‘saturated_coverage’, ‘sum_redundancy’, ‘feature_based’ 5 or Disparity-sum, Disparity-min 6 or DPP 7 is used to select the points to be labeled. These techniques can be applied directly to the features/embeddings or on the gradients of the loss functions.

Parameters
  • X (Numpy array) – Features of the labled set of points

  • Y (Numpy array) – Lables of the labled set of points

  • unlabeled_x (Numpy array) – Features of the unlabled set of points

  • net (class object) – Model architecture used for training. Could be instance of models defined in distil.utils.models or something similar.

  • handler (class object) – It should be a subclass of torch.utils.data.Dataset i.e, have __getitem__ and __len__ methods implemented, so that is could be passed to pytorch DataLoader.Could be instance of handlers defined in distil.utils.DataHandler or something similar.

  • nclasses (int) – No. of classes in tha dataset

  • typeOf (str) – Choice of submodular function - ‘facility_location’ | ‘graph_cut’ | ‘saturated_coverage’ | ‘sum_redundancy’ | ‘feature_based’ | ‘Disparity-min’ | ‘Disparity-sum’ | ‘DPP’

  • selection_type (str) – selection strategy - ‘Full’ |’PerClass’ | ‘Supervised’

  • if_grad (boolean, optional) – Determines if gradients to be used for subset selection. Default is False.

  • args (dictionary) – This dictionary should have keys ‘batch_size’ and ‘lr’. ‘lr’ should be the learning rate used for training. ‘batch_size’ ‘batch_size’ should be such that one

  • kernel_batch_size (int, optional) – For ‘Diversity’ and ‘FacLoc’ regualrizer versions, similarity kernel is to be computed, which entails creating a 3d torch tensor of dimenssions kernel_batch_size*kernel_batch_size* feature dimenssion.Again kernel_batch_size should be such that one can exploit the benefits of tensorization while honouring the resourse constraits.

select(budget)[source]

Select next set of points

Parameters

budget (int) – Number of indexes to be returned for next set

Returns

chosen – List of selected data point indexes with respect to unlabeled_x

Return type

list

Adversarial BIM

class distil.active_learning_strategies.adversarial_bim.AdversarialBIM(X, Y, unlabeled_x, net, handler, nclasses, args={})[source]

Bases: distil.active_learning_strategies.strategy.Strategy

Implements Adversial Bim Strategy which is motivated by the fact that often the distance computation from decision boundary is difficult and intractable for margin based methods. This technique avoids estimating distance by using BIM(Basic Iterative Method) 8 to estimate how much adversarial perturbation is required to cross the boundary. Smaller the required the perturbation, closer the point is to the boundary.

Basic Iterative Method (BIM): Given a base input, the approach is to perturb each feature in the direction of the gradient by magnitude \(\epsilon\), where is a parameter that determines perturbation size. For a model with loss \(\nabla J(\theta, x, y)\), where \(\theta\) represents the model parameters, x is the model input, and y is the label of x, the adversarial sample is generated iteratively as,

\[ \begin{align}\begin{aligned}\begin{eqnarray} x^*_0 & = &x,\\ x^*_i & = & clip_{x,e} (x^*_{i-1} + sign(\nabla_{x^*_{i-1}} J(\theta, x^*_{i-1} , y))) \end{eqnarray}\end{aligned}\end{align} \]
Parameters
  • X (numpy array) – Present training/labeled data

  • y (numpy array) – Labels of present training data

  • unlabeled_x (numpy array) – Data without labels

  • net (class) – Pytorch Model class

  • handler (class) – Data Handler, which can load data even without labels.

  • nclasses (int) – Number of unique target variables

  • args (dict) –

    Specify optional parameters

    batch_size- Batch size to be used inside strategy class (int, optional)

    eps-epsilon value for gradients

select(budget)[source]

Selects next set of points

Parameters

budget (int) – Number of indexes to be returned for next set

Returns

idxs – List of selected data point indexes with respect to unlabeled_x

Return type

list

Adversarial DeepFool

class distil.active_learning_strategies.adversarial_deepfool.AdversarialDeepFool(X, Y, unlabeled_x, net, handler, nclasses, args={})[source]

Bases: distil.active_learning_strategies.strategy.Strategy

Implements Adversial Deep Fool Strategy 9, a Deep-Fool based Active Learning strategy that selects unlabeled samples with the smallest adversarial perturbation. This technique is motivated by the fact that often the distance computation from decision boundary is difficult and intractable for margin-based methods. This technique avoids estimating distance by using Deep-Fool 10 like techniques to estimate how much adversarial perturbation is required to cross the boundary. The smaller the required perturbation, the closer the point is to the boundary.

Parameters
  • X (numpy array) – Present training/labeled data

  • y (numpy array) – Labels of present training data

  • unlabeled_x (numpy array) – Data without labels

  • net (class) – Pytorch Model class

  • handler (class) – Data Handler, which can load data even without labels.

  • nclasses (int) – Number of unique target variables

  • args (dict) –

    Specify optional parameters

    batch_size Batch size to be used inside strategy class (int, optional)

    max_iter Maximum Number of Iterations (int, optional)

select(budget)[source]

Select next set of points

Parameters

budget (int) – Number of indexes to be returned for next set

Returns

idxs – List of selected data point indexes with respect to unlabeled_x

Return type

list

Bayesian Active Learning Disagreement Dropout

class distil.active_learning_strategies.bayesian_active_learning_disagreement_dropout.BALDDropout(X, Y, unlabeled_x, net, handler, nclasses, args={})[source]

Bases: distil.active_learning_strategies.strategy.Strategy

Implements Bayesian Active Learning by Disagreement (BALD) Strategy 11, which assumes a Basiyan setting and selects points which maximise the mutual information between the predicted labels and model parameters. This implementation is an adaptation for a non-bayesian setting, with the assumption that there is a dropout layer in the model.

Parameters
  • X (numpy array) – Present training/labeled data

  • y (numpy array) – Labels of present training data

  • unlabeled_x (numpy array) – Data without labels

  • net (class) – Pytorch Model class

  • handler (class) – Data Handler, which can load data even without labels.

  • nclasses (int) – Number of unique target variables

  • args (dict) –

    Specify optional parameters

    batch_size Batch size to be used inside strategy class (int, optional)

    n_drop Dropout value to be used (int, optional)

select(budget)[source]

Select next set of points

Parameters

budget (int) – Number of indexes to be returned for next set

Returns

idxs – List of selected data point indexes with respect to unlabeled_x

Return type

list

KMeans Sampling

class distil.active_learning_strategies.kmeans_sampling.KMeansSampling(X, Y, unlabeled_x, net, handler, nclasses, args={})[source]

Bases: distil.active_learning_strategies.strategy.Strategy

Implements KMeans Sampling selection strategy, the last layer embeddings are calculated for all the unlabeled data points.Then the KMeans clustering algorithm is run over these embeddings with the number of clusters equal to the budget. Then the distance is calculated for all the points from their respective centers. From each cluster, the point closest to the center is selected to be labeled for the next iteration. Since the number of centers are equal to the budget, selecting one point from each cluster satisfies the total number of data points to be selected in one iteration.

Parameters
  • X (numpy array) – Present training/labeled data

  • y (numpy array) – Labels of present training data

  • unlabeled_x (numpy array) – Data without labels

  • net (class) – Pytorch Model class

  • handler (class) – Data Handler, which can load data even without labels.

  • nclasses (int) – Number of unique target variables

  • args (dict) –

    Specify optional parameters

    batch_size Batch size to be used inside strategy class (int, optional)

select(budget)[source]

Select next set of points

Parameters

budget (int) – Nuber of indexes to be returned for next set

Returns

q_idxs – List of selected data point indexes with respect to unlabeled_x

Return type

list

REFERENCES

1

Jordan T. Ash, Chicheng Zhang, Akshay Krishnamurthy, John Langford, and Alekh Agarwal. Deep batch active learning by diverse, uncertain gradient lower bounds. CoRR, 2019. URL: http://arxiv.org/abs/1906.03671, arXiv:1906.03671.

2

Ozan Sener and Silvio Savarese. Active learning for convolutional neural networks: a core-set approach. 2018. arXiv:1708.00489.

3

Kai Wei, Rishabh Iyer, and Jeff Bilmes. Submodularity in data subset selection and active learning. In Francis Bach and David Blei, editors, Proceedings of the 32nd International Conference on Machine Learning, volume 37 of Proceedings of Machine Learning Research, 1954–1963. Lille, France, 07–09 Jul 2015. PMLR. URL: http://proceedings.mlr.press/v37/wei15.html.

4

Krishnateja Killamsetty, Durga Sivasubramanian, Ganesh Ramakrishnan, and Rishabh Iyer. Glister: generalization based data subset selection for efficient and robust learning. 2020. arXiv:2012.10630.

5

Rishabh Iyer, Ninad Khargonkar, Jeff Bilmes, and Himanshu Asnani. Submodular combinatorial information measures with applications in machine learning. 2021. arXiv:2006.15412.

6

Anirban Dasgupta, Ravi Kumar, and Sujith Ravi. Summarization through submodularity and dispersion. In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 1014–1022. Sofia, Bulgaria, August 2013. Association for Computational Linguistics. URL: https://www.aclweb.org/anthology/P13-1100.

7

Laming Chen, Guoxin Zhang, and Eric Zhou. Fast greedy map inference for determinantal point process to improve recommendation diversity. In S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett, editors, Advances in Neural Information Processing Systems, volume 31. Curran Associates, Inc., 2018. URL: https://proceedings.neurips.cc/paper/2018/file/dbbf603ff0e99629dda5d75b6f75f966-Paper.pdf.

8

Florian Tramèr, Alexey Kurakin, Nicolas Papernot, Ian Goodfellow, Dan Boneh, and Patrick McDaniel. Ensemble adversarial training: attacks and defenses. arXiv preprint arXiv:1705.07204, 2017.

9

Melanie Ducoffe and Frederic Precioso. Adversarial active learning for deep networks: a margin based approach. 2018. arXiv:1802.09841.

10

Seyed-Mohsen Moosavi-Dezfooli, Alhussein Fawzi, and Pascal Frossard. Deepfool: a simple and accurate method to fool deep neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). June 2016.

11

Neil Houlsby, Ferenc Huszár, Zoubin Ghahramani, and Máté Lengyel. Bayesian active learning for classification and preference learning. 2011. arXiv:1112.5745.