Active Learning Strategies

Adversarial BIM

class distil.active_learning_strategies.adversarial_bim.AdversarialBIM(labeled_dataset, unlabeled_dataset, net, nclasses, args={})[source]

Bases: Strategy

Implements Adversial Bim Strategy which is motivated by the fact that often the distance computation from decision boundary is difficult and intractable for margin based methods. This technique avoids estimating distance by using BIM(Basic Iterative Method) 1 to estimate how much adversarial perturbation is required to cross the boundary. Smaller the required the perturbation, closer the point is to the boundary.

Basic Iterative Method (BIM): Given a base input, the approach is to perturb each feature in the direction of the gradient by magnitude \(\epsilon\), where is a parameter that determines perturbation size. For a model with loss \(\nabla J(\theta, x, y)\), where \(\theta\) represents the model parameters, x is the model input, and y is the label of x, the adversarial sample is generated iteratively as,

\[\begin{eqnarray} x^*_0 & = &x, x^*_i & = & clip_{x,e} (x^*_{i-1} + sign(\nabla_{x^*_{i-1}} J(\theta, x^*_{i-1} , y))) \end{eqnarray}\]
Parameters
  • labeled_dataset (torch.utils.data.Dataset) – The labeled training dataset

  • unlabeled_dataset (torch.utils.data.Dataset) – The unlabeled pool dataset

  • net (torch.nn.Module) – The deep model to use

  • nclasses (int) – Number of unique values for the target

  • args (dict) –

    Specify additional parameters

    • batch_size: Batch size to be used inside strategy class (int, optional)

    • device: The device that this strategy class should use for computation (string, optional)

    • loss: The loss that should be used for relevant computations (typing.Callable[[torch.Tensor, torch.Tensor], torch.Tensor], optional)

    • eps: Epsilon value for gradients (float, optional)

    • verbose: Whether to print more output (bool, optional)

select(budget)[source]

Selects next set of points

Parameters

budget (int) – Number of data points to select for labeling

Returns

idxs – List of selected data point indices with respect to unlabeled_dataset

Return type

list

Adversarial DeepFool

class distil.active_learning_strategies.adversarial_deepfool.AdversarialDeepFool(labeled_dataset, unlabeled_dataset, net, nclasses, args={})[source]

Bases: Strategy

Implements Adversial Deep Fool Strategy 2, a Deep-Fool based Active Learning strategy that selects unlabeled samples with the smallest adversarial perturbation. This technique is motivated by the fact that often the distance computation from decision boundary is difficult and intractable for margin-based methods. This technique avoids estimating distance by using Deep-Fool 3 like techniques to estimate how much adversarial perturbation is required to cross the boundary. The smaller the required perturbation, the closer the point is to the boundary.

Parameters
  • labeled_dataset (torch.utils.data.Dataset) – The labeled training dataset

  • unlabeled_dataset (torch.utils.data.Dataset) – The unlabeled pool dataset

  • net (torch.nn.Module) – The deep model to use

  • nclasses (int) – Number of unique values for the target

  • args (dict) –

    Specify additional parameters

    • batch_size: The batch size used internally for torch.utils.data.DataLoader objects. (int, optional)

    • device: The device to be used for computation. PyTorch constructs are transferred to this device. Usually is one of ‘cuda’ or ‘cpu’. (string, optional)

    • loss: The loss function to be used in computations. (typing.Callable[[torch.Tensor, torch.Tensor], torch.Tensor], optional)

    • max_iter: Maximum Number of Iterations (int, optional)

select(budget)[source]

Selects next set of points

Parameters

budget (int) – Number of data points to select for labeling

Returns

idxs – List of selected data point indices with respect to unlabeled_dataset

Return type

list

BADGE

class distil.active_learning_strategies.badge.BADGE(labeled_dataset, unlabeled_dataset, net, nclasses, args={})[source]

Bases: Strategy

This method is based on the paper Deep Batch Active Learning by Diverse, Uncertain Gradient Lower Bounds 4. According to the paper, this strategy, Batch Active learning by Diverse Gradient Embeddings (BADGE), samples groups of points that are disparate and high magnitude when represented in a hallucinated gradient space, a strategy designed to incorporate both predictive uncertainty and sample diversity into every selected batch. Crucially, BADGE trades off between uncertainty and diversity without requiring any hand-tuned hyperparameters. Here at each round of selection, loss gradients are computed using the hypothesised labels. Then to select the points to be labeled are selected by applying k-means++ on these loss gradients.

Parameters
  • labeled_dataset (torch.utils.data.Dataset) – The labeled training dataset

  • unlabeled_dataset (torch.utils.data.Dataset) – The unlabeled pool dataset

  • net (torch.nn.Module) – The deep model to use

  • nclasses (int) – Number of unique values for the target

  • args (dict) –

    Specify additional parameters

    • batch_size: The batch size used internally for torch.utils.data.DataLoader objects. (int, optional)

    • device: The device to be used for computation. PyTorch constructs are transferred to this device. Usually is one of ‘cuda’ or ‘cpu’. (string, optional)

    • loss: The loss function to be used in computations. (typing.Callable[[torch.Tensor, torch.Tensor], torch.Tensor], optional)

select(budget)[source]

Selects next set of points

Parameters

budget (int) – Number of data points to select for labeling

Returns

idxs – List of selected data point indices with respect to unlabeled_dataset

Return type

list

BatchBALD

class distil.active_learning_strategies.batch_bald.BatchBALDDropout(labeled_dataset, unlabeled_dataset, net, nclasses, args={})[source]

Bases: Strategy

Implementation of BatchBALD Strategy 5, which refines the original BALD acquisition to the batch setting using a new acquisition function. This class extends active_learning_strategies.strategy.Strategy to include a MC sampling technique based on the sampling techniques used in their paper.

Parameters
  • labeled_dataset (torch.utils.data.Dataset) – The labeled training dataset

  • unlabeled_dataset (torch.utils.data.Dataset) – The unlabeled pool dataset

  • net (torch.nn.Module) – The deep model to use

  • nclasses (int) – Number of unique values for the target

  • args (dict) –

    Specify additional parameters

    • batch_size: The batch size used internally for torch.utils.data.DataLoader objects. (int, optional)

    • device: The device to be used for computation. PyTorch constructs are transferred to this device. Usually is one of ‘cuda’ or ‘cpu’. (string, optional)

    • loss: The loss function to be used in computations. (typing.Callable[[torch.Tensor, torch.Tensor], torch.Tensor], optional)

    • n_drop: Number of dropout runs to use to generate MC samples (int, optional)

    • n_samples: Number of samples to use in computing joint entropy (int, optional)

select(budget)[source]

Selects next set of points

Parameters

budget (int) – Number of data points to select for labeling

Returns

idxs – List of selected data point indices with respect to unlabeled_dataset

Return type

list

Bayesian Active Learning Disagreement Dropout

class distil.active_learning_strategies.bayesian_active_learning_disagreement_dropout.BALDDropout(labeled_dataset, unlabeled_dataset, net, nclasses, args={})[source]

Bases: ScoreStreamingStrategy

Implements Bayesian Active Learning by Disagreement (BALD) Strategy 6, which assumes a Basiyan setting and selects points which maximise the mutual information between the predicted labels and model parameters. This implementation is an adaptation for a non-bayesian setting, with the assumption that there is a dropout layer in the model.

Parameters
  • labeled_dataset (torch.utils.data.Dataset) – The labeled training dataset

  • unlabeled_dataset (torch.utils.data.Dataset) – The unlabeled pool dataset

  • net (torch.nn.Module) – The deep model to use

  • nclasses (int) – Number of unique values for the target

  • args (dict) –

    Specify additional parameters

    • batch_size: The batch size used internally for torch.utils.data.DataLoader objects. (int, optional)

    • device: The device to be used for computation. PyTorch constructs are transferred to this device. Usually is one of ‘cuda’ or ‘cpu’. (string, optional)

    • loss: The loss function to be used in computations. (typing.Callable[[torch.Tensor, torch.Tensor], torch.Tensor], optional)

    • n_drop: Number of dropout runs to use to generate MC samples (int, optional)

select(budget)

Selects next set of points

Parameters

budget (int) – Number of data points to select for labeling

Returns

idxs – List of selected data point indices with respect to unlabeled_dataset

Return type

list

Core-Set Approch

class distil.active_learning_strategies.core_set.CoreSet(labeled_dataset, unlabeled_dataset, net, nclasses, args={})[source]

Bases: Strategy

Implementation of CoreSet 7 Strategy. A diversity-based approach using coreset selection. The embedding of each example is computed by the network’s penultimate layer and the samples at each round are selected using a greedy furthest-first traversal conditioned on all labeled examples.

Parameters
  • labeled_dataset (torch.utils.data.Dataset) – The labeled training dataset

  • unlabeled_dataset (torch.utils.data.Dataset) – The unlabeled pool dataset

  • net (torch.nn.Module) – The deep model to use

  • nclasses (int) – Number of unique values for the target

  • args (dict) –

    Specify additional parameters

    • batch_size: The batch size used internally for torch.utils.data.DataLoader objects. (int, optional)

    • device: The device to be used for computation. PyTorch constructs are transferred to this device. Usually is one of ‘cuda’ or ‘cpu’. (string, optional)

    • loss: The loss function to be used in computations. (typing.Callable[[torch.Tensor, torch.Tensor], torch.Tensor], optional)

select(budget)[source]

Selects next set of points

Parameters

budget (int) – Number of data points to select for labeling

Returns

idxs – List of selected data point indices with respect to unlabeled_dataset

Return type

list

Entropy Sampling

class distil.active_learning_strategies.entropy_sampling.EntropySampling(labeled_dataset, unlabeled_dataset, net, nclasses, args={})[source]

Bases: ScoreStreamingStrategy

Implements the Entropy Sampling Strategy, one of the most basic active learning strategies, where we select samples about which the model is most uncertain. To quantify the uncertainity we use entropy and therefore select points which have maximum entropy. Suppose the model has nclasses output nodes and each output node is denoted by \(z_j\). Thus, \(j \in [1,nclasses]\). Then for a output node \(z_i\) from the model, the corresponding softmax would be

\[\sigma(z_i) = \frac{e^{z_i}}{\sum_j e^{z_j}}\]

Then entropy can be calculated as,

\[ENTROPY = -\sum_j \sigma(z_j)*\log(\sigma(z_j))\]

The algorithm then selects budget no. of elements with highest ENTROPY.

Parameters
  • labeled_dataset (torch.utils.data.Dataset) – The labeled training dataset

  • unlabeled_dataset (torch.utils.data.Dataset) – The unlabeled pool dataset

  • net (torch.nn.Module) – The deep model to use

  • nclasses (int) – Number of unique values for the target

  • args (dict) –

    Specify additional parameters

    • batch_size: The batch size used internally for torch.utils.data.DataLoader objects. (int, optional)

    • device: The device to be used for computation. PyTorch constructs are transferred to this device. Usually is one of ‘cuda’ or ‘cpu’. (string, optional)

    • loss: The loss function to be used in computations. (typing.Callable[[torch.Tensor, torch.Tensor], torch.Tensor], optional)

select(budget)

Selects next set of points

Parameters

budget (int) – Number of data points to select for labeling

Returns

idxs – List of selected data point indices with respect to unlabeled_dataset

Return type

list

Entropy Sampling with Dropout

class distil.active_learning_strategies.entropy_sampling_dropout.EntropySamplingDropout(labeled_dataset, unlabeled_dataset, net, nclasses, args={})[source]

Bases: ScoreStreamingStrategy

Implements the Entropy Sampling Strategy with dropout. Entropy Sampling Strategy is one of the most basic active learning strategies, where we select samples about which the model is most uncertain. To quantify the uncertainity we use entropy and therefore select points which have maximum entropy. Suppose the model has nclasses output nodes and each output node is denoted by \(z_j\). Thus, \(j \in [1,nclasses]\). Then for a output node \(z_i\) from the model, the corresponding softmax would be

\[\sigma(z_i) = \frac{e^{z_i}}{\sum_j e^{z_j}}\]

Then entropy can be calculated as,

\[ENTROPY = -\sum_j \sigma(z_j)*\log(\sigma(z_j))\]

The algorithm then selects budget no. of elements with highest ENTROPY.

The drop out version uses the predict probability dropout function from the base strategy class to find the hypothesised labels. User can pass n_drop argument which denotes the number of times the probabilities will be calculated. The final probability is calculated by averaging probabilities obtained in all iteraitons.

Parameters
  • labeled_dataset (torch.utils.data.Dataset) – The labeled training dataset

  • unlabeled_dataset (torch.utils.data.Dataset) – The unlabeled pool dataset

  • net (torch.nn.Module) – The deep model to use

  • nclasses (int) – Number of unique values for the target

  • args (dict) –

    Specify additional parameters

    • batch_size: The batch size used internally for torch.utils.data.DataLoader objects. (int, optional)

    • device: The device to be used for computation. PyTorch constructs are transferred to this device. Usually is one of ‘cuda’ or ‘cpu’. (string, optional)

    • loss: The loss function to be used in computations. (typing.Callable[[torch.Tensor, torch.Tensor], torch.Tensor], optional)

    • n_drop: Number of dropout runs (int, optional)

select(budget)

Selects next set of points

Parameters

budget (int) – Number of data points to select for labeling

Returns

idxs – List of selected data point indices with respect to unlabeled_dataset

Return type

list

FASS

class distil.active_learning_strategies.fass.FASS(labeled_dataset, unlabeled_dataset, net, nclasses, args={})[source]

Bases: Strategy

Implements FASS 8 combines the uncertainty sampling method with a submodular data subset selection framework to label a subset of data points to train a classifier. Here the based on the ‘top_n’ parameter, ‘top_n*budget’ most uncertain parameters are filtered. On these filtered points one of the submodular functions viz. ‘facility_location’ , ‘feature_based’, ‘graph_cut’, ‘log_determinant’, ‘disparity_min’, ‘disparity_sum’ is applied to get the final set of points. We select a subset \(F\) of size \(\beta\) based on uncertainty sampling, such that \(\beta \ge k\).

Then select a subset \(S\) by solving

\[\max \{f(S) \text{ such that } |S| \leq k, S \subseteq F\}\]

where \(k\) is the is the budget and \(f\) can be one of these functions - ‘facility_location’ , ‘feature_based’, ‘graph_cut’, ‘log_determinant’, ‘disparity_min’, ‘disparity_sum’.

Parameters
  • labeled_dataset (torch.utils.data.Dataset) – The labeled training dataset

  • unlabeled_dataset (torch.utils.data.Dataset) – The unlabeled pool dataset

  • net (torch.nn.Module) – The deep model to use

  • nclasses (int) – Number of unique values for the target

  • args (dict) –

    Specify additional parameters

    • batch_size: The batch size used internally for torch.utils.data.DataLoader objects. (int, optional)

    • device: The device to be used for computation. PyTorch constructs are transferred to this device. Usually is one of ‘cuda’ or ‘cpu’. (string, optional)

    • loss: The loss function to be used in computations. (typing.Callable[[torch.Tensor, torch.Tensor], torch.Tensor], optional)

    • submod_args: Parameters for the submodular selection as described in SubmodularSampling (dict, optional)

    • uncertainty_measure: Describes which measure of uncertainty should be used. This should be one of ‘entropy’, ‘least_confidence’, or ‘margin’ (string, optional)

select(budget, top_n=5)[source]

Selects next set of points

Parameters
  • budget (int) – Number of data points to select for labeling

  • top_n (int, optional) – Number of slices of size budget to include in filtered subset

Returns

idxs – List of selected data point indices with respect to unlabeled_dataset

Return type

list

GLISTER

class distil.active_learning_strategies.glister.GLISTER(labeled_dataset, unlabeled_dataset, net, nclasses, args={}, validation_dataset=None, typeOf='none', lam=None, kernel_batch_size=200)[source]

Bases: Strategy

This is implementation of GLISTER-ACTIVE from the paper GLISTER: Generalization based Data Subset Selection for Efficient and Robust Learning 9. GLISTER methods tries to solve a bi-level optimisation problem.

\[\overbrace{\underset{{S \subseteq {\mathcal U}, |S| \leq k}}{\operatorname{argmin\hspace{0.7mm}}} L_V(\underbrace{\underset{\theta}{\operatorname{argmin\hspace{0.7mm}}} L_T( \theta, S)}_{inner-level}, {\mathcal V})}^{outer-level}\]

In the above equation, \(\mathcal{U}\) denotes the Data without lables i.e. unlabeled_x, \(\mathcal{V}\) denotes the validation set that guides the subset selection process, \(L_T\) denotes the training loss, \(L_V\) denotes the validation loss, \(S\) denotes the data subset selected at each round, and \(k\) is the budget. Since, solving the complete inner-optimization is expensive, GLISTER-ONLINE adopts a online one-step meta approximation where we approximate the solution to inner problem by taking a single gradient step. The optimization problem after the approximation is as follows:

\[\overbrace{\underset{{S \subseteq {\mathcal U}, |S| \leq k}}{\operatorname{argmin\hspace{0.7mm}}} L_V(\underbrace{\theta - \eta \nabla_{\theta}L_T(\theta, S)}_{inner-level}, {\mathcal V})}^{outer-level}\]

In the above equation, \(\eta\) denotes the step-size used for one-step gradient update.

Parameters
  • labeled_dataset (torch.utils.data.Dataset) – The labeled training dataset

  • unlabeled_dataset (torch.utils.data.Dataset) – The unlabeled pool dataset

  • net (torch.nn.Module) – The deep model to use

  • nclasses (int) – Number of unique values for the target

  • args (dict) –

    Specify additional parameters

    • batch_size: The batch size used internally for torch.utils.data.DataLoader objects. (int, optional)

    • device: The device to be used for computation. PyTorch constructs are transferred to this device. Usually is one of ‘cuda’ or ‘cpu’. (string, optional)

    • loss: The loss function to be used in computations. (typing.Callable[[torch.Tensor, torch.Tensor], torch.Tensor], optional)

    • lr: The learning rate used for training (float)

  • validation_dataset (torch.utils.data.Dataset) – The validation dataset to be used in GLISTER objective

  • typeOf (str, optional) – Determines the type of regulariser to be used. Default is ‘none’. For random regulariser use ‘Rand’. To use Facility Location set functiom as a regulariser use ‘FacLoc’. To use Diversity set functiom as a regulariser use ‘Diversity’.

  • lam (float, optional) – Determines the amount of regularisation to be applied. Mandatory if is not typeOf=’none’ and by default set to None. For random regulariser use values should be between 0 and 1 as it determines fraction of points replaced by random points. For both ‘Diversity’ and ‘FacLoc’, lam determines the weightage given to them while computing the gain.

  • kernel_batch_size (int, optional) – For ‘Diversity’ and ‘FacLoc’ regualrizer versions, similarity kernel is to be computed, which entails creating a 3d torch tensor of dimenssions kernel_batch_size*kernel_batch_size* feature dimenssion.Again kernel_batch_size should be such that one can exploit the benefits of tensorization while honouring the resourse constraits.

select(budget)[source]

Selects next set of points

Parameters

budget (int) – Number of data points to select for labeling

Returns

idxs – List of selected data point indices with respect to unlabeled_dataset

Return type

list

GRADMATCH

class distil.active_learning_strategies.gradmatch_active.GradMatchActive(labeled_dataset, unlabeled_dataset, net, nclasses, args={}, validation_dataset=None)[source]

Bases: Strategy

This is an implementation of an active learning variant of GradMatch from the paper GRAD-MATCH: A Gradient Matching Based Data Subset Selection for Efficient Learning 10. This algorithm solves a fixed-weight version of the error term present in the paper by a greedy selection algorithm akin to the original GradMatch’s Orthogonal Matching Pursuit. The gradients computed are on the hypothesized labels of the loss function and are matched to either the full gradient of these hypothesized examples or a supplied validation gradient. The indices returned are the ones selected by this algorithm.

\[Err(X_t, L, L_T, \theta_t) = \left |\left| \sum_{i \in X_t} \nabla_\theta L_T^i (\theta_t) - \frac{k}{N} \nabla_\theta L(\theta_t) \right | \right|\]

where,

  • Each gradient is computed with respect to the last layer’s parameters

  • \(\theta_t\) are the model parameters at selection round \(t\)

  • \(X_t\) is the queried set of points to label at selection round \(t\)

  • \(k\) is the budget

  • \(N\) is the number of points contributing to the full gradient \(\nabla_\theta L(\theta_t)\)

  • \(\nabla_\theta L(\theta_t)\) is either the complete hypothesized gradient or a validation gradient

  • \(\sum_{i \in X_t} \nabla_\theta L_T^i (\theta_t)\) is the subset’s hypothesized gradient with \(|X_t| = k\)

Parameters
  • labeled_dataset (torch.utils.data.Dataset) – The labeled training dataset

  • unlabeled_dataset (torch.utils.data.Dataset) – The unlabeled pool dataset

  • net (torch.nn.Module) – The deep model to use

  • nclasses (int) – Number of unique values for the target

  • args (dict) –

    Specify additional parameters

    • batch_size: The batch size used internally for torch.utils.data.DataLoader objects. (int, optional)

    • device: The device to be used for computation. PyTorch constructs are transferred to this device. Usually is one of ‘cuda’ or ‘cpu’. (string, optional)

    • loss: The loss function to be used in computations. (typing.Callable[[torch.Tensor, torch.Tensor], torch.Tensor], optional)

    • grad_embedding: The type of gradient embedding that should be used (string, optional)

    • omp_reg: The regularization constant to use in GradMatch objective

  • validation_dataset (torch.utils.data.Dataset, optional) – The validation dataset to use in GradMatch objective

select(budget, use_weights=False)[source]

Selects next set of points

Parameters
  • budget (int) – Number of data points to select for labeling

  • use_weights (bool) – Whether to use fixed-weight version (false) or OMP version (true)

Returns

idxs – List of selected data point indices with respect to unlabeled_dataset

Return type

list

KMeans Sampling

class distil.active_learning_strategies.kmeans_sampling.KMeansSampling(labeled_dataset, unlabeled_dataset, net, nclasses, args={})[source]

Bases: Strategy

Implements KMeans Sampling selection strategy, the last layer embeddings are calculated for all the unlabeled data points. Then the KMeans clustering algorithm is run over these embeddings with the number of clusters equal to the budget. Then the distance is calculated for all the points from their respective centers. From each cluster, the point closest to the center is selected to be labeled for the next iteration. Since the number of centers are equal to the budget, selecting one point from each cluster satisfies the total number of data points to be selected in one iteration.

Parameters
  • labeled_dataset (torch.utils.data.Dataset) – The labeled training dataset

  • unlabeled_dataset (torch.utils.data.Dataset) – The unlabeled pool dataset

  • net (torch.nn.Module) – The deep model to use

  • nclasses (int) – Number of unique values for the target

  • args (dict) –

    Specify additional parameters

    • batch_size: Batch size to be used inside strategy class (int, optional)

    • device: The device that this strategy class should use for computation (string, optional)

    • loss: The loss that should be used for relevant computations (typing.Callable[[torch.Tensor, torch.Tensor], torch.Tensor], optional)

    • rand_seed: Specifies a seed for the random seed generator used in initialization (int, optional)

    • representation: Specifies whether to use the last linear layer embeddings or the raw data. Must be one of ‘linear’ or ‘raw’ (string, optional)

    • kmeans_args: Specifies additional kmeans-related parameters

      • tol: Specifies the value of the Frobenius norm of the inertia tensor by which kmeans should cease (float, optional)

      • max_iter: Specifies the maximum number of iterations that kmeans should use before terminating (int, optional)

      • n_init: Specifies the number of kmeans run-throughs to use, wherein the one with the smallest inertia is selected for the selection phase (int, optional)

select(budget)[source]

Selects next set of points

Parameters

budget (int) – Number of data points to select for labeling

Returns

idxs – List of selected data point indices with respect to unlabeled_dataset

Return type

list

Least Confidence

class distil.active_learning_strategies.least_confidence_sampling.LeastConfidenceSampling(labeled_dataset, unlabeled_dataset, net, nclasses, args={})[source]

Bases: ScoreStreamingStrategy

Implements the Least Confidence Sampling Strategy a active learning strategy where the algorithm selects the data points for which the model has the lowest confidence while predicting its label.

Suppose the model has nclasses output nodes denoted by \(\overrightarrow{\boldsymbol{z}}\) and each output node is denoted by \(z_j\). Thus, \(j \in [1, nclasses]\). Then for a output node \(z_i\) from the model, the corresponding softmax would be

\[\sigma(z_i) = \frac{e^{z_i}}{\sum_j e^{z_j}}\]

Then the softmax can be used pick budget no. of elements for which the model has the lowest confidence as follows,

\[\mbox{argmin}_{{S \subseteq {\mathcal U}, |S| \leq k}}{\sum_S(\mbox{argmax}_j{(\sigma(\overrightarrow{\boldsymbol{z}}))})}\]

where \(\mathcal{U}\) denotes the Data without lables i.e. unlabeled_x and \(k\) is the budget.

Parameters
  • labeled_dataset (torch.utils.data.Dataset) – The labeled training dataset

  • unlabeled_dataset (torch.utils.data.Dataset) – The unlabeled pool dataset

  • net (torch.nn.Module) – The deep model to use

  • nclasses (int) – Number of unique values for the target

  • args (dict) –

    Specify additional parameters

    • batch_size: The batch size used internally for torch.utils.data.DataLoader objects. (int, optional)

    • device: The device to be used for computation. PyTorch constructs are transferred to this device. Usually is one of ‘cuda’ or ‘cpu’. (string, optional)

    • loss: The loss function to be used in computations. (typing.Callable[[torch.Tensor, torch.Tensor], torch.Tensor], optional)

select(budget)

Selects next set of points

Parameters

budget (int) – Number of data points to select for labeling

Returns

idxs – List of selected data point indices with respect to unlabeled_dataset

Return type

list

Least Confidence with Dropout

class distil.active_learning_strategies.least_confidence_sampling_dropout.LeastConfidenceSamplingDropout(labeled_dataset, unlabeled_dataset, net, nclasses, args={})[source]

Bases: ScoreStreamingStrategy

Implements the Least Confidence Sampling Strategy with dropout a active learning strategy where the algorithm selects the data points for which the model has the lowest confidence while predicting its label.

Suppose the model has nclasses output nodes denoted by \(\overrightarrow{\boldsymbol{z}}\) and each output node is denoted by \(z_j\). Thus, \(j \in [1, nclasses]\). Then for a output node \(z_i\) from the model, the corresponding softmax would be

\[\sigma(z_i) = \frac{e^{z_i}}{\sum_j e^{z_j}}\]

Then the softmax can be used pick budget no. of elements for which the model has the lowest confidence as follows,

\[\mbox{argmin}_{{S \subseteq {\mathcal U}, |S| \leq k}}{\sum_S(\mbox{argmax}_j{(\sigma(\overrightarrow{\boldsymbol{z}}))})}\]

where \(\mathcal{U}\) denotes the Data without lables i.e. unlabeled_x and \(k\) is the budget. The drop out version uses the predict probability dropout function from the base strategy class to find the hypothesised labels. User can pass n_drop argument which denotes the number of times the probabilities will be calculated. The final probability is calculated by averaging probabilities obtained in all iteraitons.

Parameters
  • labeled_dataset (torch.utils.data.Dataset) – The labeled training dataset

  • unlabeled_dataset (torch.utils.data.Dataset) – The unlabeled pool dataset

  • net (torch.nn.Module) – The deep model to use

  • nclasses (int) – Number of unique values for the target

  • args (dict) –

    Specify additional parameters

    • batch_size: The batch size used internally for torch.utils.data.DataLoader objects. (int, optional)

    • device: The device to be used for computation. PyTorch constructs are transferred to this device. Usually is one of ‘cuda’ or ‘cpu’. (string, optional)

    • loss: The loss function to be used in computations. (typing.Callable[[torch.Tensor, torch.Tensor], torch.Tensor], optional)

    • n_drop: Number of dropout runs (int, optional)

select(budget)

Selects next set of points

Parameters

budget (int) – Number of data points to select for labeling

Returns

idxs – List of selected data point indices with respect to unlabeled_dataset

Return type

list

Margin Sampling

class distil.active_learning_strategies.margin_sampling.MarginSampling(labeled_dataset, unlabeled_dataset, net, nclasses, args={})[source]

Bases: ScoreStreamingStrategy

Implements the Margin Sampling Strategy a active learning strategy similar to Least Confidence Sampling Strategy. While least confidence only takes into consideration the maximum probability, margin sampling considers the difference between the confidence of first and the second most probable labels.

Suppose the model has nclasses output nodes denoted by \(\overrightarrow{\boldsymbol{z}}\) and each output node is denoted by \(z_j\). Thus, \(j \in [1, nclasses]\). Then for a output node \(z_i\) from the model, the corresponding softmax would be

\[\sigma(z_i) = \frac{e^{z_i}}{\sum_j e^{z_j}}\]

Let,

\[m = \mbox{argmax}_j{(\sigma(\overrightarrow{\boldsymbol{z}}))}\]

Then using softmax, Margin Sampling Strategy would pick budget no. of elements as follows,

\[\mbox{argmin}_{{S \subseteq {\mathcal U}, |S| \leq k}}{\sum_S(\mbox{argmax}_j {(\sigma(\overrightarrow{\boldsymbol{z}}))}) - (\mbox{argmax}_{j \ne m} {(\sigma(\overrightarrow{\boldsymbol{z}}))})}\]

where \(\mathcal{U}\) denotes the Data without lables i.e. unlabeled_x and \(k\) is the budget.

Parameters
  • labeled_dataset (torch.utils.data.Dataset) – The labeled training dataset

  • unlabeled_dataset (torch.utils.data.Dataset) – The unlabeled pool dataset

  • net (torch.nn.Module) – The deep model to use

  • nclasses (int) – Number of unique values for the target

  • args (dict) –

    Specify additional parameters

    • batch_size: The batch size used internally for torch.utils.data.DataLoader objects. (int, optional)

    • device: The device to be used for computation. PyTorch constructs are transferred to this device. Usually is one of ‘cuda’ or ‘cpu’. (string, optional)

    • loss: The loss function to be used in computations. (typing.Callable[[torch.Tensor, torch.Tensor], torch.Tensor], optional)

select(budget)

Selects next set of points

Parameters

budget (int) – Number of data points to select for labeling

Returns

idxs – List of selected data point indices with respect to unlabeled_dataset

Return type

list

Margin Sampling with Dropout

class distil.active_learning_strategies.margin_sampling_dropout.MarginSamplingDropout(labeled_dataset, unlabeled_dataset, net, nclasses, args={})[source]

Bases: ScoreStreamingStrategy

Implements the Margin Sampling Strategy with dropout a active learning strategy similar to Least Confidence Sampling Strategy with dropout. While least confidence only takes into consideration the maximum probability, margin sampling considers the difference between the confidence of first and the second most probable labels.

Suppose the model has nclasses output nodes denoted by \(\overrightarrow{\boldsymbol{z}}\) and each output node is denoted by \(z_j\). Thus, \(j \in [1, nclasses]\). Then for a output node \(z_i\) from the model, the corresponding softmax would be

\[\sigma(z_i) = \frac{e^{z_i}}{\sum_j e^{z_j}}\]

Let,

\[m = \mbox{argmax}_j{(\sigma(\overrightarrow{\boldsymbol{z}}))}\]

Then using softmax, Margin Sampling Strategy would pick budget no. of elements as follows,

\[\mbox{argmin}_{{S \subseteq {\mathcal U}, |S| \leq k}}{\sum_S(\mbox{argmax}_j {(\sigma(\overrightarrow{\boldsymbol{z}}))}) - (\mbox{argmax}_{j \ne m} {(\sigma(\overrightarrow{\boldsymbol{z}}))})}\]

where \(\mathcal{U}\) denotes the Data without lables i.e. unlabeled_x and \(k\) is the budget.

The drop out version uses the predict probability dropout function from the base strategy class to find the hypothesised labels. User can pass n_drop argument which denotes the number of times the probabilities will be calculated. The final probability is calculated by averaging probabilities obtained in all iteraitons.

Parameters
  • labeled_dataset (torch.utils.data.Dataset) – The labeled training dataset

  • unlabeled_dataset (torch.utils.data.Dataset) – The unlabeled pool dataset

  • net (torch.nn.Module) – The deep model to use

  • nclasses (int) – Number of unique values for the target

  • args (dict) –

    Specify additional parameters

    • batch_size: The batch size used internally for torch.utils.data.DataLoader objects. (int, optional)

    • device: The device to be used for computation. PyTorch constructs are transferred to this device. Usually is one of ‘cuda’ or ‘cpu’. (string, optional)

    • loss: The loss function to be used in computations. (typing.Callable[[torch.Tensor, torch.Tensor], torch.Tensor], optional)

    • n_drop: Number of dropout runs (int, optional)

select(budget)

Selects next set of points

Parameters

budget (int) – Number of data points to select for labeling

Returns

idxs – List of selected data point indices with respect to unlabeled_dataset

Return type

list

Partitioning

class distil.active_learning_strategies.partition_strategy.PartitionStrategy(labeled_dataset, unlabeled_dataset, net, nclasses, args={}, query_dataset=None, private_dataset=None)[source]

Bases: Strategy

Provides a wrapper around most of the strategies implemented in DISTIL that allows one to select portions of the budget from specific partitions of the unlabeled dataset. This allows the use of some strategies that would otherwise fail due to time or memory constraints. For example, if one specifies a number of partitions to be 5 and wants to select 50 new points, 10 points would be selected from the first fifth of the dataset, 10 points would be selected from the second fifth of the dataset, and so on.

Parameters
  • labeled_dataset (torch.utils.data.Dataset) – The labeled training dataset

  • unlabeled_dataset (torch.utils.data.Dataset) – The unlabeled pool dataset

  • net (torch.nn.Module) – The deep model to use

  • nclasses (int) – Number of unique values for the target

  • args (dict) –

    Specify additional parameters

    • batch_size: The batch size used internally for torch.utils.data.DataLoader objects. (int, optional)

    • device: The device to be used for computation. PyTorch constructs are transferred to this device. Usually is one of ‘cuda’ or ‘cpu’. (string, optional)

    • loss: The loss function to be used in computations. (typing.Callable[[torch.Tensor, torch.Tensor], torch.Tensor], optional)

    • num_partitions: Number of partitons to use (int, optional)

    • wrapped_strategy_class: The class of the strategy to use (class, optional)

  • query_dataset (torch.utils.data.Dataset) – The query dataset to use if the wrapped_strategy_class argument points to SMI or SCMI.

  • private_dataset (torch.utils.data.Dataset) – The private dataset to use if the wrapped_strategy_class argument points to SCG or SCMI.

select(budget)[source]

Selects next set of points

Parameters

budget (int) – Number of data points to select for labeling

Returns

idxs – List of selected data point indices with respect to unlabeled_dataset

Return type

list

Random Sampling

class distil.active_learning_strategies.random_sampling.RandomSampling(labeled_dataset, unlabeled_dataset, net, nclasses, args={})[source]

Bases: Strategy

Implementation of Random Sampling Strategy. This strategy is often used as a baseline, where we pick a set of unlabeled points randomly.

Parameters
  • labeled_dataset (torch.utils.data.Dataset) – The labeled training dataset

  • unlabeled_dataset (torch.utils.data.Dataset) – The unlabeled pool dataset

  • net (torch.nn.Module) – The deep model to use

  • nclasses (int) – Number of unique values for the target

  • args (dict) –

    Specify additional parameters

    • batch_size: The batch size used internally for torch.utils.data.DataLoader objects. (int, optional)

    • device: The device to be used for computation. PyTorch constructs are transferred to this device. Usually is one of ‘cuda’ or ‘cpu’. (string, optional)

    • loss: The loss function to be used in computations. (typing.Callable[[torch.Tensor, torch.Tensor], torch.Tensor], optional)

select(budget)[source]

Selects next set of points

Parameters

budget (int) – Number of data points to select for labeling

Returns

idxs – List of selected data point indices with respect to unlabeled_dataset

Return type

list

Submodular Conditional Gain (SCG)

class distil.active_learning_strategies.scg.SCG(labeled_dataset, unlabeled_dataset, private_dataset, net, nclasses, args={})[source]

Bases: Strategy

This strategy implements the Submodular Conditional Gain (SCG) selection paradigm discuss in the paper SIMILAR: Submodular Information Measures Based Active Learning In Realistic Scenarios 11. In this selection paradigm, points from the unlabeled dataset are chosen in such a way that the submodular conditional gain between this set of points and a provided private set is maximized. Doing so allows a practitioner to select points from an unlabeled set that are dissimilar to points provided in the private set.

These submodular conditional gain functions rely on formulating embeddings for the points in the unlabeled set and the private set. Once these embeddings are formed, similarity kernels are formed from these embeddings based on a similarity metric. Once these similarity kernels are formed, they are used in computing the value of each submodular conditional gain function. Hence, common techniques for submodular maximization subject to a cardinality constraint can be used, such as the naive greedy algorithm, the lazy greedy algorithm, and so forth.

In this framework, we set the cardinality constraint to be the active learning selection budget; hence, a list of indices with a total length less than or equal to this cardinality constraint will be returned. Depending on the maximization configuration, one can ensure that the length of this list will be equal to the cardinality constraint.

Currently, three submodular conditional gain functions are implemented: ‘flcg’, ‘gccg’, and ‘logdetcg’. Each function is obtained by applying the definition of a submodular conditional gain function using common submodular functions. For more information-theoretic discussion, consider referring to the paper Submodular Combinatorial Information Measures with Applications in Machine Learning 12.

Parameters
  • labeled_dataset (torch.utils.data.Dataset) – The labeled dataset to be used in this strategy. For the purposes of selection, the labeled dataset is not used, but it is provided to fit the common framework of the Strategy superclass.

  • unlabeled_dataset (torch.utils.data.Dataset) – The unlabeled dataset to be used in this strategy. It is used in the selection process as described above. Importantly, the unlabeled dataset must return only a data Tensor; if indexing the unlabeled dataset returns a tuple of more than one component, unexpected behavior will most likely occur.

  • private_dataset (torch.utils.data.Dataset) – The private dataset to be used in this strategy. It is used in the selection process as described above. Notably, the private dataset should be labeled; hence, indexing the query dataset should return a data/label pair. This is done in this fashion to allow for gradient embeddings.

  • net (torch.nn.Module) – The neural network model to use for embeddings and predictions. Notably, all embeddings typically come from extracted features from this network or from gradient embeddings based on the loss, which can be based on hypothesized gradients or on true gradients (depending on the availability of the label).

  • nclasses (int) – The number of classes being predicted by the neural network.

  • args (dict) –

    A dictionary containing many configurable settings for this strategy. Each key-value pair is described below:

    • batch_size: The batch size used internally for torch.utils.data.DataLoader objects. (int, optional)

    • device: The device to be used for computation. PyTorch constructs are transferred to this device. Usually is one of ‘cuda’ or ‘cpu’. (string, optional)

    • loss: The loss function to be used in computations. (typing.Callable[[torch.Tensor, torch.Tensor], torch.Tensor], optional)

    • scg_function: The submodular conditional gain function to use in optimization. Must be one of ‘flcg’, ‘gccg’, or ‘logdetcg’. (string)

    • optimizer: The optimizer to use for submodular maximization. Can be one of ‘NaiveGreedy’, ‘StochasticGreedy’, ‘LazyGreedy’ and ‘LazierThanLazyGreedy’. (string, optional)

    • metric: The similarity metric to use for similarity kernel computation. This can be either ‘cosine’ or ‘euclidean’. (string)

    • nu: A parameter that governs the hardness of the privacy constraint. (float)

    • embedding_type: The type of embedding to compute for similarity kernel computation. This can be either ‘gradients’ or ‘features’. (string)

    • gradType: When ‘embedding_type’ is ‘gradients’, this defines the type of gradient to use. ‘bias’ creates gradients from the loss function with respect to the biases outputted by the model. ‘linear’ creates gradients from the loss function with respect to the last linear layer features. ‘bias_linear’ creates gradients from the loss function using both. (string)

    • layer_name: When ‘embedding_type’ is ‘features’, this defines the layer within the neural network that is used to extract feature embeddings. Namely, this argument must be the name of a module used in the forward() computation of the model. (string)

    • stopIfZeroGain: Controls if the optimizer should cease maximization if there is zero gain in the submodular objective. (bool)

    • stopIfNegativeGain: Controls if the optimizer should cease maximization if there is negative gain in the submodular objective. (bool)

    • verbose: Gives a more verbose output when calling select() when True. (bool)

select(budget)[source]

Selects next set of points

Parameters

budget (int) – Number of data points to select for labeling

Returns

idxs – List of selected data point indices with respect to unlabeled_dataset

Return type

list

Submodular Conditional Mutual Information (SCMI)

class distil.active_learning_strategies.scmi.SCMI(labeled_dataset, unlabeled_dataset, query_dataset, private_dataset, net, nclasses, args={})[source]

Bases: Strategy

This strategy implements the Submodular Conditional Mutual Information (SCMI) selection paradigm discuss in the paper SIMILAR: Submodular Information Measures Based Active Learning In Realistic Scenarios 11. In this selection paradigm, points from the unlabeled dataset are chosen in such a way that the submodular conditional mutual information between this set of points and a provided query set is maximized, conditioned on a private dataset. Doing so allows a practitioner to select points from an unlabeled set that are SIMILAR to points that they have provided in the query set while being dissimilar to points provided in the private set.

These submodular conditional mutual information functions rely on formulating embeddings for the points in the query set, the unlabeled set, and the private set. Once these embeddings are formed, similarity kernels are formed from these embeddings based on a similarity metric. Once these similarity kernels are formed, they are used in computing the value of each submodular conditional mutual information function. Hence, common techniques for submodular maximization subject to a cardinality constraint can be used, such as the naive greedy algorithm, the lazy greedy algorithm, and so forth.

In this framework, we set the cardinality constraint to be the active learning selection budget; hence, a list of indices with a total length less than or equal to this cardinality constraint will be returned. Depending on the maximization configuration, one can ensure that the length of this list will be equal to the cardinality constraint.

Currently, two submodular conditional mutual information functions are implemented: ‘flcmi’ and ‘logdetcmi’. Each function is obtained by applying the definition of a submodular conditional mutual information function using common submodular functions. For more information-theoretic discussion, consider referring to the paper Submodular Combinatorial Information Measures with Applications in Machine Learning 12.

Parameters
  • labeled_dataset (torch.utils.data.Dataset) – The labeled dataset to be used in this strategy. For the purposes of selection, the labeled dataset is not used, but it is provided to fit the common framework of the Strategy superclass.

  • unlabeled_dataset (torch.utils.data.Dataset) – The unlabeled dataset to be used in this strategy. It is used in the selection process as described above. Importantly, the unlabeled dataset must return only a data Tensor; if indexing the unlabeled dataset returns a tuple of more than one component, unexpected behavior will most likely occur.

  • query_dataset (torch.utils.data.Dataset) – The query dataset to be used in this strategy. It is used in the selection process as described above. Notably, the query dataset should be labeled; hence, indexing the query dataset should return a data/label pair. This is done in this fashion to allow for gradient embeddings.

  • private_dataset (torch.utils.data.Dataset) – The private dataset to be used in this strategy. It is used in the selection process as described above. Notably, the private dataset should be labeled; hence, indexing the query dataset should return a data/label pair. This is done in this fashion to allow for gradient embeddings.

  • net (torch.nn.Module) – The neural network model to use for embeddings and predictions. Notably, all embeddings typically come from extracted features from this network or from gradient embeddings based on the loss, which can be based on hypothesized gradients or on true gradients (depending on the availability of the label).

  • nclasses (int) – The number of classes being predicted by the neural network.

  • args (dict) –

    A dictionary containing many configurable settings for this strategy. Each key-value pair is described below:

    • batch_size: The batch size used internally for torch.utils.data.DataLoader objects. (int, optional)

    • device: The device to be used for computation. PyTorch constructs are transferred to this device. Usually is one of ‘cuda’ or ‘cpu’. (string, optional)

    • loss: The loss function to be used in computations. (typing.Callable[[torch.Tensor, torch.Tensor], torch.Tensor], optional)

    • scmi_function: The submodular conditional mutual information function to use in optimization. Must be one of ‘flcmi’ or ‘logdetcmi’. (string)

    • optimizer: The optimizer to use for submodular maximization. Can be one of ‘NaiveGreedy’, ‘StochasticGreedy’, ‘LazyGreedy’ and ‘LazierThanLazyGreedy’. (string, optional)

    • metric: The similarity metric to use for similarity kernel computation. This can be either ‘cosine’ or ‘euclidean’. (string)

    • eta: A magnification constant that is used in all but gcmi. It is used as a value of query-relevance vs diversity trade-off. Increasing eta tends to increase query-relevance while reducing query-coverage and diversity. (float)

    • nu: A parameter that governs the hardness of the privacy constraint. (float)

    • embedding_type: The type of embedding to compute for similarity kernel computation. This can be either ‘gradients’ or ‘features’. (string)

    • gradType: When ‘embedding_type’ is ‘gradients’, this defines the type of gradient to use. ‘bias’ creates gradients from the loss function with respect to the biases outputted by the model. ‘linear’ creates gradients from the loss function with respect to the last linear layer features. ‘bias_linear’ creates gradients from the loss function using both. (string)

    • layer_name: When ‘embedding_type’ is ‘features’, this defines the layer within the neural network that is used to extract feature embeddings. Namely, this argument must be the name of a module used in the forward() computation of the model. (string)

    • stopIfZeroGain: Controls if the optimizer should cease maximization if there is zero gain in the submodular objective. (bool)

    • stopIfNegativeGain: Controls if the optimizer should cease maximization if there is negative gain in the submodular objective. (bool)

    • verbose: Gives a more verbose output when calling select() when True. (bool)

select(budget)[source]

Selects next set of points

Parameters

budget (int) – Number of data points to select for labeling

Returns

idxs – List of selected data point indices with respect to unlabeled_dataset

Return type

list

Submodular Mutual Information (SMI)

class distil.active_learning_strategies.smi.SMI(labeled_dataset, unlabeled_dataset, query_dataset, net, nclasses, args={})[source]

Bases: Strategy

This strategy implements the Submodular Mutual Information (SMI) selection paradigm discuss in the paper SIMILAR: Submodular Information Measures Based Active Learning In Realistic Scenarios 11. In this selection paradigm, points from the unlabeled dataset are chosen in such a way that the submodular mutual information between this set of points and a provided query set is maximized. Doing so allows a practitioner to select points from an unlabeled set that are SIMILAR to points that they have provided in a active learning query.

These submodular mutual information functions rely on formulating embeddings for the points in the query set and the unlabeled set. Once these embeddings are formed, one or more similarity kernels (depending on the SMI function used) are formed from these embeddings based on a similarity metric. Once these similarity kernels are formed, they are used in computing the value of each submodular mutual information function. Hence, common techniques for submodular maximization subject to a cardinality constraint can be used, such as the naive greedy algorithm, the lazy greedy algorithm, and so forth.

In this framework, we set the cardinality constraint to be the active learning selection budget; hence, a list of indices with a total length less than or equal to this cardinality constraint will be returned. Depending on the maximization configuration, one can ensure that the length of this list will be equal to the cardinality constraint.

Currently, five submodular mutual information functions are implemented: fl1mi, fl2mi, gcmi, logdetmi, and com. Each function is obtained by applying the definition of a submodular mutual information function using common submodular functions. Facility Location Mutual Information (fl1mi) models pairwise similarities of points in the query set to points in the unlabeled dataset AND pairwise similarities of points within the unlabeled datasets. Another variant of Facility Location Mutual Information (fl2mi) models pairwise similarities of points in the query set to points in the unlabeled dataset ONLY. Graph Cut Mutual Information (gcmi), Log-Determinant Mutual Information (logdetmi), and Concave-Over-Modular Mutual Information (com) are all obtained by applying the usual submodular function under this definition. For more information-theoretic discussion, consider referring to the paper Submodular Combinatorial Information Measures with Applications in Machine Learning 12.

Parameters
  • labeled_dataset (torch.utils.data.Dataset) – The labeled dataset to be used in this strategy. For the purposes of selection, the labeled dataset is not used, but it is provided to fit the common framework of the Strategy superclass.

  • unlabeled_dataset (torch.utils.data.Dataset) – The unlabeled dataset to be used in this strategy. It is used in the selection process as described above. Importantly, the unlabeled dataset must return only a data Tensor; if indexing the unlabeled dataset returns a tuple of more than one component, unexpected behavior will most likely occur.

  • query_dataset (torch.utils.data.Dataset) – The query dataset to be used in this strategy. It is used in the selection process as described above. Notably, the query dataset should be labeled; hence, indexing the query dataset should return a data/label pair. This is done in this fashion to allow for gradient embeddings.

  • net (torch.nn.Module) – The neural network model to use for embeddings and predictions. Notably, all embeddings typically come from extracted features from this network or from gradient embeddings based on the loss, which can be based on hypothesized gradients or on true gradients (depending on the availability of the label).

  • nclasses (int) – The number of classes being predicted by the neural network.

  • args (dict) –

    A dictionary containing many configurable settings for this strategy. Each key-value pair is described below:

    • batch_size: The batch size used internally for torch.utils.data.DataLoader objects. (int, optional)

    • device: The device to be used for computation. PyTorch constructs are transferred to this device. Usually is one of ‘cuda’ or ‘cpu’. (string, optional)

    • loss: The loss function to be used in computations. (typing.Callable[[torch.Tensor, torch.Tensor], torch.Tensor], optional)

    • smi_function: The submodular mutual information function to use in optimization. Must be one of ‘fl1mi’, ‘fl2mi’, ‘gcmi’, ‘logdetmi’, ‘com’. (string)

    • optimizer: The optimizer to use for submodular maximization. Can be one of ‘NaiveGreedy’, ‘StochasticGreedy’, ‘LazyGreedy’ and ‘LazierThanLazyGreedy’. (string, optional)

    • metric: The similarity metric to use for similarity kernel computation. This can be either ‘cosine’ or ‘euclidean’. (string)

    • eta: A magnification constant that is used in all but gcmi. It is used as a value of query-relevance vs diversity trade-off. Increasing eta tends to increase query-relevance while reducing query-coverage and diversity. (float)

    • embedding_type: The type of embedding to compute for similarity kernel computation. This can be either ‘gradients’ or ‘features’. (string)

    • gradType: When ‘embedding_type’ is ‘gradients’, this defines the type of gradient to use. ‘bias’ creates gradients from the loss function with respect to the biases outputted by the model. ‘linear’ creates gradients from the loss function with respect to the last linear layer features. ‘bias_linear’ creates gradients from the loss function using both. (string)

    • layer_name: When ‘embedding_type’ is ‘features’, this defines the layer within the neural network that is used to extract feature embeddings. Namely, this argument must be the name of a module used in the forward() computation of the model. (string)

    • stopIfZeroGain: Controls if the optimizer should cease maximization if there is zero gain in the submodular objective. (bool)

    • stopIfNegativeGain: Controls if the optimizer should cease maximization if there is negative gain in the submodular objective. (bool)

    • verbose: Gives a more verbose output when calling select() when True. (bool)

select(budget)[source]

Selects next set of points

Parameters

budget (int) – Number of data points to select for labeling

Returns

idxs – List of selected data point indices with respect to unlabeled_dataset

Return type

list

Submodular Sampling

class distil.active_learning_strategies.submod_sampling.SubmodularSampling(labeled_dataset, unlabeled_dataset, net, nclasses, args={})[source]

Bases: Strategy

This strategy uses one of the submodular functions viz. ‘facility_location’, ‘feature_based’, ‘graph_cut’, ‘log_determinant’, ‘disparity_min’, or ‘disparity_sum’ 12, 13 to select new points via submodular maximization. These techniques can be applied directly to the features/embeddings or on the gradients of the loss functions.

Parameters
  • labeled_dataset (torch.utils.data.Dataset) – The labeled training dataset

  • unlabeled_dataset (torch.utils.data.Dataset) – The unlabeled pool dataset

  • net (torch.nn.Module) – The deep model to use

  • nclasses (int) – Number of unique values for the target

  • args (dict) –

    Specify additional parameters

    • batch_size: Batch size to be used inside strategy class (int, optional)

    • device: The device that this strategy class should use for computation (string, optional)

    • loss: The loss that should be used for relevant computations (typing.Callable[[torch.Tensor, torch.Tensor], torch.Tensor], optional)

    • submod_args: Additional parameters for submodular selection (dict, optional)

      • submod: The choice of submodular function to use. Must be one of ‘facility_location’, ‘feature_based’, ‘graph_cut’, ‘log_determinant’, ‘disparity_min’, ‘disparity_sum’ (string)

      • metric: The similarity metric to use in relevant functions. Must be one of ‘cosine’ or ‘euclidean’ (string)

      • representation: The representation of each data point to be used in submodular selection. Must be one of ‘linear’, ‘grad_bias’, ‘grad_linear’, ‘grad_bias_linear’ (string)

      • feature_weights: If using ‘feature_based’, then this specifies the weights for each feature (list)

      • concave_function: If using ‘feature_based’, then this specifies the concave function to apply in the feature-based objective (typing.Callable)

      • lambda_val: If using ‘graph_cut’ or ‘log_determinant’, then this specifies the lambda constant to be used in both functions (float)

      • optimizer: The choice of submodular optimization technique to use. Must be one of ‘NaiveGreedy’, ‘StochasticGreedy’, ‘LazyGreedy’, or ‘LazierThanLazyGreedy’ (string)

      • stopIfZeroGain: Whether to stop if adding a point results in zero gain in the submodular objective function (bool)

      • stopIfNegativeGain: Whether to stop if adding a point results in negative gain in the submodular objective function (bool)

      • verbose: Whether to print more verbose output

select(budget)[source]

Selects next set of points

Parameters

budget (int) – Number of data points to select for labeling

Returns

idxs – List of selected data point indices with respect to unlabeled_dataset

Return type

list

REFERENCES

1

Florian Tramèr, Alexey Kurakin, Nicolas Papernot, Ian Goodfellow, Dan Boneh, and Patrick McDaniel. Ensemble adversarial training: attacks and defenses. arXiv preprint arXiv:1705.07204, 2017.

2

Melanie Ducoffe and Frederic Precioso. Adversarial active learning for deep networks: a margin based approach. 2018. arXiv:1802.09841.

3

Seyed-Mohsen Moosavi-Dezfooli, Alhussein Fawzi, and Pascal Frossard. Deepfool: a simple and accurate method to fool deep neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). June 2016.

4

Jordan T. Ash, Chicheng Zhang, Akshay Krishnamurthy, John Langford, and Alekh Agarwal. Deep batch active learning by diverse, uncertain gradient lower bounds. CoRR, 2019. URL: http://arxiv.org/abs/1906.03671, arXiv:1906.03671.

5

Andreas Kirsch, Joost van Amersfoort, and Yarin Gal. Batchbald: efficient and diverse batch acquisition for deep bayesian active learning. 2019. arXiv:1906.08158.

6

Neil Houlsby, Ferenc Huszár, Zoubin Ghahramani, and Máté Lengyel. Bayesian active learning for classification and preference learning. 2011. arXiv:1112.5745.

7

Ozan Sener and Silvio Savarese. Active learning for convolutional neural networks: a core-set approach. 2018. arXiv:1708.00489.

8

Kai Wei, Rishabh Iyer, and Jeff Bilmes. Submodularity in data subset selection and active learning. In Francis Bach and David Blei, editors, Proceedings of the 32nd International Conference on Machine Learning, volume 37 of Proceedings of Machine Learning Research, 1954–1963. Lille, France, 07–09 Jul 2015. PMLR. URL: http://proceedings.mlr.press/v37/wei15.html.

9

Krishnateja Killamsetty, Durga Sivasubramanian, Ganesh Ramakrishnan, and Rishabh Iyer. Glister: generalization based data subset selection for efficient and robust learning. 2020. arXiv:2012.10630.

10

Krishnateja Killamsetty, Durga Sivasubramanian, Baharan Mirzasoleiman, Ganesh Ramakrishnan, Abir De, and Rishabh Iyer. Grad-match: a gradient matching based data subset selection for efficient learning. arXiv preprint arXiv:2103.00123, 2021.

11(1,2,3)

Suraj Kothawade, Nathan Beck, Krishnateja Killamsetty, and Rishabh Iyer. Similar: submodular information measures based active learning in realistic scenarios. arXiv preprint arXiv:2107.00717, 2021.

12(1,2,3,4)

Rishabh Iyer, Ninad Khargonkar, Jeff Bilmes, and Himanshu Asnani. Submodular combinatorial information measures with applications in machine learning. 2021. arXiv:2006.15412.

13

Anirban Dasgupta, Ravi Kumar, and Sujith Ravi. Summarization through submodularity and dispersion. In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 1014–1022. Sofia, Bulgaria, August 2013. Association for Computational Linguistics. URL: https://www.aclweb.org/anthology/P13-1100.