Introspectors API¶
Data Introspectors¶
Familiarity¶
- class deepview.introspectors.Familiarity(meta_key, _distributions)[source]¶
An algorithm that fits a density model to model responses and produces a
PipelineStagethat can score responses.Like other
introspectors, useFamiliarity.introspectto instantiate.- Parameters:
meta_key (DictMetaKey[FamiliarityResult]) – do not instantiate
Familiaritydirectly, useFamiliarity.introspect_distributions (Mapping[str, FamiliarityDistribution]) – do not instantiate
Familiaritydirectly, useFamiliarity.introspect
- class Strategy[source]¶
Bundled Familiarity computation strategies. See
FamiliarityStrategyType- class GMM(*, gaussian_count=5, convergence_threshold=0.001, max_iterations=200, covariance_type=GMMCovarianceType.DIAG, _random_state=None)¶
A
FamiliarityStrategyTypethat fits a mixture of multivariate gaussian distributions on the introspected responses usingsklearn.mixture.GaussianMixture.- Parameters:
gaussian_count (int) – [keyword arg, optional] Number of gaussian distributions to be fitted in the mixture model.
convergence_threshold (float) – [keyword arg, optional] Convergence threshold used when fitting the mixture model.
max_iterations (int) – [keyword arg, optional] Maximum number of iterations to use when fitting the mixture model.
covariance_type (GMMCovarianceType) – [keyword arg, optional] Covariance type, usually
GMMCovarianceType.DIAGorGMMCovarianceType.FULL. See sklearn’s GaussianMixture docs for extra information._random_state (RandomState | None)
- covariance_type: GMMCovarianceType = 'diag'¶
Covariance type, usually
GMMCovarianceType.DIAGorGMMCovarianceType.FULL. See sklearn’s GaussianMixture docs for extra information.
- static introspect(producer, *, strategy=None, batch_size=1024)[source]¶
Examines the
producerto fit a model for classifying familiarity of another set of responses.- Parameters:
producer (Producer) – the
Producerof model responses to fit the familiarity model tostrategy (FamiliarityStrategyType | None) – [keyword arg, optional] familiarity strategy for producing the model. Default is
Familiarity.Strategy.GMM().batch_size (int) – [keyword arg, optional] batch size to use when reading data from the
producer
- Returns:
a
FamiliarityPipelineStagethat, when added into apipeline, will score responses with regard to the fit familiarity model to the inputproducerand attach the score asmetadatausing itsmeta_key.- Return type:
- meta_key: DictMetaKey[FamiliarityResult]¶
Metadata key used to access the familiarity result (
FamiliarityResult). This is accessible via:Example
results = batch.metadata[familiarity_processor.meta_key]['response_a'] # type of results: t.Sequence[FamiliarityResult]
- class deepview.introspectors.FamiliarityStrategyType(*args, **kwargs)[source]¶
Protocol for a class/function that takes a
Producerand produces a per-layer mapping ofFamiliarityDistribution.- metadata_key: ClassVar[DictMetaKey[FamiliarityResult]]¶
Key that will be used to view the metadata for a particular strategy.
- class deepview.introspectors.FamiliarityResult(*args, **kwargs)[source]¶
Protocol for the result of applying a
FamiliarityDistributionto a response.
- class deepview.introspectors.GMMCovarianceType(value)[source]¶
Covariance type to be learnt from data. Typically, use
FULLfor low dimensional data andDIAGfor high dimensional data.The main problem with
FULLin high dimensions is that the algorithm learnsdim x dimparameters for each gaussian, and so overfitting or degenerate solutions may be a problem.The boundary between low and high dimensional data is fuzzy, and the choice of covariance type also depends on the application, data distribution or amount of data available.
A general rule is:
If there are concerns about overfitting due to a lack of data, dimensions are high wrt. the data available, etc. Then use
DIAG. This is typically the case when working with DNN embeddings.Else, use
FULL. For example, if fitting 2D data.
For more information about covariance types, refer to the sklearn GMM covariances page.
- DIAG = 'diag'¶
Diagonal covariance type, only the diagonal parameters will be learnt from data.
- FULL = 'full'¶
Full covariance type, all
dim x dimparameters will be learnt from data.
- class deepview.introspectors.FamiliarityDistribution(*args, **kwargs)[source]¶
The per-response result of
FamiliarityStrategyType. An instance of this represents the distribution for a single layer and can evaluate the contents of a response.- compute_familiarity_score(x)[source]¶
Compute and return the
Familiarity scorefor each data point inx.- Parameters:
x (ndarray) – input data samples to score according to the built distribution
- Returns:
Familiarity scorefor each data sample- Return type:
Dimensionality Reduction¶
- class deepview.introspectors.DimensionReduction(_reducers)[source]¶
Introspectorto reduce dimensionality of :class`Batch <deepview.base.Batch>`fields(usually model responses).Like other
introspectors, useDimensionReduction.introspectto instantiate.- Parameters:
_reducers (Mapping[str, DimensionReductionStrategyType])
- class Strategy[source]¶
Bundled dimension reduction strategies. See
DimensionReductionStrategyType.The available options are:
PCA– an Incremental PCA algorithm fromsklearnthat can process data incrementally without accumulating the datasetStandardPCA– PCA algorithm fromsklearnthat requires accumulating the full dataset in memoryTSNE– t-SNE algorithm fromsklearnthat requires accumulating the full dataset in memoryUMAP– the UMAP algorithm fromumap-learnthat requires accumulating the full dataset in memoryPaCMAP– the PaCMAP algorithm that requires accumulating the full dataset in memory
- class PCA(target_dimensions=2)¶
Principal Component Analysis based dimension reduction using
SKLearn IncrementalPCA.Note
This does not require reading all of the responses into memory to compute the model. A larger batch size will improve the quality of the fit at the cost of additional memory. The incremental approach produces an approximation of PCA, but is documented to be very close and testing backs this up.
DimensionReduction.Strategy.StandardPCAcan be used if exact computation of PCA is necessary.- Parameters:
target_dimensions (int) – [optional] Target dimensionality of the data.
- class PaCMAP(target_dimensions=2, *, _parameters=None, **kwargs)¶
PaCMAP (Pairwise Controlled Manifold Approximation) is a dimensionality reduction method built with PaCMAP. PaCMAP can be used for visualization, preserving both local and global structure of the data in original space.
This dimension reduction strategy requires reading all of the data into memory before producing the projection. Typically the input data should be reduced from high dimension to low, e.g. 1024 -> 40, before applying PaCMAP.
- Parameters:
- class StandardPCA(target_dimensions=2)¶
Principal Component Analysis based dimension reduction using
SKLearn PCA.This dimension reduction strategy requires reading all of the data into memory before producing the projection.
DimensionReduction.Strategy.PCAis preferred for its lower memory use.- Parameters:
target_dimensions (int) – [optional] Target dimensionality of the data.
- class TSNE(target_dimensions=2, *, _parameters=None, **kwargs)¶
t-distributed Stochastic Neighbor Embedding (t-SNE) using
SKLearn t-SNE.This dimension reduction strategy requires reading all of the data into memory before producing the projection. Typically the input data should be reduced from high dimension to low, e.g. 1024 -> 40, before applying t-SNE.
- Parameters:
- class UMAP(target_dimensions=2, *, _parameters=None, **kwargs)¶
UMAP based dimension reduction using umap-learn (https://umap-learn.readthedocs.io).
This dimension reduction strategy requires reading all of the data into memory before producing the projection. Typically the input data should be reduced from high dimension to low, e.g. 1024 -> 40, before applying UMAP.
- Parameters:
- Raises:
DeepViewException – if a layer’s response shape does not have exactly 2 dimensions.
- target_dimensions: int = 2¶
The dimension of the space to embed into. This defaults to 2 to provide straightforward visualization, but can reasonably be set to any integer value in the range 2 to 100. (from https://umap-learn.readthedocs.io)
- static introspect(producer, *, strategies, batch_size=None)[source]¶
Perform dimension reduction using training data generated by
producer, and return aDimensionReductionPipelineStagethat can perform dimensionality reduction in apipeline.The
producermust produce 1d vectors, e.g. theBatchwill be of dimensionBxN. SeeFlattenerorPoolerif multi-dimensional data is used.Note: some strategies will need to read all of the response data into memory to fit their model. Currently only the
PCAalgorithm runs in a streaming fashion.- Parameters:
producer (Producer) – the source of data to train the
strategiesonstrategies (DimensionReductionStrategyType | Mapping[str, DimensionReductionStrategyType]) – [keyword arg] which dimension reduction
strategyto use or a mapping fromfieldname tostrategy(for running a different dimension reduction per layer.batch_size (int | None) – [keyword arg, optional] size of batch to read out – this must be >= the target dimension. For some strategies like
PCA, this will improve the quality of the dimension reduction. The default value will select thebatch_sizeautomatically.
- Raises:
DeepViewException – if a layer’s response shape does not have exactly 2 dimensions.
DeepViewException – if the
batch_sizeis smaller than the target dimensions.
- Return type:
- OneOrManyDimStrategies¶
alias of Union[DimensionReductionStrategyType, Mapping[str, DimensionReductionStrategyType]]
- class deepview.introspectors.DimensionReductionStrategyType(*args, **kwargs)[source]¶
Strategy for performing dimension reduction on a single layer. This is initialized with the target dimensions.
The
fit_incremental()method is called repeatedly for each batch that is processed. When all data has been visited, thefit_complete()method is called. Algorithms that require the full data set in memory may collect values with the first call and then combine and process infit_complete().transform()is used to transform high dimensional data into the target dimensions.- check_batch_size(batch_size)[source]¶
Validate the batch_size and throw an error if there is an issue.
- Parameters:
batch_size (int) – batch size to validate
- Return type:
None
- fit_incremental(data)[source]¶
Fit the reducer to the incremental
data- Parameters:
data (ndarray) – data to fit the reducer to
- Return type:
None
- property is_one_shot: bool¶
Returns True if this can transform input data via
transform(), or if the entire input data set is transformed at once viatransform_one_shot().
- transform(data)[source]¶
Transform the given high dimensional
datainto the target dimensions. Seeis_one_shot().
- transform_one_shot()[source]¶
Returns the input data transformed per the reducer. See
is_one_shot().- Return type:
Duplicates¶
- class deepview.introspectors.Duplicates(results, count)[source]¶
Introspector for finding duplicate data in a
Producer. This uses an approximate nearest neighbor algorithm to build clusters of nearby samples,Duplicates.DuplicateSetCandidate. Specifically, it uses the ANNOY - Approximate Nearest Neighbor Oh My! algorithm.Like other
introspectors, useDuplicates.introspectto instantiate.- Parameters:
results (Mapping[str, Sequence[DuplicateSetCandidate]]) – do not instantiate
Duplicatesdirectly, useDuplicates.introspectcount (int) – do not instantiate
Duplicatesdirectly, useDuplicates.introspect
- class DuplicateSetCandidate(std, mean, projection, indices, batch)[source]¶
- Parameters:
- class KNNStrategy[source]¶
Bundled K Nearest Neighbours computation strategies. See
FamiliarityStrategyType- class KNNAnnoy¶
Strategy for computing duplicates using the Annoy library.
- class KNNFaiss¶
Strategy for computing duplicates using the FAISS library.
- class ThresholdStrategy[source]¶
- class Percentile(percentile)¶
Strategy that determines the closeness threshold by taking the nth percentile distance number in the sorted distances. For example a value of
98.5would use a threshold such that 98.5% of the points were not considered close.- Parameters:
percentile (float) – n_th percentile to use for “closeness” in the sorted distances
- class Slope(sensitivity=5)¶
Given an array of distances, find the “close” threshold – the distance where points are close to each other.
This strategy determines the closeness threshold dynamically using a sensitivity value. A lower sensitivity (down to 2) will consider more items to be close (less sensitive to the curve of distances). A value of 5 will use a sliding window 1/5 the size of the distance array (related to the size of the dataset) and is a good default. A sensitivity of 20 will use a window 1/20 the size of the distance array and is a reasonable large value.
The distance are likely a sharp up-slope followed by a elbow and finally a long, possibly rising, tail. The target delta will be computed from the difference between the 25th and 75h percentile values. A sliding window will be run over the data with a size of
len(distances) // sensitivityto find when the delta in the window exceeds the middle delta. This will approximate the tail end of the elbow.This returns the threshold value and the index into the distances array where it was found.
- Parameters:
sensitivity (int) – [optional] lower value considers more items to be close, a larger value considers less items to be close.
- Raises:
ValueError – if
sensitivity<=2
- static introspect(producer, *, batch_size=32, strategy=None, threshold=None)[source]¶
Uses an approximate nearest neighbor to build a distance matrix for all samples and build clusters from the closest samples.
Although this works on data of any dimension, the performance is linear in the number of samples in the
producerAND the number of dimensions. Consider usingDimensionReductionto reduce the number of dimensions before detecting duplicates – if the dimensions are already being reduced forFamiliarity, the same can be used here, otherwise a reduction to 40 still gives good results.The data from the
produceris L2 normalized per-column – this will help keep one column from dominating the distance metric. See also this explanation about how any why this is done.producer = Producer... duplicates = Duplicates.introspect(producer) for response_name, clusters in duplicates.items(): # sort by the mean distance to the centroid clusters = sorted(clusters, key=lambda x: x.mean) ...
- Parameters:
producer (Producer) – producer of data
batch_size (int) – [optional] size of batch to read while collecting data from the
producerstrategy (DuplicatesStrategyType | None) – [optional] strategy to use for finding the nearest neighbors. Default is
KNNAnnoythreshold (DuplicatesThresholdStrategyType | None) – [optional] strategy to use for finding the distance between points that are considered duplicates. Default is
Slopethreshold.
- Returns:
Duplicates, which contains candidate duplicates for each response name- Return type:
- results: Mapping[str, Sequence[DuplicateSetCandidate]]¶
Mapping from response name to a list of candidate duplicates.
- class deepview.introspectors.DuplicatesStrategyType(*args, **kwargs)[source]¶
Protocol for code that takes anarray of vectors (embeddings) and computes a list of duplicates for each point.
Dataset Report¶
- class deepview.introspectors.DatasetReport(data, _report_save_data_path=PosixPath('report_save_data.pkl'))[source]¶
A report built to inspect a dataset for a given model from the perspective of fairness.
Like other
introspectors, useDatasetReport.introspectto instantiate, or load a saved report usingDatasetReport.from_disk.This report is particularly useful for introspecting datasets that have various class labels attached. See overall DatasetReport page in docs to learn more.
The following components can be run (default to all), configured using a
ReportConfig. - Summarize overall dataset, including by metadata labels, if they exist - Find near duplicate data samples, seeDuplicates- Find most / least representative data overall and per metadata label, seeFamiliarity- Project the data down to visualize overall in a 2D scatterplotThe input
Producerto this class’s instantiation is expected to havefieldsof model responses (likely a layer towards the end of the model but not the last response). These responses can come either from loading data and running it through a DeepViewModel, or by loading the responses directly from file into aProducer. In eachBatch's metadata, this report looks for identifiers and optional labels attached as metadata usingBatch.StdKeys.IDENTIFIERandBatch.StdKeys.LABELSmetadata keys.Note
For the moment, the
Batch.StdKeys.IDENTIFIERshould be a path to the image data.This class creates a
pandas.DataFramefull of the data needed to build the UI for theDatasetReport, which can then be exported into a standalone static site to explore. The different components built in the UI interact with each other.# Build all components of the dataset report using default configuration. # This output can then be used to visualize the results with Canvas: # (1) as a standalone web dashboard to explore interactively # (2) inline in a Jupyter notebook to explore interactively # Please see the Canvas documentation for an example: # https://satishlokkoju.github.io/deepview/ report = DatasetReport.introspect(producer)
- Parameters:
data (DataFrame) – do not instantiate
DatasetReportdirectly, useDatasetReport.introspect_report_save_data_path (Path)
- data: DataFrame¶
pandas.DataFrameof introspection results for responses and report components
- static from_disk(directory)[source]¶
Create
DatasetReportobject from a report save directory
- static introspect(producer, *, config=None, batch_size=1024)[source]¶
Build relevant
DatasetReportcomponents from inputProducer.- Parameters:
producer (Producer) – response producer (separate caching not needed as responses are cached in this function)
config (ReportConfig | None) – [keyword arg, optional]
ReportConfig. Set components toNoneto omit them from report.batch_size (int) – [keyword arg, optional] number of samples to batch at once
- Returns:
a
DatasetReportwhose results can be exported into different formats- Return type:
- class deepview.introspectors.ReportConfig(projection=<factory>, duplicates=<factory>, familiarity=<factory>, dim_reduction=None, split_familiarity_min=50)[source]¶
Configuration for which components to build into the
DatasetReport, and what strategies to use to build those components. Default config corresponds to running all components with default strategies (projection,duplicates, andfamiliarity).When running familiarity, “split” familiarity is also run, which means that a familiarity model is built for each label, for each label category, and then that subgroup of data is evaluated according to the model.
- Parameters:
projection (DimensionReductionStrategyType | Mapping[str, DimensionReductionStrategyType] | None) – [optional] see
projectionduplicates (DuplicatesConfig | None) – [optional] see
duplicatesfamiliarity (FamiliarityStrategyType | None) – [optional] see
familiaritydim_reduction (DimensionReductionStrategyType | Mapping[str, DimensionReductionStrategyType] | None) – [optional] see
dim_reductionsplit_familiarity_min (int) – [optional] see
split_familiarity_min
- dim_reduction: DimensionReductionStrategyType | Mapping[str, DimensionReductionStrategyType] | None = None¶
If None, default to
DimensionReduction.Strategy.PCAbefore runningfamiliarity,duplicates, and/orprojection`. Else provideDimensionReduction.Strategy.
- duplicates: DuplicatesConfig | None¶
Skip
Duplicatesif None, else provide aDuplicatesConfigthat combines both the threshold strategy and the algorithm strategy for finding duplicates. Default isSlopethreshold withKNNAnnoyalgorithm.
- familiarity: FamiliarityStrategyType | None¶
Skip
Familiarityif None, else provideFamiliarity.Strategyto apply to overall and split familiarity.
- property n_stages: int¶
How many stages of
multi introspectneed to be run (not counting stub intropectors)
- projection: DimensionReductionStrategyType | Mapping[str, DimensionReductionStrategyType] | None¶
Skip
projectionif None, else provide aDimensionReduction.Strategythat projects down to 2 dimensions, for visualization (default isDimensionReduction.Strategy.UMAP).
- split_familiarity_min: int = 50¶
If running
Familiarity, min data that must exist per-label for fitting individual models to subgroups of data determined by label (“split” familiarity).
Model Introspectors¶
Principal Filter Analysis¶
- class deepview.introspectors.PFA(failed_responses, _covariance_result_by_response)[source]¶
Like other
introspectors, usePFA.introspectto instantiate.Use PFA to discover highly correlated filter, or more generically unit, responses within layers of a neural network. Exploit data to guide network compression in order to decrease inference time and memory footprint while improving generalization. See the DeepView docs for more information.
- Parameters:
failed_responses (Sequence[str]) – do not instantiate
PFAdirectly, usePFA.introspect_covariance_result_by_response (Mapping[str, PFACovariancesResult])
- class Strategy[source]¶
Bundled PFA strategies. To implement a custom strategy, see
PFAStrategyType.- class Energy(energy_threshold, min_kept_count=0)¶
Energy strategy for generating PFA recipes – this targets a given
energy_thresholdto keep.- Parameters:
- class KL(interpolation_function=None)¶
KL strategy for generating PFA recipes.
- Parameters:
interpolation_function (KLInterpolationFunction | None) – [optional] the interpolation function to use, see
KLInterpolationFunction.
- class KLInterpolationFunction(*args, **kwargs)¶
A protocol to map a KL divergence to the ratio of the number of units in the layer. The KL divergence is that between the distribution of eigenvalues of the covariance matrix of model responses and the uniform distribution.
- class LinearInterpolation(*args, **kwargs)¶
A concrete
KLInterpolationFunctionfunction that performs its intended mapping by linearly interpolating [kl_divergence, max_kl_divergence] to [0, 1]
- class Size(relative_size, min_kept_count=0, epsilon_energy=1e-08)¶
Size strategy for generating PFA recipes – this targets a given
relative_sizeto produce a cross-layer energy threshold that will produce that result.- Parameters:
- class UnitSelectionStrategy[source]¶
Strategy for selecting the maximally correlated units. To implement a custom strategy, see
PFAUnitSelectionStrategyType.- class AbsMax¶
Given a correlation matrix, choose units based on the one with the greatest coefficient
- class AbsMin¶
Given a correlation matrix, choose units based on the one with the lowest coefficient
- class L1Max¶
Given a correlation matrix, choose units based on the one with the greatest L1 norm
- class L1Min¶
Given a correlation matrix, choose units based on the one with the lowest L1 norm
- class VisType[source]¶
Type of visualization modality for PFA, available to visualize via
PFA.show()
- failed_responses: Sequence[str]¶
The names of any responses that failed to generate output. This caused by layers with insufficient data to support the analysis.
- get_recipe(*, strategy=None, unit_strategy=None)[source]¶
Generate a recipe using the given algorithm and unit strategy. For more information refer to the PFA documentation page.
- Parameters:
strategy (PFAStrategyType | None) – [keyword arg, optional] The algorithm to use,
PFAStrategyType. The default value isPFA.Strategy.KLunit_strategy (PFAUnitSelectionStrategyType | None) – [keyword arg, optional] the
PFAUnitSelectionStrategyTypeto use, default isPFA.UnitSelectionStrategy.L1Max
- Returns:
a mapping from response name to
PFARecipefor the givenalgorithmandunit strategy.- Return type:
- static introspect(producer, *, batch_size=32, epsilon_inactive=1e-08)[source]¶
Perform Principal Filter Analysis on the responses (
fields) generated by theproducer.Caution
The responses generated by
producerare assumed to be 2D (Batch x C). Thus it might be necessary topipelinetogether theProducerwith aProcessor(e.g.,Pooler), that transforms each individual response from multi-dimensional to mono-dimensional.- Parameters:
producer (Producer) – The producer of the responses (in
fields) to be analyzedbatch_size (int) – [keyword arg, optional] the batch size to use when consuming the responses (via
batch.fields)epsilon_inactive (float) – [keyword arg, optional] factor used to identify inactive units (whose
var < epsilon_inactive * np.max(var))
- Returns:
an instance of
PFAthat can generatePFARecipesusing aPFAStrategyType(e.g.,PFA.Strategy.KL).- Return type:
- static show(recipe_result, *, vis_type='table', include_columns=None, exclude_columns=None)[source]¶
Create table or chart to visualize PFA results in iPython / Jupyter notebook.
Note: Requires pandas (
vis_typeisPFA.VisType.TABLE) or matplotlib (vis_typeisPFA.VisType.CHART), which can be installed withpip install "deepview[notebook]"- Parameters:
recipe_result (Mapping[str, PFARecipe] | Collection[Mapping[str, PFARecipe]]) – result of
pfa.get_recipe, mapping of layer toPFARecipe. When plotting forvis_typePFA.VisType.TABLE, a sequence oft.Mapping[str, PFARecipe]can be passed in to compare multiple results.vis_type (str) – [keyword arg, optional] determines visualization type.
PFA.VisType.TABLEfor pandas dataframe result orPFA.VisType.CHARTfor matplotlib pyplot of recommended vs. original unit countsinclude_columns (Sequence[str] | None) – [keyword arg, optional] For
vis_typeasPFA.VisType.TABLEonly. If included, only returnpandas.DataFramewith these columns. Defaults to include all columns (valueNone). Options are: [layer name,original count,recommended count,units to keep,KL divergence,PFA strategy,units ratio,kept energy].exclude_columns (Sequence[str] | None) – [keyword arg, optional] For
vis_typeasPFA.VisType.TABLEonly. If included, returnpandas.DataFramewithout these columns (irrelevant ifinclude_columnsis specified). Defaults toNone. Options are: [layer name,original count,recommended count,units to keep,KL divergence,PFA strategy,units ratio,kept energy].
- Returns:
pandas.DataFrameormatplotlib.axes.Axesof PFA results from inputrecipe_result- Return type:
- class deepview.introspectors.PFAKLDiagnostics(kl_divergence, units_ratio)[source]¶
Diagnostic information for
PFA.Strategy.KL- Parameters:
kl_divergence (float) – see
kl_divergenceunits_ratio (float) – see
units_ratio
- class deepview.introspectors.PFAEnergyDiagnostics(total_kept_energy)[source]¶
Diagnostic information for
PFA.Strategy.Energy- Parameters:
total_kept_energy (float) – see
total_kept_energy
- class deepview.introspectors.PFARecipe(original_output_count, recommended_output_count, maximally_correlated_units, number_inactive_units, diagnostics)[source]¶
Recommendation about a specific model response. This will likely never be instantiated directly, and instead an instance will be returned from
pfa.get_recipe.- Parameters:
original_output_count (int)
recommended_output_count (int)
number_inactive_units (int)
diagnostics (PFAKLDiagnostics | PFAEnergyDiagnostics | None)
- diagnostics: PFAKLDiagnostics | PFAEnergyDiagnostics | None¶
Per algorithm diagnostic information
Maximally correlated units found with this recommendation.
- class deepview.introspectors.PFAUnitSelectionStrategyType(*args, **kwargs)[source]¶
Given a correlation matrix and a number of units to keep, choose which units are maximally correlated.
- __call__(covariances, *, num_units_to_keep)[source]¶
- Parameters:
covariances (PFACovariancesResult) – the covariance data for the layer
num_units_to_keep (int) – [keyword arg, optional] number of recommended units to be kept
- Returns:
numpy.ndarraywith the list of indexes that corresponds to the unit that is maximally correlated (the first part of the list contains the indices of the inactive units). The number of inactive units can be found incovariances.inactive_units.shape[0]- Return type:
- class deepview.introspectors.PFAStrategyType(*args, **kwargs)[source]¶
Protocol for PFA strategies (
PFA.Strategy). These examine per-layerPFACovariancesResultand produces per-layerPFARecipe.Note
This takes all layers and produces a result for each of the layers, but the algorithm operates on each layer independently.
- __call__(covariances)[source]¶
- Parameters:
covariances (Mapping[str, PFACovariancesResult]) – mapping from layer name (
fieldname) toPFACovariancesResultfor that layer- Return type:
- class deepview.introspectors.PFACovariancesResult(covariances, eigenvalues, eigenvectors, original_output_count, inactive_units)[source]¶
Encapsulates the results of the covariance calculation
- Parameters:
covariances (ndarray) – see
covarianceseigenvalues (ndarray) – see
eigenvalueseigenvectors (ndarray) – see
eigenvectorsoriginal_output_count (int) – see
original_output_countinactive_units (ndarray) – see
inactive_units
- covariances: ndarray¶
The covariances matrix. This is a two dimensional square array of size
original_output_count.
- eigenvalues: ndarray¶
The eigenvalues of the covariances. This is a one dimensional array of size
original_output_count.
- eigenvectors: ndarray¶
The eigenvectors of the covariances. This is a two dimensional square array of size
original_output_count.
Inactive Unit Analysis¶
- class deepview.introspectors.IUA(_layer_counts, _unit_counts, _total_probe_counts)[source]¶
An introspector that evaluates responses to compute for inactive unit statistics.
Like other
introspectors, useIUA.introspectto instantiate.- Parameters:
- class Result(mean_inactive, std_inactive, inactive, unit_inactive_count, unit_inactive_proportion)[source]¶
Per-response
IUAResult- Parameters:
mean_inactive (float) – see
mean_inactivestd_inactive (float) – see
std_inactiveunit_inactive_count (Sequence[float]) – see
unit_inactive_countunit_inactive_proportion (Sequence[float]) – see
unit_inactive_proportion
- inactive: Sequence[float]¶
sequence tracking the number of inactive units in the layer per batch input, used to compute
mean_inactiveandstd_inactive
- class VisType[source]¶
Type of visualization modality for IUA, available to visualize via
IUA.show()
- static introspect(producer, *, batch_size=32, rtol=1e-05, atol=1e-08)[source]¶
Compute inactive unit statistics (mean, standard deviation, counts, and unit frequency) for each layer (
field) in the inputproducerof model responses.- Parameters:
producer (Producer) – The producer of the model responses to be introspected
batch_size (int) – [keyword arg, optional] number of inputs to pull from
producerat a timertol (float) – [keyword arg, optional] float relative tolerance parameter (see doc for
numpy.isclose()).atol (float) – [keyword arg, optional] float absolute tolerance parameter (see doc for
numpy.isclose()).
- Returns:
an
IUAinstance that can provide information about inactive units in the model- Return type:
- property results: Mapping[str, Result]¶
A per-layer
IUA.Resultencapsulating Inactive Unit Analysis results.
- static show(iua, *, vis_type='table', response_names=None)[source]¶
Create table or chart to visualize IUA results in iPython / Jupyter notebook.
- Note: Requires pandas
(
vis_typeisIUA.VisType.TABLE) or matplotlib (vis_typeisIUA.VisType.CHART), which can be installed withpip install "deepview[notebook]"
- Parameters:
iua (IUA) – result of
IUA.introspect(), instance ofIUAvis_type (str) – [keyword arg, optional] determines visualization type. IUA.VisType.TABLE for pandas dataframe result or IUA.VisType.CHART for matplotlib pyplot of inactive units
response_names (Sequence[str] | None) – [keyword arg, optional] For IUA.VisType.CHART vis. Sequence of responses (
fieldnames) to visualize (defaults to None for showing all responses)
- Returns:
pandas.DataFrameormatplotlib.axes.AxesofIUAresults- Return type: