.. _dimension_reduction: Dimension Reduction =================== DeepView provides a :class:`DimensionReduction ` introspector with a variety of strategies (algorithms). `DimensionReduction` has two primary uses: - reduce high-dimensional data to something lower for consumption by a different :class:`Introspector ` - reduce data to 2D or 3D for visualization (e.g. :ref:`Dataset Report `). Often, model responses are very large in the number of dimensions. However, some algorithms work better on lower dimensional data. For example :class:`Familiarity ` and even the :class:`DimensionReduction Strategies ` other than `PCA` work better on e.g. 40 dimensional data. Some of the algorithms state this is useful for reducing the noise in very high dimensional data. `PCA` (Principal Component Analysis) is a great strategy to perform this reduction. The notebook :ref:`Dimension Reduction Example Notebook ` below gives an example of reducing high dimension data for use with various `DimensionReduction` strategies. `DimensionReduction` to 2D is also a nice way to visualize the clusters and relationships in the data. `UMAP`, `PaCMAP` and `t-SNE` are all algorithms that are well suited to this task. The notebook :ref:`below ` also shows examples of doing this. General Usage ------------- For getting started with DeepView code, please see the :ref:`how-to pages `. .. code-block:: python # a source of embeddings (typically high dimensional data) response_producer = pipeline(...) # first, create a dimension reduction `PipelineStage` object (`reducer`, here) that is fit # to the input data and will be able to project any data to a lower number of dimensions reducer = DimensionReduction.introspect( response_producer, strategies=DimensionReduction.Strategy.PCA(40) ) # Next, chain the reducer PipelineStage into a new pipeline that will reduce all output data # from `response_producer` into 40 dimensions reduced_producer = pipeline(response_producer, reducer) See the :ref:`example notebook ` below for more detailed usage. Config Options -------------- DeepView comes with four :class:`Strategies ` for performing dimension reduction, each with their own advantages and disadvantages: - :class:`PCA ` - very fast and good for reducing e.g. 1024 -> 40 dimensions - memory efficient - not suitable for 2D projection - :class:`UMAP ` - excellent 2D projections - preserves local but not global structure - :class:`PaCMAP ` - excellent 2D projections - preserves local and global structure - :class:`TSNE (t-SNE) ` - largely replaced by newer strategies For a more in-depth comparison, please see :ref:`the example notebook ` below. Relevant API ------------ - :class:`DimensionReduction `: introspector for Dimension Reduction .. _dimensionreduction_example: Example ------- .. toctree:: :maxdepth: 1 Jupyter Notebook: Dimension Reduction Strategies <../../notebooks/data_introspection/dimension_reduction.ipynb> References ---------- - `UMAP documentation `_ - `UMAP paper `_ - `Understanding UMAP `_ - `PaCMAP `_ - `PaCMAP paper `_ - `t-SNE documentation `_ - `Understanding t-SNE `_