deepview_data

Support for custom image datasets in DeepView

class deepview_data.CustomDatasets[source]

Bases: object

Custom Datasets, each bundled as a DeepView Producer.

class ImageFolderDataset(root_folder, image_size=(64, 64), train_split=0.8, valid_extensions=None, max_samples=-1, write_to_folder=False)

Bases: Producer, _Logged

A dataset that loads images from a directory structure where each subdirectory represents a class.

Example directory structure:

root_folder/
class1/

image1.jpg image2.jpg

class2/

image3.jpg image4.jpg

Parameters:
  • root_folder (str) – Path to the root directory containing class subdirectories

  • image_size (Tuple[int, int]) – Tuple of (height, width) to resize images to

  • train_split (float) – Fraction of data to use for training (default: 0.8)

  • valid_extensions (List[str]) – List of valid file extensions to include (default: [‘.jpg’, ‘.jpeg’, ‘.png’])

  • max_samples (int) – Maximum number of samples to load (-1 for all, default: -1)

  • write_to_folder (bool)

__call__(batch_size)

Produce generic Batch es from the loaded data, running through training and test sets.

Parameters:

batch_size (int) – the length of batches to produce

Returns:

yields Batches of the split_dataset of size batch_size. If self.attach_metadata is True, attaches metadata in format:

  • Batch.StdKeys.IDENTIFIER: Use pathname as the identifier for each data sample, excluding base data directory

  • Batch.StdKeys.LABELS: A dict with:
    • ”label”: a NumPy array of label features (format specific to each dataset)

    • ”dataset”: a NumPy array of ints either 0 (for “train”) or 1 (for “test”)

Return type:

Iterable[Batch]

attach_metadata: bool = True

Whether to attach metadata to batches (e.g., labels) or not.

cleanup()

Explicitly clean up the dataset folder created by this instance.

This method attempts to delete the dataset folder if it exists and was created by this instance (write_to_folder=True). It uses a robust approach to handle potential file system locks.

Returns:

True if cleanup was successful or not needed, False if cleanup failed

Return type:

bool

image_size: Tuple[int, int]
max_samples: int = -1

Max data samples to pull from. Set to -1 to pull all samples.

raw_dataset: Tuple[Tuple[ndarray, ndarray], Tuple[ndarray, ndarray]] = Field(name=None,type=None,default=<dataclasses._MISSING_TYPE object>,default_factory=<dataclasses._MISSING_TYPE object>,init=False,repr=True,hash=None,compare=True,metadata=mappingproxy({}),kw_only=<dataclasses._MISSING_TYPE object>,_field_type=None)
root_folder: str
train_split: float
valid_extensions: List[str]
write_to_folder: bool = False

bool to write data to folder for visualization. If False, does not write anything.