deepview_data

Support for custom image datasets in DeepView

class deepview_data.CustomDatasets[source]

Bases: object

Custom Datasets, each bundled as a DeepView Producer.

class ImageFolderDataset(root_folder, image_size=(64, 64), train_split=0.8, valid_extensions=None, max_samples=-1, write_to_folder=False)

Bases: Producer, _Logged

A dataset that loads images from a directory structure where each subdirectory represents a class.

Example directory structure: root_folder/

class1/

image1.jpg image2.jpg

class2/

image3.jpg image4.jpg

Parameters:
  • root_folder – Path to the root directory containing class subdirectories

  • image_size – Tuple of (height, width) to resize images to

  • train_split – Fraction of data to use for training (default: 0.8)

  • valid_extensions – List of valid file extensions to include (default: [‘.jpg’, ‘.jpeg’, ‘.png’])

  • max_samples – Maximum number of samples to load (-1 for all, default: -1)

__call__(batch_size)

Produce generic Batch es from the loaded data, running through training and test sets.

Parameters:

batch_size (int) – the length of batches to produce

Returns:

yields Batches of the split_dataset of size batch_size. If self.attach_metadata is True, attaches metadata in format:

  • Batch.StdKeys.IDENTIFIER: Use pathname as the identifier for each data sample, excluding base data directory

  • Batch.StdKeys.LABELS: A dict with:
    • ”label”: a NumPy array of label features (format specific to each dataset)

    • ”dataset”: a NumPy array of ints either 0 (for “train”) or 1 (for “test”)

Return type:

Iterable[Batch]

attach_metadata: bool = True

Whether to attach metadata to batches (e.g., labels) or not.

image_size: Tuple[int, int]
max_samples: int = -1

Max data samples to pull from. Set to -1 to pull all samples.

raw_dataset: Tuple[Tuple[ndarray, ndarray], Tuple[ndarray, ndarray]] = Field(name=None,type=None,default=<dataclasses._MISSING_TYPE object>,default_factory=<dataclasses._MISSING_TYPE object>,init=False,repr=True,hash=None,compare=True,metadata=mappingproxy({}),_field_type=None)
root_folder: str
train_split: float
valid_extensions: List[str]
write_to_folder: bool = False

bool to write data to folder for visualization. If False, does not write anything.