Creating a Dataset

Creating a new dataset is relatively easy, the Dataset class only defined these 3 abstract methods which you must implement:

  • name: a property that returns the folder name of the dataset e.g. image_mnist.
  • download: a method that downloads the data to disk and possibly perform other tasks such as file extraction, organization, and cleanup.
  • load: the method that loads the data into memory and structures it in the most convenient format for the user.

Path

The self.path field is a pathlib.Path that tells the dataset where the data should be stored. The get method ensures this path exists before calling download or load; use this field when implementing these methods.

get kwargs

The get method will accept **kwargs which it will forward to load. For example:

def load(self, dtype=np.float32):
    # code

With this implementation the get method can be called like this:

.get(dtype=np.uint8)

Template

You can use this template to get started.

from dataget.dataset import Dataset

class some_dataset(Dataset):

    # OPTIONAL
    def __init__(self, init_arg, **kwargs):
        # code
        super().__init__(**kwargs) # !!IMPORTANT

    @property
    def name(self):
        return "{dataset_type}_{dataset_name}"

    def download(self):
        # code 

    def load(self, some_arg):
        # code
        return a, b, c, ...

Warning

If you are defining your own __init__ remenber to always forward **kwargs to super().__init__ since its very important that all datasets support the path and global_cache keyword arguments defined in the Dataset class. If super().__init__ is not called at all the path field will not be instantiated and errors will occure.