Creating a Dataset¶
Creating a new dataset is relatively easy, the Dataset
class only defined these 3 abstract methods which you must implement:
name
: a property that returns the folder name of the dataset e.g.image_mnist
.download
: a method that downloads the data to disk and possibly perform other tasks such as file extraction, organization, and cleanup.load
: the method that loads the data into memory and structures it in the most convenient format for the user.
Path
The self.path
field is a pathlib.Path
that tells the dataset where the data should be stored. The get
method ensures this path exists before calling download
or load
; use this field when implementing these methods.
get kwargs¶
The get
method will accept **kwargs
which it will forward to load
. For example:
def load(self, dtype=np.float32): # code
With this implementation the get
method can be called like this:
.get(dtype=np.uint8)
Template¶
You can use this template to get started.
from dataget.dataset import Dataset class some_dataset(Dataset): # OPTIONAL def __init__(self, init_arg, **kwargs): # code super().__init__(**kwargs) # !!IMPORTANT @property def name(self): return "{dataset_type}_{dataset_name}" def download(self): # code def load(self, some_arg): # code return a, b, c, ...
Warning
If you are defining your own __init__
remenber to always forward **kwargs
to super().__init__
since its very important that all datasets support the path
and global_cache
keyword arguments defined in the Dataset
class. If super().__init__
is not called at all the path
field will not be instantiated and errors will occure.