API Reference¶
Dataset
¶
__init__(self, path=None, global_cache=False)
¶
Show source code in dataget/dataset.py
16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 | def __init__(self, path: Path = None, global_cache: bool = False): """ By default every dataset is downloaded inside `./data/{dataset_name}` in the current directory, however, you can use the the parameters from the base `dataget.Dataset` class constructor to constrol where the data is stored. Parameters: path: if set defines the exact location where the dataset will be stored. Takes precedence over `global_cache`. global_cache: if `True` the data is downloaded to `~/.dataget/{dataset_name}` instead. Use this to reuse datasets across projects. ### Examples Setting `global_cache=True` on any dataset constructor downloads the data to global folder: ```python dataget.image.mnist(global_cache=True).get() ``` By setting the `path` argument you can specify the exact location for the dataset: ```python dataget.image.mnist(path="/my/dataset/path").get() ``` """ if path and not isinstance(path, Path): path = Path(path) if path: pass elif global_cache: path = Path("~").expanduser() / ".dataget" / self.name else: path = Path("data") / self.name self.path = path |
By default every dataset is downloaded inside ./data/{dataset_name}
in the current directory, however, you can use the the parameters from the base dataget.Dataset
class constructor to constrol where the data is stored.
Parameters
Name | Type | Description | Default |
---|---|---|---|
path |
Path |
if set defines the exact location where the dataset will be stored. Takes precedence over global_cache . |
None |
global_cache |
bool |
if True the data is downloaded to ~/.dataget/{dataset_name} instead. Use this to reuse datasets across projects. |
False |
Examples¶
Setting global_cache=True
on any dataset constructor downloads the data to global folder:
dataget.image.mnist(global_cache=True).get()
By setting the path
argument you can specify the exact location for the dataset:
dataget.image.mnist(path="/my/dataset/path").get()
get(self, clean=False, _debug=False, **kwargs)
¶
Show source code in dataget/dataset.py
51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 | def get(self, clean: bool = False, _debug: bool = False, **kwargs): """ Downloads and load the dataset into memory. Parameters: clean: deletes the dataset folder and forces a new download of the data. kwargs: all keyword arguments are forwarded to the `load` method. Consult the documentation on a specific dataset to see which options are available. """ if clean or not self.is_valid(): if not _debug: shutil.rmtree(self.path, ignore_errors=True) self.path.mkdir(parents=True) # get data coro = self.download() if coro is not None: asyncio.run(coro) # mark as valid (self.path / ".valid").touch() return self.load(**kwargs) |
Downloads and load the dataset into memory.
Parameters
Name | Type | Description | Default |
---|---|---|---|
clean |
bool |
deletes the dataset folder and forces a new download of the data. | False |
**kwargs |
_empty |
all keyword arguments are forwarded to the load method. Consult the documentation on a specific dataset to see which options are available. |
{} |