API Reference

Dataset

__init__(self, path=None, global_cache=False)

Show source code in dataget/dataset.py
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
    def __init__(self, path: Path = None, global_cache: bool = False):
        """
        By default every dataset is downloaded inside `./data/{dataset_name}` in the current directory, however, you can use the the parameters from the base `dataget.Dataset` class constructor to constrol where the data is stored.

        Parameters:
            path: if set defines the exact location where the dataset will be stored. Takes precedence over `global_cache`.
            global_cache: if `True` the data is downloaded to `~/.dataget/{dataset_name}` instead. Use this to reuse datasets across projects.

        ### Examples

        Setting `global_cache=True` on any dataset constructor downloads the data to global folder:

        ```python
        dataget.image.mnist(global_cache=True).get()
        ```

        By setting the `path` argument you can specify the exact location for the dataset:

        ```python
        dataget.image.mnist(path="/my/dataset/path").get()
        ```
        """

        if path and not isinstance(path, Path):
            path = Path(path)

        if path:
            pass
        elif global_cache:
            path = Path("~").expanduser() / ".dataget" / self.name
        else:
            path = Path("data") / self.name

        self.path = path

By default every dataset is downloaded inside ./data/{dataset_name} in the current directory, however, you can use the the parameters from the base dataget.Dataset class constructor to constrol where the data is stored.

Parameters

Name Type Description Default
path Path if set defines the exact location where the dataset will be stored. Takes precedence over global_cache. None
global_cache bool if True the data is downloaded to ~/.dataget/{dataset_name} instead. Use this to reuse datasets across projects. False

Examples

Setting global_cache=True on any dataset constructor downloads the data to global folder:

dataget.image.mnist(global_cache=True).get()

By setting the path argument you can specify the exact location for the dataset:

dataget.image.mnist(path="/my/dataset/path").get()

get(self, clean=False, _debug=False, **kwargs)

Show source code in dataget/dataset.py
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
    def get(self, clean: bool = False, _debug: bool = False, **kwargs):
        """
        Downloads and load the dataset into memory.

        Parameters:
            clean: deletes the dataset folder and forces a new download of the data.
            kwargs: all keyword arguments are forwarded to the `load` method. Consult the documentation on a specific dataset to see which options are available.

        """

        if clean or not self.is_valid():

            if not _debug:
                shutil.rmtree(self.path, ignore_errors=True)
                self.path.mkdir(parents=True)

            # get data
            coro = self.download()

            if coro is not None:
                asyncio.run(coro)

            # mark as valid
            (self.path / ".valid").touch()

        return self.load(**kwargs)

Downloads and load the dataset into memory.

Parameters

Name Type Description Default
clean bool deletes the dataset folder and forces a new download of the data. False
**kwargs _empty all keyword arguments are forwarded to the load method. Consult the documentation on a specific dataset to see which options are available. {}