dataget.kaggle

Download any dataset from the Kaggle platform and immediately loads it into memory:

import dataget

df_train, df_test = dataget.kaggle(dataset="cristiangarcia/pointcloudmnist2d").get(
    files=["train.csv", "test.csv"]
)

In this example we downloaded the Point Cloud Mnist 2D dataset from Kaggle and load the train.csv and test.csv files as pandas dataframes.

Config

To start using this Dataset make sure you have properly installed and configured the Kaggle API.

Supported Formats

Right now we only support the csv format. In the future we want to be able to load any file that numpy or pandas can read.

API Reference

kaggle

__init__(self, dataset=None, competition=None, **kwargs)

Show source code in kaggle.py
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
    def __init__(self, dataset: str = None, competition: str = None, **kwargs):
        """
        Create a Kaggle dataset. You have to specify either `dataset` or `competition`.

        Arguments:
            dataset: the id of the kaggle dataset in the format `username/dataset_name`.
            competition: the name of the kaggle competition.
            kwargs: common init kwargs.
        """
        assert (
            dataset is not None != competition is not None
        ), "Set either dataset or competition"

        self.kaggle_dataset = dataset
        self.kaggle_competition = competition

        super().__init__(**kwargs)

Create a Kaggle dataset. You have to specify either dataset or competition.

Parameters

Name Type Description Default
dataset str the id of the kaggle dataset in the format username/dataset_name. None
competition str the name of the kaggle competition. None
**kwargs _empty common init kwargs. {}

load(self, files)

Show source code in kaggle.py
51
52
53
54
55
56
57
    def load(self, files: list):
        """
        Arguments:
            files: the list of files that will be loaded into memory
        """

        return [self._load_file(filename) for filename in files]

Parameters

Name Type Description Default
files list the list of files that will be loaded into memory required