dataget.kaggle¶
Download any dataset from the Kaggle platform and immediately loads it into memory:
import dataget df_train, df_test = dataget.kaggle(dataset="cristiangarcia/pointcloudmnist2d").get( files=["train.csv", "test.csv"] )
In this example we downloaded the Point Cloud Mnist 2D dataset from Kaggle and load the train.csv
and test.csv
files as pandas
dataframes.
Config
To start using this Dataset
make sure you have properly installed and configured the Kaggle API.
Supported Formats¶
Right now we only support the csv
format. In the future we want to be able to load any file that numpy
or pandas
can read.
API Reference¶
kaggle
¶
__init__(self, dataset=None, competition=None, **kwargs)
¶
Show source code in kaggle.py
16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 | def __init__(self, dataset: str = None, competition: str = None, **kwargs): """ Create a Kaggle dataset. You have to specify either `dataset` or `competition`. Arguments: dataset: the id of the kaggle dataset in the format `username/dataset_name`. competition: the name of the kaggle competition. kwargs: common init kwargs. """ assert ( dataset is not None != competition is not None ), "Set either dataset or competition" self.kaggle_dataset = dataset self.kaggle_competition = competition super().__init__(**kwargs) |
Create a Kaggle dataset. You have to specify either dataset
or competition
.
Parameters
Name | Type | Description | Default |
---|---|---|---|
dataset |
str |
the id of the kaggle dataset in the format username/dataset_name . |
None |
competition |
str |
the name of the kaggle competition. | None |
**kwargs |
_empty |
common init kwargs. | {} |
load(self, files)
¶
Show source code in kaggle.py
51 52 53 54 55 56 57 | def load(self, files: list): """ Arguments: files: the list of files that will be loaded into memory """ return [self._load_file(filename) for filename in files] |
Parameters
Name | Type | Description | Default |
---|---|---|---|
files |
list |
the list of files that will be loaded into memory | required |