dataget.kaggle¶
Download any dataset from the Kaggle platform and immediately loads it into memory:
import dataget df_train, df_test = dataget.kaggle(dataset="cristiangarcia/pointcloudmnist2d").get( files=["train.csv", "test.csv"] )
In this example we downloaded the Point Cloud Mnist 2D dataset from Kaggle and load the train.csv and test.csv files as pandas dataframes.
Config
To start using this Dataset make sure you have properly installed and configured the Kaggle API.
Supported Formats¶
Right now we only support the csv format. In the future we want to be able to load any file that numpy or pandas can read.
API Reference¶
kaggle¶
__init__(self, dataset=None, competition=None, **kwargs)¶
Show source code in kaggle.py
16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 | def __init__(self, dataset: str = None, competition: str = None, **kwargs): """ Create a Kaggle dataset. You have to specify either `dataset` or `competition`. Arguments: dataset: the id of the kaggle dataset in the format `username/dataset_name`. competition: the name of the kaggle competition. kwargs: common init kwargs. """ assert ( dataset is not None != competition is not None ), "Set either dataset or competition" self.kaggle_dataset = dataset self.kaggle_competition = competition super().__init__(**kwargs) |
Create a Kaggle dataset. You have to specify either dataset or competition.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
dataset |
str |
the id of the kaggle dataset in the format username/dataset_name. |
None |
competition |
str |
the name of the kaggle competition. | None |
**kwargs |
_empty |
common init kwargs. | {} |
load(self, files)¶
Show source code in kaggle.py
51 52 53 54 55 56 57 | def load(self, files: list): """ Arguments: files: the list of files that will be loaded into memory """ return [self._load_file(filename) for filename in files] |
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
files |
list |
the list of files that will be loaded into memory | required |