dataget.text.imdb_reviews
Downloads the IMDB Reviews dataset and loads it as pandas dataframes.
import dataget
df_train, df_test = dataget.text.imdb_reviews().get()
This dataset also contains unsupervised sample, to load them set the include_unsupervised argument:
import dataget
df_train, df_test = dataget.text.imdb_reviews().get(include_unsupervised=True)
All unsupervised sample will have a label of -1.
|
type |
shape |
| df_train |
pd.DataFrame |
(75_000, 3) |
| df_test |
pd.DataFrame |
(25_000, 3) |
Features
| column |
type |
| text |
str |
| label |
int64 |
| text_path |
str |
Info
- Folder name:
text_imdb_reviews
- Size on disk:
490MB
API Reference
imdb_reviews
load(self, include_unlabeled=False)
Show source code in imdb_reviews.py
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60 | def load(self, include_unlabeled=False):
"""
Arguments:
include_unlabeled: whether or not to include the unlabeled samples.
"""
train_path = self.path / "aclImdb" / "train"
test_path = self.path / "aclImdb" / "test"
# train
df_train = [
self.load_df(train_path / "pos", label=1),
self.load_df(train_path / "neg", label=0),
]
if include_unlabeled:
df_train.append(self.load_df(train_path / "unsup", label=-1))
df_train = pd.concat(df_train, axis=0)
# test
df_test = pd.concat(
[
self.load_df(test_path / "pos", label=1),
self.load_df(test_path / "neg", label=0),
],
axis=0,
)
return df_train, df_test
|
Parameters
| Name |
Type |
Description |
Default |
include_unlabeled |
_empty |
whether or not to include the unlabeled samples. |
False |