dataget.text.imdb_reviews¶
Downloads the IMDB Reviews dataset and loads it as pandas
dataframes.
import dataget df_train, df_test = dataget.text.imdb_reviews().get()
include_unsupervised
argument:
import dataget df_train, df_test = dataget.text.imdb_reviews().get(include_unsupervised=True)
-1
.
Format¶
type | shape | |
---|---|---|
df_train | pd.DataFrame | (75_000, 3) |
df_test | pd.DataFrame | (25_000, 3) |
Features¶
column | type |
---|---|
text | str |
label | int64 |
text_path | str |
Info¶
- Folder name:
text_imdb_reviews
- Size on disk:
490MB
API Reference¶
imdb_reviews
¶
load(self, include_unlabeled=False)
¶
Show source code in imdb_reviews.py
32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 | def load(self, include_unlabeled=False): """ Arguments: include_unlabeled: whether or not to include the unlabeled samples. """ train_path = self.path / "aclImdb" / "train" test_path = self.path / "aclImdb" / "test" # train df_train = [ self.load_df(train_path / "pos", label=1), self.load_df(train_path / "neg", label=0), ] if include_unlabeled: df_train.append(self.load_df(train_path / "unsup", label=-1)) df_train = pd.concat(df_train, axis=0) # test df_test = pd.concat( [ self.load_df(test_path / "pos", label=1), self.load_df(test_path / "neg", label=0), ], axis=0, ) return df_train, df_test |
Parameters
Name | Type | Description | Default |
---|---|---|---|
include_unlabeled |
_empty |
whether or not to include the unlabeled samples. | False |