dataget.audio.free_spoken_digit¶
Downloads the Free Spoken Digits dataset and loads its metadata as pandas dataframes. The audio samples are as .wav files.
import dataget df = dataget.audio.free_spoken_digit().get()
df dataframe has the audio_path column which contains the relative path of each sample. You can easily load them using scipy.io.wavfile.read.
Tip
Its recommended that you split train / test based on user instead of randomly to avoid testing based on similar samples found in training.
Format¶
| type | shape | |
|---|---|---|
| df | pd.DataFrame | (2_000, 4) |
Features¶
| column | type | description |
|---|---|---|
| audio_path | str |
Relative path of the audio file |
| label | int64 |
Target label in the range [0, 9] |
| user | str |
Name of the speaker |
| repetition | int64 |
Repetition number for each (user, label) pair, i.e. each user repeats each digit multiple times |
Info¶
- Folder name:
audio_free_spoken_digit - Size on disk:
26MB