thunder出版
ada-dataset vol.002
ada-dataset vol.002
291個の在庫
Timestamp-aligned English speech-to-text translation data, delivered on a standard CD-ROM. Each disc contains 5–6 long-form English audio recordings (WAV, 16kHz mono) with word-level and segment-level alignment annotations — ready for real-time subtitle training, simultaneous translation research, and ASR fine-tuning.
What's on the disc
| Audio | 5–6 WAV files, ~10 min each, totaling ~50–60 min per disc |
| Annotations | JSONL — segment-level timestamps, source transcript, aligned translation, word-level timing with confidence scores |
| Subtitles | SRT + VTT — bilingual subtitle files for every audio track |
| Capacity | ~580–700 MB per disc |
Data structure per audio file
{
"audio_path": "wavs/013429.wav",
"source_language": "english",
"duration_seconds": 142.5,
"segments": [
{
"start": 0.000,
"end": 4.230,
"source_text": "We won't feel compelled...",
"words": [
{"word": "We", "start": 0.00, "end": 0.15, "score": 0.98},
{"word": "won't", "start": 0.18, "end": 0.42, "score": 0.95}
]
}
]
}
Series & catalog
This disc is part of an ongoing CD series. Each volume is a self-contained dataset — no other volumes are required. New volumes ship regularly, covering different speech domains: business meetings, news broadcasts, lectures, and casual conversation. Collect them individually or build a comprehensive corpus over time.
License
Once you purchase this CD, the data is yours to use — freely and permanently.
- Commercial use — products, SaaS, internal tools, client work
- Model training — fine-tune, distill, or train from scratch. No royalties
- Modify & derive — transform, augment, merge with your own datasets
- No expiration — perpetual license, no recurring fees, no strings attached
Source dataset license is documented in the included LICENSE file on each disc.
Specs
| Media | CD-ROM (700 MB) |
| Audio format | WAV 16kHz 16-bit mono |
| Audio length | ~50–60 min per disc |
| Annotation | JSONL + SRT + VTT |
| Alignment | Word-level + segment-level timestamps |
| Shipping | Physical CD |
Who this is for
- ML engineers building real-time subtitle or live translation systems
- Researchers benchmarking long-form speech translation models
- Teams training or evaluating ASR with forced-alignment ground truth
- Anyone who wants clean, timestamp-aligned English speech data they actually own
Data structure per audio file JA Translated
{ "audio_path": "wavs/013429.wav", "source_language": "english","target_language": "japanese","duration_seconds": 142.5, "segments": [ { "start": 0.000, "end": 4.230,"source_text": "We won't feel compelled...","source_text": "私たちは強制されることはないだろう…","words": [ {"word": "We", "start": 0.00, "end": 0.15, "score": 0.98}, {"word": "won't", "start": 0.18, "end": 0.42, "score": 0.95} ] } ] }
受取状況を読み込めませんでした