ENG Timestamp-Aligned
Speech-to-Text Dataset
Word-level & segment-level timestamp-aligned English speech data with paired translations. Built for real-time subtitles, simultaneous translation, and streaming ASR systems.
Segment-Level Aligned JSONL
Each audio file is broken into subtitle-sized segments with precise start/end timestamps, source transcription, translated text, and word-level timing with confidence scores.
Multiple Formats Included
Every purchase includes the dataset in three ready-to-use formats.
Data Fields
Use Cases
Real-Time Subtitles
Train streaming subtitle models with segment-level supervision and precise timing boundaries.
Simultaneous Translation
Segment boundaries define optimal "when to translate" points for live interpretation systems.
ASR Fine-Tuning
Word-level timestamps and confidence scores for training forced-alignment and recognition models.
Translation Evaluation
Per-segment BLEU/COMET scoring — measure quality degradation across long audio inputs.