Now Available

ENG Timestamp-Aligned
Speech-to-Text Dataset

Word-level & segment-level timestamp-aligned English speech data with paired translations. Built for real-time subtitles, simultaneous translation, and streaming ASR systems.

Data Format

Segment-Level Aligned JSONL

Each audio file is broken into subtitle-sized segments with precise start/end timestamps, source transcription, translated text, and word-level timing with confidence scores.

sample.jsonl
{ "audio_path": "wavs/013429.wav", "source_language": "english", "duration_seconds": 142.5, "segments": [ { "start": 0.000, "end": 4.230, "source_text": "We won't feel compelled...", "words": [ {"word": "We", "start": 0.00, "end": 0.15, "score": 0.98}, {"word": "won't", "start": 0.18, "end": 0.42, "score": 0.95} ] }, // ... more segments ] }
Delivered Formats

Multiple Formats Included

Every purchase includes the dataset in three ready-to-use formats.

.jsonl
Primary format — ML training & data pipelines
.srt
SubRip — universal subtitle player support
.vtt
WebVTT — HTML5 <video> & streaming
Dataset Scale
42K+
Samples
100K+
Audio Files
Word
Level Timestamps
Schema Reference

Data Fields

schema
── Per-file (top level) ── audio_path string Relative path to WAV file source_language string "english" duration_seconds float Total audio length (sec) num_segments int Number of aligned segments ── Per-segment ── start float Segment start (sec) end float Segment end (sec) source_text string English transcription translated_text string Aligned translation ── Per-word ── word string Individual word start float Word start (sec) end float Word end (sec) score float Alignment confidence (0–1)
License — What You Can Do
Use freely in any project — commercial products, internal tools, SaaS, research papers
Train & fine-tune any model — no royalties, no attribution required in production
Modify & derive — transform, augment, merge with your own data
Perpetual license — no expiration, no recurring fees, yours forever

Buy once. Use everywhere. No strings.

Full license terms included with delivery. Per-product source license documented.

What's Included
Audio format WAV 16kHz
Annotation JSONL + SRT + VTT
Alignment level Word + Segment
Delivery Digital download
Source license Documented per product
Built For

Use Cases

Real-Time Subtitles

Train streaming subtitle models with segment-level supervision and precise timing boundaries.

🌐

Simultaneous Translation

Segment boundaries define optimal "when to translate" points for live interpretation systems.

🎯

ASR Fine-Tuning

Word-level timestamps and confidence scores for training forced-alignment and recognition models.

📊

Translation Evaluation

Per-segment BLEU/COMET scoring — measure quality degradation across long audio inputs.