Now Available

ENG Timestamp-Aligned
Speech-to-Text Dataset

Word-level & segment-level timestamp-aligned English speech data with paired translations. Built for real-time subtitles, simultaneous translation, and streaming ASR systems.

Data Format

Segment-Level Aligned JSONL

Each audio file is broken into subtitle-sized segments with precise start/end timestamps, source transcription, translated text, and word-level timing with confidence scores.

sample.jsonl

{ "audio_path": "wavs/013429.wav", "source_language": "english", "duration_seconds": 142.5, "segments": [ { "start": 0.000, "end": 4.230, "source_text": "We won't feel compelled...", "words": [ {"word": "We", "start": 0.00, "end": 0.15, "score": 0.98}, {"word": "won't", "start": 0.18, "end": 0.42, "score": 0.95} ] }, // ... more segments ] }

Delivered Formats

Multiple Formats Included

Every purchase includes the dataset in three ready-to-use formats.

.jsonl

Primary format — ML training & data pipelines

.srt

SubRip — universal subtitle player support

.vtt

WebVTT — HTML5 <video> & streaming

Dataset Scale

42K+

Samples

100K+

Audio Files

Word

Level Timestamps

Schema Reference

Data Fields

schema

── Per-file (top level) ── audio_path string Relative path to WAV file source_language string "english" duration_seconds float Total audio length (sec) num_segments int Number of aligned segments ── Per-segment ── start float Segment start (sec) end float Segment end (sec) source_text string English transcription translated_text string Aligned translation ── Per-word ── word string Individual word start float Word start (sec) end float Word end (sec) score float Alignment confidence (0–1)

License — What You Can Do

Use freely in any project — commercial products, internal tools, SaaS, research papers

Train & fine-tune any model — no royalties, no attribution required in production

Modify & derive — transform, augment, merge with your own data

Perpetual license — no expiration, no recurring fees, yours forever

What's Included

Audio format WAV 16kHz

Annotation JSONL + SRT + VTT

Alignment level Word + Segment

Delivery Digital download

Source license Documented per product

Built For

Use Cases

⚡

Real-Time Subtitles

Train streaming subtitle models with segment-level supervision and precise timing boundaries.

🌐

Simultaneous Translation

Segment boundaries define optimal "when to translate" points for live interpretation systems.

🎯

ASR Fine-Tuning

Word-level timestamps and confidence scores for training forced-alignment and recognition models.

📊

Translation Evaluation

Per-segment BLEU/COMET scoring — measure quality degradation across long audio inputs.

aizuchi

ada-dataset vol.002

ada-dataset vol.002

通常価格 ¥11,000 JPY

通常価格セール価格 ¥11,000 JPY
ada-dataset vol.001

ada-dataset vol.001

通常価格 ¥11,000 JPY

通常価格セール価格 ¥11,000 JPY

カートにアイテムが追加されました

ENG Timestamp-Aligned
Speech-to-Text Dataset

Segment-Level Aligned JSONL

Multiple Formats Included

Data Fields

Use Cases

Real-Time Subtitles

Simultaneous Translation

ASR Fine-Tuning

Translation Evaluation

aizuchi

ada-dataset vol.002

ada-dataset vol.002

ada-dataset vol.001

ada-dataset vol.001

国/地域

国/地域

ENG Timestamp-AlignedSpeech-to-Text Dataset

Segment-Level Aligned JSONL

Multiple Formats Included

Data Fields

Use Cases

Real-Time Subtitles

Simultaneous Translation

ASR Fine-Tuning

Translation Evaluation

aizuchi

ada-dataset vol.002

ada-dataset vol.002

ada-dataset vol.001

ada-dataset vol.001

ENG Timestamp-Aligned
Speech-to-Text Dataset