{"title":"Ada","description":"","products":[{"product_id":"ada-dataset-part1","title":"ada-dataset vol.001","description":"\u003cdiv class=\"ds-product-desc\"\u003e\n\u003cp\u003eTimestamp-aligned English speech-to-text translation data, delivered on a standard CD-ROM. Each disc contains 5–6 long-form English audio recordings (WAV, 16kHz mono) with word-level and segment-level alignment annotations — ready for real-time subtitle training, simultaneous translation research, and ASR fine-tuning.\u003c\/p\u003e\n\u003ch3\u003eWhat's on the disc\u003c\/h3\u003e\n\u003ctable style=\"width: 100.087%; height: 97.9689px;\"\u003e\n\u003ctbody\u003e\n\u003ctr style=\"height: 19.5938px;\"\u003e\n\u003ctd style=\"width: 14.5689%; height: 19.5938px;\"\u003e\u003cstrong\u003eAudio\u003c\/strong\u003e\u003c\/td\u003e\n\u003ctd style=\"width: 82.5716%; height: 19.5938px;\"\u003e5–6 WAV files, ~10 min each, totaling ~50–60 min per disc\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr style=\"height: 39.1875px;\"\u003e\n\u003ctd style=\"width: 14.5689%; height: 39.1875px;\"\u003e\u003cstrong\u003eAnnotations\u003c\/strong\u003e\u003c\/td\u003e\n\u003ctd style=\"width: 82.5716%; height: 39.1875px;\"\u003eJSONL — segment-level timestamps, source transcript, aligned translation, word-level timing with confidence scores\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr style=\"height: 19.5938px;\"\u003e\n\u003ctd style=\"width: 14.5689%; height: 19.5938px;\"\u003e\u003cstrong\u003eSubtitles\u003c\/strong\u003e\u003c\/td\u003e\n\u003ctd style=\"width: 82.5716%; height: 19.5938px;\"\u003eSRT + VTT — bilingual subtitle files for every audio track\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr style=\"height: 19.5938px;\"\u003e\n\u003ctd style=\"width: 14.5689%; height: 19.5938px;\"\u003e\u003cstrong\u003eCapacity\u003c\/strong\u003e\u003c\/td\u003e\n\u003ctd style=\"width: 82.5716%; height: 19.5938px;\"\u003e~580–700 MB per disc\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003c\/tbody\u003e\n\u003c\/table\u003e\n\u003ch3\u003eData structure per audio file\u003c\/h3\u003e\n\u003cpre\u003e\u003ccode\u003e{\n  \"audio_path\": \"wavs\/013429.wav\",\n  \"source_language\": \"english\",\n  \"duration_seconds\": 142.5,\n  \"segments\": [\n    {\n      \"start\": 0.000,\n      \"end\": 4.230,\n      \"source_text\": \"We won't feel compelled...\",\n      \"words\": [\n        {\"word\": \"We\", \"start\": 0.00, \"end\": 0.15, \"score\": 0.98},\n        {\"word\": \"won't\", \"start\": 0.18, \"end\": 0.42, \"score\": 0.95}\n      ]\n    }\n  ]\n}\n\u003c\/code\u003e\u003c\/pre\u003e\n\u003ch3\u003eSeries \u0026amp; catalog\u003c\/h3\u003e\n\u003cp\u003eThis disc is part of an ongoing CD series. Each volume is a self-contained dataset — no other volumes are required. New volumes ship regularly, covering different speech domains: business meetings, news broadcasts, lectures, and casual conversation. Collect them individually or build a comprehensive corpus over time.\u003c\/p\u003e\n\u003ch3\u003eLicense\u003c\/h3\u003e\n\u003cp\u003eOnce you purchase this CD, the data is yours to use — freely and permanently.\u003c\/p\u003e\n\u003cul\u003e\n\u003cli\u003e\n\u003cstrong\u003eCommercial use\u003c\/strong\u003e — products, SaaS, internal tools, client work\u003c\/li\u003e\n\u003cli\u003e\n\u003cstrong\u003eModel training\u003c\/strong\u003e — fine-tune, distill, or train from scratch. No royalties\u003c\/li\u003e\n\u003cli\u003e\n\u003cstrong\u003eModify \u0026amp; derive\u003c\/strong\u003e — transform, augment, merge with your own datasets\u003c\/li\u003e\n\u003cli\u003e\n\u003cstrong\u003eNo expiration\u003c\/strong\u003e — perpetual license, no recurring fees, no strings attached\u003c\/li\u003e\n\u003c\/ul\u003e\n\u003cp\u003eSource dataset license is documented in the included LICENSE file on each disc.\u003c\/p\u003e\n\u003ch3\u003eSpecs\u003c\/h3\u003e\n\u003ctable style=\"width: 100.087%; height: 137.157px;\"\u003e\n\u003ctbody\u003e\n\u003ctr style=\"height: 19.5938px;\"\u003e\n\u003ctd style=\"width: 23.9466%; height: 19.5938px;\"\u003eMedia\u003c\/td\u003e\n\u003ctd style=\"width: 73.1938%; height: 19.5938px;\"\u003eCD-ROM (700 MB)\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr style=\"height: 19.5938px;\"\u003e\n\u003ctd style=\"width: 23.9466%; height: 19.5938px;\"\u003eAudio format\u003c\/td\u003e\n\u003ctd style=\"width: 73.1938%; height: 19.5938px;\"\u003eWAV 16kHz 16-bit mono\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr style=\"height: 19.5938px;\"\u003e\n\u003ctd style=\"width: 23.9466%; height: 19.5938px;\"\u003eAudio length\u003c\/td\u003e\n\u003ctd style=\"width: 73.1938%; height: 19.5938px;\"\u003e~50–60 min per disc\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr style=\"height: 19.5938px;\"\u003e\n\u003ctd style=\"width: 23.9466%; height: 19.5938px;\"\u003eAnnotation\u003c\/td\u003e\n\u003ctd style=\"width: 73.1938%; height: 19.5938px;\"\u003eJSONL + SRT + VTT\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr style=\"height: 19.5938px;\"\u003e\n\u003ctd style=\"width: 23.9466%; height: 19.5938px;\"\u003eAlignment\u003c\/td\u003e\n\u003ctd style=\"width: 73.1938%; height: 19.5938px;\"\u003eWord-level + segment-level timestamps\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr style=\"height: 19.5938px;\"\u003e\n\u003ctd style=\"width: 23.9466%; height: 19.5938px;\"\u003eShipping\u003c\/td\u003e\n\u003ctd style=\"width: 73.1938%; height: 19.5938px;\"\u003ePhysical CD\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003c\/tbody\u003e\n\u003c\/table\u003e\n\u003ch3\u003eWho this is for\u003c\/h3\u003e\n\u003cul\u003e\n\u003cli\u003eML engineers building real-time subtitle or live translation systems\u003c\/li\u003e\n\u003cli\u003eResearchers benchmarking long-form speech translation models\u003c\/li\u003e\n\u003cli\u003eTeams training or evaluating ASR with forced-alignment ground truth\u003c\/li\u003e\n\u003cli\u003eAnyone who wants clean, timestamp-aligned English speech data they actually own\u003c\/li\u003e\n\u003c\/ul\u003e\n\u003cp\u003e \u003c\/p\u003e\n\u003ch3\u003eData structure per audio file JA Translated\u003c\/h3\u003e\n\u003cpre\u003e\u003ccode\u003e{\n  \"audio_path\": \"wavs\/013429.wav\",\n  \"source_language\": \"english\",\u003cbr\u003e\u003cmeta charset=\"utf-8\"\u003e\u003c\/code\u003e\u003ccode\u003e  \"target_language\": \"japanese\",\u003c\/code\u003e\u003ccode\u003e\n  \"duration_seconds\": 142.5,\n  \"segments\": [\n    {\n      \"start\": 0.000,\n      \"end\": 4.230,\n\u003cmeta charset=\"utf-8\"\u003e\u003c\/code\u003e\u003ccode\u003e      \"source_text\": \"We won't feel compelled...\",\u003cbr\u003e\u003c\/code\u003e\u003cmeta charset=\"utf-8\"\u003e\u003ccode\u003e      \"source_text\": \"私たちは強制されることはないだろう…\",\u003c\/code\u003e\u003ccode\u003e\n      \"words\": [\n        {\"word\": \"We\", \"start\": 0.00, \"end\": 0.15, \"score\": 0.98},\n        {\"word\": \"won't\", \"start\": 0.18, \"end\": 0.42, \"score\": 0.95}\n      ]\n    }\n  ]\n}\u003c\/code\u003e\u003c\/pre\u003e\n\u003c\/div\u003e","brand":"thunder出版","offers":[{"title":"JA","offer_id":53163373035819,"sku":null,"price":11000.0,"currency_code":"JPY","in_stock":true},{"title":"CN","offer_id":53163373068587,"sku":null,"price":11000.0,"currency_code":"JPY","in_stock":true},{"title":"TW","offer_id":53176208064811,"sku":null,"price":11000.0,"currency_code":"JPY","in_stock":true}]},{"product_id":"ada-dataset-vol-002","title":"ada-dataset vol.002","description":"\u003cdiv class=\"ds-product-desc\"\u003e\n\u003cp\u003eTimestamp-aligned English speech-to-text translation data, delivered on a standard CD-ROM. Each disc contains 5–6 long-form English audio recordings (WAV, 16kHz mono) with word-level and segment-level alignment annotations — ready for real-time subtitle training, simultaneous translation research, and ASR fine-tuning.\u003c\/p\u003e\n\u003ch3\u003eWhat's on the disc\u003c\/h3\u003e\n\u003ctable style=\"width: 100.087%; height: 97.9689px;\"\u003e\n\u003ctbody\u003e\n\u003ctr style=\"height: 19.5938px;\"\u003e\n\u003ctd style=\"width: 14.5689%; height: 19.5938px;\"\u003e\u003cstrong\u003eAudio\u003c\/strong\u003e\u003c\/td\u003e\n\u003ctd style=\"width: 82.5716%; height: 19.5938px;\"\u003e5–6 WAV files, ~10 min each, totaling ~50–60 min per disc\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr style=\"height: 39.1875px;\"\u003e\n\u003ctd style=\"width: 14.5689%; height: 39.1875px;\"\u003e\u003cstrong\u003eAnnotations\u003c\/strong\u003e\u003c\/td\u003e\n\u003ctd style=\"width: 82.5716%; height: 39.1875px;\"\u003eJSONL — segment-level timestamps, source transcript, aligned translation, word-level timing with confidence scores\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr style=\"height: 19.5938px;\"\u003e\n\u003ctd style=\"width: 14.5689%; height: 19.5938px;\"\u003e\u003cstrong\u003eSubtitles\u003c\/strong\u003e\u003c\/td\u003e\n\u003ctd style=\"width: 82.5716%; height: 19.5938px;\"\u003eSRT + VTT — bilingual subtitle files for every audio track\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr style=\"height: 19.5938px;\"\u003e\n\u003ctd style=\"width: 14.5689%; height: 19.5938px;\"\u003e\u003cstrong\u003eCapacity\u003c\/strong\u003e\u003c\/td\u003e\n\u003ctd style=\"width: 82.5716%; height: 19.5938px;\"\u003e~580–700 MB per disc\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003c\/tbody\u003e\n\u003c\/table\u003e\n\u003ch3\u003eData structure per audio file\u003c\/h3\u003e\n\u003cpre\u003e\u003ccode\u003e{\n  \"audio_path\": \"wavs\/013429.wav\",\n  \"source_language\": \"english\",\n  \"duration_seconds\": 142.5,\n  \"segments\": [\n    {\n      \"start\": 0.000,\n      \"end\": 4.230,\n      \"source_text\": \"We won't feel compelled...\",\n      \"words\": [\n        {\"word\": \"We\", \"start\": 0.00, \"end\": 0.15, \"score\": 0.98},\n        {\"word\": \"won't\", \"start\": 0.18, \"end\": 0.42, \"score\": 0.95}\n      ]\n    }\n  ]\n}\n\u003c\/code\u003e\u003c\/pre\u003e\n\u003ch3\u003eSeries \u0026amp; catalog\u003c\/h3\u003e\n\u003cp\u003eThis disc is part of an ongoing CD series. Each volume is a self-contained dataset — no other volumes are required. New volumes ship regularly, covering different speech domains: business meetings, news broadcasts, lectures, and casual conversation. Collect them individually or build a comprehensive corpus over time.\u003c\/p\u003e\n\u003ch3\u003eLicense\u003c\/h3\u003e\n\u003cp\u003eOnce you purchase this CD, the data is yours to use — freely and permanently.\u003c\/p\u003e\n\u003cul\u003e\n\u003cli\u003e\n\u003cstrong\u003eCommercial use\u003c\/strong\u003e — products, SaaS, internal tools, client work\u003c\/li\u003e\n\u003cli\u003e\n\u003cstrong\u003eModel training\u003c\/strong\u003e — fine-tune, distill, or train from scratch. No royalties\u003c\/li\u003e\n\u003cli\u003e\n\u003cstrong\u003eModify \u0026amp; derive\u003c\/strong\u003e — transform, augment, merge with your own datasets\u003c\/li\u003e\n\u003cli\u003e\n\u003cstrong\u003eNo expiration\u003c\/strong\u003e — perpetual license, no recurring fees, no strings attached\u003c\/li\u003e\n\u003c\/ul\u003e\n\u003cp\u003eSource dataset license is documented in the included LICENSE file on each disc.\u003c\/p\u003e\n\u003ch3\u003eSpecs\u003c\/h3\u003e\n\u003ctable style=\"width: 100.087%; height: 137.157px;\"\u003e\n\u003ctbody\u003e\n\u003ctr style=\"height: 19.5938px;\"\u003e\n\u003ctd style=\"width: 23.9466%; height: 19.5938px;\"\u003eMedia\u003c\/td\u003e\n\u003ctd style=\"width: 73.1938%; height: 19.5938px;\"\u003eCD-ROM (700 MB)\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr style=\"height: 19.5938px;\"\u003e\n\u003ctd style=\"width: 23.9466%; height: 19.5938px;\"\u003eAudio format\u003c\/td\u003e\n\u003ctd style=\"width: 73.1938%; height: 19.5938px;\"\u003eWAV 16kHz 16-bit mono\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr style=\"height: 19.5938px;\"\u003e\n\u003ctd style=\"width: 23.9466%; height: 19.5938px;\"\u003eAudio length\u003c\/td\u003e\n\u003ctd style=\"width: 73.1938%; height: 19.5938px;\"\u003e~50–60 min per disc\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr style=\"height: 19.5938px;\"\u003e\n\u003ctd style=\"width: 23.9466%; height: 19.5938px;\"\u003eAnnotation\u003c\/td\u003e\n\u003ctd style=\"width: 73.1938%; height: 19.5938px;\"\u003eJSONL + SRT + VTT\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr style=\"height: 19.5938px;\"\u003e\n\u003ctd style=\"width: 23.9466%; height: 19.5938px;\"\u003eAlignment\u003c\/td\u003e\n\u003ctd style=\"width: 73.1938%; height: 19.5938px;\"\u003eWord-level + segment-level timestamps\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr style=\"height: 19.5938px;\"\u003e\n\u003ctd style=\"width: 23.9466%; height: 19.5938px;\"\u003eShipping\u003c\/td\u003e\n\u003ctd style=\"width: 73.1938%; height: 19.5938px;\"\u003ePhysical CD\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003c\/tbody\u003e\n\u003c\/table\u003e\n\u003ch3\u003eWho this is for\u003c\/h3\u003e\n\u003cul\u003e\n\u003cli\u003eML engineers building real-time subtitle or live translation systems\u003c\/li\u003e\n\u003cli\u003eResearchers benchmarking long-form speech translation models\u003c\/li\u003e\n\u003cli\u003eTeams training or evaluating ASR with forced-alignment ground truth\u003c\/li\u003e\n\u003cli\u003eAnyone who wants clean, timestamp-aligned English speech data they actually own\u003c\/li\u003e\n\u003c\/ul\u003e\n\u003cp\u003e \u003c\/p\u003e\n\u003ch3\u003eData structure per audio file JA Translated\u003c\/h3\u003e\n\u003cpre\u003e\u003ccode\u003e{\n  \"audio_path\": \"wavs\/013429.wav\",\n  \"source_language\": \"english\",\u003cbr\u003e\u003cmeta charset=\"utf-8\"\u003e\u003c\/code\u003e\u003ccode\u003e  \"target_language\": \"japanese\",\u003c\/code\u003e\u003ccode\u003e\n  \"duration_seconds\": 142.5,\n  \"segments\": [\n    {\n      \"start\": 0.000,\n      \"end\": 4.230,\n\u003cmeta charset=\"utf-8\"\u003e\u003c\/code\u003e\u003ccode\u003e      \"source_text\": \"We won't feel compelled...\",\u003cbr\u003e\u003c\/code\u003e\u003cmeta charset=\"utf-8\"\u003e\u003ccode\u003e      \"source_text\": \"私たちは強制されることはないだろう…\",\u003c\/code\u003e\u003ccode\u003e\n      \"words\": [\n        {\"word\": \"We\", \"start\": 0.00, \"end\": 0.15, \"score\": 0.98},\n        {\"word\": \"won't\", \"start\": 0.18, \"end\": 0.42, \"score\": 0.95}\n      ]\n    }\n  ]\n}\u003c\/code\u003e\u003c\/pre\u003e\n\u003c\/div\u003e","brand":"thunder出版","offers":[{"title":"JA","offer_id":53299302629675,"sku":null,"price":11000.0,"currency_code":"JPY","in_stock":true},{"title":"CN","offer_id":53299302662443,"sku":null,"price":11000.0,"currency_code":"JPY","in_stock":true},{"title":"TW","offer_id":53299302695211,"sku":null,"price":11000.0,"currency_code":"JPY","in_stock":true}]}],"url":"https:\/\/thunder-publication.com\/collections\/ada.oembed","provider":"thunder publication","version":"1.0","type":"link"}