Developer API
Use VoiceToTextOnline from your scripts, automations, and AI agents. REST APIs with Bearer token auth and JSON responses — speech-to-text, text-to-speech, and YouTube transcript extraction without building the infrastructure.
Authentication
All requests require a Bearer token in the Authorization header. Generate your key at Dashboard → API Keys. Keys begin with v2t_live_ and are shown only once — store them securely.
Authorization: Bearer v2t_live_<your-key>
Endpoints
/api/v1/youtube-transcriptliveExtract a transcript from any YouTube video. Returns plain text or SRT subtitle format. No credits charged — rate limited to 20 requests per key per day.
/api/v1/ttsliveConvert text to speech using Google Neural voices. Returns base64-encoded MP3. Uses your existing TTS character quota (shared with the web UI).
/api/v1/transcribeliveUpload an audio or video file for AI transcription via AssemblyAI. Returns transcript text, confidence, duration, word count, and optional speaker-labelled utterances.
YouTube Transcript
Request body
{
"url": "https://youtube.com/watch?v=...", // required — full URL or 11-char video ID
"format": "text" // optional — "text" (default) or "srt"
}Examples
curl -X POST https://voicetotextonline.com/api/v1/youtube-transcript \
-H "Authorization: Bearer v2t_live_..." \
-H "Content-Type: application/json" \
-d '{"url": "https://youtube.com/watch?v=dQw4w9WgXcQ"}'const res = await fetch('https://voicetotextonline.com/api/v1/youtube-transcript', {
method: 'POST',
headers: {
'Authorization': 'Bearer v2t_live_...',
'Content-Type': 'application/json',
},
body: JSON.stringify({ url: 'https://youtube.com/watch?v=dQw4w9WgXcQ' }),
})
const data = await res.json()
// data.transcript — plain text
// data.language — detected language codeimport requests
resp = requests.post(
'https://voicetotextonline.com/api/v1/youtube-transcript',
headers={'Authorization': 'Bearer v2t_live_...'},
json={'url': 'https://youtube.com/watch?v=dQw4w9WgXcQ'},
)
data = resp.json()
print(data['transcript'])Response
{
"success": true,
"videoId": "dQw4w9WgXcQ",
"language": "en",
"transcript": "We're no strangers to love...",
"format": "text",
"wordCount": 312
}Text to Speech
Request body
{
"text": "Hello from VoiceToTextOnline.", // required
"voiceName": "en-US-Neural2-J", // required — see GET /api/tts/voices
"languageCode": "en-US", // required — BCP-47 language code
"speed": 1.0 // optional — 0.25 to 4.0, default 1.0
}To list available voices and their language codes, call GET /api/tts/voices (no auth required). Quota is shared with your web UI usage — the same monthly character allowance applies.
Examples
curl -X POST https://voicetotextonline.com/api/v1/tts \
-H "Authorization: Bearer v2t_live_..." \
-H "Content-Type: application/json" \
-d '{
"text": "Hello from VoiceToTextOnline.",
"voiceName": "en-US-Neural2-J",
"languageCode": "en-US",
"speed": 1.0
}'const res = await fetch('https://voicetotextonline.com/api/v1/tts', {
method: 'POST',
headers: {
'Authorization': 'Bearer v2t_live_...',
'Content-Type': 'application/json',
},
body: JSON.stringify({
text: 'Hello from VoiceToTextOnline.',
voiceName: 'en-US-Neural2-J',
languageCode: 'en-US',
speed: 1.0,
}),
})
const data = await res.json()
// data.audioBase64 — MP3 as base64 stringimport requests, base64
resp = requests.post(
'https://voicetotextonline.com/api/v1/tts',
headers={'Authorization': 'Bearer v2t_live_...'},
json={
'text': 'Hello from VoiceToTextOnline.',
'voiceName': 'en-US-Neural2-J',
'languageCode': 'en-US',
'speed': 1.0,
},
)
data = resp.json()
audio = base64.b64decode(data['audioBase64'])
with open('output.mp3', 'wb') as f:
f.write(audio)Response
{
"success": true,
"audioBase64": "//NExAA...",
"format": "mp3",
"charactersUsed": 29,
"charsUsed": 1234,
"charsLimit": 10000,
"charsRemaining": 8766,
"voiceName": "en-US-Neural2-J",
"languageCode": "en-US"
}Speech to Text
Form fields
file=@meeting.mp3 // required — audio or video file language=en // optional — AssemblyAI language code speaker_labels=true // optional — default true
This endpoint transcribes the uploaded file in memory and returns JSON directly. It does not store the uploaded file or add the result to dashboard history.
Examples
curl -X POST https://voicetotextonline.com/api/v1/transcribe \ -H "Authorization: Bearer v2t_live_..." \ -F "file=@meeting.mp3" \ -F "speaker_labels=true"
const form = new FormData()
form.append('file', fileInput.files[0])
form.append('speaker_labels', 'true')
const res = await fetch('https://voicetotextonline.com/api/v1/transcribe', {
method: 'POST',
headers: {
'Authorization': 'Bearer v2t_live_...',
},
body: form,
})
const data = await res.json()
// data.text — transcript text
// data.utterances — speaker-labelled segments, when availableimport requests
with open('meeting.mp3', 'rb') as f:
resp = requests.post(
'https://voicetotextonline.com/api/v1/transcribe',
headers={'Authorization': 'Bearer v2t_live_...'},
files={'file': f},
data={'speaker_labels': 'true'},
)
data = resp.json()
print(data['text'])Response
{
"success": true,
"filename": "meeting.mp3",
"text": "Thanks everyone for joining...",
"language": "en",
"confidence": 0.94,
"durationSeconds": 184,
"wordCount": 512,
"speakerLabels": true,
"words": [],
"utterances": [],
"usage": {
"tier": "starter",
"durationMinutes": 4,
"monthlyMinutesUsed": 18,
"monthlyMinutesLimit": 200
}
}Rate Limits & Quotas
/api/v1/youtube-transcript/api/v1/tts/api/v1/transcribeWhen a limit is hit, the API returns 429 with an error field describing what was exceeded. Quotas are tied to the user account associated with the API key.
MCP Server — for AI Agents
If you are building with Claude, Cursor, or Windsurf, use the MCP server instead of the REST API. Your AI agent calls VoiceToTextOnline tools natively — no fetch calls, no response parsing, no glue code. Same API key, same quota.
{
"mcpServers": {
"voicetotextonline": {
"url": "https://voicetotextonline.com/api/mcp",
"headers": {
"Authorization": "Bearer v2t_live_..."
}
}
}
}youtube_transcriptliveurl, format? → plain text transcript
text_to_speechlivetext, voiceName, languageCode, speed? → base64 MP3
transcribesoonUse REST POST /api/v1/transcribe for audio/video files
Why Build with VoiceToTextOnline?
No infra to manage
Google Cloud TTS, AssemblyAI, and YouTube caption extraction handled for you. One endpoint, one key.
Works in any language
60+ languages for TTS. YouTube transcripts in whatever language the video was recorded in.
Designed for agents
All responses are clean JSON. Consistent error codes, predictable schema. Easy to chain into LLM pipelines and automations.
Same quota as the UI
API usage and web UI usage share the same character balance. No double billing, no separate API quota to track.
Ready to build?
API usage draws from your existing plan and credits — no separate API subscription required. Free accounts get 2 API keys with no credit card.
Questions? hello@voicetotextonline.com