Developer API

Use VoiceToTextOnline from your scripts, automations, and AI agents. REST APIs with Bearer token auth and JSON responses — speech-to-text, text-to-speech, and YouTube transcript extraction without building the infrastructure.

Get a free API key →MCP server docs

Authentication

All requests require a Bearer token in the Authorization header. Generate your key at Dashboard → API Keys. Keys begin with v2t_live_ and are shown only once — store them securely.

auth header

Authorization: Bearer v2t_live_<your-key>

200Request succeeded

401Missing or invalid API key

429Rate limit or quota reached

400Bad request — check field names and values

404Resource not found (e.g. no captions)

502 / 504Upstream provider error or timeout

Endpoints

POST

/api/v1/youtube-transcriptlive

Extract a transcript from any YouTube video. Returns plain text or SRT subtitle format. No credits charged — rate limited to 20 requests per key per day.

POST

/api/v1/ttslive

Convert text to speech using Google Neural voices. Returns base64-encoded MP3. Uses your existing TTS character quota (shared with the web UI).

POST

/api/v1/transcribelive

Upload an audio or video file for AI transcription via AssemblyAI. Returns transcript text, confidence, duration, word count, and optional speaker-labelled utterances.

YouTube Transcript

Rate limit

20 requests / key / day

Credits

None charged

Formats

text (default), srt

Method

POST

Request body

json

{
  "url": "https://youtube.com/watch?v=...",  // required — full URL or 11-char video ID
  "format": "text"                           // optional — "text" (default) or "srt"
}

Examples

curl

curl -X POST https://voicetotextonline.com/api/v1/youtube-transcript \
  -H "Authorization: Bearer v2t_live_..." \
  -H "Content-Type: application/json" \
  -d '{"url": "https://youtube.com/watch?v=dQw4w9WgXcQ"}'

javascript

const res = await fetch('https://voicetotextonline.com/api/v1/youtube-transcript', {
  method: 'POST',
  headers: {
    'Authorization': 'Bearer v2t_live_...',
    'Content-Type': 'application/json',
  },
  body: JSON.stringify({ url: 'https://youtube.com/watch?v=dQw4w9WgXcQ' }),
})
const data = await res.json()
// data.transcript — plain text
// data.language   — detected language code

python

import requests

resp = requests.post(
    'https://voicetotextonline.com/api/v1/youtube-transcript',
    headers={'Authorization': 'Bearer v2t_live_...'},
    json={'url': 'https://youtube.com/watch?v=dQw4w9WgXcQ'},
)
data = resp.json()
print(data['transcript'])

Response

json

{
  "success": true,
  "videoId": "dQw4w9WgXcQ",
  "language": "en",
  "transcript": "We're no strangers to love...",
  "format": "text",
  "wordCount": 312
}

Text to Speech

Voices

2,000+ Google voices

Output

Base64 MP3

Quota

Shared with web UI

Method

POST

Request body

json

{
  "text": "Hello from VoiceToTextOnline.",  // required
  "voiceName": "en-US-Neural2-J",           // required — see GET /api/tts/voices
  "languageCode": "en-US",                  // required — BCP-47 language code
  "speed": 1.0                              // optional — 0.25 to 4.0, default 1.0
}

To list available voices and their language codes, call GET /api/tts/voices (no auth required). Quota is shared with your web UI usage — the same monthly character allowance applies.

Examples

curl

curl -X POST https://voicetotextonline.com/api/v1/tts \
  -H "Authorization: Bearer v2t_live_..." \
  -H "Content-Type: application/json" \
  -d '{
    "text": "Hello from VoiceToTextOnline.",
    "voiceName": "en-US-Neural2-J",
    "languageCode": "en-US",
    "speed": 1.0
  }'

javascript

const res = await fetch('https://voicetotextonline.com/api/v1/tts', {
  method: 'POST',
  headers: {
    'Authorization': 'Bearer v2t_live_...',
    'Content-Type': 'application/json',
  },
  body: JSON.stringify({
    text: 'Hello from VoiceToTextOnline.',
    voiceName: 'en-US-Neural2-J',
    languageCode: 'en-US',
    speed: 1.0,
  }),
})
const data = await res.json()
// data.audioBase64 — MP3 as base64 string

python

import requests, base64

resp = requests.post(
    'https://voicetotextonline.com/api/v1/tts',
    headers={'Authorization': 'Bearer v2t_live_...'},
    json={
        'text': 'Hello from VoiceToTextOnline.',
        'voiceName': 'en-US-Neural2-J',
        'languageCode': 'en-US',
        'speed': 1.0,
    },
)
data = resp.json()
audio = base64.b64decode(data['audioBase64'])
with open('output.mp3', 'wb') as f:
    f.write(audio)

Response

json

{
  "success": true,
  "audioBase64": "//NExAA...",
  "format": "mp3",
  "charactersUsed": 29,
  "charsUsed": 1234,
  "charsLimit": 10000,
  "charsRemaining": 8766,
  "voiceName": "en-US-Neural2-J",
  "languageCode": "en-US"
}

Speech to Text

Input

Multipart file

Max file

25MB credits · 100MB paid

Usage

Rounded-up minutes

Method

POST

Form fields

multipart/form-data

file=@meeting.mp3           // required — audio or video file
language=en                  // optional — AssemblyAI language code
speaker_labels=true          // optional — default true

This endpoint transcribes the uploaded file in memory and returns JSON directly. It does not store the uploaded file or add the result to dashboard history.

Examples

curl

curl -X POST https://voicetotextonline.com/api/v1/transcribe \
  -H "Authorization: Bearer v2t_live_..." \
  -F "file=@meeting.mp3" \
  -F "speaker_labels=true"

javascript

const form = new FormData()
form.append('file', fileInput.files[0])
form.append('speaker_labels', 'true')

const res = await fetch('https://voicetotextonline.com/api/v1/transcribe', {
  method: 'POST',
  headers: {
    'Authorization': 'Bearer v2t_live_...',
  },
  body: form,
})
const data = await res.json()
// data.text — transcript text
// data.utterances — speaker-labelled segments, when available

python

import requests

with open('meeting.mp3', 'rb') as f:
    resp = requests.post(
        'https://voicetotextonline.com/api/v1/transcribe',
        headers={'Authorization': 'Bearer v2t_live_...'},
        files={'file': f},
        data={'speaker_labels': 'true'},
    )

data = resp.json()
print(data['text'])

Response

json

{
  "success": true,
  "filename": "meeting.mp3",
  "text": "Thanks everyone for joining...",
  "language": "en",
  "confidence": 0.94,
  "durationSeconds": 184,
  "wordCount": 512,
  "speakerLabels": true,
  "words": [],
  "utterances": [],
  "usage": {
    "tier": "starter",
    "durationMinutes": 4,
    "monthlyMinutesUsed": 18,
    "monthlyMinutesLimit": 200
  }
}

Rate Limits & Quotas

/api/v1/youtube-transcript

Limit20 requests per key per day

CreditsNone charged

ResetMidnight UTC

/api/v1/tts

LimitFree: 500 chars/req · Paid: 2,000 chars/req

CreditsDeducted from monthly character quota

Reset1st of each month

/api/v1/transcribe

LimitCredits: 10 req/day · Starter: 20 · Pro: 50

UsageRounded-up transcription minutes

ResetDaily requests reset midnight UTC

When a limit is hit, the API returns 429 with an error field describing what was exceeded. Quotas are tied to the user account associated with the API key.

MCP Server — for AI Agents

If you are building with Claude, Cursor, or Windsurf, use the MCP server instead of the REST API. Your AI agent calls VoiceToTextOnline tools natively — no fetch calls, no response parsing, no glue code. Same API key, same quota.

claude_desktop_config.json / .cursor/mcp.json

{
  "mcpServers": {
    "voicetotextonline": {
      "url": "https://voicetotextonline.com/api/mcp",
      "headers": {
        "Authorization": "Bearer v2t_live_..."
      }
    }
  }
}

youtube_transcriptlive

url, format? → plain text transcript

text_to_speechlive

text, voiceName, languageCode, speed? → base64 MP3

transcribesoon

Use REST POST /api/v1/transcribe for audio/video files

Full MCP server documentation →

Why Build with VoiceToTextOnline?

No infra to manage

Google Cloud TTS, AssemblyAI, and YouTube caption extraction handled for you. One endpoint, one key.

Works in any language

60+ languages for TTS. YouTube transcripts in whatever language the video was recorded in.

Designed for agents

All responses are clean JSON. Consistent error codes, predictable schema. Easy to chain into LLM pipelines and automations.

Same quota as the UI

API usage and web UI usage share the same character balance. No double billing, no separate API quota to track.

Ready to build?

API usage draws from your existing plan and credits — no separate API subscription required. Free accounts get 2 API keys with no credit card.

Generate API key →View pricing MCP server docs

Questions? hello@voicetotextonline.com