What is the VoiceToTextOnline MCP server?

It is a Model Context Protocol (MCP) server that lets AI tools like Claude Desktop, Cursor, and Windsurf call VoiceToTextOnline tools — such as YouTube transcript extraction and text-to-speech — directly, without writing any fetch() code.

How do I add VoiceToTextOnline to Claude Desktop via MCP?

Generate a free API key at voicetotextonline.com/dashboard/api-keys, then add an entry to your claude_desktop_config.json with the server URL https://voicetotextonline.com/api/mcp and your v2t_live_ Bearer token. Claude will then be able to call youtube_transcript and text_to_speech tools natively.

What MCP tools are available?

Currently two tools are live: youtube_transcript (extracts captions from any YouTube video, up to 20 per day) and text_to_speech (converts text to MP3 using Google Neural voices). Audio file transcription is coming soon.

MCP Server for Claude, Cursor & AI Agents

Beta

Give your AI agent native access to voice-to-text transcription, YouTube transcript extraction, and text-to-speech — no API calls, no glue code. Add one config entry to Claude Desktop, Cursor, or Windsurf and start using it immediately.

Get a free API key →REST API docs

What is the Model Context Protocol?

MCP (Model Context Protocol) is an open standard created by Anthropic that lets AI tools call external services as native functions — no custom integration, no REST plumbing. Add VoiceToTextOnline once and every project using that config can transcribe audio, pull YouTube captions, and generate speech directly inside the AI conversation.

Tools, not endpoints

Instead of writing fetch() calls, your AI agent calls tools like youtube_transcript and text_to_speech as if they were built-in functions.

Works mid-conversation

Pull a transcript or generate audio during any task — without leaving the chat or switching tools. Claude handles it automatically.

One config, every project

Add your API key to your MCP config once. Every project sharing that config gets instant access, no per-project setup.

Available MCP Tools

youtube_transcript● live20 requests / day per API key

YouTube Transcript

Extract the full transcript from any YouTube video URL or 11-character video ID. Returns plain text. Perfect for summarisation, research, content repurposing, and note-taking workflows.

url: string — YouTube URL or video IDformat?: "text" | "srt" — output format

20 requests / day per API key

text_to_speech● live500 chars/request (free) · 2,000 chars/request (paid)

Text to Speech

Convert text to lifelike speech using Google Neural voices (Neural2, Chirp3 HD). Returns base64-encoded MP3 audio. Supports 55+ languages and speaking rate control.

text: string — text to synthesizevoiceName: string — e.g. "en-US-Neural2-J"languageCode: string — BCP-47 e.g. "en-US"speed?: number — 0.25–4.0, default 1.0

500 chars/request (free) · 2,000 chars/request (paid)

transcribe○ coming soonCredit-based (same as dashboard)

Audio File Transcription

Submit an audio or video file URL for AI transcription with speaker labels and timestamps. Returns a full transcript via AssemblyAI.

fileUrl: stringlanguage?: stringspeakerLabels?: boolean

Credit-based (same as dashboard)

How to Set Up the MCP Server

Works with Claude Desktop, Cursor, Windsurf, and any MCP-compatible client. The server URL and Authorization header are identical regardless of which client you use.

Generate a free API key

Go to Dashboard → API Keys and click Generate Key. Free accounts include 2 keys with no credit card required.

Add to your MCP config

Paste the snippet below into your client's MCP config file. Replace v2t_live_... with your actual key.

Restart your AI tool and use it naturally

Tell Claude "get the transcript of this YouTube video" or "convert this text to speech" — it calls VoiceToTextOnline automatically.

MCP config (Claude Desktop / Cursor / Windsurf)

{
  "mcpServers": {
    "voicetotextonline": {
      "url": "https://voicetotextonline.com/api/mcp",
      "headers": {
        "Authorization": "Bearer v2t_live_..."
      }
    }
  }
}

Claude Desktop

claude_desktop_config.json

~/Library/Application Support/Claude/

Cursor

.cursor/mcp.json

project root or ~/.cursor/

Windsurf

.codeiumrc

project root

Any MCP client

mcp config

see client docs

MCP vs REST API — Which Should You Use?

No glue code required

The REST API requires fetch calls, response parsing, and error handling. MCP tools are invoked directly by the AI — the agent decides when to call them and handles the result.

Works mid-conversation

Your agent can pull a YouTube transcript or generate audio mid-task without any tool-switching. It's invisible to the user.

Chain tools together

Transcribe audio → summarise → convert summary to speech. All in one agent flow. MCP makes multi-step pipelines trivial.

Same auth, same quota

Your existing API key and plan covers both MCP and REST. No new account, no extra billing. Quota is shared across both.

Prefer direct HTTP? View the REST API docs →

Frequently Asked Questions

Is the MCP server free to use?▼

Yes. Free accounts include 2 API keys and access to all MCP tools within the free plan quota (20 YouTube transcripts/day, 500 chars/request for TTS). Paid plans unlock higher quotas and Google Neural voices.

Does this work with Cursor and Windsurf?▼

Yes — any MCP-compatible client works. For Cursor, add the config to .cursor/mcp.json. For Windsurf, add it to .codeiumrc. The server URL and Authorization header are identical regardless of client.

How is this different from the REST API?▼

The REST API requires you to write fetch() calls, parse responses, and handle errors yourself. With MCP, your AI agent calls the tools directly mid-conversation with zero boilerplate. Both use the same API key and quota.

What happens when I hit the daily YouTube limit?▼

The tool returns a clear error message. The counter resets at midnight UTC. Paid API plans will have higher limits in a future update.

Can I use the MCP server for commercial projects?▼

Yes, subject to the standard Terms of Service. API keys can be used in production applications — same rules as the REST API.

Ready to add voice tools to your AI agent?

Generate a free API key, add one config entry, and your AI agent gains access to YouTube transcript extraction and text-to-speech — immediately. No credit card required.

Get free API key →REST API docs Smart Voice Tools

MCP Server for Claude, Cursor & AI Agents

What is the Model Context Protocol?

Tools, not endpoints

Works mid-conversation

One config, every project

Available MCP Tools

YouTube Transcript

Text to Speech

Audio File Transcription

How to Set Up the MCP Server

MCP vs REST API — Which Should You Use?

No glue code required

Works mid-conversation

Chain tools together

Same auth, same quota

Frequently Asked Questions

YouTube to Text →

Text to Speech →

Audio Transcription →

Ready to add voice tools to your AI agent?

What is the Model Context Protocol?

Tools, not endpoints

Works mid-conversation

One config, every project

Available MCP Tools

YouTube Transcript

Text to Speech

Audio File Transcription

How to Set Up the MCP Server

MCP vs REST API — Which Should You Use?

No glue code required

Works mid-conversation

Chain tools together

Same auth, same quota

Frequently Asked Questions

Related Tools & Integrations

YouTube to Text →

Text to Speech →

Audio Transcription →

Ready to add voice tools to your AI agent?