Portuguese Voice to Text — Ditado de Voz em Português

Portuguese is spoken by 260 million people across 9 countries — but Brazilian and European Portuguese have diverged so significantly over 500 years that they sometimes feel like separate languages. European Portuguese swallows unstressed vowels so aggressively that "professor" can sound like "prfesr." Brazilian Portuguese keeps all its vowels open and clear, with a singing intonation that European speakers sometimes find cartoonishly exaggerated. Speech recognition models trained on one will make systematic errors on the other. This page explains the divide precisely, where regional Brazilian accents create additional challenges, and how to get the best results from wherever you're dictating.

O português brasileiro e o europeu divergiram tanto que os modelos de reconhecimento de voz treinados num podem errar sistematicamente no outro. Esta página explica as diferenças fonológicas precisas, os desafios dos sotaques regionais, e como obter os melhores resultados para o seu português.

Brazilian vs European Portuguese: The Phonological Divide

The BP/EP split is the most consequential dialect divide in Portuguese speech recognition — more impactful than regional Brazilian accents and more impactful than the African varieties. It begins with a single fundamental difference: what happens to unstressed vowels.

🇧🇷

Brazilian Portuguese (pt-BR)

Brazilian Portuguese preserves unstressed vowels with full or near-full quality. "Temperatura" is spoken as written — all five syllables clearly present. Vowels are open and distinct. The /e/ and /o/ in unstressed syllables retain their quality. This makes BP significantly more predictable for automatic speech recognition — the acoustic form closely matches the orthographic form.

BP: palavra "secretária"

se-cre-TÁ-ri-a

All vowels present and clear

🇵🇹

European Portuguese (pt-PT)

European Portuguese reduces or completely deletes unstressed vowels — particularly unstressed /e/ and /o/. "Professor" is spoken closer to "prfessor." "Secretária" becomes closer to "scrtrya." This vowel reduction is so extreme that EP can be unintelligible to BP speakers and nearly incomprehensible to learners. It is also much harder for ASR systems, which must reconstruct deleted vowels from context.

EP: palavra "secretária"

scr-TÁ-rya (vowels deleted)

Unstressed vowels reduced/deleted

Critical: always select the right locale

Use pt-BR for Brazilian Portuguese and pt-PT for European Portuguese. A model trained on BP will systematically fail on EP vowel reduction. This locale selection matters more than speaking speed or clarity — wrong locale means the model is listening for sounds that don't match what you produce.

Vocabulary Divide — As Duas Normas em Palavras

Beyond phonology, BP and EP have diverged substantially in vocabulary — particularly for modern technology and everyday objects. A model trained primarily on BP data will not recognise EP words for the same concepts, and vice versa. These vocabulary differences are a significant source of errors when the wrong locale is selected:

Concept🇧🇷 BP🇵🇹 EPASR risk
Mobile phonecelulartelemóvelHigh — completely different words
BusônibusautocarroHigh — completely different words
FridgegeladeirafrigoríficoHigh
TraintremcomboioHigh
Bathroombanheirocasa de banhoMedium
Computer mousemouseratoMedium — BP uses English loanword
Download (verb)baixar / fazer downloaddescarregarMedium
To handle/managelidar comgerirLow — overlapping use
ApartmentapartamentoapartamentoNone — same word

Regional Brazilian Accents — Sotaques do Brasil

Brazil is larger than continental Europe, and its regional accents reflect that scale. The pt-BR model is trained predominantly on São Paulo and Rio de Janeiro speech — the dominant media centres. Here's how regional varieties perform:

🏙️

Paulistano (São Paulo) — Best Results

The São Paulo urban accent is the effective reference model for Brazilian Portuguese ASR — it dominates in media, corporate communication, and digital content production. Relatively flat intonation, clear vowels, and high media representation mean pt-BR models perform best on Paulistano speech. Expect word error rates of 7–12% in quiet conditions.

🏖️

Carioca (Rio de Janeiro) — Very Good

Rio's Carioca accent is the second most represented variety in training data. The distinctive features — palatalisation of /t/ and /d/ before /i/ ("tia" sounds like "tchya," "dia" sounds like "djya") and the characteristic rising intonation — are well-modelled. Palatalisation is the most recognition-relevant feature; models handle it correctly for common words.

🌵

Nordestino (Northeast Brazil) — Moderate

Northeast Brazilian Portuguese (Bahia, Pernambuco, Ceará, Maranhão) has the most open vowels of any Brazilian variety and a distinct rhythm — more syllable-timed than the stress-timed São Paulo standard. The open /e/ and /o/ in positions where SP speakers use closed vowels can confuse models. Vocabulary from regional culture (forró, nordestino slang) may not be in training data. Moderate error rates for natural speech.

🧉

Gaúcho (Rio Grande do Sul) — Moderate

Rio Grande do Sul's Portuguese is influenced by Italian and German immigration — producing a distinctive rolled /r/, different vowel qualities, and Spanish-influenced vocabulary near the border. The Gaúcho accent's retroflex /r/ (different from both the São Paulo and Rio varieties) and Spanish borrowings ("bah," "tchê," "guri") cause moderate error rates on vocabulary items.

🌾

Caipira (Interior São Paulo / Minas) — Challenging

The Caipira dialect — spoken in the interior of São Paulo, Minas Gerais, Goiás, and Mato Grosso — has the most distinctive feature of any Brazilian variety: strong retroflexion of post-vocalic /r/ (the "r caipira"), making it sound American-English-like. "Porta" becomes "pohta" with a distinct retroflex quality. This sound is underrepresented in training data from media-focused corpora. Caipira speakers doing formal dictation benefit from reducing retroflexion.

🏔️

Mineiro (Minas Gerais) — Moderate/Challenging

Minas Gerais Portuguese is known for its distinctive drawling intonation — vowels are lengthened and the melodic contour is exaggerated. "Uai" as an interjection, distinctive "uê" exclamations, and a particular stress pattern on final syllables all diverge from São Paulo standard. Models handle educated Mineiro speech acceptably; strong regional features cause moderate errors.

Nasal Vowels — Vogais Nasais: ã, em, im, om, um

Portuguese has one of the most complex nasal vowel systems of any Romance language — five nasal vowel phonemes, including the nasal diphthong /ã̃w̃/ (as in "não," "mão," "pão") that has no equivalent in Spanish, French, or Italian. For speech recognition, nasal vowels create two specific challenges:

🔤 Diacritic Selection (ã vs a)

The tilde (ã) and the nasal vowel digraphs (em, im, om, um, an, en, in, on, un) are output automatically — you never need to say "a with tilde." The model correctly outputs "irmã" not "irma," "coração" not "coracao." Where errors occur is in distinguishing nasal from oral vowels in fast speech — particularly the distinction between "a" and "ã" at word boundaries in rapid connected speech.

🌊 The -ão Diphthong

The word-final "-ão" diphthong (as in "não," "ação," "população") is phonetically complex — a nasal vowel followed by a nasal glide, producing a sound that has no equivalent in other European languages. In BP, it is produced relatively clearly. In EP, it undergoes additional reduction. The model handles "-ão" reliably for common words; less common words or fast speech may produce "-am" or "-on" in the output. Check "-ão" endings in formal text.

Nasal vowel pairs that cause errors

a / ã

la / lã (wool)

ao / ão

mao / mão (hand)

em / êm

tem / têm (has/have)

vem / vêm

comes (sing./pl.)

The Portuguese R Problem — O Problema do R

No feature of Portuguese phonology creates more regional variation — or more ASR confusion — than the letter R. Portuguese has at least five distinct R sounds across its varieties, and they are not interchangeable. A speech model must learn to associate all of them with the same orthographic "r" or "rr":

🇧🇷 São Paulo / Formal BP

Word-initial and doubled rr: uvular fricative /χ/ or /ʁ/ (similar to French R). Post-vocalic single r: often a retroflex approximant /ɻ/ or vocalized to a vowel-like quality in certain positions.

🏖️ Rio de Janeiro (Carioca)

Word-initial and rr: aspirated /h/ or /χ/ — "rio" sounds like "hio." This is the most distinctive Carioca feature. Post-vocalic r: similar to other BP but sometimes devoiced at syllable ends.

🌾 Caipira (Interior SP/MG)

Post-vocalic r: strongly retroflex /ɻ/ — the "r caipira." "Porta," "carro," "mar" all have a distinctive American-English-like retroflex quality. This is the ASR-challenging feature — underrepresented in media-based training data.

🇵🇹 European Portuguese

Word-initial and rr: uvular trill /ʀ/ or fricative /ʁ/. Intervocalic single r: flap /ɾ/. Word-final r: often deleted in informal EP speech entirely. The EP pattern is closer to French R in many positions.

🧉 Gaúcho (RS)

Influenced by Italian and Spanish immigration — some speakers produce a rolled alveolar trill /r/ similar to Spanish or Italian, rather than the uvular Brazilian standard. This feature causes specific recognition errors for Gaúcho speakers.

🌍 African Portuguese (AO/MZ)

Angolan and Mozambican Portuguese often use a flap or trill /r/ influenced by Bantu language phonology — neither the BP uvular nor EP uvular. Models trained on European/Brazilian data handle African Portuguese R patterns with moderate success.

Lusophone Africa — Português de África

More than 50 million people in Africa speak Portuguese — in Angola, Mozambique, Cape Verde, Guinea-Bissau, São Tomé and Príncipe. African Portuguese varieties have developed distinctively, influenced by Bantu languages (in Angola and Mozambique), Creole languages (Cape Verde), and other African linguistic substrates. Here's how speech recognition handles each:

🇦🇴

Angola — Moderate

Angolan Portuguese has a distinct rhythm influenced by Bantu languages — more syllable-timed, with clearer vowel articulation than EP. The Angolan educated standard performs moderately well with pt-PT locale. Vocabulary unique to Angola (influenced by Kimbundu and Umbundu) may not be recognised.

🇲🇿

Mozambique — Moderate

Mozambican Portuguese is influenced by Bantu languages (Ronga, Tsonga, Sena, Makua) and has distinctive vowel harmony features from those substrates. The Maputo educated standard is reasonably handled by pt-PT locale. Code-switching with local languages is common but not handled by standard models.

🇨🇻

Cape Verde — Most Challenging

Cape Verde's situation is unique: the native language of most Cape Verdeans is Kriolu (Cape Verdean Creole), a Portuguese-based creole. Formal Portuguese is used in education and government but spoken with heavy Kriolu phonological influence. Standard Portuguese ASR models handle formal Cape Verdean Portuguese with difficulty — Kriolu itself has no ASR support.

Best practice for African Portuguese speakers

Select pt-PT locale (not pt-BR — African Portuguese orthographic norms follow the European standard). Speak in a formal register close to European Portuguese phonology for best results. Code-switching into local languages should be typed manually — ASR will not handle it.

Como Começar — How to Start

1

Selecione a variante correta: pt-BR para o Brasil, pt-PT para Portugal e os PALOP (países africanos de língua oficial portuguesa)

Locale selection is the single most important step for Portuguese. pt-BR and pt-PT use different acoustic models — wrong selection causes systematic errors regardless of how clearly you speak.

2

Clique em "Start 🎤" e permita o acesso ao microfone quando solicitado

Click Start and allow microphone access. Chrome on desktop gives best Portuguese results — it uses Google's Portuguese ASR engine.

3

Fale em ritmo moderado. Falantes do EP: articulem as vogais átonas com mais clareza do que no discurso natural para melhorar a precisão

EP speakers: slightly over-articulate unstressed vowels compared to natural speech — the model needs more acoustic signal than rapid EP provides. BP speakers: natural pace works well.

4

Copie o texto ou baixe como TXT. Acentos (ã, ç, á, â, é, ê, ó, ô) e cedilha são inseridos automaticamente

All Portuguese diacritics — ã, ç, á, â, é, ê, í, ó, ô, ú — are output automatically. No special commands needed for accented characters.

Diacritics and Orthography — Acentos e Ortografia

Portuguese has more diacritics than Spanish but fewer than French. All are output automatically — you never spell them out. Here's what the model handles and where to double-check:

✅ Handled automatically and reliably

  • Cedilha ç — "cabeça," "coração," "açúcar" — always correct
  • Acute accent á, é, í, ó, ú — stress markers output correctly
  • Tilde ã, õ — "irmã," "não," "corações" — correct for common words
  • Circumflex â, ê, ô — "câmera," "você," "avô" — correct for common words
  • Grave à — used only in contractions "à," "às" — correct

⚠️ Check manually in formal writing

  • tem / têm (he has / they have) — circumflex distinguishes 3rd sg./pl.
  • vem / vêm (he comes / they come) — same issue
  • pelo / pélo (by the / hair) — acute distinguishes meaning
  • BP vs EP spelling differences — "fato/facto," "ótimo/óptimo" — post-2009 Orthographic Agreement affects these; select the right locale
  • Proper nouns with unusual diacritics — verify manually

Orthographic Agreement (AO 1990 / 2009)

The 1990 Portuguese Language Orthographic Agreement, fully implemented by 2016, unified some BP/EP spellings — removing certain consonant letters silent in BP ("facto" → "fato," "óptimo" → "ótimo" in BP). Models trained on post-2009 data use the reformed spellings for BP. EP still accepts both forms in some contexts. If you're writing for a specific publication, verify their house style.

Portunhol, Portuñol, and English Code-Switching

Brazilian tech and business professionals heavily mix English terms into Portuguese — a practice so common it's almost the default mode in startups and multinationals:

"Preciso de um feedback sobre o pitch antes do deadline de amanhã."

"Manda o link do drive no WhatsApp, preciso baixar o dashboard."

"Vamos fazer um brainstorming antes da call com o cliente."

The pt-BR model handles common English tech terms well — "feedback," "link," "drive," "dashboard," "deadline" are output in Roman script within the Portuguese text. Less common English words may be phonetically transcribed in Portuguese orthography — "fedbaque" for "feedback" in a model with limited English exposure. For formal documents, pause and type English terms manually. For internal notes and messages, the mixed output is typically clean enough.

Portunhol (Portuguese-Spanish mixing) is common near Brazil's borders with Spanish-speaking neighbours and among diaspora communities. The pt-BR model will attempt to transcribe Spanish words into Portuguese — often incorrectly. Spanish-Portuguese bilinguals should dictate entirely in one language and translate/edit the other-language portions manually.

Dicas para Melhor Precisão — Tips for Best Accuracy

✅ O que melhora a precisão

  • • Selecione pt-BR ou pt-PT corretamente — é o passo mais importante
  • • Falantes do EP: articule as vogais átonas com mais clareza
  • • Termine a frase completa antes de pausar
  • • Sotaque Caipira forte: reduza o r retrofléxico em ditados formais
  • • Ambiente silencioso — ruído de fundo prejudica a segmentação
  • • Chrome oferece o melhor suporte para português
  • • Para vocabulário técnico em inglês: pause e digite manualmente

⚠️ Erros comuns e soluções

  • tem / têm, vem / vêm — verifique em escrita formal
  • -ão final — confira em palavras menos comuns
  • Nomes próprios incomuns — sempre verificar
  • Palavras diferentes BP/EP (ônibus/autocarro) — locale errado
  • Inglês transcrito foneticamente — digitar manualmente
  • Números grandes financeiros — confirme "bilhão" vs "milhão"

Quem Usa — Who Uses Portuguese Voice to Text

📱

WhatsApp Brazil

Brazil has the second largest WhatsApp user base in the world after India — over 120 million users. Voice dictation for long Portuguese messages, group updates, and client communication is extremely common. Voice-to-text is faster than typing Portuguese diacritics on a mobile keyboard.

🚀

Brazilian Tech Startups

São Paulo's vibrant startup ecosystem — one of Latin America's largest — generates huge volumes of Brazilian Portuguese business communication. Founders and teams use voice dictation for emails, PRDs, meeting notes, and investor updates in Portuguese. Faster than typing, especially for longer documents.

🎬

Content Creators

Brazilian YouTube is the third largest YouTube market globally. Portuguese content creators use voice-to-text for video scripts, subtitles, descriptions, and social posts. Dictating in Portuguese and editing the transcript is significantly faster than writing from scratch.

📚

Students & Academics

University students across Brazil and Portugal use Portuguese voice dictation for thesis drafts, research summaries, and essay outlines. Speaking academic Portuguese and cleaning up the transcript is faster than composing formal prose on a keyboard.

⚖️

Legal & Medical Professionals

Brazilian and Portuguese lawyers, judges, and doctors use voice dictation for case notes, petitions, and clinical documentation. Formal Portuguese legal vocabulary ("doravante," "conforme," "nos termos de") transcribes accurately. Always proofread before signing or filing.

🌍

Portuguese Diaspora

Brazilian communities in the US, UK, Portugal, and Japan, and Portuguese communities across Europe, use voice dictation for family communication, official Portuguese documents, and community work. Voice bypasses the friction of Portuguese keyboard layouts on non-Portuguese devices.

Comandos de Voz em Português — Voice Commands

Diga estas palavras durante o ditado para adicionar pontuação e formatação:

Pontuação / Punctuation

Diga / SayInsere / Inserts
"ponto". (full stop)
"vírgula", (comma)
"ponto e vírgula"; (semicolon)
"dois pontos": (colon)
"ponto de interrogação"?
"ponto de exclamação"!
"reticências"
"aspas"" " (quotes)
"travessão"— (em dash)

Formatação / Formatting

Diga / SayAção / Action
"nova linha"New line
"novo parágrafo"New paragraph
"apagar"Delete last word
"pausar"Pause recognition

BP vs EP punctuation terms

"Ponto final" (EP) and "ponto" (BP) both work for full stop. "Vírgula" is consistent across both varieties. Chrome's Portuguese mode responds to both BP and EP command variants when the correct locale is selected.

Transcrever arquivos de áudio em português — MP3, WAV, MP4

Upload Portuguese audio recordings — meetings, podcasts, lectures, interviews. Pro plan supports files up to 5 hours with timestamps. / Envie gravações em português e receba o texto com marcações de tempo.

Ver planos Pro →

Perguntas Frequentes — FAQ

Por que o português europeu é mais difícil para reconhecimento de voz do que o brasileiro?

O principal motivo é a redução vocálica. O português europeu reduz ou elimina vogais átonas de forma muito agressiva — "professor" pode soar como "prfessor," "secretária" como "scrtrya." O modelo de reconhecimento de voz precisa reconstruir as vogais apagadas a partir do contexto, o que aumenta a taxa de erro. O português brasileiro mantém as vogais átonas com qualidade plena, tornando a correspondência entre fala e grafia muito mais direta.

Does it handle all Portuguese diacritics (ã, ç, â, ê, etc.) automatically?

Yes, completely automatically. You never need to say "a with tilde" or "c with cedilla." The model outputs ã, ç, á, â, é, ê, í, ó, ô, ú, à based on phonology and word recognition. The cedilha (ç) is handled reliably for common words. The main diacritics to double-check manually are the circumflex-distinguished verb forms: tem/têm, vem/vêm — where the circumflex marks the difference between 3rd person singular and plural.

Funciona para o português de Angola e Moçambique?

Sim, com limitações. Selecione pt-PT (o padrão ortográfico dos PALOP segue o europeu). O português formal angolano e moçambicano é razoavelmente reconhecido — especialmente em registos mais próximos do europeu padrão. Vocabulário específico de Angola ou Moçambique influenciado por línguas bantu não estará no modelo. Alternância de código com línguas locais (kimbundu, ronga, etc.) não é suportada — digite essas partes manualmente.

I have a strong Caipira accent — how do I improve accuracy?

The Caipira retroflex /r/ (the "r caipira") is the primary source of ASR errors for interior São Paulo and Minas Gerais speakers. The model was trained predominantly on urban SP/RJ speech and the strong retroflex post-vocalic r is underrepresented. Two practical strategies: first, consciously reduce the retroflexion in post-vocalic positions during dictation — articulate word endings (porta, carro, mar) with a less retroflex quality than your natural speech. Second, speak slightly more slowly to give the model more time to process the unfamiliar acoustic pattern. These changes can reduce Caipira-specific error rates by 30–40%.

Funciona no celular no Brasil e em Portugal?

Sim. Funciona no Android com Chrome e no iPhone com Safari em todos os países de língua portuguesa. O Chrome no Android oferece os melhores resultados para o português. Nenhuma instalação é necessária — funciona diretamente no navegador. Especialmente útil para mensagens longas no WhatsApp — dite, copie e cole, sem precisar navegar pelo teclado com diacríticos.

What changed with the 2009 Orthographic Agreement — does it affect transcription?

The Acordo Ortográfico de 1990 (fully implemented by 2016) unified some spellings between BP and EP — primarily removing silent consonants from BP words: "acção" → "ação," "facto" → "fato," "óptimo" → "ótimo" in Brazilian Portuguese. The pt-BR model uses reformed spellings. The pt-PT model uses the reformed EP spellings (which differ slightly — "ação" is the same, but some words retain both forms in Portugal). If you're writing for a publication with a specific house style, verify which post-AO spellings they use.

Ferramentas relacionadas

Comece a Ditar em Português Agora

Gratuito, sem instalação, sem cadastro. Acentos e cedilha automáticos.

Chrome recomendado — melhor suporte para reconhecimento de voz em português