German Voice to Text — Spracherkennung auf Deutsch

German presents one challenge that no other major language does at the same scale: compound words. German builds new nouns by joining existing ones without spaces — "Donaudampfschifffahrtsgesellschaft" (Danube steamship company) is one word, "Kraftfahrzeughaftpflichtversicherung" (motor vehicle liability insurance) is one word. A speech recognition model must decide, in real time, whether what you just said is one compound word or two separate words — without any acoustic boundary between them. German also has three grammatical genders, four cases with distinct article forms, and three major national standard varieties (German, Austrian, Swiss) with significant phonological differences. This page explains all of it.

Die größte Herausforderung der deutschen Spracherkennung ist die Komposita-Segmentierung: Das System muss in Echtzeit entscheiden, ob zwei aufeinanderfolgende Silben ein langes Kompositum oder zwei separate Wörter bilden. Dazu kommen Umlaute, Kasus-Endungen und die Unterschiede zwischen Hochdeutsch, Österreichisch und Schweizerdeutsch.

Compound Words: German's Defining ASR Challenge — Komposita

German is a compounding language — it creates new nouns by joining existing words together, written as a single word without spaces or hyphens. This is not a quirk; it is the standard mechanism for expressing complex concepts. Where English says "health insurance," German says "Krankenversicherung." Where English says "data protection regulation," German says "Datenschutzgrundverordnung." Where English says "speed limit," German says "Geschwindigkeitsbegrenzung."

For speech recognition, this creates the inverse problem: when you say "Kranken versicherung" (health insurance), the model must output "Krankenversicherung" — a single word. But when you say "kranken Versicherungsvertreter" (sick insurance agent), those are two separate words. The acoustic boundary between them is identical. The model uses a language model (statistical word co-occurrence) to decide — and for established compound words, it performs well. Where it fails is with novel compounds, technical jargon, and very long compounds that are rare in training data.

Compound word examples — length progression

HaustürShort — always correct
KrankenversicherungCommon — correct
BundesverfassungsgerichtEstablished — usually correct
KraftfahrzeughaftpflichtversicherungLong — check output
DonaudampfschifffahrtsgesellschaftVery long — verify manually

Practical tip for compound words

Speak compound words with continuous airflow — no pause or breath between the component parts. A micro-pause between "Kranken" and "versicherung" tells the model they are separate words. For very long technical compounds, speak them slightly more slowly but without any pause. If a compound splits incorrectly in the output, simply delete the space — correction is faster than re-dictating.

Three National Standards — Drei Standardvarietäten

German is an official language of Germany, Austria, Switzerland, Liechtenstein, Luxembourg, and parts of Belgium and Italy. The three major national standard varieties — German Standard (Bundesdeutsch), Austrian Standard (Österreichisches Deutsch), and Swiss Standard (Schweizer Hochdeutsch) — differ significantly in phonology, vocabulary, and in the Swiss case, orthography. Speech recognition models trained primarily on German Standard (de-DE) make systematic errors on Austrian and Swiss input:

🇩🇪

Bundesdeutsch (de-DE) — Best Results

Standard German as spoken in northern and central Germany — used in national broadcasting, official contexts, and by most trained speakers as a prestige variety. The reference model for all German ASR. Flat intonation, clear consonant articulation, uvular /r/ in most positions. Expect word error rates of 6–10% for clear speech in quiet conditions.

🇦🇹

Österreichisches Deutsch (de-AT) — Good

Austrian German is a recognised national standard with its own orthographic norms and an official dictionary (ÖWB). Key phonological differences: the /a:/ vowel is more fronted; the vowel /ɔ/ in words like "oft" sounds more like EP Portuguese /ɔ/; the final /r/ is often vocalised to a schwa. Vocabulary differences are substantial — "Jänner" not "Januar" for January, "Erdäpfel" not "Kartoffel" for potato, "Marille" not "Aprikose" for apricot. Select de-AT locale for Austrian users.

🇨🇭

Schweizer Hochdeutsch (de-CH) — Moderate

Swiss Standard German differs in a critical orthographic rule: the ß does not exist in Swiss German — it is always written as "ss" (Strasse not Straße, heiss not heiß). Phonologically, Swiss speakers use a syllable-timed rhythm (unlike the stress-timed German and Austrian varieties), produce a uvular /r/ rarely (using a more alveolar trill in many regions), and have vowel length distinctions that differ from German Standard. Select de-CH; the model handles Swiss Standard reasonably, though heavy Swiss dialect influence increases errors.

Schweizerdeutsch ≠ Schweizer Hochdeutsch

Schweizerdeutsch (Swiss German dialects — Zürichdeutsch, Berndeutsch, Baseldeutsch) are the native spoken varieties of German-speaking Switzerland. They are not Hochdeutsch with an accent — they are genuinely distinct varieties, mutually unintelligible with Standard German in their broadest forms, with different vowel systems, consonant inventories, and grammar. Standard German ASR models perform poorly on natural Schweizerdeutsch. Swiss German speakers should switch to Schweizer Hochdeutsch (the formal register) for dictation.

National Vocabulary Differences — Bezeichnungsvarianten

Each German-speaking country uses different words for many common concepts. A de-DE model hearing Austrian vocabulary may recognise the word but output the German equivalent — or fail to recognise it entirely:

Concept🇩🇪 German🇦🇹 Austrian🇨🇭 Swiss
JanuaryJanuarJännerJanuar
PotatoKartoffelErdäpfelErdäpfel / Härdöpfel
ApricotAprikoseMarilleAprikose / Barille
TomatoTomateParadeiserTomate
ButcherMetzger / FleischerFleischhauerMetzger
TramStraßenbahnStraßenbahn / BimTram
ElevatorAufzug / FahrstuhlLiftLift
ApartmentWohnungWohnungWohnung / Wohnung
BicycleFahrradFahrrad / RadVelo

Umlauts, ß and German Orthography — Umlaute und Eszett

✅ Umlauts Are Automatic

The three German umlauts — ä, ö, ü — are output automatically and reliably. You never need to say "a umlaut" or "oe." "Über," "größer," "Mädchen," "Höflichkeit" — all umlaut characters are inferred from phonology and word recognition. The model also correctly handles the phonological distinction between short and long umlauts ("öffnen" vs "Österreich") to choose correct spelling. Umlauts are one of German ASR's strongest features.

⚠️ The ß / ss Decision

The ß (Eszett or scharfes S) represents a long /s/ after long vowels and diphthongs. "Straße" (street), "heiß" (hot), "Spaß" (fun) use ß. "Wasser" (water), "dass" (that), "muss" (must) use ss after short vowels. The model applies the 1996 orthographic reform rules correctly for common words. Where it may fail: uncommon words, proper names, and words that changed under the reform (formerly "daß" is now "dass" — models trained on older data may output the old form). Swiss output always uses ss (no ß), regardless of vowel length.

🔡 Capitalisation of Nouns

German capitalises all nouns — not just proper nouns. "Das Haus ist groß" (The house is big) — Haus is capitalised because it's a noun. This is a mandatory rule with no exception for common nouns. Speech recognition handles noun capitalisation automatically using part-of-speech analysis: it identifies nouns in the sentence structure and capitalises them. For common nouns, this is very accurate. Errors occur with borderline cases — nominalised verbs ("das Laufen"), nominalised adjectives ("das Schöne"), and words that are nouns in some contexts and adjectives in others.

🔢 Numbers and Ordinals

German number words are long: "zweitausendvierundzwanzig" (2024), "neunzehnhundertfünfundachtzig" (1985). The model outputs Arabic numerals (2024, 1985) for numbers spoken in full — this is standard in modern German writing. Ordinals ("der dritte," "zum zweiten Mal") are output correctly as text, not numbers. Large financial figures should be verified — "eine Million zweihunderttausend Euro" should produce "1.200.000 €" or "1,2 Millionen Euro" depending on context.

Three Genders, Four Cases — Drei Genera, Vier Kasus

German has three grammatical genders (masculine der, feminine die, neuter das) and four grammatical cases (nominative, accusative, dative, genitive), producing a complex article and adjective declension system. This matters for speech recognition in a specific way: the spoken forms of many case endings are reduced or merged in natural speech, and the model must reconstruct the correct written form.

CaseMasculineFeminineNeuterASR challenge
Nominativeder Manndie Fraudas KindNone — clear forms
Accusativeden Manndie Fraudas Kindder/den distinction — "-en" can be swallowed
Dativedem Mannder Fraudem KindMedium — "dem" vs "den" in fast speech
Genitivedes Mannesder Fraudes KindesHigh — "-es" ending often reduced in speech

The genitive case is particularly prone to errors — in spoken German, genitive constructions are increasingly replaced by dative ("wegen dem Regen" instead of "wegen des Regens"), and the model may output the colloquial dative form when you intended a formal genitive. For legal, academic, or formal business writing, check genitive constructions manually.

German Dialect Accuracy — Dialektale Variation

German has an unusually rich dialect landscape — regional varieties that diverge from Standard German to the point of mutual incomprehensibility in their strongest forms. All ASR models target Hochdeutsch (Standard German). Here's how major dialect regions perform:

📻

Hochdeutsch / Bühnendeutsch — Best

Standard German as spoken in Hannover, on national TV (ARD, ZDF), and by trained speakers. Minimal regional colouring, clear consonants, standard /r/ realisation. This is the reference model. Speakers from northern Germany (Hamburg, Hannover, Berlin educated speech) approach this standard most closely.

🍺

Bavarian / Bairisch — Moderate

Bavarian German (Bayern, southern Bavaria, parts of Austria) has distinct vowel quality, diphthongisation, and characteristic features like lenition of stops ("kaufen" → "kau'n"), the "i"-suffix for diminutives ("Bua" for boy, "Dirndl"), and distinct negation ("ned" not "nicht"). Educated Bavarian approximating Standard German performs moderately. Natural Bavarian dialect increases errors significantly.

🏭

Ruhrgebiet / Kölsch — Moderate

The Ruhr area and Cologne produce a softened Rhineland variety — the "Reibelaut" (spirantisation of stops), the characteristic Kölsch vocabulary, and the distinctive intonation pattern ("Frageintonation" rising tone on statements). Educated Rhineland speech handles well. The Kölsch dialect proper (used in Cologne pubs and Carnival) has specific vocabulary and pronunciation that the de-DE model will not recognise.

🏰

Sächsisch (Saxon) — Moderate/Challenging

Saxon German (Sachsen — Dresden, Leipzig, Chemnitz) is frequently cited as the most strongly marked regional variety in Germany. Distinctive features: /g/ is often pronounced as /j/ ("jut" for "gut"), the "Einheitsvokal" (vowel flattening), and a characteristic melodic pattern. Saxon-accented speech causes moderate errors; strong Saxon dialect produces higher error rates. Saxons doing formal dictation benefit from conscious approximation toward northern German standard pronunciation.

🎻

Wienerisch (Viennese) — Challenging

Natural Viennese dialect — as spoken in working-class Viennese neighborhoods and in traditional Viennese contexts — diverges substantially from Standard Austrian German. Characteristic vowel shifts ("i" → "ü" in some positions), apocope (word-final vowel deletion), specific vocabulary, and a distinctive cadence. ASR models handle educated Viennese German well; natural Wienerisch dialect increases errors considerably.

🏔️

Schweizerdeutsch — Most Challenging

Natural Swiss German dialects (Zürichdeutsch, Berndeutsch, Baseldeutsch) are the most challenging for Standard German ASR. They preserve the medieval high German consonant distinction between /k/ and /kx/ (the "Chuchichästli" — kitchen cupboard — famously demonstrates this), have different vowel systems, no ß, syllable-timing, and completely different vocabulary for many items. Swiss Standard German (Schweizer Hochdeutsch) performs moderately; natural Schweizerdeutsch produces high error rates with standard de-DE/de-CH models.

Wie man beginnt — How to Start

1

Wähle die richtige Sprachvariante: de-DE für Deutschland, de-AT für Österreich, de-CH für die Schweiz

Locale selection matters for vocabulary and phonology. Austrian and Swiss users should select de-AT and de-CH respectively — wrong locale causes vocabulary recognition failures for region-specific words.

2

Klicke auf „Start 🎤" und erlaube den Mikrofonzugriff, wenn du dazu aufgefordert wirst

Click Start and allow microphone access. Chrome on desktop provides best German ASR results.

3

Sprich Komposita ohne Pause zwischen den Bestandteilen — ein Atemzug, ein Wort

The most important German dictation tip: say compound words with no pause between components. A micro-pause splits them into separate words in the output.

4

Kopiere den Text oder lade ihn als TXT herunter. Umlaute (ä, ö, ü) und ß werden automatisch korrekt gesetzt

Copy or download as TXT. Umlauts and ß are handled automatically. Check genitive case endings and long compound words in formal writing.

Sie vs du — Formal and Informal Address in Dictation

German distinguishes formal address (Sie — capitalised, with third-person plural verb forms) from informal address (du — lowercase, with second-person singular verb forms). This distinction affects not just the pronoun but the entire verb conjugation and any associated adjective forms. "Haben Sie das Dokument?" (formal) vs "Hast du das Dokument?" (informal).

For speech recognition, the key point: the model correctly capitalises "Sie" when it is the formal pronoun (distinguished from "sie" meaning "she" or "they" by context). When you dictate "Können Sie mir helfen?" the model correctly outputs "Sie" with a capital. When you dictate "sie geht nach Hause" the model correctly outputs "sie" lowercase for "she." This context-sensitive capitalisation is one of German ASR's most impressive features — but it can fail in ambiguous sentences. Check "Sie / sie / Sie" in formal correspondence.

Tipps für bessere Ergebnisse — Tips for Best Accuracy

✅ Was die Genauigkeit verbessert

  • • Richtige Sprachvariante wählen — de-DE, de-AT oder de-CH
  • • Komposita ohne Pause sprechen — ein Wort, ein Atemzug
  • • Vollständige Sätze sprechen, bevor man pausiert
  • • Dialektsprecher: Hochdeutsch annähern für formale Diktiertexte
  • • Schweizerdeutsch vermeiden — Schweizer Hochdeutsch verwenden
  • • Klare Aussprache der Endsilben — "-en", "-em", "-es" nicht verschlucken
  • • Chrome bietet die beste deutsche Spracherkennung

⚠️ Häufige Fehler und Lösungen

  • Kompositum aufgespalten — Leerzeichen manuell löschen
  • Genitivendung falsch — in formalen Texten manuell prüfen
  • ß vs ss Fehler — nach Reform-Regeln korrigieren; Schweiz immer ss
  • Sie / sie Groß-/Kleinschreibung — in Briefen und E-Mails prüfen
  • Österreichische Wörter nicht erkannt — de-AT Locale nutzen
  • Dialektform im Output — Hochdeutsch-Äquivalent verwenden

Wer nutzt deutsche Spracherkennung — Who Uses German Voice to Text

⚖️

Juristen & Notare

German legal writing has elaborate formal requirements — genitive constructions, complex subordinate clauses, precise case endings. Voice dictation for first drafts of legal documents, contracts, and court submissions is widely used in German law firms. Always proofread case endings and genitive constructions before signing.

🏥

Ärzte & Mediziner

German medical terminology is dominated by compound nouns — "Bluthochdruck," "Herzrhythmusstörung," "Gallenblasenentzündung." Doctors dictate clinical notes and reports using exactly the vocabulary that German ASR handles best: formal, precise, compound-noun-heavy. A major use case across German, Austrian, and Swiss hospitals.

💼

Geschäftsleute

German business writing has a formal register with specific compound noun vocabulary ("Geschäftsführung," "Jahresabschluss," "Gewinnbeteiligung"). Business professionals dictate emails, reports, and memos — the formal written register of German business correspondence matches well with ASR training data.

🎓

Studierende & Forscher

University students dictate essay drafts, seminar papers, and thesis sections in German. Academic German — formal, structured, noun-heavy — is one of the registers German ASR handles best. Speaking academic German and editing the output is significantly faster than typing, especially for long texts.

🌍

Deutsche Diaspora

German-speaking communities in the US, UK, Australia, and South America use voice dictation for German family communication, official German documents, and business correspondence with German-speaking partners. Voice removes the friction of German keyboard layouts (ä, ö, ü, ß) on non-German systems.

📝

Journalisten & Autoren

German journalists and authors use voice dictation for article drafts, interview transcriptions, and book sections. Speaking content aloud and editing the transcript is faster than writing German from scratch — particularly for complex subordinate clause constructions that benefit from being spoken naturally first.

Deutsche Sprachbefehle — German Voice Commands

Sage diese Wörter während des Diktierens, um Satzzeichen einzufügen. Chrome hat die umfassendste Unterstützung für deutsche Sprachbefehle:

Satzzeichen / Punctuation

Sag / SayEinfügen / Inserts
"Punkt". (full stop)
"Komma", (comma)
"Semikolon"; (semicolon)
"Doppelpunkt": (colon)
"Fragezeichen"?
"Ausrufezeichen"!
"Anführungszeichen"„ " (German quotes)
"Gedankenstrich"— (em dash)
"Auslassungszeichen"

Format / Formatting

Sag / SayAktion / Action
"Neue Zeile"New line
"Neuer Absatz"New paragraph
"Löschen"Delete last word
"Pause"Pause recognition

German Anführungszeichen

German uses „lower-99, upper-66" quotation marks — opening „ at the bottom, closing " at the top. This differs from English "66-99" quotes. Chrome's German mode outputs „ " correctly when you say "Anführungszeichen." Austrian German may use »French guillemets« instead — verify house style in formal documents.

Deutsche Audiodateien transkribieren — MP3, WAV, MP4

Upload German audio recordings — meetings, lectures, interviews, podcasts. Pro plan handles files up to 5 hours with timestamps. / Deutsche Aufnahmen hochladen und Text mit Zeitstempeln erhalten.

Pro-Pläne ansehen →

Häufige Fragen — FAQ

Wie erkennt das System lange Komposita wie „Kraftfahrzeughaftpflichtversicherung"?

Das Sprachmodell verwendet statistische Worthäufigkeit und Kontext: „Kraftfahrzeughaftpflichtversicherung" ist ein im Deutschen bekanntes Kompositum, das im Trainingskorpus vorkommt. Häufige etablierte Komposita werden zuverlässig korrekt ausgegeben. Sehr lange oder seltene Komposita können aufgespalten werden — der Fehler ist dann leicht durch manuelles Löschen des Leerzeichens zu korrigieren. Entscheidend: kein Atemzug oder Pause zwischen den Bestandteilen des Kompositums beim Sprechen.

Werden Umlaute (ä, ö, ü) und ß automatisch korrekt gesetzt?

Umlaute werden automatisch und zuverlässig gesetzt — du musst nie „a-Umlaut" oder „ae" sagen. Das Modell erkennt die Phonologie und gibt den korrekten Buchstaben aus. Das ß wird gemäß den Regeln der Rechtschreibreform von 1996 eingesetzt: nach langen Vokalen und Diphthongen (heiß, Straße, Spaß), nicht nach Kurzvokalen (dass, muss, Wasser). Bei der Schweizer Variante (de-CH) wird stets „ss" statt „ß" ausgegeben. Fehler beim ß können in seltenen oder reformierten Wörtern auftreten — in formellen Texten prüfen.

Funktioniert es mit österreichischem Deutsch?

Ja, mit de-AT Locale gut. Österreichisches Standarddeutsch — wie im ORF verwendet — wird zuverlässig erkannt. Wichtig: de-AT Locale auswählen, damit österreichische Bezeichnungsvarianten korrekt erkannt werden (Jänner, Marille, Paradeiser, Erdäpfel). Natürliche österreichische Dialekte (Wienerisch, Steirisch, Vorarlbergerisch) werden mit de-AT moderater Genauigkeit erkannt; starke Dialektmerkmale erhöhen die Fehlerquote. Für formales Diktieren empfiehlt sich österreichisches Standarddeutsch.

Does German voice recognition capitalise nouns automatically?

Yes — this is one of German ASR's most impressive features. The model uses part-of-speech analysis to identify nouns in the sentence structure and capitalises them automatically. "Das Haus ist groß" — Haus is capitalised correctly without any special command. Common nouns are very accurate. Borderline cases — nominalised verbs ("das Laufen"), nominalised adjectives ("das Schöne"), and contextually ambiguous words — may occasionally be capitalised incorrectly. Always check capitalisation in formal writing before submission.

Can non-native German speakers use this effectively?

Yes, from B2 level upward. The main challenges for non-native speakers: (1) the /r/ sound — German has a uvular /ʁ/ quite unlike English /r/; non-native /r/ substitution causes some errors but context compensates for common words; (2) the final devoicing — German devoices all final obstruents ("Weg" is spoken /veːk/ not /veːg/), which non-native speakers sometimes violate; (3) consonant clusters like /ʃtʁ/ (Straße), /pfl/ (Pflege), /kn/ (Knie) — speak these clearly. German ASR is generally robust to mild non-native accents for standard vocabulary.

Wird Schweizerdeutsch erkannt?

Natürliches Schweizerdeutsch (Zürichdeutsch, Berndeutsch, Baseldeutsch) wird von Standard-ASR-Modellen schlecht erkannt — die Dialekte unterscheiden sich zu stark vom Hochdeutschen in Vokalqualität, Konsonantensystem und Vokabular. Schweizer sollten für das Diktieren Schweizer Hochdeutsch (die formale Varietät, wie sie im Schweizer Radio und Fernsehen gesprochen wird) verwenden, nicht Mundart. De-CH erkennt Schweizer Hochdeutsch gut und gibt stets „ss" statt „ß" aus, was dem Schweizer Schreibstandard entspricht.

Verwandte Tools

Jetzt auf Deutsch diktieren — Start Dictating in German

Kostenlos, keine Installation, keine Registrierung. Umlaute und ß automatisch korrekt.

Chrome empfohlen — beste Unterstützung für deutsche Spracherkennung