Korean Voice to Text — 한국어 음성 텍스트 변환

Korean presents two challenges that are unlike anything in Latin-script or even other East Asian languages. First, Korean has the most complex spacing rules of any major language — where to put spaces between words is a grammatical question, not just a stylistic one, and the rules involve morphological analysis that even native Koreans frequently get wrong. Second, Korean has seven distinct speech levels — called speech level honorifics — each with completely different verb endings, and selecting the wrong one is a social error, not just a grammatical one. Add consonant assimilation patterns that transform the sounds of words completely in connected speech, SOV word order that places verbs at the end, and the North/South Korean vocabulary split, and you have one of the most linguistically rich ASR challenges of any world language.

한국어 음성 인식의 가장 큰 과제는 띄어쓰기 규칙과 경어법입니다. 자음 동화, 연음법칙 등 음운 변화도 중요한 도전입니다. 이 페이지에서 한국어 음성 인식이 어떻게 작동하는지, 그리고 최선의 결과를 얻는 방법을 자세히 설명합니다.

Korean Spacing Rules — 띄어쓰기: The Most Complex Word Spacing in Any Language

Korean spacing rules are a genuine grammatical system, not just a convention. The Korean Language Institute's official spelling standard (한글 맞춤법) devotes an entire chapter to spacing rules — and native Korean speakers, including highly educated ones, make spacing errors regularly. Unlike English where spaces simply separate words, Korean spaces separate semantic units called 어절 (eojel) that can include a content word plus its attached grammatical particles.

For speech recognition, this creates a unique challenge: the model must apply the full morphological analysis of Korean spacing rules to continuous spoken input. The acoustic signal contains no information about spacing — the model must infer correct spacing entirely from grammatical and lexical knowledge. This is Korean ASR's most persistent source of errors, and it affects output that is otherwise phonetically accurate.

Common spacing situations — correct and incorrect

나는 학교에 갔다나는학교에갔다

Particles (에, 는) attach to the preceding noun without space — but the noun unit is separated from the next unit

먹을 수 있다먹을수있다 / 먹을수 있다

수 (bound noun meaning "can/possibility") requires a space before and after — bound nouns are a major spacing difficulty

그것이 맞는 것 같다그것이 맞는것 같다

것 (thing/fact) as a bound noun requires a space before it

책 한 권책한권

Counter words (권 for books) are separate from numbers and nouns

Practical approach to spacing errors

Korean ASR spacing errors are the most common type of error you'll see — more common than phoneme recognition errors. The good news: spacing errors don't change the meaning acoustically, so you can read the output and understand it even with wrong spacing. For formal documents, use Korean word processor spell-check (한글 맞춤법 검사) after dictation — this catches most spacing errors automatically. For informal use (messages, notes), the content is clear despite spacing issues.

Speech Levels — 경어법: Seven Ways to End a Sentence

Korean has seven grammatically distinct speech levels — called 경어법 (gyeong-eo-beop) or speech level honorifics — that are expressed primarily through verb endings. Unlike Japanese keigo (which involves separate vocabulary), Korean speech levels primarily change the verb ending while keeping the same core vocabulary. The choice of speech level encodes the social relationship between speaker and listener: age, rank, familiarity, and context all determine which level is appropriate.

LevelName"I eat" (먹다)ContextASR accuracy
Formal polite합쇼체먹습니다News, formal presentations, business✅ Excellent
Informal polite해요체먹어요Everyday polite speech, strangers✅ Excellent
Plain/Intimate해체먹어Close friends, younger people, self✅ Very good
Blunt해라체먹어라Commands, written narrative✅ Good
Familiar하게체먹네Older to younger adult (formal)⚠️ Less common — moderate
Semiformal하오체먹소Literary, archaic — rarely spoken⚠️ Rare — may error
Formal high하십시오체드십시오Maximum deference, formal announcements✅ Good for common forms

The two most commonly used levels — 합쇼체 (formal polite) and 해요체 (informal polite) — are the best supported by Korean ASR models, as they dominate written Korean content and most media. The 하게체 and 하오체 levels are archaic in spoken use and may produce errors because they're underrepresented in training data. For dictation, use whichever speech level is natural for your context — but be aware that archaic levels will have higher error rates.

Consonant Assimilation — 자음 동화: When Sounds Transform Completely

Korean has a systematic set of phonological rules that change how consonants are pronounced in connected speech. These rules are predictable and apply consistently — but they mean that the spoken form of a word can sound dramatically different from how it's written. Speech recognition must "reverse" these transformations to output the correct written form.

연음법칙 — Liaison (Sound Linking)

When a syllable ending in a consonant is followed by a syllable beginning with ㅇ (a null initial consonant), the final consonant moves to the beginning of the next syllable. "먹어요" (meog-eo-yo, I eat) is pronounced "머거요" (meo-geo-yo). The written form stays 먹어요 but the spoken form sounds like 머거요. ASR models handle this correctly for common words — it's deeply embedded in training data.

먹어요 written → 머거요 spoken

입어요 written → 이버요 spoken

닭이 written → 달기 spoken

비음화 — Nasalisation

Korean stops (ㄱ, ㄷ, ㅂ) become nasals (ㅇ, ㄴ, ㅁ) before nasal consonants. "국민" (gukmmin, citizen) is pronounced "궁민" (gungmin) — the ㄱ becomes ㅇ before ㅁ. "백만" (baekman, one million) is pronounced "뱅만" (baengman). The model outputs 국민 and 백만 (the correct written forms) despite hearing the nasalised spoken versions.

국민 written → 궁민 spoken

백만 written → 뱅만 spoken

입니다 written → 임니다 spoken

경음화 — Tensification (Fortition)

Korean has three sets of stops: lax (ㅂ,ㄷ,ㄱ), aspirated (ㅍ,ㅌ,ㅋ), and tense/fortis (ㅃ,ㄸ,ㄲ). Tensification (경음화) makes lax consonants become tense after certain environments. "학교" (school) is pronounced "학꾜" — the ㄱ of 교 becomes tense ㄲ. "식당" (restaurant) is pronounced "식땅." The model handles tensification correctly for common words because native speakers apply it consistently and training data reflects natural speech.

학교 written → 학꾜 spoken

식당 written → 식땅 spoken

국자 written → 국짜 spoken

ㅎ 탈락 — H-Deletion

The consonant ㅎ is frequently deleted in certain phonological environments — between voiced sounds, it often disappears entirely. "좋아요" (joayo — it's good) is pronounced "조아요" with no ㅎ. "괜찮아요" (gwaenchanayo — it's okay) undergoes similar reduction. Models handle common ㅎ-deletion patterns correctly. Errors occur with less common vocabulary where ㅎ deletion is phonologically predictable but lexically unusual.

좋아요 written → 조아요 spoken

많아요 written → 마나요 spoken

싫어요 written → 시러요 spoken

Key insight: speak naturally, don't over-articulate

Unlike some languages where over-articulating helps, Korean ASR performs better when you speak naturally — applying all phonological rules as a native speaker would. The model is trained on natural Korean speech including assimilation. Artificially "correcting" your pronunciation to match spelling (saying 먹어요 as "meok-eo-yo" rather than "meo-geo-yo") can actually hurt accuracy because it produces unnatural phoneme sequences.

North vs South Korean — 남북한 언어 차이

Since the division of Korea in 1945, North Korean (조선어, Joseoneo) and South Korean (한국어, Hangugeo) have diverged substantially — in vocabulary, pronunciation norms, and even orthographic conventions. ASR models trained on South Korean data (the overwhelmingly dominant variety in training corpora) will make systematic errors on North Korean speech.

Vocabulary Differences

North Korean deliberately replaced many Sino-Korean and foreign loanwords with native Korean equivalents after 1948. South Korean freely adopted English loanwords; North Korean prohibited most of them. "Computer" in South Korea is "컴퓨터" (keompyuteo, English loanword); in North Korea it's "전자계산기" (electronic calculator) or "콤퓨터." "Ice cream" is "아이스크림" in the South, "얼음보숭이" in the North.

Pronunciation Differences

North Korean preserves initial ㄹ (r/l) before vowels and ㄴ before certain vowels — features that underwent the "두음법칙" (initial sound rule) in South Korean. "노동" (labour) is 로동 in North Korean. "여자" (woman) is 녀자 in North Korea. The ㅏ/ㅐ vowel distinction is maintained more clearly in North Korean. South Korean ASR models handle these features poorly.

For North Korean defectors and diaspora

North Korean defectors living in South Korea report that their accent causes recognition errors — particularly the initial ㄹ/ㄴ features and vocabulary differences. Speaking South Korean standard vocabulary and approximating Seoul pronunciation gives the best ASR results. Over time, many defectors naturally acquire South Korean pronunciation features, which improves recognition accuracy.

Korean Regional Varieties — 지역 방언

Korean has distinct regional dialects (방언, bangyeon) that differ from Seoul Standard Korean in phonology, vocabulary, and grammar. ASR models are trained on Seoul Standard Korean (표준어, pyojuneo). Here is how regional varieties perform:

🏙️

Seoul / Gyeonggi — Best Results

Seoul and Gyeonggi-do (the capital region) speech is the reference model for Korean ASR. Modern educated Seoul speech — used in national broadcasting (KBS, MBC, SBS), corporate settings, and formal contexts — gives word error rates of 8–13% in clear quiet conditions. The "표준어" standard is closely modelled on this variety.

🏖️

Busan / Gyeongsang — Moderate

Gyeongsang dialect (부산, 대구, 경상도) is the most widely spoken regional dialect in South Korea and one of the most acoustically distinct — it has a pitch accent system (like Japanese) that Seoul Korean lost centuries ago. "친구" (friend) has a different pitch pattern in Gyeongsang than Seoul. Characteristic vocabulary ("마이" for "많이," "와예" for "왜요") and pitch patterns cause moderate errors. Models handle educated Busan speech; strong 경상도 dialect features increase errors.

🌿

Jeolla / Honam — Moderate

Jeolla dialect (전라도 — Gwangju, Jeonju, Mokpo) has a distinctive melodic intonation pattern and characteristic vocabulary and endings ("~당께," "~그랬응께"). The vowel system is similar to Seoul but intonation patterns differ substantially. Models handle educated Jeolla speech reasonably; strong dialect features and the characteristic melodic pattern cause moderate errors.

🌊

Jeju — Most Challenging

Jeju language (제주어) — spoken on Jeju Island — is classified by UNESCO as a critically endangered language and is by some analyses a separate language from Korean rather than a dialect. It preserves the archaic vowel "아래아" (ㆍ), has a completely different vocabulary for many concepts, distinct grammar, and phonology. Standard Korean ASR models fail almost completely on natural Jeju. Jeju speakers must use standard Korean for dictation.

사용 방법 — How to Start

1

언어 메뉴에서 "Korean (한국어)" 또는 "ko-KR"을 선택하세요

Select Korean (ko-KR) from the language menu. Chrome on desktop provides best Korean ASR results — Google's Korean model is among the most mature available.

2

"Start 🎤"를 클릭하고 마이크 접근 권한을 허용하세요

Click Start and allow microphone access. Quiet environment is important — consonant assimilation errors increase significantly with background noise.

3

자연스럽게 말하세요 — 철자대로 발음하려 하지 마세요. 연음법칙, 비음화, 경음화를 자연스럽게 적용하세요

Speak naturally — do not try to pronounce words as spelled. The model is trained on natural Korean including phonological rules. Over-articulating to match spelling hurts accuracy.

4

텍스트를 복사하거나 TXT로 다운로드하세요. 띄어쓰기 오류는 한글 맞춤법 검사기로 후처리하세요

Copy or download as TXT. Run Korean spell-check (맞춤법 검사) for formal documents — this catches most spacing errors automatically.

Konglish: Korean-English Code-Switching

Korean has absorbed an enormous volume of English loanwords — called 외래어 (oereeo, foreign words) — adapted to Korean phonology and written in Hangul. Unlike Chinese or Tamil which resist foreign phonology, Korean has developed a productive system for adapting English sounds to its syllable structure (always consonant-vowel, never ending in most consonant clusters). This creates some specific recognition challenges:

"내일 미팅에서 프레젠테이션 어떻게 할 거야?"

"데드라인이 내일인데 파일업로드했어?"

"카페에서 아메리카노 한 잔 마실래?"

Blue = English-origin loanwords, fully adapted into Hangul

Because these loanwords are spoken with Korean phonology — "meeting" becomes "미팅" (mi-ting), "presentation" becomes "프레젠테이션" (peu-re-jen-te-i-syeon) — the Korean ASR model recognises them as Korean words and outputs the standard Hangul form. This is different from languages like Hindi where English words appear in Roman script — in Korean, all speech is output in Hangul including the English loanwords. The output is always in Hangul unless you type Roman characters separately.

Loanword standardisation

Some English loanwords have multiple accepted Hangul spellings — "coffee" can be 커피 (standard) or 코피 (older). "Computer" is 컴퓨터 (standard) but some older speakers say 콤퓨터. The model outputs the standard National Institute of Korean Language (국립국어원) spelling. If you use a non-standard pronunciation of a loanword, you may get the non-standard Hangul form — use the standard pronunciation for best results.

Verb-Final Structure and ASR — 동사 문말: Why Korean Sentences Must Be Completed

Korean is strictly verb-final (SOV — Subject-Object-Verb): "나는 밥을 먹었다" (I rice ate) not "I ate rice." This has a specific implication for speech recognition that parallels Japanese: the verb at the end of the sentence carries crucial information for interpreting everything that came before it — including the speech level, the negation, the aspect, and the meaning of many particles.

If you pause mid-sentence before the verb — "나는 밥을..." — the model has accumulated particles and nouns but has no verb to anchor the sentence structure. The spacing analysis, speech level determination, and certain ambiguous particle choices all depend on what verb follows. Completing sentences before pausing is the single highest-leverage technique for Korean ASR accuracy, for exactly the same reason as Japanese: the information the model needs arrives at the end.

정확도 향상을 위한 팁 — Tips for Best Accuracy

✅ 정확도를 높이는 방법

  • • 문장을 끝까지 말한 후 멈추기 — 동사가 먼저 와야 컨텍스트 완성
  • • 자연스럽게 발음하기 — 철자대로 발음하려 하지 말기
  • • 방언 사용자: 표준어에 가깝게 말하기
  • • 조용한 환경에서 딕테이션하기
  • • Chrome 브라우저 — 한국어 ASR 최적
  • • 공식 문서에는 맞춤법 검사기 후처리
  • • 영어 단어는 표준 외래어 발음으로 말하기

⚠️ 자주 발생하는 오류와 해결책

  • 띄어쓰기 오류 — 맞춤법 검사기로 후처리 (가장 흔한 오류)
  • 의존명사 띄어쓰기 (수, 것, 바) — 자동 검사기로 수정
  • 경상도/전라도 방언 특징 — 표준어로 말하기
  • 하게체/하오체 오류 — 해요체 또는 합쇼체 사용 권장
  • 외래어 비표준 발음 — 국립국어원 표준 발음 사용
  • 제주어 — 표준 한국어로 말해야 인식 가능

한국어 음성 입력 사용자 — Who Uses Korean Voice to Text

📱

카카오톡 & 메시지

Korea has near-universal KakaoTalk penetration — essentially 100% of smartphone users communicate via KakaoTalk. Typing Korean on a touchscreen (using the various Korean keyboard layouts — cheonjiin, naratgeul, QWERTY-based) is slower than typing English for many users. Voice dictation for long KakaoTalk messages is faster for paragraphs and longer communication.

💼

직장인 & 비즈니스

Korean business writing has a highly formal register — elaborate honorifics, specific formal vocabulary, complex sentence structure. Professionals dictate email drafts, meeting notes, and reports. The 합쇼체 speech level used in formal business correspondence is one of the best-supported levels in Korean ASR.

🎓

학생 & 연구자

University students dictate essay drafts, seminar papers, and thesis sections in Korean. Korean academic writing has a formal register with specific grammatical constructions — dictating in formal 합쇼체 and post-editing spacing errors is faster than typing long academic Korean from scratch.

🎬

콘텐츠 크리에이터

Korean YouTube and social media content is globally influential — K-pop commentary, gaming, beauty, and food content. Korean creators use voice dictation for scripts, captions, and descriptions. Speaking naturally and editing the output is faster than typing Korean, particularly for fast-turnaround content.

🌍

재외동포 커뮤니티

Korean diaspora communities in the US, Canada, Australia, Japan, and China use voice dictation for Korean family communication, official Korean documents, and community work. Voice removes the friction of Korean keyboard setup on non-Korean systems — particularly useful for second-generation Koreans who speak Korean but are less comfortable with Korean keyboard layouts.

📖

한국어 학습자

Advanced Korean learners (TOPIK 4–6 level) use voice-to-text as pronunciation and phonological rule feedback — if consonant assimilation, tensification, and nasalisation produce the correct Hangul output, the phonological rules are being applied correctly. Particularly useful for confirming that natural phonological processes have been acquired, not just the spelling rules.

한국어 음성 명령 — Voice Commands in Korean

딕테이션 중 다음 단어를 말하면 문장부호가 삽입됩니다. Chrome이 가장 완전한 한국어 명령을 지원합니다:

문장부호 / Punctuation

말하기 / Say삽입 / Inserts
"마침표". (full stop)
"쉼표", (comma)
"물음표"?
"느낌표"!
"콜론":
"세미콜론";
"따옴표"" " (quotes)
"줄표"— (em dash)

서식 / Formatting

말하기 / Say동작 / Action
"줄 바꿈"New line
"새 단락"New paragraph
"삭제"Delete last word

띄어쓰기 후처리 권장

공식 문서의 경우 딕테이션 후 반드시 한글 맞춤법 검사기(네이버, 한컴 등)를 사용하세요. 한국어 ASR의 가장 흔한 오류는 음운 오류가 아니라 띄어쓰기 오류이며, 맞춤법 검사기가 대부분 자동으로 수정합니다.

한국어 오디오 파일 변환 — MP3, WAV, MP4

Upload Korean audio recordings — meetings, lectures, interviews, podcasts. Pro plan handles files up to 5 hours with timestamps. / 한국어 녹음 파일을 업로드하여 시간 표시가 있는 텍스트를 받으세요.

Pro 플랜 보기 →

자주 묻는 질문 — FAQ

왜 한국어 음성 인식은 띄어쓰기 오류가 많은가요?

한국어 띄어쓰기 규칙은 세계에서 가장 복잡한 규칙 중 하나입니다. 음성 신호에는 띄어쓰기 정보가 전혀 없기 때문에 모델이 형태소 분석을 통해 전적으로 추론해야 합니다. 심지어 고학력 한국인도 띄어쓰기를 자주 틀립니다. 해결책: 딕테이션 후 한컴이나 네이버 맞춤법 검사기를 사용하면 대부분의 띄어쓰기 오류가 자동으로 교정됩니다.

Does Korean voice recognition handle consonant assimilation rules (연음, 비음화, 경음화)?

Yes — this is actually one of Korean ASR's strongest features. The model is trained on natural Korean speech which applies all phonological rules consistently, and it has learned to reverse them to output correct written forms. "먹어요" spoken as "머거요" → outputs 먹어요. "국민" spoken as "궁민" → outputs 국민. "학교" spoken as "학꾜" → outputs 학교. Speak naturally and apply phonological rules as you normally would — do not try to pronounce words as spelled. That would produce unnatural phoneme sequences that hurt accuracy.

경상도 사투리를 쓰면 인식이 잘 안 되나요?

경상도 방언(부산, 대구, 경상도)은 서울 표준어와 상당히 다른 성조 체계를 가지고 있어 ASR 오류가 더 많이 발생합니다. 경상도 특유의 어휘("마이," "와예," "아이가")와 억양 패턴이 주요 오류 원인입니다. 공식적인 딕테이션을 위해서는 표준어에 가깝게 말하는 것이 권장됩니다. 교육받은 경상도 발음(방언 특징이 덜한)은 적당한 수준의 정확도를 보입니다.

Can non-native Korean speakers (Korean learners) use this effectively?

Yes, from TOPIK 4 level upward. The main challenges for learners: (1) phonological rules — if you're not yet naturally applying 연음, 비음화, and 경음화, the model may produce wrong output; (2) speech level endings — the model recognises 해요체 and 합쇼체 very well, which are what learners use; (3) loanword pronunciation — using the Korean standard pronunciation of loanwords (not the English pronunciation) is essential. Advanced learners use this as phonological feedback — if 먹어요 transcribes as 머거요 in the output, that means pronunciation wasn't natural (the model heard isolated syllables, not connected speech).

Android와 iPhone에서도 작동하나요?

네. Android에서는 Chrome, iPhone에서는 Safari를 사용합니다. Android Chrome이 한국어 ASR에서 가장 좋은 결과를 제공합니다. 별도 앱 설치 없이 브라우저에서 바로 사용 가능합니다. 카카오톡에서 사용할 경우: 딕테이션 후 텍스트를 복사하여 카카오톡에 붙여넣기 하면 됩니다. 한글이 모든 기기에서 올바르게 표시됩니다.

Does it work for North Korean speakers?

Poorly for natural North Korean speech. North Korean preserves features like initial ㄹ/ㄴ (두음법칙 not applied), has different vocabulary (전자계산기 instead of 컴퓨터), and distinct pronunciation patterns. South Korean ASR models make systematic errors on these features. North Korean defectors in South Korea get better results by using South Korean standard vocabulary and approximating Seoul pronunciation — many naturally acquire South Korean features over time, which improves recognition accuracy accordingly.

관련 도구 — Related Tools

지금 한국어로 말하기 시작하세요 — Start Dictating in Korean

무료, 설치 불필요, 회원가입 불필요. 한글 자동 출력.

시작하기 →

Chrome 권장 — 한국어 음성 인식 최적 지원