FACTSpeech: Speaking a Foreign Language Pronunciation Using Only Your Native Characters

 

Hong-Sun Yang, Ji-Hoon Kim, Yoon-Cheol Ju, Il-Hwan Kim, Byeong-Yeol Kim, Shuk-Jae Choi, Hyung-Yong Kim

ABSTRACT

Recent text-to-speech models have been requested to synthesize natural speech from language-mixed sentences because they are commonly used in real-world applications. However, most models do not consider transliterated words as input. When generating speech from transliterated text, it is not always natural to pronounce transliterated words as they are written, such as in the case of song titles. To address this issue, we introduce FACTSpeech, a system that can synthesize natural speech from transliterated text while allowing users to control the pronunciation between native and literal languages. Specifically, we propose a new language shift embedding to control the pronunciation of input text between native or literal pronunciation. Moreover, we leverage conditional instance normalization to improve pronunciation while preserving the speaker identity. The experimental results show that FACTSpeech generates native speech even from the sentences of transliterated form.

Controlling the pronunciation of transliterated words

WORD: eng2kor LSE
WORD: kor2eng LSE
WORD: zero LSE
The next song is pi ttam nunmul by bangtan sonyeondan.
The next song is pi ttam nunmul by bangtan sonyeondan.
The next song is pi ttam nunmul by bangtan sonyeondan.
The next song is pi ttam nunmul by bangtan sonyeondan.
오늘의 추천곡은 테일러 스위프트의 배드 블러드 입니다.
oneurui chucheongogeun "Taylor Swift"-ui "Bad Blood"-ipnida.
오늘의 추천곡은 테일러 스위프트의 배드 블러드 입니다.
오늘의 추천곡은 테일러 스위프트의 배드 블러드 입니다.
오늘의 추천곡은 테일러 스위프트의 배드 블러드 입니다.

Transliterated text to speech

4 types of input
EN:         English
T-EN:     A transliterated text from English to Korean
KO:         Korean
T-KO:     A transliterated text from Korean to English
Ground Truth
(EN) Each crew was to compete against their rival from the other town.
(T-EN) 이취 크루 와즈 투 컴피트 어게인스트 데어 라이벌 프롬 더 아더 타운.
Ground Truth
(KO) 좋아하지도 않는 사람한테 자기 인생을 겁니까?
(T-KO) joahajido anneun saramhante jagi insaengeul geopnikka?

Language style interpolation

If the interpolation scroll bar does not work,
it means that the audio is loading, so please try again later.
Script: 냉각수나 배터리는 미리 점검하세요.
naenggaksuna "battery"-neun miri jeomgeomhaseyo.

Interpolation
\(e_{LSE}^{zero}\)          \(e_{LSE}^{kor2eng}\)




Script: The next song is pi ttam nunmul by bangtan sonyeondan.

Interpolation
\(e_{LSE}^{zero}\)          \(e_{LSE}^{eng2kor}\)




Script: 룩 포워드 투 씨잉 웨어 유 고 넥쓰트.
Look forward to seeing where you go next.

Interpolation
\(e_{LSE}^{zero}\)          \(e_{LSE}^{kor2eng}\)




Script: Annyeonghaseyo je ireumeun honggildong ipnida.

Interpolation
\(e_{LSE}^{zero}\)          \(e_{LSE}^{eng2kor}\)




Additional audio

5 types of input
EN:         English
T-EN:     A transliterated text from English to Korean
KO:         Korean
T-KO:     A transliterated text from Korean to English
CM:        Code-mixed
(EN) In fact, these are often quite dangerous to obtain, and use.

FACTSpeech

w/o CIN

Y. Zhang et al.

SANE-TTS

(T-EN) 인 팩트, 디즈 아 옵튼 콰잇 덴저러쓰 투 옵테인, 엔드 유즈.

FACTSpeech

w/o CIN

Y. Zhang et al.

SANE-TTS

(KO) 주문하시겠습니까?

FACTSpeech

w/o CIN

Y. Zhang et al.

SANE-TTS

(T-KO) jumunhasigetseupnikka?

FACTSpeech

w/o CIN

Y. Zhang et al.

SANE-TTS

(CM) 이 연구의 key contribution은 무엇인가요?

FACTSpeech

w/o CIN

Y. Zhang et al.

SANE-TTS