Cantonese Diction Guide

I use the romanization system Jyutping to transliterate Cantonese text, with some of my own modifications in the score to aid English-speaking singers. The text page at the front of the score will show the unmodified Jyutping, as well as the original Chinese characters.


aa is like the [a] in father.

a is closer to the [ɜ] in us. 

e is usually pronounced like the [ɛ] in bed, but before i it is more like the [e] in egg.

i is usually like the [i] in bee, but before k and ng it is closer to the [ɪ] in bit.*

*Some people use [e] in these cases, but I prefer [ɪ] for my music.

o is usually pronounced [ɔ], a bit like the [ʌ] in fun but with rounded lips. Before u, it is [o] as in go.

u by itself or before n is like [u] as in food. Before ng and t it is like [ʊ] as in put, and I use that symbol in the score. After y it is [y] like the German ü, and I replace it with ü in the score. 

Two vowel sounds are signified by pairs of letters:

  • oe is [œ], like the German ö — [ɛ] with rounded lips.
  • eo is [ɵ], probably the most difficult vowel for English speakers. This sound is somewhat like [y] but with the tongue farther back.

Diphthongs are just combinations of the above vowels, with the two exceptions already mentioned (ei and ou). I prefer singers to emphasize the more open vowel.


Most consonants are close to their English equivalents. A few clarifications are:

b, d, and g are not voiced the way they are in English; they are just written this way to differentiate them from aspirated consonants p, t, and k. For the purpose of singing it’s fine to use the English sounds, but don’t voice them too heavily.

p, t, and k at the ends of words are not aspirated. They sound more like glottal stops. I recommend singing the particle dik (的), for example, as a staccato [dɪ]. 

c is like the [ts] in cats. z is similar, but not aspirated; I think of it like the [dz] in kids, but unvoiced.

Jyutping, logically enough, uses j to represent the sound [j]. I use y for this sound, however, because j denotes a consonant closer to [dʒ] in Japanese Romaji and Mandarin Pinyin, and consistency across languages is useful for my multilingual works. 

In everyday speech, many speakers pronounce initial n more like l, so the word for “you” sounds like lei instead of nei. Another common feature of modern spoken Cantonese is dropping the initial ng from words like ngo (“I”, or “me”). Singers seem to prefer the “old-fashioned” pronunciations though, so that is usually what I use in my music. 

My works that include Cantonese