Japanese Diction Guide

I think Japanese is one of the easier languages for singing, and I would love to hear it more often in concerts. Most of its phonemes can be found in other languages, and many of the difficulties of spoken pronunciation do not apply to singing (though I will mention them below, for completeness).


There are only five vowels, and they are similar to the ones in Italian and Spanish:

  • [a] as in father
  • [i] as in bee
  • [u] as in food*
  • [e] as in egg
  • [o] as in core

* Technically, most native speakers compress u into [ɯᵝ], but I don’t feel that’s necessary for non-native singers.


Consonants are generally close to their English equivalents. The ones that may need clarification are:

  • ts is like the [ts] in cats. (The English pronunciation of the Japanese word tsunami, with a silent t, is wrong.)
  • j is similar to the [dʒ] in jump, though somewhat more palatalized.
  • r should be flipped like a Spanish or Italian r.
  • g is generally pronounced like the [g] in go, but some people use a sound closer to the [ŋ] in sing, particularly when they want to sound more formal. I like [g] for modern texts and [ŋ] for older ones, but I encourage singers to do what they find most comfortable.
  • Sometimes n functions as its own syllable, and you will need to “hum” an n without any vowel.
  • f is usually pronounced somewhere between the English [f] and [h]—it varies by individual speaker. If that’s difficult for you, a light [f] is fine.

Additional Notes

Some minor points that will probably only arise if you are very diligent and inquisitive:

Double vowels, where a vowel sound needs to be held twice as long as usual, are an important nuance in spoken Japanese, although a singer’s durations are prescribed by the music. There are a few different ways to write these long vowels, so you might see the same word spelled differently in other contexts. For example, sensei (“expert” or “teacher”) ends with a double [e] sound, not [e] followed by [i]. It can also be romanized sensē or sensee but the first variant is the most common. I would probably spell it the first way on your text page but leave out the i in the score. Don’t worry about the discrepancy!

In some situations, i and u vowels are devoiced in speech, but generally not in song. For example, the word for “moon” is usually pronounced tski in speech but tsuki in song. This might cause confusion if you ask a Japanese speaker to read the text to you and find it differs from your score. Again, don’t worry. Enjoy the music!

My works that include Japanese