Kakasi kanji to roomaji converter encoding difficulties

General Tech Learning Aids/Tools 3 years ago

9.91K 2 0 0 0

User submissions are the sole responsibility of contributors, with TuteeHUB disclaiming liability for accuracy, copyrights, or consequences of use; content is for informational purposes only and not professional advice.

Answers (2)

Post Answer
profilepic.png
manpreet Tuteehub forum best answer Best Answer 3 years ago

I am trying to use the Kakasi kanji/hiragana/katakana to roomaji converter, as an aid to learning kanji pronunciation within specific sentences. I am using command and parameters:

kakasi -Ja -Ha -Ka -Ea -s

For example, converting today's date gives:

$ echo "731" | kakasi -Ja -Ha -Ka -Ea -s 
7 shin ?? 1 ka �

There is clearly a configuration error, that I think comes from the input encoding (UTF-8) not being correctly understood by the tool.

Could anybody with experience on this matter please advise on how to either tell kakasi to accept Unicode input, or suggest an alternative open-source tool for conversion that works better? (Please, no Windows software.)

0 views
0 shares

profilepic.png
manpreet 3 years ago

 

Thanks to comments by @Earthliŋ and @blutorange (recognition where recognition is due), the combination of iconv with kakasi has finally worked. Initial convertion from Unicode to Shift-JIS is required, and performed using:

$ echo "731" | iconv -f utf8 -t shift-jis | kakasi -Ja -Ha -Ka -Ea -s 

7 gatsu 31 nichi

Conversion back in the other direction is not needed when output is roumaji, since the basic characters have low ASCII values that are identical under all encodings. If necessary, conversion from Shift-JIS back to Unicode can be performed with:

$ echo "731" | iconv -f utf8 -t shift-jis | kakasi -Ja -Ha -Ka -Ea -s | iconv -f shift-jis -t utf8

7 gatsu 31 nichi

For instance, to convert into Hiragana:

$ echo "731" | iconv -f utf8 -t shift-jis | kakasi -JH -KH -Ea -s | iconv -f shift-jis -t utf8

7 がつ 31 にち

Update

As pointed out by @oals in the comments, newer versions of kakasi have the little documented parameters -iutf8 and -outf8 to specify Unicode encoding for either input or output. The above conversion to Hiragana can then be more efficiently performed using:

$ echo "731" | kakasi -JH -KH -Ea -s -iutf8 -outf8

7 がつ 31 にち

Thanks for your f="https://forum.tuteehub.com/tag/help">help.


0 views   0 shares

No matter what stage you're at in your education or career, TuteeHUB will help you reach the next level that you're aiming for. Simply,Choose a subject/topic and get started in self-paced practice sessions to improve your knowledge and scores.

Similar Forum