Converting from ISO-IR-87 to UTF-8 encoding
I am working on Debian and derivatives system. I’d like to convert from an original input ISO-IR-87 to UTF-8. Is there an easy way to do it ?
% iconv -l | grep "IR-8" ISO-IR-8-1// ISO-IR-84// ISO-IR-85// ISO-IR-86// ISO-IR-88// ISO-IR-89// % dpkg -S /usr/bin/iconv libc-bin: /usr/bin/iconv % apt-cache policy libc-bin libc-bin: Installed: 2.36-9+deb12u3 Candidate: 2.36-9+deb12u3 Version table: *** 2.36-9+deb12u3 500 500 http://security.debian.org/debian-security bookworm-security/main amd64 Packages 100 /var/lib/dpkg/status 2.36-9+deb12u2 500 500 http://deb.debian.org/debian bookworm/main amd64 Packages
recode seems to be working on my system (thanks @frostschutz):
% echo -n 'ＡＢＣ' > t.txt % recode -v UTF-8..JIS_X0208 t.txt Request: UTF-8..:libiconv:..JIS_X0208 Shrunk to: UTF-8..JIS_X0208 Recoding t.txt... done % recode -v JIS_X0208..UTF-8 t.txt Request: JIS_X0208..:libiconv:..UTF-8 Shrunk to: JIS_X0208..UTF-8 Recoding t.txt... done
recode seems to support it:
$ recode -l | grep -i ISO-IR-87 JIS_X0208 csISO87JISX0208 ISO-IR-87 JIS0208 JISX0208.1983-0 JISX0208.1990-0 JIS_X0208-1983 JIS_X0208-1990 X0208
It looks like it has many other names (see https://en.wikipedia.org/wiki/JIS_X_0208 for even more) but none of them seem to be supported by
iconv of the GNU libc. That Wikipedia article suggests that Japanese character set was not properly specified, with incompatibilities between implementations and is not currently in use which may explain why it’s not been included in the GNU libc (even if it has been in GNU’s standalone iconv library as indicated by @frostschutz).