Converting from ISO-IR-87 to UTF-8 encoding
I am working on Debian and derivatives system. I’d like to convert from an original input ISO-IR-87 to UTF-8. Is there an easy way to do it ?
For reference:
% iconv -l | grep "IR-8"
ISO-IR-8-1//
ISO-IR-84//
ISO-IR-85//
ISO-IR-86//
ISO-IR-88//
ISO-IR-89//
% dpkg -S /usr/bin/iconv
libc-bin: /usr/bin/iconv
% apt-cache policy libc-bin
libc-bin:
Installed: 2.36-9+deb12u3
Candidate: 2.36-9+deb12u3
Version table:
*** 2.36-9+deb12u3 500
500 http://security.debian.org/debian-security bookworm-security/main amd64 Packages
100 /var/lib/dpkg/status
2.36-9+deb12u2 500
500 http://deb.debian.org/debian bookworm/main amd64 Packages
recode seems to be working on my system (thanks @frostschutz):
% echo -n 'ABC' > t.txt
% recode -v UTF-8..JIS_X0208 t.txt
Request: UTF-8..:libiconv:..JIS_X0208
Shrunk to: UTF-8..JIS_X0208
Recoding t.txt... done
% recode -v JIS_X0208..UTF-8 t.txt
Request: JIS_X0208..:libiconv:..UTF-8
Shrunk to: JIS_X0208..UTF-8
Recoding t.txt... done
GNU recode
seems to support it:
$ recode -l | grep -i ISO-IR-87 JIS_X0208 csISO87JISX0208 ISO-IR-87 JIS0208 JISX0208.1983-0 JISX0208.1990-0 JIS_X0208-1983 JIS_X0208-1990 X0208
So:
recode ISO-IR-87..UTF-8
It looks like it has many other names (see https://en.wikipedia.org/wiki/JIS_X_0208 for even more) but none of them seem to be supported by iconv
of the GNU libc. That Wikipedia article suggests that Japanese character set was not properly specified, with incompatibilities between implementations and is not currently in use which may explain why it’s not been included in the GNU libc (even if it has been in GNU’s standalone iconv library as indicated by @frostschutz).