How can I convert full-width characters to half-width characters (and vice versa)?

Here is my simple problem, how can I convert half-width to full-width from the command line. I thought this would be built-in my iconv command line, but I did not find anything here:

$ iconv  -l | grep -i full
-> nothing
$ iconv  -l | grep -i half
-> nothing

Typical input would be:

$ echo -n "Ab9876543210" | iconv -f utf8 -t utf16be | hexdump -C
00000000  ff 21 00 62 ff 19 ff 18  ff 17 ff 16 ff 15 ff 14  |.!.b............|
00000010  ff 13 ff 12 ff 11 ff 10                           |........|
00000018
Asked By: malat

||

If you have the uconv utility from the ICU tools (icu-devtools package on Debian-based OSes):

$ echo 'Ab9876543210' | uconv -x Fullwidth-Halfwidth
Ab9876543210

(beware it also converts characters that are normally full-width such as ones in the Korean or Japanese scripts to their half-width representation).

Change to Halfwidth-Fullwidth for the reverse.

If not and you’re only interested in converting the full-width variants of the ASCII printable characters:

$ echo 'Ab9876543210' | perl -C -pe 'y/x{ff01}-x{ff5e}/!-~/'
Ab9876543210

Or also converting U+3000 (ideographic space) to ASCII space:

$ echo 'Ab9876543210' | perl -C -pe 'y/x{3000}x{ff01}-x{ff5e}/ !-~/'
Ab9876543210
curl -s https://www.unicode.org/Public/UNIDATA/UnicodeData.txt | grep '<wide>'

Will reveal a few extra that are the full-width variants of some non-ASCII characters, which you can add to the list.

perl -C -pe 'y/x{3000}x{ff01}-x{ff60}x{ffe0}-x{ffe6}/ !-~x{2985}x{2986}xa2xa3xacxafxa6xa5x{20a9}/'

(and searching for <narrow> will show the half-width variants of some normally full-width characters, but that’s a large list and with non-contiguous ranges so adding those would render the expression much larger).

On some systems, you may be able to do the same with tr if in the C.UTF-8 locale, not with current versions of GNU tr unless patched by your OS vendor.

$ uname
FreeBSD
$ echo 'Ab9876543210' | LC_ALL=C.UTF-8 tr $'u3000uff01-uff5e' ' !-~'
Ab9876543210

(also assuming a shell with support for zsh’s $'uXXXX').

For the reverse conversion, just change the y/from/to/ to y/to/from/.

perl also has an interface to the Unicode data in its Unicode::UCD module, so you could also do:

perl -C -MUnicode::UCD=charprop -pe '
  s{p{Decomposition_Type: Wide}}{
    $cache{$&} //= charprop(ord($&), "Decomposition_Mapping")
  }ge'

Though it’s quite slow even if mitigated here by the use of caching. See perldoc perluniprops and perldoc Unicode::UCD for details.

Or using the NFKD decomposition for those characters that have a wide decomposition type:

perl -MUnicode::Normalize=NFKD -C -pe 's/p{Dt=Wide}/NFKD$&/ge'

If it’s to convert to ASCII, on GNU systems at least, iconv -t ASCII//translit would also convert those (and more characters to their closest ASCII character¹ representation)

$ echo 'Stéphane' | iconv -t ASCII//translit
Stephane

Obviously, there’s no way to do the reverse.

In any case, here, it’s not conversion of the same character from charset to charset you want to do, but some form of transliteration from some characters to other characters.

iconv -l like uconv -l lists the supported encodings/charsets. uconv -L lists the transliterators. GNU iconv only has that //translit which gives possible approximation if the character doesn’t exist in the target charset (besides //ignore which just discards them instead).


¹ or characters like in æ -> ae or -> ffi which by the way don’t have full-width forms but have once approximated; for instance, you might want to convert affix to affix rather than affix when converting to full-width which none of the solutions mentioned here handle.

Answered By: Stéphane Chazelas
Categories: Answers Tags:
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.