How to correctly unzip weird character file in zip?

I’m really new in Linuxverse, so please understand if I’m asking standard question.

I just downloaded a zip file using this link: https://io.genesis-ark.club/common/20240401/三国名将录(2).zip

the first time I download using wget "https://io.genesis-ark.club/common/20240401/三国名将录(2).zip" I got ???????.zip after the download completed, so I just redownload it using wget "https://io.genesis-ark.club/common/20240401/三国名将录(2).zip" -O mydownload.zip. but when I want to extract the zip, I got weird output like this

enter image description here

so I tried to find how to get details zip file and I found it while googling to use zipinfo mydownload.zip and the output is also weird for me

enter image description here

Is it posibility caused by they compressing using non-standard program (like rar,zip,7z)? how do I fixed it? I’ve been tried using unzip -a and 7zip but none of them works.

FYI: I’m using Centos 7.0

Additional Information as per comment:

output locale charmap:

locale: Cannot set LC_CTYPE to default locale: No such file or directory
locale: Cannot set LC_ALL to default locale: No such file or directory
ANSI_X3.4-1968

output locale:

locale: Cannot set LC_CTYPE to default locale: No such file or directory
locale: Cannot set LC_ALL to default locale: No such file or directory
LANG=en_US.UTF-8
LC_CTYPE=UTF-8
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=
Asked By: flix

||

The character encoding in your locale is ANSI_X3.4-1968 which is another name for ASCII, that charset only has 128 different characters, American (the A in both ASCII and ANSI) English ones.

It doesn’t have any Chinese character.

You land in such a locale because the LC_CTYPE environment variable is set to UTF-8 and UTF-8 is not a valid locale name on your CentOS system (see the output of locale -a for the list of supported locales). It is a valid locale in very few systems, a known exception being Apple macos (formerly known as macOS, OS/X and MacOS/X or whatever Apple’s marketing team fancied on a given day).

When the locale category for LC_CTYPE, which covers the character encoding among other things, cannot be found, it defaults to the C locale in which on most systems including CentOS the charmap is ASCII.

Your output of locale also shows that the LANG environment variable is set to en_US.UTF-8.

That is a valid locale on your system, the LC_CTYPE one is bogus, and should be left unset, so that, like for the other LC_* categories, it is derived from $LANG, and the en_US.UTF-8 locale will have a UTF-8 charmap which is an encoding of Unicode which contains all known characters in existence.

You can address the problem locally in your shell environment by doing:

unset -v LC_CTYPE

(assuming a POSIX-like shell)

After which you’ll see the output of locale showing:

LC_CTYPE="en_US.UTF-8"

(the double quotes indicate that the value is inferred, here from $LANG as $LC_ALL is not set)

To fix the problem more permanently, you’d need to figure out what sets that environment variable.

One possibility is that you’re logging in over ssh to that CentOS system from a macos system and that the environment on your macos system includes a LC_CTYPE=UTF-8 (as it’s valid there) and as ssh is often configured by default to pass the localisation variables along to the server¹, that variable ends up being propagated to the remote shell session on the CentOS system.

If that’s the case, you need to tell macos to stop sending the LC_CTYPE variable, or to send it with a value that is portable to other systems and is not macos-specific. For instance, you could add:

SetEnv LC_CTYPE=en_US.UTF-8

To your ~/.ssh/config for it to send a specific value for that variable. Stopping it from sending your $LC_CTYPE variable may be more difficult or more clunky as it may involve changing the system-wide ssh_config. You could also raise it as an issue with Apple as it’s obviously a nuisance that it sends bogus env var values to non-macos systems.

You could also do:

env -u LC_CTYPE ssh centos-machine

Instead of:

ssh centos-machine

To make sure the LC_CTYPE environment variable it not passed to ssh, and therefore not pass along to the remote shell.


¹ usually via some SendEnv LANG LC_* configuration directive client-side and AcceptEnv LANG LC_* server-side

Answered By: Stéphane Chazelas
Categories: Answers Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.