Enabling command hashing in tcsh

It seems command hashing is disabled by default in our tcsh environment, and I’m not permitted to get it enabled across the board. Instead I’m looking to enable command hashing within individual scripts, all of which contain while loops, so I’d expect the first iteration to loop through all the paths defined in $PATH and subsequent iterations to hit the exact path from the internal hash table. The purpose is to reduce the number of failed execve calls captured by the audit service.

First question, is there a command similar to "hash" in tcsh to output the internal hash table? Hashstat doesn’t appear to work, it doesn’t output anything on the prompt, perhaps because hashing is disabled? When I did get it to print something, it printed only the number and size of hash buckets, not any specific commands.

Main question, I’ve tried adding "rehash" to the beginning of the script, which has helped reduce the number of execve calls per command from ~5 to ~2 (even on the first iteration). For some reason, it still always tries to run commands from "/sbin" first. Any suggestions on what to check to see why after running rehash it still tries to execute a command from an invalid path, or is there an alternative way to enable command hashing from within a script?

Side question, bash on the other hand still manages to hit the correct path even with the hash table disabled. Any idea how it does this without command hashing?

Lastly, strace hasn’t captured the failed execve calls captured by audit. I’ve tried simple strace sleep, and strace -f -e trace=execve sleep, both essentially just showing the correct entry, but not the failed ones:

execve("/bin/sleep", ["sleep"], 0x7ffe0d773ff8 /* 32 vars */) = 0.

Asked By: Maikol

||

It’s quite a complicated issue, but I’ll try to explain

How are the commands being hashed in tcsh

Command hashing in tcsh is quite different than in bash. In bash, every time is command is executed for the first time, it searches for its location in the PATH environment variable and caches it. The next time the same command is being executed, it will use the full path already hashed.

Here’s the relevant section from the tcsh man page:

rehash

Causes the internal hash table of the contents of the directories in
the path variable to be recomputed. This is needed if new commands
are added to directories in path while you are logged in. This
should be necessary only if you add commands to one of your own
directories, or if a systems programmer changes the contents of one of
the system directories. Also flushes the cache of home directories
built by tilde expansion.

path

A list of directories in which to look for executable commands. […]
A shell which is given neither the -c nor the -t option hashes the
contents of the directories in path after reading ~/.tcshrc and each
time path is reset
. If one adds a new command to a directory in
path while the shell is active, one may need to do a rehash for the shell to find it.

In other words tcsh performs the hashing for all the commands found in the directories in the path variable, only when:

  1. The shell is started.
  2. The path is being modified.
  3. You manually run the rehash command.

So after you start the shell it will hash all the commands found in path, and further commands will not be hashed unless you modify the path or run rehash.

What is the reason your hashing might be disabled?

The man pages specific exactly the conditions for this to happen:

This hashing mechanism is not used:

  1. If hashing is turned explicitly off via unhash.
  2. If the shell was given a -f argument.

Here’s an example. If I start the shell with the -f flag, hashstat will be empty because the hashing is disabled.

$ tcsh -f
> hashstat

The same would happen if somewhere you run the unhash command.

However, in both scenarios, you can still enable it by running `rehash.

> rehash
> hashstat
512 hash buckets of 8 bits each

Why does tcsh executes some commands in different paths even if hashing is enabled?

Understanding hashstat

Let’s look at the output of hashstat:

$ hashstat
512 hash buckets of 8 bits each

It says there are 512 hash buckets. tcsh uses a hashing function to calculate the hash id of each command, and in this case the value would be between 0 and 511. This means that some commands might (and probably will) have the same hash. The initial number of buckets and bits might be differ depending on the number of directories in your PATH.

So let’s say two command A and B have the same hash (let’s say the hash of both commands is 123). Let’s say command A is located /bin, and command B is located /usr/bin. What happens then?

It means that in the bucket of hash id 123, there will be two paths: Both /bin and /usr/bin. And when you execute either A or B, it will try to exec those commands in both of those directories (in their order in PATH) until it succeeds. You’ll see an example further down this answer. But for now let’s talk about rehash.

rehash "secret" parameters

The rehash builtin in tcsh has some "secret" undocumented optional arguments that you can only see in the source code. The options are:

rehash [hashlength [hashwidth [debug]]]

For instance, you can change the number of hash buckets the following way:

$ hashstat
512 hash buckets of 8 bits each   # At the beginning, 512 buckets

$ rehash 4096                     # increasing the bucket number

$ hashstat
4096 hash buckets of 8 bits each  # Now there are 4096 buckets instead of 512

I won’t talk about the hash width, because it only has effect if the length is set to 0 (and then the width would be used internally to calculate the length of the hash). Besides for that that it has no meaning.

Important note

Increasing the table length could help you minimize the possible number of conflicts between different commands. So if you’re doing some tests you can try that and see if there is some improvement.

Anyway, the really interesting part is yet to come.

Showing the internal hashing table

You could increase the debugging for hash operations by setting the last argument to 1 or 3. Let’s see how it works.

$ rehash 0 0 3 
hash=19   dir=0  prog=addgnupghome
hash=0    dir=0  prog=addpart
hash=206  dir=0  prog=agetty
hash=498  dir=0  prog=alternatives
hash=463  dir=0  prog=applygnupgdefaults
hash=323  dir=0  prog=blkdiscard
hash=342  dir=0  prog=blkid
hash=277  dir=0  prog=blkzone
hash=410  dir=0  prog=blockdev
hash=500  dir=0  prog=cfdisk
[...]

It shows you exactly all the commands are hashed and where.

For instance, the command addgnupghome has a hash value of 19, and it was found in directory number 0. What is directory number 0?

Those directories are numbered according to their order in your PATH environment variables (which is linked to the path shell variable in tcsh).

$ echo $PATH
/usr/sbin:/usr/bin:/sbin:/bin

The directories in my PATH are numbered the following way:

  1. /usr/sbin
  2. /usr/bin
  3. /sbin
  4. /bin

So if command addgnupghome was found in directory number 0, it means it’s located in /usr/sbin.

Examples

Let’s save the output of our rehash command (after I’ve set the debug to 3) to a file:

$ rehash > rehash.log

Now, let’s look at a certain hash:

$ grep hash=498 rehash.log
hash=498  dir=0  prog=alternatives
hash=498  dir=1  prog=passwd

There are two commands that are mapped to the same hash (498):

  • alternatives
    • found in dir 0 (/usr/sbin)
  • passwd
    • found in dir 1 (/usr/bin).

Another nice effect of increasing the debug level is in the where builtin ("Reports all known instances of command, including aliases, builtins and executables in path"). Usually it just shows the found location of the executables, but with the increased debugs it also shows you missed location, and you will see

$ where alternatives
/usr/sbin/alternatives
hash miss: /usr/bin/alternatives

$ where passwd
hash miss: /usr/sbin/passwd
/usr/bin/passwd

You could see it will search for both commands in both directories: /usr/sbin and /usr/bin!

Confirming with strace.

You wrote:

Lastly, strace hasn’t captured the failed execve calls captured by
audit. I’ve tried simple strace sleep, and strace -f -e trace=execve sleep, both essentially just showing the correct entry,
but not the failed ones:

execve("/bin/sleep", ["sleep"], 0x7ffe0d773ff8 /* 32 vars */) = 0

If you run strace sleep, it’s no longer the shell that searches for the location of sleep – it’s the strace command. The shell only searches for the location of the command that you actually run, which is the first word in your line (in this case, strace). The rest of the words are considered parameters that are just passed to the command. So before strace executes the sleep command, strace itself searches for the command in the directories in your PATH.

If you want to actually see how your shell finds the location of your commands, you’ll need to attach strace to your shell (from another terminal) and then run the command.

For instance, I have a tcsh process running with pid 21033. From another terminal, I attach strace to this pid.

$ strace -f -qq -e trace=execve -p 21033

If I run alternatives on the attached shell, it will immediately find it in the first directory in the list of this hash.

[pid 19134] execve("/usr/sbin/alternatives", ["alternatives", "--help"], [/* 125 vars */]) = 0

But passwd command is located in the second directory, and will fail the execve in the first one:

[pid 20919] execve("/usr/sbin/passwd", ["passwd", "-h"], [/* 125 vars */]) = -1 ENOENT (No such file or directory)
[pid 20919] execve("/usr/bin/passwd", ["passwd", "-h"], [/* 125 vars */]) = 0

Here’s the opposite example, of a hash that’s mapped only to one command:

$ grep hash=102 rehash.log
hash=102  dir=1  prog=curl   # Only curl has hash 102

$ where curl
/usr/bin/curl                # And indeed "where" only tries one dir

Disclaimer

Everything I wrote here was based on what I saw in the tcsh source code and by testing in my own environment (I have version 6.20.00). This might be different according to the version and the way the tcsh was compiled on your distro.

Answered By: aviro
Categories: Answers Tags: , , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.