manipulate ls text output to add path to filenames

I get sometimes files with following ls output format:

-rw-r--r-- 1 root root 128 May 15  2020 0hourly
-rw------- 1 root root 235 Dec 17  2020 sysstat
-rw------- 1 root root 235 Dec 17  2020 sysstat

Is there any chance using normal gnu tools or even clear bash internals to manipulate that content to:

-rw-r--r-- 1 root root 128 May 15  2020 /etc/cron.d/0hourly
-rw------- 1 root root 235 Dec 17  2020 /etc/cron.d/sysstat
-rw------- 1 root root 235 Dec 17  2020 /etc/cron.daily/sysstat

That would be great.

I mean the easiest is to remove the file paths like that:
cat <filename> | grep -v -E "^/[a-z]"

But like I said how to move these paths down to the follow-up lines with the filenames?

The command that is the given is this one: ls -lR /etc/cron* > <filename>.

I don’t have influence to that output, but rather I get these command outputs executed by ls redirected to a separate file <filename> that is transferred to me.

And what I like to do is manipulate it’s content into the mentioned second result. basically obtaining the first line an appy the path to the files lines 2 and 3 and then take line 4 and apply it to line 5. And then configured that one as a general approach.

I think that should be possible using awk.

Asked By: André Letterer


You haven’t shown us what command you are using or why you’re getting this output, but if the objective is to list all files and directories matching /etc/cron*, you could just use find instead:

find /etc/cron*

Or, if you need the full listing (GNU find):

find /etc/cron* -ls

Any find:

find /etc/cron* -exec ls -ld {} +

Here is example output on my Arch Linux:

$ ls /etc/cron*
/etc/cron.deny  /etc/crontab  /etc/crontab~  /etc/crontab.pacnew






And with find:

$ find /etc/cron* -ls
   262172      4 drwxr-xr-x   2 root     root         4096 Jan 23 19:41 /etc/cron.d
   263666      4 -rw-r--r--   1 root     root          128 Jan 14 14:59 /etc/cron.d/0hourly
   262173      4 drwxr-xr-x   2 root     root         4096 Sep 30 11:38 /etc/cron.daily
   262618      4 -rw-r--r--   1 root     root           74 Jan 14 14:59 /etc/cron.deny
   262174      4 drwxr-xr-x   2 root     root         4096 Jan 23 19:41 /etc/cron.hourly
   263665      4 -rwxr-xr-x   1 root     root          843 Jan 14 14:59 /etc/cron.hourly/0anacron
   262175      4 drwxr-xr-x   2 root     root         4096 Jun 30  2016 /etc/cron.monthly
   262632      0 -rw-r--r--   1 root     root            0 Oct 31  2017 /etc/crontab
   262633      4 -rw-r--r--   1 root     root           49 Sep 22  2017 /etc/crontab~
   272465      4 -rw-r--r--   1 root     root          119 Jan 14 14:59 /etc/crontab.pacnew
   262176      4 drwxr-xr-x   2 root     root         4096 Sep 30 11:38 /etc/cron.weekly
   275802      4 -rwxr--r--   1 root     root           68 Sep 30 11:37 /etc/cron.weekly/
Answered By: terdon

Not entirely sure what you want, but try this command:

$ ls -la | awk -v path=$PWD '{$NF=path"/"$NF;print}' |sed 's| /| t/|g'

You can drop the sed part if not interested in the alignment of the paths.

Answered By: user9101329

Solution with TXR Lisp.

Let’s take it for granted you got this ls output from somewhere and have to work with it; you cannot go back to the original time and machine and obtain the information in a different format.

$ txr < lsdata
-rw-r--r-- 1 root root 128 May 15  2020 /etc/cron.d/0hourly
-rw------- 1 root root 235 Dec 17  2020 /etc/cron.d/sysstat
-rw------- 1 root root 235 Dec 17  2020 /etc/cron.daily/sysstat

Where is:

(let ((curdir ""))
  (whilet ((line (get-line)))
    (match-case line
      (`@dir:` (set curdir dir))
      (`@{metadata 39} @name` (put-line `@metadata @curdir/@name`)))))

This isn’t perfect: it will be fooled by a name ending in :. If we can assume that the directory lines area always absolute paths, we can include that in the match:

(let ((curdir ""))
  (whilet ((line (get-line)))
    (match-case line
      (`/@dir:` (set curdir dir))
      (`@{metadata 39} @name` (put-line `@metadata /@curdir/@name`)))))
Answered By: Kaz

You can just do:

ls -ld /etc/cron*/*

The point being to pass the full paths of all the files to ls and be sure to pass the -d option so that for files of type directory, ls shows the info about the directory files themselves rather than list the contents of the directory.

The list of paths there is generated by the shell by expanding that /etc/cron*/* glob.

In the fish shell, you can also do:

ls -ld /etc/cron**

To list all the files whose path starts with /etc/cron, so including /etc/crontab, /etc/cron.d and all the files within.

You can achieve something similar with find with:

find /etc -path '/etc/cron*' -exec ls -ld {} +

Or with zsh with

set -o extendedglob
ls -ld /etc/**/*~^/etc/cron*

(or ls -ld /etc/**~^/etc/cron* if you also enable the globstarshort option)

Answered By: Stéphane Chazelas

If none of your file or directory names contain white space then you could do the following using any POSIX awk:

$ awk '
    NF==1 && sub(/:$/,"/") { dir=$0; next }
    match($0,/[^[:space:]]+$/) { $0=substr($0,1,RSTART-1) dir substr($0,RSTART) }
    { print }
' file
-rw-r--r-- 1 root root 128 May 15  2020 /etc/cron.d/0hourly
-rw------- 1 root root 235 Dec 17  2020 /etc/cron.d/sysstat
-rw------- 1 root root 235 Dec 17  2020 /etc/cron.daily/sysstat

or if your file/directory names can contain spaces but your directory paths always start with / and your ls output always has exactly the same number of fields before the file name as shown in your example then you could do something like this:

$ awk '
    /^// && sub(/:$/,"/") { dir=$0; next }
    match($0,/^([^[:space:]]+[[:space:]]+){8}/) { $0=substr($0,1,RLENGTH) dir substr($0,RLENGTH+1) }
    { print }
' file

But ls doesn’t always produce output with those fields (what ls outputs for the date/time depends on the age of your files and locale setting, and user IDs can contain spaces, for example) and all of the characters in the per-file lines could be present in a directory name and file names can end with : since file and directory names can contain any characters except / or NUL so YMMV with whatever you come up with to try to tell the lines apart and then figure out where the file name starts in the per-file lines. Plus file names can contain newlines which is a whole other world of problems.

So there is no robust way to parse the output of ls for every possible output it could produce. If you want to do this then you just have to figure out what kind of pattern matching you think/hope will be good enough for your needs given whatever context you call ls in and then write your script based on that.

Since some other tool is creating a file of ls output for you to then have to parse you should try to get that other tool fixed since it’s well known that you shouldn’t try to parse the output of ls (see and Why *not* parse `ls` (and what to do instead)?) so that tool is setting you up for failure.

Answered By: Ed Morton

Simple solution for the simple case:

% awk 'NF == 1 { dir = $1; sub(/:$/, "", dir); next }
       NF >= 9 { $9 = dir "/" $9; print; next }
       { print }' input.txt
-rw-r--r-- 1 root root 128 May 15 2020 /etc/cron.d/0hourly
-rw------- 1 root root 235 Dec 17 2020 /etc/cron.d/sysstat
-rw------- 1 root root 235 Dec 17 2020 /etc/cron.daily/sysstat

On lines with just one field (NF == 1), remove the colon and pick the directory name, and on lines with at least nine fields, add the last seen directory name to the start of the ninth space-separated field ($9), since that’s where the (start of) the filename is in the common ls output format. Lines with the wrong number of fields are printed as-is (that would include both empty lines and the total 123 lines that ls -R outputs, not that your sample input includes them).

But more generically, the output of ls can vary, so we need to be careful. For older files, the common timestamp format is May 15 2020, but for recent files, the year is replaced with the hour and minutes, e.g. Mar 29 15:38. Luckily, the number of fields doesn’t change there. But the timestamp format may change depending on the locale, and if the listing contains device files or symlinks, other fields in the output change.

With symlinks, the symlink target is added after an arrow, and for device files, the size field is replaced with the device information, which might be multiple fields, or not (the first line with the null device below is from GNU ls, the second from Mac):

lrwxr-xr-x  1 user  group  9 Mar 29 15:38 link.txt -> hello.txt
crw-rw-rw-  1 root  wheel  0x3000002 Mar 29 15:39 null
crw-rw-rw-  1 root  root 1, 3 Sep  2  2022 null

Of course, if the username or group name can also contain whitespace, that would also produce issues.

Also, the AWK script above compresses multiple spaces to one, turning e.g.
May 15 2020 into May 15 2020 and messing the alignment of other fields. If you care about that, it might be easier to switch to Perl:

% perl -lne  'chomp; if (/^S+:/) { $dir = s/:$//r; next; } s#((S+s+){8})(.*)#$1$dir/$3#; print' input.txt
-rw-r--r-- 1 root root 128 May 15  2020 /etc/cron.d/0hourly
-rw------- 1 root root 235 Dec 17  2020 /etc/cron.d/sysstat
-rw------- 1 root root 235 Dec 17  2020 /etc/cron.daily/sysstat
-rw------- 1 root root 235 Dec 17  2020 /etc/cron.daily/foo bar

Here, the key is the regex ((S+s+){8}), which matches and captures eight instances of non-whitespace characters followed by whitespace characters, so the following (.*) matches the rest of the line.

Answered By: ilkkachu
Categories: Answers Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.