sort in shell using multiple -t

I have a file that contains:

192.168.130.175 2014-09-04      10:25:01        /index.html

192.168.138.244 2014-09-04      11:23:00        /index.html

192.168.138.244 2014-09-04      10:29:37        /products.html

192.168.138.244 2014-09-04      11:22:49        /products.html

192.168.83.173  2014-09-04      10:05:17        /products.html

192.168.130.175 2014-09-04      12:24:24        /products/004.html

192.168.130.175 2014-09-04      10:09:13        /products/296.html

192.168.130.175 2014-09-04      11:01:20        /products/296.html

192.168.83.173  2014-09-04      12:19:55        /products/560.html

I want to, first, sort by IP in number, if same then use the forth one separated by tabs in alphabet
the file name is access-2014-09-04.log. I tried sort -t. -k1,2 -k3,4 -n -t $'t' -k4 access-2014-09-04.log

but it tells me sort: incompatible tabs.

Asked By: 龚健翔

||

You can split your command into a pipeline with two sorts:

sort -t $'t' -k4 access-2014-09-04.log | sort -t. -k1,2 -k3,4 -n -s

Output:

192.168.83.173  2014-09-04      12:19:55        /products/560.html
192.168.83.173  2014-09-04      10:05:17        /products.html
192.168.130.175 2014-09-04      10:25:01        /index.html
192.168.130.175 2014-09-04      12:24:24        /products/004.html
192.168.130.175 2014-09-04      10:09:13        /products/296.html
192.168.130.175 2014-09-04      11:01:20        /products/296.html
192.168.138.244 2014-09-04      11:23:00        /index.html
192.168.138.244 2014-09-04      10:29:37        /products.html
192.168.138.244 2014-09-04      11:22:49        /products.html

First I sort by your 2nd criterion, the path. Now the list is sorted in path order. Then I sort by the 1st criterion. I add -s to make it a stable sort so that the 2nd sorting criterion is not wasted. The result is that if the 1st criterion is different, said 1st criterion will be taken into consideration, but if it’s the same, the previous order in the 1st step (ordered by the 1st criterion) is maintained.

By the way, your -k1,2 -k3,4 does not seem to sort IP address correctly. If you want to simply sort with . as the separator, you need to use -k1,1 -k2,2 -k3,3 -k4,4:

sort -t $'t' -k4 access-2014-09-04.log | sort -t. -k1,1 -k2,2 -k3,3 -k4,4 -n -s

The --debug option revealed this to me. However I consider the -V option to be cleaner than a bunch of -k. The LC_ALL=C is also nice. For that cleaner solution, please see the second half of @steeldriver’s answer.

Answered By: Daniel T

A numeric (-n) sort on the IP addresses will only consider the first two octets as a "number" – as you can verify using the --debug option:

$ LC_ALL=C sort --debug --stable -t $'t' -k1,1n access-2014-09-04.log 
sort: text ordering performed using simple byte comparison
192.168.130.175>2014-09-04>10:25:01>/index.html
_______
192.168.138.244>2014-09-04>11:23:00>/index.html
_______
192.168.138.244>2014-09-04>10:29:37>/products.html
_______
192.168.138.244>2014-09-04>11:22:49>/products.html
_______
192.168.83.173>2014-09-04>10:05:17>/products.html
_______
192.168.130.175>2014-09-04>12:24:24>/products/004.html
_______
192.168.130.175>2014-09-04>10:09:13>/products/296.html
_______
192.168.130.175>2014-09-04>11:01:20>/products/296.html
_______
192.168.83.173>2014-09-04>12:19:55>/products/560.html
_______

(I added --stable to omit the final whole-line lexical sort just for clarity).

AFAIK, you can’t use two different field separators within a single sort command – although you can break a key into further subfields using a KEYDEF of the form F.C where F is the actual field number and C is a character position within the field. So in this particular case where the first two octets are all 7 characters long, you could fake a "dot separated" numeric sort of the first tab separated true field (don’t really do this – it’s just meant to illustrate the F.C notation and will fail IRL where the octets may be 1,2, or 3 characters long) using -k1.1,1.7n -k1.9,1n:

$ LC_COLLATE=C sort --debug -t $'t' -k1.1,1.7n -k1.9,1n -s access-2014-09-04.log 
sort: text ordering performed using simple byte comparison
192.168.83.173>2014-09-04>10:05:17>/products.html
_______
        ______
192.168.83.173>2014-09-04>12:19:55>/products/560.html
_______
        ______
192.168.130.175>2014-09-04>10:25:01>/index.html
_______
        _______
192.168.130.175>2014-09-04>12:24:24>/products/004.html
_______
        _______
192.168.130.175>2014-09-04>10:09:13>/products/296.html
_______
        _______
192.168.130.175>2014-09-04>11:01:20>/products/296.html
_______
        _______
192.168.138.244>2014-09-04>11:23:00>/index.html
_______
        _______
192.168.138.244>2014-09-04>10:29:37>/products.html
_______
        _______
192.168.138.244>2014-09-04>11:22:49>/products.html
_______
        _______

I suspect what you actually want is to natural sort the IP by version (-V) and then sort by name with the default lexical ordering:

$ sort -t $'t' -k1,1V -k4 access-2014-09-04.log 
192.168.83.173  2014-09-04      12:19:55        /products/560.html
192.168.83.173  2014-09-04      10:05:17        /products.html
192.168.130.175 2014-09-04      10:25:01        /index.html
192.168.130.175 2014-09-04      12:24:24        /products/004.html
192.168.130.175 2014-09-04      10:09:13        /products/296.html
192.168.130.175 2014-09-04      11:01:20        /products/296.html
192.168.138.244 2014-09-04      11:23:00        /index.html
192.168.138.244 2014-09-04      10:29:37        /products.html
192.168.138.244 2014-09-04      11:22:49        /products.html

Specifying stable sort (-s / --stable) does not seem to be necessary since the remaining fields (date and time) appear to be lexically ordered anyhow. Note that will sort the filename column in your locale’s lexical order – add LC_ALL=C to force strict byte order.

Answered By: steeldriver

Using the Decorate-Sort-Undecorate idiom with any awk+sort+cut:

$ awk -v OFS='t' '{ip=$1; gsub(/./,OFS,ip); print ip, $0}' file |
    sort -k1,1n -k2,2n -k3,3n -k4,4n -k8 | cut -f5-
192.168.83.173  2014-09-04      10:05:17        /products.html
192.168.83.173  2014-09-04      12:19:55        /products/560.html
192.168.130.175 2014-09-04      10:25:01        /index.html
192.168.130.175 2014-09-04      12:24:24        /products/004.html
192.168.130.175 2014-09-04      10:09:13        /products/296.html
192.168.130.175 2014-09-04      11:01:20        /products/296.html
192.168.138.244 2014-09-04      11:23:00        /index.html
192.168.138.244 2014-09-04      10:29:37        /products.html
192.168.138.244 2014-09-04      11:22:49        /products.html
Answered By: Ed Morton