Write a regular expression whose output will only be rows in a range 01/03/2021 – 01/03/2020

I have a file that got dates from 01/01/2020 to 04/04/2021
I want to get only the dates between 01/03/2020 to 01/03/2021 by using egrep. I tried to do

egrep "([0][1-9]|[1-2][0-9]|[3][0]/[0][3-9]|[1][0-2]/[2][0][2][0-1])$" dates.txt

but it is still giving me all the dates in the file:

$ cat dates.txt 
01/01/2020
24/01/2020
04/02/2020
23/02/2020
01/03/2020
13/03/2020
14/04/2020
29/05/2020
16/06/2020
17/07/2020
18/08/2020
19/09/2020
20/10/2020
21/11/2020
22/12/2020
23/01/2021
24/02/2021
01/03/2021
25/03/2021
04/04/2021
Asked By: Mohamad

||

From your description you need any date from year 2020 after 01/03/2020. That would be:

$ egrep "(../(0[3-9]|1[0-2])/2020$)" dates.txt

And also all dates from 2021 up to 01/03/2021. That part would be:

$ egrep "((/0[1-2]/|01/03/)2021$)" dates.txt

Joining both ranges:

$ egrep "(../(0[3-9]|1[0-2])/2020$|(/0[1-2]/|01/03/)2021$)" dates.txt

Simplifying a little bit, changing to grep -E (which is the present day equivalent to egrep), and listing the output:

$ grep -E "(/(0[3-9]|1[0-2])/2020|(/0[1-2]/|01/03/)2021)$" dates.txt
01/03/2020
13/03/2020
14/04/2020
29/05/2020
16/06/2020
17/07/2020
18/08/2020
19/09/2020
20/10/2020
21/11/2020
22/12/2020
23/01/2021
24/02/2021
01/03/2021

Your source file seems to be:

$ cat dates.txt 
01/01/2020
24/01/2020
04/02/2020
23/02/2020
01/03/2020
13/03/2020
14/04/2020
29/05/2020
16/06/2020
17/07/2020
18/08/2020
19/09/2020
20/10/2020
21/11/2020
22/12/2020
23/01/2021
24/02/2021
01/03/2021
25/03/2021
04/04/2021
Answered By: QuartzCristal

Using the example file given, where dates are in order and the start + end date are present in the file, you might find a solution using awk to be more straightforward.

$ awk '$1=="01/03/2020",$1=="01/03/2021"' dates.txt
01/03/2020
13/03/2020
14/04/2020
29/05/2020
16/06/2020
17/07/2020
18/08/2020
19/09/2020
20/10/2020
21/11/2020
22/12/2020
23/01/2021
24/02/2021
01/03/2021
$

As an aside, do be aware that use of egrep is deprecated, in favour of the POSIX-compliant grep -E approach.

Answered By: steve

I really wouldn’t try to do this with just regular expressions. More sophisticated tools will make it easier. For example, with awk:

$ awk -F/ '($3==2020 && $2 > 2) || ($3==2021 && ($2 < 3) || ($1< 2 && $2 == 3))' dates.txt 
01/03/2020
13/03/2020
14/04/2020
29/05/2020
16/06/2020
17/07/2020
18/08/2020
19/09/2020
20/10/2020
21/11/2020
22/12/2020
23/01/2021
24/02/2021
01/03/2021

The awk is setting the field separator to / and then simply selecting lines that match one of these three criteria:

  • the last field (the year) is 2020 and the second field (the month) is greater than 2. This will match all dates from 01/03/2020 until 31/12/2020.
  • the last field (the year) is 2021 and either
    • the second field (the month) is smaller than 3 OR
    • the first field (the day of the month) is less than 2 and the second field (the month) is exactly 3.
Answered By: terdon

Just use awk:

$ awk -F'/' '{d=$3$2$1} (20200301 <= d) && (d <= 20210301)' dates.txt
01/03/2020
13/03/2020
14/04/2020
29/05/2020
16/06/2020
17/07/2020
18/08/2020
19/09/2020
20/10/2020
21/11/2020
22/12/2020
23/01/2021
24/02/2021
01/03/2021

The above will work whether the input is sorted or not and whether the range-delimiting dates are present in the input or not.

Just change <= to < if by between you meant excluding the delimiting dates.

Answered By: Ed Morton

Using Raku (formerly known as Perl_6)

raku -ne 'my $ts = .subst(/ ^ (d**2) / (d**2) / (d**4) /, {"$2-$1-$0"}).Date; say $ts if Date.new("2020-03-01") < $ts < Date.new
("2021-03-01");' 

Raku handles ISO-8601 dates by default, as long as you provide a string in the right format (hyphen-separated yyyy-mm-dd). The code above captures the digits, re-arranges them, and creates Date objects. Then a "startdate < $timestamp < enddate" conditional is used to select out the desired range.

Sample Input:

01/01/2020
24/01/2020
04/02/2020
23/02/2020
01/03/2020
13/03/2020
14/04/2020
29/05/2020
16/06/2020
17/07/2020
18/08/2020
19/09/2020
20/10/2020
21/11/2020
22/12/2020
23/01/2021
24/02/2021
01/03/2021
25/03/2021
04/04/2021

Sample Output:

2020-03-13
2020-04-14
2020-05-29
2020-06-16
2020-07-17
2020-08-18
2020-09-19
2020-10-20
2020-11-21
2020-12-22
2021-01-23
2021-02-24

[With the Raku code described here, a nice check with ISO-8601 Date conversion is that it will balk at months > 12, helping to ensure that months/days don’t get scrambled].

Below, there’s a more compact solution (possibly at the expense of readability), that preserves the dates in the original format:

~$ raku -ne '.say if Date.new("2020-03-01") < S/ ^ (d**2) / (d**2) / (d**4) /{"$2-$1-$0"}/.Date < Date.new("2021-03-01");' file
13/03/2020
14/04/2020
29/05/2020
16/06/2020
17/07/2020
18/08/2020
19/09/2020
20/10/2020
21/11/2020
22/12/2020
23/01/2021
24/02/2021

https://docs.raku.org/type/Date
https://raku.org

Answered By: jubilatious1

To answer the question with the tool requested, the problem lies in grouping. In particular, if you write the regular expression as:

([0][1-9]|[1-2][0-9]|[3][0]/[0][3-9]|[1][0-2]/[2][0][2][0-1])$

This will match anything matching any of:

[0][1-9]$
[1-2][0-9]%
[3][0]/[0][3-9]$
[1][0-2]/[2][0][2][0-1]$

Try instead the command:

egrep "[0-9][0-9]/(((0[3-9]|1[012])/2020)|(0[12]/2021))$" dates.txt

Note that I have removed any attempt at date validation in the expression, just matching. I wouldn’t recommend trying to do both.

It isn’t clear to me if you wanted 01/03/2021 to match. If you did, I would add an or clause to match just that, as:

egrep "([0-9][0-9]/(((0[3-9]|1[012])/2020)|(0[12]/2021)))$|01/03/2021$" dates.txt
Answered By: David G.

GNU grep supports PCRE mode, where we can use as shown:

grep -P '(?x) (?:01/03|/0[12])/2021 | /(?!0[12])../2020' file

We can do it in perl by writing the regex in multi lines:

perl -ne 'print if
  m{
    (?:
      (?:01/03/2021)      |
      (?:/(?:0[12])/2021) |
      (?:/(?!0[12])../2020)
    )
  }x;' file

sed equivalent is as shown where we keep deleting impossibilities, and at the end what remains is the desired answer.

sed -e '
  :01/03/2021$:!{
  :/0[12]/2021$:!{
  :/2020$:!d
  :/0[12]/:d
  };}
' file

sed -e '
  :01/03/2021$:b
  :/0[12]/2021$:b
  :/2020$:{
    :/0[12]/:!b
  };d
' file

sed -En '1{x
  s:.*:20200301-20210301#0123456789:
x;}
  s:^(..)/(..)/(.{4}):&n321:;G
  /n(.*)(.).*n1(.).*#.*2.*3/d
  /n(.*)(.).*-1(.).*#.*3.*2/d
  P
' file

Answered By: guest_7
Categories: Answers Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.