Fetch data from a large file based on time and date recursively

Fetch data based on date(column 1) and time (column 2). Each date has time in column 2. Based on each date in column1 two files are to be generated which will have all the fields based on time 6:00 to 21:50:08 (day) and 22:00:00 to 5:50:00 (night). Trying to get two files for each date as date_day and date_night based on the times specified.

Input file:

Date       Time     R1      R2      R3
03/10/2023 19:00:08 19.06   39.870  5.12
03/10/2023 19:10:08 18.87   39.970  4.98
03/10/2023 19:20:08 18.68   39.940  4.80
03/10/2023 19:30:08 18.84   40.110  5.01
03/10/2023 19:40:08 18.89   38.960  4.64
03/10/2023 19:50:08 18.60   39.100  4.43
03/10/2023 23:30:08 18.03   34.200  2.03
03/10/2023 23:40:08 17.94   33.930  1.84
03/10/2023 23:50:08 17.87   33.840  1.74
03/11/2023 00:00:08 17.75   33.790  1.61
03/11/2023 00:10:08 17.96   34.060  1.91
03/11/2023 00:20:08 18.13   33.690  1.91
03/11/2023 00:30:08 17.91   33.620  1.68

Output files:

03/10/2023_day

Date        Time    R1      R2      R3
03/10/2023 19:00:08 19.06   39.870  5.12
03/10/2023 19:10:08 18.87   39.970  4.98
03/10/2023 19:20:08 18.68   39.940  4.80
03/10/2023 19:30:08 18.84   40.110  5.01
03/10/2023 19:40:08 18.89   38.960  4.64

03/10/2023_night

Date        Time     R1      R2      R3
03/10/2023 19:50:08 18.60   39.100  4.43
03/10/2023 23:30:08 18.03   34.200  2.03
03/10/2023 23:40:08 17.94   33.930  1.84
03/10/2023 23:50:08 17.87   33.840  1.74

03/11/2023_night:

Date       Time      R1     R2      R3
03/11/2023 00:00:08 17.75   33.790  1.61
03/11/2023 00:10:08 17.96   34.060  1.91
03/11/2023 00:20:08 18.13   33.690  1.91
03/11/2023 00:30:08 17.91   33.620  1.68

I tried the following to fetch day and night files. This I will have to do for each date again and again. I removed the ‘:’ from the time column in my code to get a number without :. Can anyone please help to put it in a loop and the separate day and night files for each date?

awk '$1 ~ /03/10/2023/ && $2 >= 060000 && $2 <= 215000' data |sed 's/t/,/g' > 03_10_23_day.csv
awk '$1 ~ /03/10/2023/ && $2 > 215000' data |sed 's/t/,/g' > 03_10_23_night.csv
Asked By: Gavin

||

You want string comparisons rather than number comparisons in awk and generate all output files in one awk invocation. Using a loop wouldn’t make sense here:

awk -v OFS=, '
  {$1 = $1} # force reformatting with comma delimiters
  NR == 1 {header = $0; next}
  {
    split($1, f, "/")
    outfile = f[1] "_" f[2] "_" substr(f[3], 3) "_" 
              ($2 >= "06:00:00" && $2 < "22:00:00" ? "day" : "night") 
              ".csv"
    if (!seen[outfile]++) print header > outfile
    print > outfile
  }' < data

($2 >= "06" && $2 < "22" would also work here)

I’d suggest naming your files 2023-10-03-night.csv rather than 03_10_23_night.csv (assuming it’s meant to be the third of October rather than the tenth of March) which would mean ls for instance would show them in chronological order and that 2023-10-03 is an international standard unambiguous format.

Answered By: Stéphane Chazelas

Using Raku (formerly known as Perl_6)

#OUTPUT A SPECIFIED 'TIME-OF-DAY' RANGE FOR ALL DATES IN FILE:

~$ raku -e 'my $hdr = get; my @a = lines.map: *.split(" ");  
            my @b = do for @a { .[0..1].join("T").subst(/ (d**2) / (d**2) / (d**4) /, {"$2-$0-$1"} ).DateTime, .[2..*] };  
            put $hdr; for @b {   
                my $start = .[0].truncated-to("day") + Duration.new(21600);  
                my $stop  = .[0].truncated-to("day") + Duration.new(71408);  
                put $_ if  $_.[0] ~~ $start ..^ $stop };'  file 

Above (first answer) is an approach using Raku, a member of the Perl-family of programming languages. An advantage of using Raku is that ISO-8601 DateTimes are built-in. Above filters lines of input, giving output within a defined $start .. $stop range. The ..^ range operator (with caret) excludes the RHS timepoint from output.

#OUTPUT A 'TIME-OF-DAY' RANGE FOR A SPECIFIED DATE IN FILE:

~$ raku -e 'my $target_date = DateTime.new("2023-03-10");   
            say $target_date; my $hdr = get;  
            my @a = lines.map: *.split(" "); my @b = do for @a { .[0..1].join("T").subst(/ (d**2) / (d**2) / (d**4) /, {"$2-$0-$1"} ).DateTime, .[2..*] };  
            put $hdr;  for @b {   
                my $start = $target_date + Duration.new(21600);  
                my $stop  = $target_date + Duration.new(71408);   
                put $_ if  $_.[0] ~~ $start ..^ $stop };'    file

Above (second answer) somewhat more specifically you can define a $target_date and only retain a ‘time-of-day’ range for that Date in output.

Sample Input (OP’s example plus two extra lines added at end):

Date       Time     R1      R2      R3
03/10/2023 19:00:08 19.06   39.870  5.12
03/10/2023 19:10:08 18.87   39.970  4.98
03/10/2023 19:20:08 18.68   39.940  4.80
03/10/2023 19:30:08 18.84   40.110  5.01
03/10/2023 19:40:08 18.89   38.960  4.64
03/10/2023 19:50:08 18.60   39.100  4.43
03/10/2023 23:30:08 18.03   34.200  2.03
03/10/2023 23:40:08 17.94   33.930  1.84
03/10/2023 23:50:08 17.87   33.840  1.74
03/11/2023 00:00:08 17.75   33.790  1.61
03/11/2023 00:10:08 17.96   34.060  1.91
03/11/2023 00:20:08 18.13   33.690  1.91
03/11/2023 00:30:08 17.91   33.620  1.68
03/12/2023 19:00:08 19.06   39.870  5.12
03/12/2023 19:10:08 18.87   39.970  4.98

Sample Output (1):

Date       Time     R1      R2      R3
2023-03-10T19:00:08Z 19.06   39.870  5.12
2023-03-10T19:10:08Z 18.87   39.970  4.98
2023-03-10T19:20:08Z 18.68   39.940  4.80
2023-03-10T19:30:08Z 18.84   40.110  5.01
2023-03-10T19:40:08Z 18.89   38.960  4.64
2023-03-12T19:00:08Z 19.06   39.870  5.12
2023-03-12T19:10:08Z 18.87   39.970  4.98

Sample Output (2):

2023-03-10T00:00:00Z
Date       Time     R1      R2      R3
2023-03-10T19:00:08Z 19.06   39.870  5.12
2023-03-10T19:10:08Z 18.87   39.970  4.98
2023-03-10T19:20:08Z 18.68   39.940  4.80
2023-03-10T19:30:08Z 18.84   40.110  5.01
2023-03-10T19:40:08Z 18.89   38.960  4.64

https://docs.raku.org/language/temporal
https://docs.raku.org/type/DateTime
https://raku.org

Answered By: jubilatious1

Assumptions/Understandings:

  • output filenames are to convert dates from using / to dates using _ (eg, 03/10/2023 becomes 03_10_2023)
  • output filenames are to be of the format DD_MM_YYYY_day or DD_MM_YYY_night – per expected outputs (ie, for this answer we’ll ignore the .csv extentions as shown in OP’s sample code)
  • input/output field delimiters are white space – as per sample inputs/outputs (ie, for this answer we’ll ignore the implied tab/comma delimiters from OP’s sed s/t/,/g)
  • input data is already sorted by date (column #1) and time (column #2)
  • time ranges are defined as follows (removing gaps introduced by OP’s definitions):
  • day = 06:00:00 to 21:59:59 (vs OP: 06:00:00 to 21:50:08 ???)
  • night = 22:00:00 to 05:59:59 (vs OP: 22:00:00 to 05:50:00 ???)
  • the 03/10/2023 19:50:08 entry should reside in the resulting _day file and not the _night file (as OP has shown in the expected output)
  • OP wants to place 0[0-5] (early morning) entries in the date’s _night file (as opposed to either a) placing in previous date’s _night file or b) placing in a new _morning file)
  • NOTE: if any of these assumptions/understandings are incorrect then OP will need to update the question to provide more details and clarity

Adding a couple more lines to sample input file:

$ cat data
Date       Time     R1      R2      R3
03/10/2023 03:10:08 19.06   39.870  5.12          # new: to be placed in '_night' file
03/10/2023 05:30:08 18.87   39.970  4.98          # new: to be placed in '_night' file
03/10/2023 19:00:08 19.06   39.870  5.12
03/10/2023 19:10:08 18.87   39.970  4.98
03/10/2023 19:20:08 18.68   39.940  4.80
03/10/2023 19:30:08 18.84   40.110  5.01
03/10/2023 19:40:08 18.89   38.960  4.64
03/10/2023 19:50:08 18.60   39.100  4.43
03/10/2023 23:30:08 18.03   34.200  2.03
03/10/2023 23:40:08 17.94   33.930  1.84
03/10/2023 23:50:08 17.87   33.840  1.74
03/11/2023 00:00:08 17.75   33.790  1.61
03/11/2023 00:10:08 17.96   34.060  1.91
03/11/2023 00:20:08 18.13   33.690  1.91
03/11/2023 00:30:08 17.91   33.620  1.68

NOTE: file does not contain comments

One awk idea:

awk '
NR==1               { hdr = $0; next }                        # save header

$1 != prev_dt       { close(out_day)                          # if new date then close output files
                      close(out_night)

                      prev_dt = out_dt = $1                   # make note of new date
                      gsub(///,"_",out_dt)                   # replace "/" with "_"

                      out_day   = out_dt "_day"               # define new output file names
                      out_night = out_dt "_night"

                      hdr_flag_day = hdr_flag_night = 1       # reset "print header?" flag
                    }

$2 >= "06:00:00" &&                                           # "day"
$2 <= "21:59:59"    { if ( hdr_flag_day ) {
                         print hdr > out_day
                         hdr_flag_day = 0
                      }
                      print $0 > out_day
                      next
                    }

                    { if ( hdr_flag_night ) {                 # "night"
                         print hdr > out_night
                         hdr_flag_night = 0
                      }
                      print $0 > out_night
                    }
' data

This generates:

$ head 03*20??_[dn]*
==> 03_10_2023_day <==
Date       Time     R1      R2      R3
03/10/2023 19:00:08 19.06   39.870  5.12
03/10/2023 19:10:08 18.87   39.970  4.98
03/10/2023 19:20:08 18.68   39.940  4.80
03/10/2023 19:30:08 18.84   40.110  5.01
03/10/2023 19:40:08 18.89   38.960  4.64
03/10/2023 19:50:08 18.60   39.100  4.43

==> 03_10_2023_night <==
Date       Time     R1      R2      R3
03/10/2023 03:10:08 19.06   39.870  5.12
03/10/2023 05:30:08 18.87   39.970  4.98
03/10/2023 23:30:08 18.03   34.200  2.03
03/10/2023 23:40:08 17.94   33.930  1.84
03/10/2023 23:50:08 17.87   33.840  1.74

==> 03_11_2023_night <==
Date       Time     R1      R2      R3
03/11/2023 00:00:08 17.75   33.790  1.61
03/11/2023 00:10:08 17.96   34.060  1.91
03/11/2023 00:20:08 18.13   33.690  1.91
03/11/2023 00:30:08 17.91   33.620  1.68
Answered By: markp-fuso
Categories: Answers Tags: , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.