awk, sed, or grep command to output boolean file for each word in file1 to check if it's existent or not in file2

I have two files, if the word in file 1 is not existent in file 2, I want to generate the word false in the corresponding line in a new file say file 3. Otherwise, I want to output true in the corresponding line.

file 1:

a
b
c
d

file 2:

a
d
c
e
t
y

file 3:

true
false
true
true

Is there a way to do that using awk/sed/grep commands?

Asked By: M.A.G

||

Assuming file2 isn’t empty and isn’t so huge that it can’t fit in memory without causing problems:

awk 'NR==FNR{a[$0]; next} {print ($0 in a ? "true" : "false")}' file2 file1
Answered By: Ed Morton

Read man grep bash, and do something like UNTESTED:

for pat in $(cat "file 1") ; do
  ans="False"
  grep --quiet  "^$pat$" "file 2" || 
    ans="True"
  echo -e "$patt$ans" >>"file 3"
Answered By: waltinator
$ perl -le '
  # construct a partial regex from the first filename argument
  my $re = join("|", split /n+/, do { local(@ARGV,$/) = shift; <> });

  # complete and pre-compile the regex
  $re = qr/b(?:$re)b/;

  # read and process stdin and/or remaining filename args
  while(<>) {
    print /$re/ ? "true" : "false"
  }' file2 file1 
true
false
true
true

This perl script reads in the entire file indicated by the first argument (file2) and constructs a single regular expression that matches the individual words on each line of the file. Each word in the regex is separated by the | alternation character. It assumes that the file contains one word per line and that lines are separated by one or more newline characters (which has the useful side-effect of ignoring blank lines).

NOTE: Each word of file2 will be interpreted as a regular expression. If you want them to be interpreted as fixed strings, change the my $re ... line to:

my $re = join("|", map { quotemeta $_ } split /n+/, do { local(@ARGV,$/) = shift; <> });

The quotemeta function "quotes" all regexp metacharacters in a string so that they lose their special meaning and are treated as literal characters. See perldoc -f quotemeta. The map function causes the { quotemeta $_ } block to be applied to each element of the list returned by split. See perldoc -f map and perldoc -f split.

BTW, do { local(@ARGV,$/) = shift; <> }) is a fairly common perl idiom for "slurping" a file, i.e. reading in an entire file at once. There are many other methods to do the same thing, including modules like File::Slurper, but this is simple and portable and doesn’t require any library modules to be used or installed.

The script then uses the qr quoting operator to pre-compile the regex to improve performance (recompiling the same regex on every pass through the loop would be an enormous waste of CPU time). Word-boundary markers, b, are used to prevent partial matches, and ?: is used to prevent capture of matches (which would just waste time since we only need to detect that a match occurred and don’t need to use the matches for anything). See perldoc -f qr and man perlre

The regex is case-sensitive, but could be made case-insensitive with the i regex modifier:

$re = qr/b(?:$re)b/i;

The script then reads any remaining input and, for each line of input, it prints "true" if the line matches the regex and "false" otherwise. Because it uses the while(<>) it will read data from stdin and/or any filename arguments. In this example, it reads the input from file1.

Memory usage is proportional to the size of the first file – the more words in file2, the more RAM it will use. Run-time is, of course, proportional to the size of the first file and all other input.

Answered By: cas

Convert your second file into a sed program:

$ sort -u file2 | sed 's:.*:s/^&$/true/; t:'
s/^a$/true/; t
s/^c$/true/; t
s/^d$/true/; t
s/^e$/true/; t
s/^t$/true/; t
s/^y$/true/; t

This sed program replaces the lines from file2 with the string true. The script immediately skips to the end using t if a substitution is performed; otherwise, the next substitution is attempted.

For this to work, the input in file2 obviously can’t contain anything that could be interpreted as a regular expression. It also can’t contain the character /.

Executing this on the first file, and adding a last substitution that defaults to replacing the line with the string false, we get

$ sort -u file2 | sed 's:.*:s/^&$/true/; t:' | sed -f /dev/stdin -e 's/.*/false/' file1
true
false
true
true
Answered By: Kusalananda
tmp=$(mktemp)
comm -2 <(sort file1) <(sort file2) 
| sed -e 's/^t.*/true/;t
  c false' > "$tmp"

paste <(cat -n < "$tmp") 
  <(cat -n file1 | sort -bk2) 
| sort -bk3,3n | cut -f2

Output:-

true
false
true
true

Notes:

  • First step we sort both files and run them through comm with the second file suppressed.
  • Next sed will identify the true/false elements
  • Finally to recover the original order, we run a paste on output and numbered sorted input.
Answered By: guest_7
$ for i in $(<file1.txt); do grep -oq "$i" file2.txt && echo true || echo false; done
true
false
true
true

$ xargs -i sh -c 'grep -oq "{}" file2.txt && echo true || echo false' < file1.txt
true
false
true
true

#!/bin/bash

find_in_array() {
  local string=$1
  shift
  for element in "$@"; do [[ "$element" == "$string" ]] && return 0; done
  return 1
}

readarray -t arr1 <file1.txt
readarray -t arr2 <file2.txt

for str in "${arr1[@]}"; do
    find_in_array "$str" "${arr2[@]}" && echo true || echo false
done

output

true
false
true
true
Answered By: ufopilot
Categories: Answers Tags: , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.