Bash built-in regex vs sed and grep commands

Bash has built-in regex for pattern matching. Sed and egrep commands can also do this.

What is the benefit to choose the built-in vs commands? I would like to know which one is faster and other aspects in comparison.

UPDATE:

Sorry, I might have mistaken some Bash feature to regex.

by "built-in regex", I meant Bash’s string manipulation mentioned in Bash String Manipulation, in particular,

String Removal

stringZ=abcABC123ABCabc
echo ${stringZ#a*C}      # 123ABCabc

String replacement

stringZ=abcABC123ABCabc
echo ${stringZ/a?c/xyz}       # xyzABC123ABCabc
                              # Replaces first match of 'abc' with 'xyz'.

Are they regex?

Asked By: oldpride

||

Addressing the updated question:

What you are showing are, strictly speaking, not applications of regular expressions in the shell. Both are parameter expansions using shell globs, the same sort of patterns that you’d use as filename globbing patterns to do filename expansions, e.g. things like cat text*.txt >combined.

The first expansion is a standard prefix string removal, while the second is a non-standard (but implemented by bash and some other shells) more general substitution. Neither use regular expressions, and you would not be able to do the same sort of operation with shell globbing patterns using grep, sed, or awk.

To use regular expressions in the shell, the shell must support it (it is not a standard feature of a Unix shell, although many shells provide it), and you must use the syntax that the shell provides, which in the case of bash is by using the =~ operator within [[ ... ]].

The use of basic regular expressions (as opposed to extended regular expressions) is also made possible in a limited way by the standard expr utility. But this is very rarely used.


Addressing the original formulation of the question:

You pick the tools that are appropriate for the job at hand.

The tools and their basic usages:

  1. You would use =~ within [[ ... ]] in the bash shell to apply a regular expression to a string stored in a shell variable. This is typically used for testing whether a string matches a certain expression and potentially to extract substrings. It’s ideal for tasks such as validating user-supplied input or handling short strings; tasks that don’t involve line-by-line processing in a loop.

  2. You may use grep for simpler file-processing tasks. It’s useful for extracting lines from a stream, or from one or several files, based on patterns, either regular expressions or plain strings. It can also test whether one or several patterns are present in the input data. Most tasks you’d use grep for may also be performed by sed, but the opposite is not true.

  3. To perform more advanced processing of files, you may employ sed. It allows you to edit a stream, or one or several documents, using substitutions with regular expressions within lines. Additionally, you can prepend, append, replace, or delete lines based on absolute line numbers, regular expressions, or specified ranges. Being a stream editor, the editing done with sed is often of the same type as you would otherwise have needed to do using a text editor. Most tasks you’d use sed for may also be performed by awk, but the opposite is not true.

  4. When dealing with structured text data and requiring versatile data manipulation, awk may be more suitable than sed. You would use awk to process text files, particularly for tasks like extracting specific columns, performing mathematical operations, and applying custom logic to filter, transform, or aggregate data. Some of this processing would potentially involve awk‘s built-in ability to apply custom code to records matching particular regular expressions, or to use regular expressions in substitutions etc.

  5. Some structured formats, such as JSON, YAML, XML, and CSV (using more advanced quoting rules than simple comma-separated values), require care and knowledge about how the rules of the format work with regards to quoting and character encoding etc. For these types of data, specialized processing software should be used, such as jq, Miller (mlr), xmlstarlet, csvkit etc. Many of these tools allow you to safely work with the given data using regular expressions if the task at hand requires it.

It is more common to start with a task and select the tool, than doing the opposite.

Answered By: Kusalananda
Categories: Answers Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.