Split string by delimiter and get N-th element

I have a string:

one_two_three_four_five

I need to save in a variable A value two and in variable B value fourfrom the above string

I am using ksh.

Asked By: Alex

||

Use cut with _ as the field delimiter and get desired fields:

A="$(cut -d'_' -f2 <<<'one_two_three_four_five')"
B="$(cut -d'_' -f4 <<<'one_two_three_four_five')"

You can also use echo and pipe instead of Here string:

A="$(echo 'one_two_three_four_five' | cut -d'_' -f2)"
B="$(echo 'one_two_three_four_five' | cut -d'_' -f4)"

Example:

$ s='one_two_three_four_five'

$ A="$(cut -d'_' -f2 <<<"$s")"
$ echo "$A"
two

$ B="$(cut -d'_' -f4 <<<"$s")"
$ echo "$B"
four

Beware that if $s contains newline characters, that will return a multiline string that contains the 2nd/4th field in each line of $s, not the 2nd/4th field in $s.

Answered By: heemayl

Using only POSIX sh constructs, you can use parameter substitution constructs to parse one delimiter at a time. Note that this code assumes that there is the requisite number of fields, otherwise the last field is repeated.

string='one_two_three_four_five'
remainder="$string"
first="${remainder%%_*}"; remainder="${remainder#*_}"
second="${remainder%%_*}"; remainder="${remainder#*_}"
third="${remainder%%_*}"; remainder="${remainder#*_}"
fourth="${remainder%%_*}"; remainder="${remainder#*_}"

Alternatively, you can use an unquoted parameter substitution with wildcard expansion disabled and IFS set to the delimiter character (this only works if the delimiter is a single non-whitespace character or if any whitespace sequence is a delimiter).

string='one_two_three_four_five'
set -f; IFS='_'
set -- $string
second=$2; fourth=$4
set +f; unset IFS

This clobbers the positional parameters. If you do this in a function, only the function’s positional parameters are affected.

Yet another approach for strings that don’t contain newline characters is to use the read builtin.

IFS=_ read -r first second third fourth trail <<'EOF'
one_two_three_four_five
EOF

Wanted to see an awk answer, so here’s one:

A=$(awk -F_ '{print $2}' <<< 'one_two_three_four_five')
B=$(awk -F_ '{print $4}' <<< 'one_two_three_four_five')  

Try it online!

Answered By: Paul Evans

Is a python solution allowed?

# python3 -c "import sys; print(sys.argv[1].split('_')[1])" one_two_three_four_five
two

# python3 -c "import sys; print(sys.argv[1].split('_')[3])" one_two_three_four_five
four
Answered By: fhgd

Here string

The simplest way (for shells with <<<) is:

IFS='_' read -r a second a fourth a <<<"$string"

Using a temporary variable $a instead of $_ because one shell complains.

In a full script:

string='one_two_three_four_five'
IFS='_' read -r a second a fourth a <<<"$string"
echo "$second $fourth"
  • No IFS changing
  • No issues with set -f (Pathname expansion)
  • No changes to the positional parameters ($@).

Heredoc
For a solution portable to all shells (yes, all POSIX included) without changing IFS or set -f, use the (a bit more complex) heredoc equivalent:

string='one_two_three_four_five'

IFS='_' read -r a second a fourth a <<_EOF_
$string
_EOF_

echo "$second $fourth"

Understand that this solutions (both the here-doc and the use of <<<) will remove all trailing newlines.

And that this is designed to a "one liner" variable content.

Solutions for multi-liners are possible but need more complex constructs.


Bash 4.4+
A very simple solution is possible in bash version 4.4

readarray -d _ -t arr <<<"$string"

echo "array ${arr[1]} ${arr[3]}"   # array numbers are zero based.

beware a newline character is added at the end of the last element and an empty $string is split into one element containing a newline character.

readarray -t -d _ arr < <(printf %s "$string")

Would create an empty array for an empty $string, but beware that a trailing empty element like in string=foo_ would not result in an empty trailing element.

readarray -t -d _ arr < <(printf %s_ "$string")

Would preserve all elements and split an empty string into one empty element.

readarray -t -d _ arr < <(printf %s "${string+${string}_}")

Would split an empty string into one empty element, but would give an empty list if $string was unset.

There is no equivalent for POSIX shells, as many POSIX shells do not have arrays.

Arrays
For shells that have arrays may be as simple as (tested working in attsh, lksh, mksh, ksh, and bash, but not zsh):

set -f; IFS=_; arr=($string)

But with a lot of additional plumbing to keep and reset variables and options:

string='one_* *_three_four_five'
    
case $- in
    *f*) noglobset=true; ;;
    *) noglobset=false;;
esac
    
oldIFS="$IFS"

set -f; IFS=_; arr=($string)
    
if $noglobset; then set -f; else set +f; fi

IFS=$oldIFS
    
echo "two=${arr[1]} four=${arr[3]}"

In zsh, arrays start in 1, and no split+glob is performed by default upon parameter expansions. So some changes need to be done to get this working in zsh:

IFS=_; arr=( $=string )
echo "two=${arr[2]} four=${arr[4]}"

Where $=string requests word splitting explicitly (glogging is still not done so doesn’t need to be disabled globally). Also note that while foo_ would be split into foo only in ksh/bash/yash, it’s split into foo and the empty string in zsh.

Answered By: user232326

With zsh you could split the string (on _) into an array:

non_empty_elements=(${(s:_:)string})
all_elements=("${(@s:_:)string}")

and then access each/any element via array index:

print -r -- ${all_elements[4]}

Keep in mind that in zsh (like most other shells, but unlike ksh/bash) array indices start at 1.

Or directly in one expansion:

print -r -- "${${(@s:_:)string}[4]}"

Or using an anonymous function for the elements to be available in its $1, $2…:

(){print -r -- $4} "${(@s:_:)string}"
Answered By: don_crissti

Another awk example; simpler to understand.

A=$(echo one_two_three_four_five | awk -F_ '{print $1}')
B=$(echo one_two_three_four_five | awk -F_ '{print $2}')  
C=$(echo one_two_three_four_five | awk -F_ '{print $3}')  
... and so on...  

Can be used with variables also.

Suppose:

this_str="one_two_three_four_five"  

Then the following works:

A=$(printf '%sn' "${this_str}" | awk -F_ '{print $1}')
B=$(printf '%sn' "${this_str}" | awk -F_ '{print $2}')
C=$(printf '%sn' "${this_str}" | awk -F_ '{print $3}')
... and so on...  

That assumes ${this_str} doesn’t contain newline characters, or it would return the first _ in each line of the contents of the variable instead of the first field in the contents of the variable.

Answered By: user274900

With due respect to everyone who have posted excellent answers, I wonder if we are over-engineering this problem. Three simple lines to just answer the question asked without generalizing:

str="one_two_three_four_five" <– create a string

A=$(echo $str | awk -F_ '{print $2}') <– tell awk to use _ as the delimiter and assign the second field to A

B=$(echo $str | awk -F_ '{print $4}') <– tell awk to use _ as the delimiter and assign the fourth field to B

You can then use the variables as usual. Here is an example:

$ echo "The value of A is: $A; the value of B is: $B"
The value of A is: two; the value of B is: four
$ 
Answered By: Hopping Bunny

Using Raku (formerly known as Perl_6)

A=$(raku -e  'print $*IN.split("_")[1];' <<< 'one_two_three_four_five')
B=$(raku -e  'print $*IN.split("_")[3];' <<< 'one_two_three_four_five')

This answer complements the awk answer by @Paul_Evans. You can place print at the right end of the method chain, if you find that more readable. Also, if you have an issue with quoting, then the .split("_") call can be replaced by .split(q[_]).

Putting these two options together:

A=$(raku -e  '$*IN.split(q[_])[1].print;' <<< 'one_two_three_four_five')
B=$(raku -e  '$*IN.split(q[_])[3].print;' <<< 'one_two_three_four_five')

Finally, a word about indexing. You can take the first element after splitting with head, or the first 2 elements with head(2). If you want to take elements from the right end, use tail in a similar manner. The way to numerically index from the right end in Raku is to use the * "whatever-star" idiom. So the last (zero-indexed) element is [*-1], the second-to-last is [*-2], etc.

~$ raku -e  'print $*IN.split(q[_])[*-4];' <<< 'one_two_three_four_five'
two
~$ raku -e  'print $*IN.split(q[_])[*-2];' <<< 'one_two_three_four_five'
four

https://raku.org

Answered By: jubilatious1
Categories: Answers Tags: , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.