Creating a sequence with a specific string and varying numbers and letters

I need to create a single row with columns that have a shared string, vary by number, and share repeating letters. My desired output looks like:

SNP1a  SNP1b  SNP2a  SNP2b ... SNP3502a  SNP3502b

I am new to using unix/linux, so my attempts have been rather rudimentary. So far I have done:

seq -f "SNP%1g" 1 3502 > header
awk '{print;print;}' header > header2

So that header2 is:

SNP1
SNP1
SNP2
SNP2
...
SNP3502
SNP3502

However, I am stuck on how to add an alternating a and b to each row.

Any help would be greatly appreciated!

Asked By: Zoe Diaz-Martin

||

With the zsh shell:

() { print ${(j[  ])@}; } SNP{1..3502}{a,b}

Where:

  • SNP{1..3502}{a,b} generates the list using brace expansion
  • that’s passed to anonymous function where the list is available in the $@ aka $argv array
  • we join the elements of the array with two spaces inbetween with the j[ ] parameter expansion flag
  • and pass that to print which prints it.

From another shell:

zsh -c '() { print ${(j[  ])@}; } SNP{1..3502}{a,b}'

If your list of numbers, prefixes and suffixes are in separate arrays:

pre=( SNP )
num=( {1..3502} )
suf=( a b )
() { print ${(j[  ])@}; } $^pre$^num$^suf

With perl:

perl -le 'print join "  ", map {$n=$_; map "SNP$n$_", qw(a b)} (1..3502)'
Answered By: Stéphane Chazelas

With bash :

printf '%s ' SNP{1..3502}{a..b}

If the last trailing space is a problem wrap it in a function:

headers(){
    local pieces=( SNP{1..3052}{a..b} ) IFS=' '
    printf '%s' "${pieces[*]}" # add 'n' to get new line at the end
}
Answered By: DanieleGrassini

Using any awk in any shell on every Unix box:

awk '
     BEGIN {
        n = split("a b", lets)
        for ( i=1; i <= 5; i++ ) {
            for ( j=1; j <= n; j++ ) {
                printf "%sSNP%d%s", sep, i, lets[j]
                sep = OFS
            }
        }
        print ""
    }
'
SNP1a SNP1b SNP2a SNP2b SNP3a SNP3b SNP4a SNP4b SNP5a SNP5b
Answered By: Ed Morton

With any of ksh, bash or zsh just use echo:

$ echo SNP{1..3502}{a,b}       # {a..b} also works here.
SNP1a SNP1b SNP2a SNP2b SNP3a SNP3b SNP4a SNP4b SNP5a .....

In this specific case echo is perfectly fine as there is no leading ‘-‘on the generated list nor there are special characters inside.

If you must use printf, then try:

printf '%sn' SNP{1..3}{a,b} | paste -s -d ' ' -

And, if you must use awk, then use Ed Morton’s answer

Answered By: QuartzCristal

Using Raku (formerly known as Perl_6)

raku -e 'my  @nbr = "SNP" xx 3502 Z~ 1..3502; 
         for @nbr -> $i {put $i ~ "a"; put $i ~ "b"};'   

OR

raku -e 'my @nbr = "SNP" xx 3502 Z~ 1..3502; my @ltr = "a".."b"; 
         for @nbr -> $i {put $i ~ @ltr[0]; put $i ~ @ltr[1]};'  

OR

raku -e 'my @nbr = "SNP" xx 3502 Z~ 1..3502; my @ltr = "a".."b"; 
         for @nbr -> $i {put $i ~ $_ for @ltr};' 

This no doubt can be improved, but it gets the job done. The code uses Raku’s Z infix operator, in conjunction with Raku’s ~ (tilde) string-concatenation operator. Each identifier prints on a separate line. In the third example, @ltr letters load into $_ (a.k.a. Raku’s topic variable).

For more ideas on how to create sequences of identifier strings in Raku, see the SO link below:

https://stackoverflow.com/questions/47999523/concatenating-lists-in-raku?

Answered By: jubilatious1