Creating a sequence with a specific string and varying numbers and letters
I need to create a single row with columns that have a shared string, vary by number, and share repeating letters. My desired output looks like:
SNP1a SNP1b SNP2a SNP2b ... SNP3502a SNP3502b
I am new to using unix/linux, so my attempts have been rather rudimentary. So far I have done:
seq -f "SNP%1g" 1 3502 > header
awk '{print;print;}' header > header2
So that header2
is:
SNP1
SNP1
SNP2
SNP2
...
SNP3502
SNP3502
However, I am stuck on how to add an alternating a
and b
to each row.
Any help would be greatly appreciated!
With the zsh
shell:
() { print ${(j[ ])@}; } SNP{1..3502}{a,b}
Where:
SNP{1..3502}{a,b}
generates the list using brace expansion- that’s passed to anonymous function where the list is available in the
$@
aka$argv
array - we join the elements of the array with two spaces inbetween with the
j[ ]
parameter expansion flag - and pass that to
print
which prints it.
From another shell:
zsh -c '() { print ${(j[ ])@}; } SNP{1..3502}{a,b}'
If your list of numbers, prefixes and suffixes are in separate arrays:
pre=( SNP )
num=( {1..3502} )
suf=( a b )
() { print ${(j[ ])@}; } $^pre$^num$^suf
With perl
:
perl -le 'print join " ", map {$n=$_; map "SNP$n$_", qw(a b)} (1..3502)'
With bash
:
printf '%s ' SNP{1..3502}{a..b}
If the last trailing space is a problem wrap it in a function:
headers(){
local pieces=( SNP{1..3052}{a..b} ) IFS=' '
printf '%s' "${pieces[*]}" # add 'n' to get new line at the end
}
Using any awk in any shell on every Unix box:
awk '
BEGIN {
n = split("a b", lets)
for ( i=1; i <= 5; i++ ) {
for ( j=1; j <= n; j++ ) {
printf "%sSNP%d%s", sep, i, lets[j]
sep = OFS
}
}
print ""
}
'
SNP1a SNP1b SNP2a SNP2b SNP3a SNP3b SNP4a SNP4b SNP5a SNP5b
With any of ksh, bash or zsh just use echo:
$ echo SNP{1..3502}{a,b} # {a..b} also works here.
SNP1a SNP1b SNP2a SNP2b SNP3a SNP3b SNP4a SNP4b SNP5a .....
In this specific case echo
is perfectly fine as there is no leading ‘-‘on the generated list nor there are special characters inside.
If you must use printf
, then try:
printf '%sn' SNP{1..3}{a,b} | paste -s -d ' ' -
And, if you must use awk, then use Ed Morton’s answer
Using Raku (formerly known as Perl_6)
raku -e 'my @nbr = "SNP" xx 3502 Z~ 1..3502;
for @nbr -> $i {put $i ~ "a"; put $i ~ "b"};'
OR
raku -e 'my @nbr = "SNP" xx 3502 Z~ 1..3502; my @ltr = "a".."b";
for @nbr -> $i {put $i ~ @ltr[0]; put $i ~ @ltr[1]};'
OR
raku -e 'my @nbr = "SNP" xx 3502 Z~ 1..3502; my @ltr = "a".."b";
for @nbr -> $i {put $i ~ $_ for @ltr};'
This no doubt can be improved, but it gets the job done. The code uses Raku’s Z
infix operator, in conjunction with Raku’s ~
(tilde) string-concatenation operator. Each identifier prints on a separate line. In the third example, @ltr
letters load into $_
(a.k.a. Raku’s topic variable).
For more ideas on how to create sequences of identifier strings in Raku, see the SO link below:
https://stackoverflow.com/questions/47999523/concatenating-lists-in-raku?