Assumptions:
- generate
$durchlauf (a number) random line numbers; we'll refer to a single number as n ...
- delete lines numbered
n and n+1 from the input file and in their place ...
- insert
$string (a randomly generated base64 string)
- this list of random line numbers must not have any consecutive line numbers
As others have pointed out you want to limit yourself to a single gawk call per input file.
New approach:
- generate
$durchlauf (count) random numbers (see gen_numbers() function)
- generate
$durchlauf (count) base64 strings (we'll reuse Ed Morton's code)
paste these 2 sets of data into a single input stream/file
- feed 2 files to
gawk ... the paste result and the actual file to be modified
- we won't be able to use
gawk's -i inplace so we'll use an intermediate tmp file
- when we find a matching line in our input file we'll 1) insert the
base64 string and then 2) skip/delete the current/next input lines; this should address the issue where we have two random numbers that are different by +1
One idea to insure we do not generate consecutive line numbers:
- break our set of line numbers into ranges, eg, 100 lines split into 5 ranges =>
1-20 / 21-40 / 41-60 / 61-80 / 81-100
- reduce the end of each range by 1, eg,
1-19 / 21-39 / 41-59 / 61-79 / 81-99
- use
$RANDOM to generate numbers between each range (this tends to be at least a magnitude faster than comparable shuf calls)
We'll use a function to generate our list of non-consecutive line numbers:
gen_numbers () {
max=$1 # $zeilen eg, 100
count=$2 # $durchlauf eg, 5
interval=$(( max / count )) # eg, 100 / 5 = 20
for (( start=1; start<max; start=start+interval ))
do
end=$(( start + interval - 2 ))
out=$(( ( RANDOM % interval ) + start ))
[[ $out -gt $end ]] && out=${end}
echo ${out}
done
}
Sample run:
$ zeilen=100
$ durchlauf=5
$ gen_numbers ${zeilen} ${durchlauf}
17
31
54
64
86
Demonstration of the paste/gen_numbers/base64/tr/gawk idea:
$ zeilen=300
$ durchlauf=3
$ paste <( gen_numbers ${zeilen} ${durchlauf} ) <( base64 /dev/urandom | tr -dc '[[:print:]]' | gawk -v max="${durchlauf}" -v RS='.{230}' '{print RT} FNR==max{exit}' )
This generates:
74 7VFhnDN4J...snip...rwnofLv
142 ZYv07oKMB...snip...xhVynvw
261 gifbwFCXY...snip...hWYio3e
Main code:
tmpfile=$(mktemp)
while/for loop ... # whatever OP is using to loop over list of input files
do
zeilen=$(wc -l < "testfile${filecount}".txt)
durchlauf=$(( $zeilen/20 ))
awk '
# process 1st file (ie, paste/gen_numbers/base64/tr/gawk)
FNR==NR { ins[$1]=$2 # store base64 in ins[] array
del[$1]=del[($1)+1] # make note of zeilen and zeilen+1 line numbers for deletion
next
}
# process 2nd file
FNR in ins { print ins[FNR] } # insert base64 string?
! (FNR in del) # if current line number not in del[] array then print the line
' <( paste <( gen_numbers ${zeilen} ${durchlauf} ) <( base64 /dev/urandom | tr -dc '[[:print:]]' | gawk -v max="${durchlauf}" -v RS='.{230}' '{print RT} FNR==max{exit}' )) "testfile${filecount}".txt > "${tmpfile}"
# the last line with line continuations for readability:
#' <( paste \
# <( gen_numbers ${zeilen} ${durchlauf} ) \
# <( base64 /dev/urandom | tr -dc '[[:print:]]' | gawk -v max="${durchlauf}" -v RS='.{230}' '{print RT} FNR==max{exit}' ) \
# ) \
#"testfile${filecount}".txt > "${tmpfile}"
mv "${tmpfile}" "testfile${filecount}".txt
done
Simple example of awk code in action:
$ cat orig.txt
line1
line2
line3
line4
line5
line6
line7
line8
line9
$ cat paste.out # simulated output from paste/gen_numbers/base64/tr/gawk
1 newline1
5 newline5
$ awk '...' paste.out orig.txt
newline1
line3
line4
newline5
line7
line8
line9
to:mark-fuso: I copy it from a other posting. it's to hard to understand awk for a small job. "loop change same fileprint" is a copy error - not from me. i will delete it – kumpel4 Sep 19 '21 at 08:48