TL (see TL;DR near the end of the question)
I came about this data with pipes as field delimiters (|) and backslash-quote pairs as quotes (\") to fields with delimiters in the data, such as:
1|\"2\"|\"3.1|3.2\"|4 # basically 1, 2, 3.1|3.2, 4
that is (in awk):
$1==1
$2==\"2\"
$3==\"3.1|3.2\"
$4==4
I decided to try and use GNU awk's FPAT to solve the field issue since writing a negative match regex to \" didn't seem that bad.
I came about this answer to Regular expression to match a line that doesn't contain a word with a link to (an offsite link) an online generator of negative regular expressions given an input phrase.
As the generator supports only alphanumeric and space characters currently, \" (backslash-quote) was replaced with bq and the generator provided regex:
^([^b]|b+[^bq])*b*$
| was replaced with a p and the data above replaced with:
1pbq2bqpbq3.1p3.2bqp4
1|\"2\"|\"3.1|3.2\"|4 # original for comparision
Sample FPAT from GNU awk documentation (FPAT="([^,]*)|(\"[^\"]+\")") was used to generate an FPAT:
FPAT="([^p]*)|(bq([^b]|b+[^bq])*b*bq)"
and a trial was done:
$ gawk 'BEGIN {
FPAT="([^p]*)|(bq([^b]|b+[^bq])*b*bq)"
OFS=ORS
}
{
print $1,$2,$3,$4
}' data
which output:
1
bq2bq
bq3.1p3.2bq
4
which is right. Replacing pqs with |"s in the program produced:
$ gawk 'BEGIN {
FPAT="([^|]*)|(b\"([^b]|b+[^b\"])*b*b\")"
OFS=ORS
}
{
print $1,$2,$3,$4
}' data
outputs:
1
b"2b"
b"3.1|3.2b"
4
which is still right. However, when replacing bs with \s and adding some escaping, resulted in:
(TL;DR how to fix escaping in below script)
$ gawk 'BEGIN {
FPAT="([^|]*)|(\\\"([^\\]|\\+[^\\\"])*\\*\\\")"
OFS=ORS
}
{
print $1,$2,$3,$4
}' data
and output fails or differs from the previous:
1
\"2\"
\"3.1
3.2\"
so there is probably something wrong with my \\s but after too many try and errs my head is filled with backslashes and all thoughts pretty much escaped (pun intended). And as the community is all about sharing, I thought to share my headache with you guys.
Edit: Apparently it's got something to do with backslashes in quotes, since if instead of defining FPAT="..." I use GNU awk's strongly typed typing FPAT=@/.../ I get the correct output:
$ gawk 'BEGIN {
FPAT=@/([^|]*)|(\\\"([^\\]|\\+[^\\\"])*\\*\\\")/
OFS=ORS
}
{
print $1,$2,$3,$4
}' data
Output now:
1
\"2\"
\"3.1|3.2\"
4