Fairly new to R but from my research I believe this is a unique question.
I am working with water quality data that is provided to me by another agency in a wide format which looks like so:
library(tidyverse)
library(hablar)
wq.data<-as.data.frame(read.csv('C:/WQdata.csv',stringsAsFactors = FALSE))
wq.data<-wq.data%>%
convert(chr(1:8))
wq.data
# A tibble: 9 x 8
Station.ID SampleTime AreaName Sample.Depth.m Temp.Mid.degC Temp.Mid.Q DO.mg.l DO.Q
<chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 220 07/09/2019 14:44 Blue River 1.5 29.69 "" 1.66 ""
2 220 06/10/2019 14:17 Blue River 2.1 28.67 K 3.62 ""
3 220 05/22/2019 14:31 Blue River 1.7 29.63 "" 7.66 LP
4 220 04/08/2019 14:15 Blue River 1.7 25.9 "" 6.27 ""
5 220 03/13/2019 14:19 Blue River 1.4 23.04 K 5.51 ""
6 220 02/13/2019 14:34 Blue River 2 20.17 "" 6.05 L
7 220 01/15/2019 14:16 Blue River 1.7 18.75 MK 5.45 ""
8 220 12/10/2018 14:53 Blue River 1.1 20.18 "" 5.64 ""
9 220 11/14/2018 14:43 Blue River 1.1 27.43 "" 5.62 ""
I converted to chr to avoid issues of the data having no common type. You will see that besides the observation data included in Station.ID, SampleTime, and AreaName, the rest of the data includes both the parameter and units in the column name, which I want to extract. Most columns have an associated column for which the name always ends with Q (i.e. Temp.Mid.Q), which stands for Qualifier and is the QA/QC codes which match that specific observation and are important to keep with each observation.
This is only a subset, the actual data is much wider with many more parameters.
Ideally, I am looking for an end product like this:
Station.ID SampleTime AreaName Parameter Units Value Qualifier
1 220 07/09/2019 14:44 Blue River Sample.Depth m 1.50
2 220 07/09/2019 14:44 Blue River Temp.Mid degC 29.69
3 220 07/09/2019 14:44 Blue River DO mg.l 1.66
4 220 06/10/2019 14:17 Blue River Sample.Depth m 2.10
5 220 06/10/2019 14:17 Blue River Temp.Mid degC 28.67 K
6 220 06/10/2019 14:17 Blue River DO mg.l 3.62
7 220 05/22/2019 14:31 Blue River Sample.Depth m 1.70
8 220 05/22/2019 14:31 Blue River Temp.Mid degC 29.63
9 220 05/22/2019 14:31 Blue River DO mg.l 7.66 LP
I thought this question on stacking/melting multiple columns would help, but there the column names at least have a standard value and year. I am not familiar with regex syntax which may or may not be helpful here.
So, since I need to extract the parameters from my column names, I have resorted to attempting to do this "by hand" using an intermediary spec data frame as described near the end of the Tidyr Pivoting Vignette.
I began by creating the following spec:
spec<-tribble(
~.name, ~.value, ~parameter, ~units,
"Sample.Depth.m", "value", "Sample.Depth", "m",
"Temp.Mid.degC", "value", "Temp.Mid", "deg.C",
"DO.mg.l", "value", "DO", "mg.l",
"Temp.Mid.Q", "Qual", NA, NA,
"DO.Q", "Qual", NA, NA,
)
However, perhaps obviously, when I execute:
wq.data.longer<-pivot_longer_spec(wq.data,spec)
I am returned with an extra row with NA inserted but the actual Qualifier character is absent from Qual and no longer associated with a specific observation:
wq.data.longer
# A tibble: 36 x 7
Station.ID SampleTime AreaName parameter units value Qual
<chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 220 07/09/2019 14:44 Blue River Sample.Depth m 1.5 NA
2 220 07/09/2019 14:44 Blue River Temp.Mid deg.C 29.69 NA
3 220 07/09/2019 14:44 Blue River DO mg.l 1.66 NA
4 220 07/09/2019 14:44 Blue River NA NA NA ""
5 220 06/10/2019 14:17 Blue River Sample.Depth m 2.1 NA
6 220 06/10/2019 14:17 Blue River Temp.Mid deg.C 28.67 NA
7 220 06/10/2019 14:17 Blue River DO mg.l 3.62 NA
8 220 06/10/2019 14:17 Blue River NA NA NA ""
9 220 05/22/2019 14:31 Blue River Sample.Depth m 1.7 NA
10 220 05/22/2019 14:31 Blue River Temp.Mid deg.C 29.63 NA
# ... with 26 more rows
Recall my Qualifier would ideally show here in row 6 (for the above view) if all was correct:
Station.ID SampleTime AreaName parameter units value Qual
6 220 06/10/2019 14:17 Blue River Temp.Mid deg.C 28.67 K
This may be a fairly simple solution but I haven't been able to make any progress. Any assistance is very much appreciated. I haven't found evidence of many others using pivot_longer_spec nor much more guidance from the Tidyr vignette so I'm generally curious how to get farther with this method.
Many thanks!