Source data
I am working on an ELT project to load data from CSV files into PostgreSQL where I will transform it. The CSV files have many columns that are consistent across files, but also contain activity columns that are inconsistent with names like Date (05/19/2020), Type (05/19/2020), etc.
In the loading script I am merging all of the columns with dates in the column name into one jsonb column so I don't have to constantly add new columns to the raw data table.
The resulting jsonb column in the raw data table looks like this:
| id | activity |
|---|---|
| 12345678 | {"Date (05/19/2020)": null, "Type (05/19/2020)": null, "Date (06/03/2020)": "06/01/2020", "Type (06/03/2020)": "E"} |
| 98765432 | {"Date (05/19/2020)": "05/18/2020", "Type (05/19/2020)": "B", "Date (10/23/2020)": "10/26/2020", "Type (10/23/2020)": "T"} |
JSON to columns
Using the amazing create_jsonb_flat_view function from this post I can convert the jsonb to columns like this:
| id | Date (05/19/2020) | Type (05/19/2020) | Date (06/03/2020) | Type (06/03/2020) | Type (10/23/2020 | Date (10/23/2020) | Type (10/23/2020) |
|---|---|---|---|---|---|---|---|
| 10629465 | null | null | 06/01/2020 | E | |||
| 98765432 | 05/18/2020 | B | 10/26/2020 | T |
Need to move part of column name to row
Now, this is where I'm stuck. I need to remove the portion of the column name that is the Activity Date (e.g. (05/19/2020)) and create a row for each id and ActivityDate with additional columns for Date and Type like this:
| id | ActivityDate | Date | Type |
|---|---|---|---|
| 12345678 | 05/19/2020 | null | null |
| 12345678 | 06/03/2020 | 06/01/2020 | E |
| 98765432 | 05/19/2020 | 05/18/2020 | B |
| 98765432 | 10/23/2020 | 10/26/2020 | T |