Consider the dataframe, tests, of individual tests in some boreholes:
borehole depthTest
0 B-1 1.5
1 B-1 20.0
2 B-42 1.0
3 B-42 2.0
4 B-42 15.0
5 B-42 30.0
6 B-09 1.0
7 B-09 10.0
8 B-09 15.0
I have another dataframe, liths, with ranges of lithologies of each borehole:
borehole depthTop lith
0 B-1 0 sand
1 B-1 5 clay
2 B-1 18 shale
3 B-42 0 sand
4 B-42 1 clay
5 B-42 26 shale
6 B-09 0 sand
7 B-09 12 shale
The lithologies are a continuous sequence for each borehole. For example: in B-1 there is sand from a depth of 0 to 5 m, clay from 5 to 18 m, and shale from 18 m onwards. The bottom of each lithology is the top of the next. In other words, the bottom of each lithology would be liths.groupby('borehole').depthTop.shift(-1)
Edit: I want to join the two dfs so I can get the lithology of each test: I want to match on the borehole and then find the lith that has the closest depthTop <= depthTest.
For example: In B-42 there is clay from a depth of 1 m to 26 m. The test in B-42 at 15.0 m should be classified as clay because 15 is between 1 and 26.
Here's the desired result:
borehole depthTest lith
0 B-1 1.5 sand
1 B-1 20.0 shale
2 B-42 1.0 clay
3 B-42 2.0 clay
4 B-42 15.0 clay
5 B-42 30.0 shale
6 B-09 1.0 sand
7 B-09 10.0 sand
8 B-09 15.0 shale
This seems like a groupby and merge_asof problem, but I can't figure out how to get them together.
My solution so far, which works, is to dump this into sqlite3 and then do a between join (like I did here), but that really seems like defeat.