Consider the following problem. A data frame has a column of timestamps that we will represent as integers (we cannot necessarily use date time for this problem, so we will stick with integers). Another data frame has a column of timestamps which are matched to the first data frame, but there was some error in the measurements, meaning that measurements from the first data frame have +5 or -5 added to them in the second data frame.
We want to match up the timestamps, and if we cannot do this because it is ambiguous, we should put nan.
df1 = pd.DataFrame.from_dict({"time1" : [1,2,3,4]})
df2 = pd.DataFrame.from_dict({"time2" : [1.2,4.3,2.3,7.9,2.9, 3.9]})
df3 = pd.DataFrame.from_dict({"time1" : [1,2,3,4], "time2" : [1.2, 2.3, 2.9, np.nan]})
Here df1 is the exactly measured data frame, df2 contains measurements with noise errors of +-0.5, and df3 is my desired output. Note that it is ambiguous as to the timestamp in df2 that 4 should match to (it could be 3.9 or 4.3, we cannot tell which), so there should be np.nan in that column.
Please note crucially that df2 might have more measurements than df1 does.
I have currently tried using a standard .apply statement with a lambda function. This takes a timestamp in df1, and then it brute force searches for a timestamp in df2 which is at most +-0.5 away from the timestamp in df1. This works, but it is not fast enough for the amount of measurements I have in the experiment I'm working with. I need the most efficient solution possible.