Note: This question is inspired by the ideas discussed in this other post: DataFrame algebra in Pandas
Say I have two dataframes A and B and that for some column col_name, their values are:
A[col_name] | B[col_name]
--------------| ------------
1 | 3
2 | 4
3 | 5
4 | 6
I want to compute the set difference between A and B based on col_name. The result of this operation should be:
The rows of A where A[col_name] didn't match any entries in B[col_name].
Below is the result for the above example (showing other columns of A as well):
A[col_name] | A[other_column_1] | A[other_column_2]
------------+-------------------|------------------
1 | 'foo' | 'xyz' ....
2 | 'bar' | 'abc'
Keep in mind that some entries in A[col_name] and B[col_name] could hold the value np.NaN. I would like to treat those entries as undefined BUT different, i.e. the set difference should return them.
How can I do this in Pandas? (generalizing to a difference on multiple columns would be great as well)