I have two dataframes, say df1 and df2, with the same column names.
Example:
df1
C1 | C2 | C3 | C4
A 1 2 AA
B 1 3 A
A 3 2 B
df2
C1 | C2 | C3 | C4
A 1 3 E
B 1 2 C
Q 4 1 Z
I would like to filter out rows in df1 based on common values in a fixed subset of columns between df1 and df2. In the above example, if the columns are C1 and C2, I would like the first two rows to be filtered out, as their values in both df1 and df2 for these columns are identical.
What would be a clean way to do this in Pandas?
So far, based on this answer, I have been able to find the common rows.
common_df = pandas.merge(df1, df2, how='inner', on=['C1','C2'])
This gives me a new dataframe with only those rows that have common values in the specified columns, i.e., the intersection.
I have also seen this thread, but the answers all seem to assume a difference on all the columns.
The expected result for the above example (rows common on specified columns removed):
C1 | C2 | C3 | C4
A 3 2 B