I have a two Pandas dataframes and want to intersect (aka inner join) them on a key. I understand how a Pandas dataframe merge() works, but the problem I have is that I want to keep the columns from one dataframe, not the columns from both.
For example, suppose I have the following two dataframes, df_users and df_valid_users.
df_users df_valid_users
user_id email dob user_id last_login
------- ----- --- ------- ----------
123 ... ... 456 ...
456 ... ... 710 ...
789 ... ...
Desired result:
user_id email dob
------- ----- ---
456 ... ...
I want to get all the columns from df_users as long as df_users.user_id is also in df_valid_users.
I tried to use pandas.merge(), but the result contains the columns from both dataframes. How do keep only the columns from the left dataframe? My real data has dozens of columns, not just a few as in this example.
import pandas as pd
rows_users = [ ['123', 'foo', '1970-01-01'],
['456', 'bar', '1980-01-01'],
['789', 'baz', '1990-01-01']]
df_users = pd.DataFrame(rows_users, columns=['user_id', 'email', 'dob'])
df_users
# user_id email dob
# 0 123 foo 1970-01-01
# 1 456 bar 1980-01-01
# 2 789 baz 1990-01-01
rows_valid_users = [ ['456', '2018-01-10'],
['710', '1918-01-02']]
df_valid_users = pd.DataFrame(rows_valid_users, columns=['user_id', 'last_login'])
df_valid_users
# user_id last_login
# 0 456 2018-01-10
# 1 710 1918-01-02
pd.merge(df_users, df_valid_users, how='inner', on=['user_id'])
# user_id email dob last_login
# 0 456 bar 1980-01-01 1918-01-02
Any help would be appreciated.