Is it possible to return the entirety of data not just part of which we are grouping by?
I mean for example - I have a dataframe with 5 columns and one of those columns contains distance, the other one is timestamp and the last important one is name. I grouped dataframe by timestamp - agg function I applied is (min) on distance. As a return i get correctly grouped dataframe with timestamp and distance - how can i add columns name there. If I group it by name as well then timestamp becomes duplicated - it has to stay unique. As a final result I need to get dataframe like this:
| timestamp | name | distance |
|---|---|---|
| 2020-03-03 15:30:235 | Billy | 123 |
| 2020-03-03 15:30:435 | Johny | 111 |
But instead i get this:
| timestamp | distance |
|---|---|
| 2020-03-03 15:30:235 | 123 |
| 2020-03-03 15:30:435 | 111 |
Whole table has more than 700k rows so joining it back on distance gives me that amount of rows which my PC can't even handle.
Here is my groupby which gives me 2nd table:
grouped_df = df1.groupby('timestamp')['distance'].min()
Here is what i tried to do in order to get name inside the table:
grouped_df.merge(df1, how='left', left_on=['timestamp','distance'],
right_on = ['timestamp','distance'])