To group by one or multiple columns while aggregating others you can use groupBy followed by aggregate (or its alias agg).
Example
Given input:
Waiter Revenue Hours Tables Gender
0 Alice 3000 3.0 Outside f
1 Bob 2000 4.0 Inside m
2 Alex 1000 2.0 Inside d
3 Alex 500 0.5 Outside None
Expected output:
Revenue Hours Tables Gender
Waiter
Alex 1500 1.25 {Inside, Outside} d
Alice 3000 3.00 {Outside} f
Bob 2000 4.00 {Inside} m
Grouped by Waiter shows:
- the sum of
Revenue
- the mean of worked
Hours
- the set of unique
Tables served
- the first (non undefined) value for
Gender
How to group-by with specific aggregation
Code:
import pandas as pd
df = pd.DataFrame({'Waiter': ['Alice','Bob','Alex', 'Alex'], 'Revenue': [3000, 2000, 1000, 500], 'Hours': [3, 4, 2, 0.5], 'Tables': ['Outside', 'Inside', 'Inside', 'Outside'], 'Gender': ['f', 'm', 'd', None]})
df.groupby(['Waiter']).agg({'Revenue': 'sum', 'Hours': 'mean', 'Tables': lambda x: set(x), 'Gender': 'first'})
Explained:
groupby can be done be multiple of a single column, here only Waiter
agg (or aggregate) by a dict defining the aggregation for each column. The column is specified as key like Revenue with the aggregate function specified either as function name 'sum' (in quotes or as reference like list) or as lambda like lambda x: set(x)
Note: to get the list of Tables we could also define following value as aggregation function:
'unique' produces a list with unique values (see Series.unique)
set for a set as we did with the lambda
list for a list (which may contain duplicates)
See also