Suppose I have the following df:
df=pd.DataFrame({'g1':['a','b','c'],
'g2':['x','y','z'],
'val':[1,2,3]})
df
g1 g2 val
a x 1
b y 2
c z 3
whose 'primary key' (using SQL's jargon here) is g1 and g2.
How do I expand the dataframe such that it has a row for every combination of the existing values in g1 and g2. For the rows that are added, leaving np.nan under the val column is fine. So the result will look like the following:
g1 g2 val
a x 1.0
a y NaN
a z NaN
b x NaN
b y 2.0
b z NaN
c x NaN
c y NaN
c z 3.0
I am currently doing the following, but I am sure it is not the best solution:
df.set_index(['g1','g2']).T.stack().unstack().T.reset_index()
I can also do merges, but when the number of keys is large the code looks cumbersome.