Suppose I have a data frame like this:
A B C D
0 1 10 x 5
1 1 20 y 5
2 1 30 z 5
3 2 40 x 6
4 2 50 y 6
5 2 60 z 6
This, can be viewed, as a table that stores the value of B as a function of A, C, and D. Now, I would look like to transform the B column into three columns B_x, B_y, B_z, like this:
A B_x B_y B_z D
0 1 10 20 30 5
1 2 40 50 60 6
I.e., B_x stores B(A, D) when C = 'x', B_y stores B(A, D) when C = 'y', etc.
What is the most efficient way to do this?
I have a found a solution like this:
frames = []
for c, subframe in df.groupby('C'):
subframe = subframe.rename(columns={'B': f'B_{c}'})
subframe = subframe.set_index(['A', 'D'])
del subframe['C']
frames.append(subframe)
out = frames[0]
for frame in frames[1:]:
out = out.join(frame)
out = out.reset_index()
This gives the correct response, but I feel that it is highly inefficient. I am also not too happy with the fact that to implement this solution one would need to know which columns should not get the prefix in column C explicitly. (In this MWE there were only two of them, but there could be tens in real life.)
Is there a better solution? I.e., a method that says, take a column as a suffix column (in this case C) and a set of 'value' columns (in this case only B); turn the value column names into name_prefix and fill them appropriately?