I have a dataframe with the following structure:
df = pd.DataFrame({'TIME':list('12121212'),'NAME':list('aabbccdd'), 'CLASS':list("AAAABBBB"),
'GRADE':[4,5,4,5,4,5,4,5]}, columns = ['TIME', 'NAME', 'CLASS','GRADE'])
print(df):
TIME NAME CLASS GRADE
0 1 a A 4
1 2 a A 5
2 1 b A 4
3 2 b A 5
4 1 c B 4
5 2 c B 5
6 1 d B 4
7 2 d B 5
What I need to do is split the above dataframe into multiple dataframes according to the variable CLASS, convert the dataframe from long to wide (such that we have NAMES as columns and GRADE as the main entry in the data matrix) and then iterate other functions over the smaller CLASS dataframes. If I create a dict object as suggested here, I obtain:
d = dict(tuple(df.groupby('CLASS')))
print(d):
{'A': TIME NAME CLASS GRADE
0 1 a A 4
1 2 a A 5
2 1 b A 4
3 2 b A 5, 'B': TIME NAME CLASS GRADE
4 1 c B 4
5 2 c B 5
6 1 d B 4
7 2 d B 5}
In order to convert the dataframe from long to wide, I used the function pivot_table from pandas:
for names, classes in d.items():
newdata=df.pivot_table(index="TIME", columns="NAME", values="GRADE")
print(newdata):
NAME a b c d
TIME
1 4 4 4 4
2 5 5 5 5
So far so good. However, once I obtain the newdata dataframe I am not able to access the smaller dataframes created in d, since the variable CLASS is now missing from the dataframe (as it should be). Suppose I then need to iterate a function over the two smaller subframes CLASS==A and CLASS==B. How would I be able to do this using a for loop if I am not able to define the dataset structure using the column CLASS?