I have one pd including two categorical columns with 150 categories. May be a value in column A is not appeared in Column B. For example
a = pd.DataFrame({'A':list('bbaba'), 'B':list('cccaa')})
a['A'] = a['A'].astype('category')
a['B'] = a['B'].astype('category')
The output is
Out[217]:
A B
0 b c
1 b c
2 a c
3 b a
4 a a
And also
cat_columns = a.select_dtypes(['category']).columns
a[cat_columns] = a[cat_columns].apply(lambda x: x.cat.codes)
a
The output is
Out[220]:
A B
0 1 1
1 1 1
2 0 1
3 1 0
4 0 0
My problem is that in column A, the b is considered as 1, but in column B, the c is considered as 1. However, I want something like this:
Out[220]:
A B
0 1 2
1 1 2
2 0 2
3 1 0
4 0 0
which 2 is considered as c.
Please note that I have 150 different labels.