I would like to reenumerate rows in given df using some conditions. My question is an extension of this question.
Example of df:
ind seq status
0 1 2 up
1 1 3 mid
2 1 5 down
3 2 1 up
4 2 2 mid
5 2 3 down
6 3 1 up
7 3 2 mid
8 3 3 oth
The df contains ind column which represents a group. The seq column might have some bad data. That's way I would like to add another column seq_corr to correct the seq enumerating based on some conditions:
- the first value in a group in
statuscolumn equalsup - the last value in a group in
statuscolumn equalsdownORoth - in all other cases copy number from
seqcolumn.
I know the logical way to do this but I have some troubles how to convert it to Python. Especially when it comes to proper slicing and accessing the first and the last element of each group.
Below you can find my not working code:
def new_id(x):
if (x.loc['status',0] == 'up') and ((x.loc['status',-1]=='down') or (x['status',-1]=='oth')):
x['ind_corr'] = np.arange(1, len(x) + 1)
else:
x['seq_corr']= x['seq']
return x
df.groupby('ind', as_index=False).apply(new_id)
Expected result:
ind seq status seq_corr
0 1 2 up 1
1 1 3 mid 2
2 1 5 down 3
3 2 1 up 1
4 2 2 mid 2
5 2 3 down 3
6 3 5 up 1
7 3 2 mid 2
8 3 7 oth 3
Hoping that someone would be able to point me out any solution.