I'd like to find the start and stop points of a column and flag them as shown below:
| value | flag |
|---|---|
| NaN | NaN |
| NaN | NaN |
| 1 | start |
| 2 | NaN |
| 1 | NaN |
| 3 | NaN |
| 2 | stop |
| NaN | NaN |
| 1 | start |
| 2 | stop |
I'd like to find the start and stop points of a column and flag them as shown below:
| value | flag |
|---|---|
| NaN | NaN |
| NaN | NaN |
| 1 | start |
| 2 | NaN |
| 1 | NaN |
| 3 | NaN |
| 2 | stop |
| NaN | NaN |
| 1 | start |
| 2 | stop |
start occurs when the current value is notnull and the previous value isnullstop occurs when the current value is notnull and the next value isnullGenerate these conditions using shift and assign using loc:
start = df.value.notnull() & df.value.shift().isnull()
stop = df.value.notnull() & df.value.shift(-1).isnull()
df.loc[start, 'flag'] = 'start'
df.loc[stop, 'flag'] = 'stop'
# value flag
# 0 NaN NaN
# 1 NaN NaN
# 2 1.0 start
# 3 2.0 NaN
# 4 1.0 NaN
# 5 3.0 NaN
# 6 2.0 stop
# 7 NaN NaN
# 8 1.0 start
# 9 2.0 stop
Alternatively assign using mask:
df['flag'] = df['flag'].mask(start, 'start')
df['flag'] = df['flag'].mask(stop, 'stop')
Here I iterated through the rows and used a flag to indicate if we were starting or not.
start_flag = 0
for index, row in df.iterrows():
if row['val'].isnull():
df.loc[index, 'flag'] = "NaN"
start_flag = 0
else:
if start_flag == 0:
df.loc[index, 'flag'] = "start"
start_flag = 1
else:
if (index < df.shape[0]-1 and df.loc[index+1, 'val'].isnull()) or index == df.shape[0]-1:
df.loc[index, 'flag'] = "stop"
Here's what you need:
# Auxiliar columns to detect start and end
df['Past'] = df['Value'].shift(-1)
df['Future'] = df['Value'].shift(1)
# Auxiliar function to complete new column
def Find_start_stop_Null(row):
flag = np.nan
if ((not pd.isnull(row['Value'])) and (pd.isnull(row['Future']))):
flag = 'start'
elif ((not pd.isnull(row['Value'])) and (pd.isnull(row['Past']))):
flag = 'stop'
return flag
df['flag'] = df.apply(lambda row: Find_start_stop_Null(row), axis=1)
# Drop unnecessary columns
df = df.drop('Past', axis=1)
df = df.drop('Future', axis=1)