First is impossible return NaNs with integers, because NaNs is float by design.
Your solution working if mixed types - numeric with strings:
df = pd.DataFrame({
'Regular Price': ['a',1,2.3,'a',7],
'B': list(range(5))
})
print (df)
B Regular Price
0 0 a
1 1 1
2 2 2.3
3 3 a
4 4 7
df['Regular Price'] = df['Regular Price'].apply(
lambda x: int(round(x)) if isinstance(
x, (int, float)) else None
)
print (df)
B Regular Price
0 0 NaN
1 1 1.0
2 2 2.0
3 3 NaN
4 4 7.0
But if all data are strings need to_numeric with errors='coerce' for convert not numeric to NaNs:
df = pd.DataFrame({
'Regular Price': ['a','1','2.3','a','7'],
'B': list(range(5))
})
print (df)
B Regular Price
0 0 a
1 1 1
2 2 2.3
3 3 a
4 4 7
df['Regular Price'] = pd.to_numeric(df['Regular Price'], errors='coerce').round()
print (df)
B Regular Price
0 0 NaN
1 1 1.0
2 2 2.0
3 3 NaN
4 4 7.0
EDIT:
I also need to remove floating points and use int only
It is possible by convert to None for NaNs and cast to int:
df['Regular Price'] = pd.to_numeric(df['Regular Price'],
errors='coerce').round()
df['Regular Price'] = np.where(df['Regular Price'].isnull(),
None,
df['Regular Price'].fillna(0).astype(int))
print (df)
B Regular Price
0 0 None
1 1 1
2 2 2
3 3 None
4 4 7
print (df['Regular Price'].apply(type))
0 <class 'NoneType'>
1 <class 'int'>
2 <class 'int'>
3 <class 'NoneType'>
4 <class 'int'>
Name: Regular Price, dtype: object
But it slow performance, so the best dont use it. There also should be another problems - soe function failed, so the best is floats if working with NaNs:
Testing some function like diff in 50k rows DataFrame:
df = pd.DataFrame({
'Regular Price': ['a','1','2.3','a','7'],
'B': list(range(5))
})
df = pd.concat([df]*10000).reset_index(drop=True)
print (df)
df['Regular Price'] = pd.to_numeric(df['Regular Price'], errors='coerce').round()
df['Regular Price1'] = np.where(df['Regular Price'].isnull(),
None,
df['Regular Price'].fillna(0).astype(int))
In [252]: %timeit df['Regular Price2'] = df['Regular Price1'].diff()
TypeError: unsupported operand type(s) for -: 'int' and 'NoneType'
In [274]: %timeit df['Regular Price3'] = df['Regular Price'].diff()
1000 loops, best of 3: 301 µs per loop
In [272]: %timeit df['Regular Price2'] = df['Regular Price1'] * 1000
100 loops, best of 3: 4.48 ms per loop
In [273]: %timeit df['Regular Price3'] = df['Regular Price'] * 1000
1000 loops, best of 3: 469 µs per loop
EDIT:
df = pd.DataFrame({
'Regular Price': ['a','1','2.3','a','7'],
'B': list(range(5))
})
print (df)
B Regular Price
0 0 a
1 1 1
2 2 2.3
3 3 a
4 4 7
df['Regular Price'] = pd.to_numeric(df['Regular Price'], errors='coerce').round()
print (df)
B Regular Price
0 0 NaN
1 1 1.0
2 2 2.0
3 3 NaN
4 4 7.0
First is possible remove NaNs rows by column Regular Price and then convert to int.
df1 = df.dropna(subset=['Regular Price']).copy()
df1['Regular Price'] = df1['Regular Price'].astype(int)
print (df1)
B Regular Price
1 1 1
2 2 2
4 4 7
Process what you need, but dont change index.
#e.g. some process
df1['Regular Price'] = df1['Regular Price'] * 100
Last combine_first - it add NaN to Regular Price column.
df2 = df1.combine_first(df)
print (df2)
B Regular Price
0 0.0 NaN
1 1.0 100.0
2 2.0 200.0
3 3.0 NaN
4 4.0 700.0