On Kaggle this page(https://www.kaggle.com/alexisbcook/categorical-variables) there's this section of code
s = (X_train.dtypes == 'object')
object_cols = list(s[s].index)
what is s (what kind of object is it) and how does s[s].index work?
On Kaggle this page(https://www.kaggle.com/alexisbcook/categorical-variables) there's this section of code
s = (X_train.dtypes == 'object')
object_cols = list(s[s].index)
what is s (what kind of object is it) and how does s[s].index work?
Let's take this DataFrame:
In [2]: X_train = pd.DataFrame([("f", 2)])
In [3]: X_train
Out[3]:
0 1
0 f 2
the first line s = (X_train.dtypes == 'object') creates a series which indicates whether each column in X_train is of the object type (here it is a str, in particular):
In [4]: s = (X_train.dtypes == 'object')
In [5]: s
Out[5]:
0 True
1 False
dtype: bool
the second line merely selects the column names that have the True bool in Series s and returns a list of those columns. This notation uses a trick called boolean array indexing which allows filtering a pandas object by an boolean iterable, which is s in our case:
In [7]: object_cols = list(s[s].index)
In [8]: object_cols
Out[8]: [0]