Consider the following dataframe df in which the feature column is string of comma separated feature names in a dataset (df can be potentially large).
index features
1 'f1'
2 'f1, f2'
3 'f1, f2, f3'
I also have a function get_weights that accepts a comma-separated string of feature names and calculates and returns a list that contains a weight for each given weight. The implementation details are not important and for the sake of simplicity, let's consider that the function returns equal weights for each feature:
import numpy as np
def get_weights(features):
features = features.split(', ')
return np.ones(len(features)) / len(features)
Using pandas, how can I apply the get_weights on df and have the results in a new dataframe as below:
index f1 f2 f3
1 1 0 0
2 0.5 0.5 0
3 0.33 0.33 0.33
That is, in the resulting dataframe, the features in df.features are turned into columns that contain the weight for that feature per row.