I have a pandas DataFrame, in which one column resources consists of a list of tuples. For example, take the following DataFrame:
df = pd.DataFrame({"id": [1, 2, 3],
"resources": [[(1, 3), (1, 1), (2, 9)],
[(3, 1), (3, 1), (3, 4)],
[(9, 0), (2, 6), (5,5)]]
})
Now, I want to add the following columns to my DataFrame, which contain the following:
- A column
firstcontaining a list with the unique first elements of the tuples inresources(so basically a set of all the first elements) - A column
secondcontaining a list with the unique second elements of the tuples inresources(so basically a set of all the second elements) - A column
samecontaining the number of tuples inresourceshaving the same first and second element - A column
differentcontaining the number of tuples inresourceshaving different first and second elements
the desired output columns would look like this:
first:[[1, 2], [3], [9, 2, 5]]second:[[1, 3, 9], [1, 4], [0, 6, 5]]same:[1, 0, 1]different:[2, 3, 2]
How to achieve this in the least time consuming way? I was first thinking of using Series.str, but could not find enough functionality there to achieve my goal