I have the following data:
Rank Platforms Technology
high Windows||Linux Unity
high Linux
low Windows Unreal
low Linux||MacOs GameMakerStudio||Unity||Unreal
low GameMakerStudio
Both Platforms and Technology are categorical variables. The issue here is they can have one, or Empty, or, especially multiple values like GameMakerStudio||Unity||Unreal. I am building a logistic regression model to predict Rank data.
I am attempting to encoding these variables for my model. However, I have not found any solution for list-type categorical values. I have read this page Encoding Categorical Variables and found that One-hot encoding is the most closely related, but still does not address my issue.
I could, of course, manually encode it. For example, there are around 7 distinct platform value for Platforms column, if Platforms = Windows||Linux, I could set 2 columns is_windows = true and is_linux = true. But for Technology column, there are 21 distinct values.
Is there a way to encode it automatically?