Like in Java, and JavaScript, nan in numpy does not equal itself.
>>> np.nan == np.nan
False
This means when the set constructor checks "do I have an instance of nan in this set yet?" it alwasy returns False
So… why?
nan in both cases means "value which cannot be represented by 'float'". This means an attempt to convert it to float necessarily fails. It's also unable to be sorted, because there's no way to tell if nan is supposed to be larger or smaller than any number.
After all, which is bigger "cat" or 7? And is "goofy" == "pluto"?
SO… what do I do?
There are a couple of ways to resolve this problem. Personally, I generally try to fill nan before processing: DataFrame.fillna will help with that, and I would always use df.unique() to get a set of unique values.
no_nas = s.dropna().unique()
with_nas = s.unique()
with_replaced_nas = s.fillna(-1).unique() # using a placeholder
(note: all of the above can be passed into the set constructor.
What if I don't want to use the Pandas way?
There are reasons not to use Pandas, or to rely on native objects instead of Pandas. These should suffice.
Your other option is to filter and remove the nan.
unqs = set(item for item in s if not np.isnan(item))
You could also replace things inline:
placeholder = '{placeholder}' # There are a variety of placeholder options.
unqs = set(item if not np.isnan(item) else placeholder for item in s)