Be aware that your original method is actually O(#words2) time and thus will not work on large datasets of perhaps more than 10000 words.
groupby one-liner:
One of the most elegant weirdest use cases I've ever seen for itertools.groupby:
>>> [list(v) for k,v in groupby(sorted(words,key=sorted),sorted)]
[['cat', 'act'], ['wer', 'erw']]
defaultdict three-liner:
Using collections.defaultdict, you can do:
anagrams = defaultdict(list)
for w in words:
anagrams[tuple(sorted(w))].append(w)
As for If doing it your original way without any imports, you can emulate collections.defaultdict as follows:
anagrams = {}
for w in words:
key = tuple(sorted(w))
anagrams.setdefault(key,[]).append(w)
example:
>>> anagrams
{('e', 'r', 'w'): ['wer', 'erw'], ('a', 'c', 't'): ['cat', 'act']}
(Also written up in whi's answer.)
map-reduce:
This problem is also the poster child for map-reduce, where the reduction key you use is the sorted letters (or more efficiently, a hash). This will allow you to massively parallelize the problem.
If we assume the length of words is bounded, the groupby solution is O(#words log(#words)), while the hash solution is expected O(#words). In the unlikely event the length of words is arbitrary in length, sorting (O(length log(length)) per word) is less efficient than using an order-agnostic hash of the letters (O(length) per word). Sadly, collections.Counter is not hashable so you'd have to write your own.