list of lists of anagram in python

Question

If my input is a list like this:

words = ['cat','act','wer','erw']

I want to make a list of lists of anagrams like this -

[['cat','act'],['wer','erw']]

I have tried to do something like this:

[[w1 for w in words if w!=w1 and sorted(w1)==sorted(w)] for w1 in words]

but it doesn't work. The output was :

[['cat'], ['act'], ['wer'], ['erw']]

In addition, I don`t want to use any import (except string). What is the mistake?

score 3 · Answer 1 · edited May 23 '17 at 12:20

Be aware that your original method is actually O(#words²) time and thus will not work on large datasets of perhaps more than 10000 words.

groupby one-liner:

One of the most ~~elegant~~ weirdest use cases I've ever seen for itertools.groupby:

>>> [list(v) for k,v in groupby(sorted(words,key=sorted),sorted)]
[['cat', 'act'], ['wer', 'erw']]

defaultdict three-liner:

Using collections.defaultdict, you can do:

anagrams = defaultdict(list)
for w in words:
    anagrams[tuple(sorted(w))].append(w)

As for If doing it your original way without any imports, you can emulate collections.defaultdict as follows:

anagrams = {}
for w in words:
    key = tuple(sorted(w))
    anagrams.setdefault(key,[]).append(w)

example:

>>> anagrams
{('e', 'r', 'w'): ['wer', 'erw'], ('a', 'c', 't'): ['cat', 'act']}

(Also written up in whi's answer.)

map-reduce:

This problem is also the poster child for map-reduce, where the reduction key you use is the sorted letters (or more efficiently, a hash). This will allow you to massively parallelize the problem.

If we assume the length of words is bounded, the groupby solution is O(#words log(#words)), while the hash solution is expected O(#words). In the unlikely event the length of words is arbitrary in length, sorting (O(length log(length)) per word) is less efficient than using an order-agnostic hash of the letters (O(length) per word). Sadly, collections.Counter is not hashable so you'd have to write your own.

Elegant and concise, but unfortunately the OP said without using any import — Nick Craig-Wood, Jun 02 '12 at 08:24
@NickCraig-Wood: ah, missed that; think I'll leave this here for future reference though. — ninjagecko, Jun 02 '12 at 08:25

score 2 · Answer 2 · answered Jun 02 '12 at 08:27

2

words = ['cat','act','wer','erw']
dic={}
for w in words:
    k=''.join(sorted(w))
    dic.setdefault(k,[])
    dic[k].append(w)
print dic.values()

this is better in perform: O(n)

answered Jun 02 '12 at 08:27

whi

2,685
6
33
40

Slightly more eficient to write `dic.setdefault(k,[]).append(w)` - saves looking the key up twice in the dictionary – Nick Craig-Wood Jun 02 '12 at 08:32

score 0 · Answer 3 · answered Jun 02 '12 at 08:12

You can find various solutions to anagrams of a single word at a time by googleing. It is likely that there would be a more efficient solver around than the obvious "search through all the words I know and see if they have the same letters".

Once you have one, you can put it into a function:

def anagrams(word):
    "return a list of all known anagrams of *word*"

Once you have that, generalising it to a list of words is trivial:

[anagrams(word) for word in words]

score 0 · Answer 4 · answered Jun 02 '12 at 08:38

0

This one should do the trick in the style you prefer

[[w, w1] for w1 in words for w in words if w!=w1 and sorted(w1)==sorted(w)][::2]

answered Jun 02 '12 at 08:38

xvatar

3,229
17
20

This is O(len(words)**2) so if there are lots of words this will be very slow. You've nested `for w1 in words` in `for w in words`. – Nick Craig-Wood Jun 02 '12 at 08:41
@NickCraig-Wood you are right, this is not a fast solution, but it's "pythonic" :) – xvatar Jun 02 '12 at 08:43
and what [::2] means in your sulotion – Eyal Dreifuss Jun 02 '12 at 08:52
@EyalDreifuss [::2] means from the first element, take every second element – xvatar Jun 02 '12 at 09:06
@EyalDreifuss for your answer, this part `[w1 for w in words if w!=w1 and sorted(w1)==sorted(w)]` only gets each single element and makes each of them a list; even for that case, it should be `w` instead of `w1` at the beginning, otherwise you are just taking the `w1` from the outside loop – xvatar Jun 02 '12 at 09:10

list of lists of anagram in python

4 Answers4