I am working with two lists generated by NLTK's PlaintextCorpusReader and I would like to combine them into a single dictionary.
The keys for the dictionary should be the sentences in the corpus, which I've extracted using PlaintextCorpusReader's .sents(). The values should be the fileids of where each sentence is located in the corpus, which I've extracted using .fileids().
The .fileids() come back as strings, e.g.
['R_v_Cole_2007.txt', 'R_v_Sellick_2005.txt']
The .sents() come back as list(list(str)), e.g.
[[u'1', u'.'], [u'The', u'Registrar', u'has', u'referred', u'to', u'this', u'Court', u'two', u'applications', u'for', u'permission', u'to', u'appeal', u'against', u'conviction', u'to', u'be', u'heard', u'together', u'.'], ...]
I've tried a range of things, mainly from this question on a similar issue, but everything I try results in the following error:
TypeError: unhashable type: 'list'
Where am I going wrong?
The code I'm working with to get the stuff I want for the dictionary is as follows:
corpus_root = '/Users/danielhoadley/Documents/Python/NLTK/text/'
wordlists = PlaintextCorpusReader(corpus_root, '.*')
dictionary = {}
values = wordlists.fileids()
keys = wordlists.sents()
## How do I get the keys and values into a dictionary from here?