I am trying to reduce the memory-consumption of a python dict, which in my case serves as a word-->document_id "inverted index". Each word is hashed as an integer, which takes up 24 bytes.
I was wondering if I can convert each element within dict's values and each key within dict to a bitarray instead. I've noticed that the max value of any encountered int is less than 2^22, so I can maybe just allocate a bit-array of "size 22".
How can this be done? So far I've seen gmpy2 and bitarray libraries, as well as std::bitset in the C++ stdlib, which I can use with Cython. I've read from this post that bitarray is not as fast as gmpy. In gmpy, I am not sure how to set the size. Finally, I wonder if the memory-overhead of gmpy or bitarray objects in Python is worth it, when I can just use std::bitset, which probably uses the least memory of all.