I'm using python 2.7 and numpy on a linux machine.
I am running a program which involves a time-consuming function computeGP(level, grid) which takes input in form of a numpy array level and an object grid, which is not is not modified by this function.
My goal is to parallelize computeGP (locally, so on different cores) for different level but the same grid. Since grid stays invariant, this can be done without synchronization hassle using shared memory. I've read a bit about threading in python and the GIL and it seems to me that i should go with the multiprocessing module rather than threading. This and this answers recommend to use multiprocessing.Array to share efficiently, while noting that on unix machines it is default behaviour that the object is not copied.
My problem is that the object grid is not a numpy array.
It is a list of numpy arrays, because the way my data structure works is that i need to access array (listelement) N and then access its row K.
Basically the list just fakes pointers to the arrays.
So my questions are:
- My understanding is, that on unix machines i can share the object
gridwithout any further usage ofmultiprocessingdatatypesArray(orValue). Is that correct? - Is there a better way to
implement this pointer-to-array datastructure which can use the more
efficient
multiprocessing.Array?
I don't want to assemble one large array containing the smaller ones from the list, because the smaller ones are not really small either...
Any thoughts welcome!