What is the best way of layout this out in local memory to reduce bank conflicts ?
I was thinking:
RRRRRRRRRRRR...
GGGGGGGGGGGG...
BBBBBBBBBBBB...
AAAAAAAAAAAA...
I would like to grab all four channels at once to use in vector operations.
Thanks!
Then use "RGBARGBARGBARGBA..." and you can grab all four channels at once to use in a vector. Plus, it's one read instead of 4.
Bank conflicts are caused when multiple work items are accessing different areas that are a certain offset from each other. So your image layout doesn't matter as much as your row pitch when it comes to causing a bank conflict.
On my target architecture, HD7700, the planar configuration gave the best performance: vload4 was much slower. I think this must be due to bank conflicts, but I am not sure.