I have a xlsx file with 11 columns and and 15M rows and 198Mb in size. It's taking forever with pandas to read and work. After reading Stackoverflow answers, I switched to dask and modin. However, I',m receiving the following error when using dask:
df = dd.read_csv('15Lacs.csv', encoding= 'unicode_escape')
c error :out of memory.
When I use modin['ray'] I get the following error:
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xa0 in position 112514: invalid start byte
Is there a more efficient way to import large xlsx or csv files to python on average hardware?