Our python program reported errors when running a new dataset:
Console
x
7
1
[77 rows x 4 columns]]'. Reason: 'error("'i' format requires -2147483648 <= number <= 2147483647",)'
2
multiprocessing.pool.MaybeEncodingError: Error sending result: '[ id ... email_send_date
3
raise self._value
4
File "/usr/local/lib/python3.6/multiprocessing/pool.py", line 644, in get
5
return self._map_async(func, iterable, mapstar, chunksize).get()
6
File "/usr/local/lib/python3.6/multiprocessing/pool.py", line 266, in map
7
dfs = pool.map(partial(pd.read_parquet, **kwargs), file_list)
Then I found this issue in the python community quickly https://bugs.python.org/issue17560https://bugs.python.org/issue17560. Seems the reason is the multiprocessing mechanism in python only support 32bit to encode object length. And this problem existed even up to Python-3.8
The solution is just using multithreads instead of multiprocessing
Previous code:
Python
1
2
1
with Pool(processes=n_jobs) as pool:
2
pool.map(...)
Solution code:
Python
1
2
1
with ThreadPoolExecutor(max_workers=n_jobs) as pool:
2
pool.map(...)