Our python program reported errors when running a new dataset:
[77 rows x 4 columns]]'. Reason: 'error("'i' format requires -2147483648 <= number <= 2147483647",)' multiprocessing.pool.MaybeEncodingError: Error sending result: '[ id ... email_send_date raise self._value File "/usr/local/lib/python3.6/multiprocessing/pool.py", line 644, in get return self._map_async(func, iterable, mapstar, chunksize).get() File "/usr/local/lib/python3.6/multiprocessing/pool.py", line 266, in map dfs = pool.map(partial(pd.read_parquet, **kwargs), file_list)
Then I found this issue in the python community quickly https://bugs.python.org/issue17560https://bugs.python.org/issue17560. Seems the reason is the multiprocessing mechanism in python only support 32bit to encode object length. And this problem existed even up to Python-3.8
The solution is just using multithreads instead of multiprocessing
Previous code:
with Pool(processes=n_jobs) as pool: pool.map(...)
Solution code:
with ThreadPoolExecutor(max_workers=n_jobs) as pool: pool.map(...)