Our python program reported errors when running a new dataset:

[77 rows x 4 columns]]'. Reason: 'error("'i' format requires -2147483648 <= number <= 2147483647",)'
multiprocessing.pool.MaybeEncodingError: Error sending result: '[                          id  ... email_send_date
    raise self._value
  File "/usr/local/lib/python3.6/multiprocessing/pool.py", line 644, in get
    return self._map_async(func, iterable, mapstar, chunksize).get()
  File "/usr/local/lib/python3.6/multiprocessing/pool.py", line 266, in map
    dfs = pool.map(partial(pd.read_parquet, **kwargs), file_list)

Then I found this issue in the python community quickly https://bugs.python.org/issue17560https://bugs.python.org/issue17560. Seems the reason is the multiprocessing mechanism in python only support 32bit to encode object length. And this problem existed even up to Python-3.8

The solution is just using multithreads instead of multiprocessing

Previous code:

with Pool(processes=n_jobs) as pool:
  pool.map(...)

Solution code:

with ThreadPoolExecutor(max_workers=n_jobs) as pool:
  pool.map(...)