We were using Pandas to get the number of rows for a parquet file:

import pandas as pd
df = pd.read_parquet("my.parquet")

This is easy but will cost a lot of time and memory when the parquet file is very large. For example, it may cost more than 100GB of memory to just read a 10GB parquet file.

If we only need to get the number of rows, not the whole data, Pyarrow will be a better solution:

import pyarrow.parquet as pq
table = pq.read_table("my.parquet", columns=[])

This method only spend a couple seconds and cost about 2GB of memory for the same parquet file.