My previous code was trying to read all data and get only one column that I need:
import pandas as pd df = pd.read_csv("data.csv")["card_id"]
In the test environment, this program cost more than 10GB memory because of the large size of the data file.
To reduce the memory, I changed to use
import pandas as pd df = pd.read_csv("data.csv", usecols=["card_id"])
Then, the program only cost less than 1GB memory.
The only problem is: only
read_sql() support reading special columns. In
read_parquet(), we still need to read all data at first.