Efficient reading in pandas

My previous code was trying to read all data and get only one column that I need:

import pandas as pd
df = pd.read_csv("data.csv")["card_id"]

In the test environment, this program cost more than 10GB memory because of the large size of the data file.

To reduce the memory, I changed to use usecols :

import pandas as pd
df = pd.read_csv("data.csv", usecols=["card_id"])

Then, the program only cost less than 1GB memory.

The only problem is: only read_csv() and read_sql() support reading special columns. In read_parquet(), we still need to read all data at first.

Robin on Linux

Efficient reading in pandas

Leave a Reply Cancel reply

Robin on Linux

Related Posts

Leave a Reply Cancel reply