Accelerate reading of NumPy array from files

In the training process, I need to read array data from .npy file and get a part of it:

import numpy as np
data = np.load("sample1.npy")
sound1 = data[start1: end1]
sound2 = data[start2: end2]

Since the .npy files are large, it became slowly to read a large file but only get some small parts of it. Is there a simple way to let us only read these small parts?

Yes, it’s just a simple line change mmap_mode:

import numpy as np
data = np.load("sample1.npy", mmap_mode="r")  # just change this line
sound1 = data[start1: end1]
sound2 = data[start2: end2]

Now, the load() function will not generate any IO for disk. But when we process the segment (like sound1 or sound2), it will load and only load the pages that contains the segment, which decrease the total IO tremendously.

After I change this line of code, the reading bandwidth dropped from 300MB/s to less than 100MB/s

Robin on Linux

Accelerate reading of NumPy array from files

Leave a Reply Cancel reply

Robin on Linux

Related Posts

Leave a Reply Cancel reply