In the training process, I need to read array data from .npy file and get a part of it:

import numpy as np
data = np.load("sample1.npy")
sound1 = data[start1: end1]
sound2 = data[start2: end2]

Since the .npy files are large, it became slowly to read a large file but only get some small parts of it. Is there a simple way to let us only read these small parts?

Yes, it’s just a simple line change mmap_mode:

import numpy as np
data = np.load("sample1.npy", mmap_mode="r")  # just change this line
sound1 = data[start1: end1]
sound2 = data[start2: end2]

Now, the load() function will not generate any IO for disk. But when we process the segment (like sound1 or sound2), it will load and only load the pages that contains the segment, which decrease the total IO tremendously.

After I change this line of code, the reading bandwidth dropped from 300MB/s to less than 100MB/s