In the training process, I need to read array data from .npy
file and get a part of it:
import numpy as np data = np.load("sample1.npy") sound1 = data[start1: end1] sound2 = data[start2: end2]
Since the .npy
files are large, it became slowly to read a large file but only get some small parts of it. Is there a simple way to let us only read these small parts?
Yes, it’s just a simple line change mmap_mode
:
import numpy as np data = np.load("sample1.npy", mmap_mode="r") # just change this line sound1 = data[start1: end1] sound2 = data[start2: end2]
Now, the load()
function will not generate any IO for disk. But when we process the segment (like sound1
or sound2
), it will load and only load the pages that contains the segment, which decrease the total IO tremendously.
After I change this line of code, the reading bandwidth dropped from 300MB/s to less than 100MB/s