I have implemented a dataset class for my image samples. But it can’t handle the situation that a corrupted image has been read:
import torch.utils.data as data
class MyDataset(data.Dataset):
...
def __getitem__(self, index):
image = cv2.imread(image_list[index])
if image is None:
# What should we do?
...
The correct solution is in Pytorch Forum. Therefore I changed my code:
class MyDataset(data.Dataset):
...
def __getitem__(self, index):
image = cv2.imread(image_list[index])
if image is None:
return None
# Other preprocessing
...
def my_collate(batch):
batch = filter(lambda img: img is not None, batch)
return data.dataloader.default_collate(list(batch))
dataset = MyDataset()
loader = data.DataLoader(dataset, collate_fn=my_collate)
But it reports:
Loading data exception: Traceback (most recent call last): File "/opt/conda/lib/python3.6/site-packages/torch/utils/data/_utils/worker.py", line 108, in _worker_loop samples = collate_fn([dataset[i] for i in batch_indices]) File "train.py", line 197, in my_collate return data.dataloader.default_collate(batch) File "/opt/conda/lib/python3.6/site-packages/torch/utils/data/_utils/collate.py", line 34, in default_collate elem_type = type(batch[0]) TypeError: 'filter' object is not subscriptable
Seems default_collate() couldn’t recognize the ‘filter’ object. Don’t worry. We can just add a small function: list()
def my_collate(batch):
...
return data.dataloader.default_collate(list(batch))