How to ignore illegal sample of dataset in PyTorch?

I have implemented a dataset class for my image samples. But it can’t handle the situation that a corrupted image has been read:

import torch.utils.data as data
class MyDataset(data.Dataset):
  ...
  def __getitem__(self, index):
    image = cv2.imread(image_list[index])
    if image is None:
      # What should we do?
...

The correct solution is in Pytorch Forum. Therefore I changed my code:

class MyDataset(data.Dataset):
  ...
  def __getitem__(self, index):
    image = cv2.imread(image_list[index])
    if image is None:
      return None
    # Other preprocessing
    ...
def my_collate(batch):
    batch = filter(lambda img: img is not None, batch)
    return data.dataloader.default_collate(list(batch))
dataset = MyDataset()
loader = data.DataLoader(dataset, collate_fn=my_collate)

But it reports:

Loading data exception: Traceback (most recent call last):
  File "/opt/conda/lib/python3.6/site-packages/torch/utils/data/_utils/worker.py", line 108, in _worker_loop
    samples = collate_fn([dataset[i] for i in batch_indices])
  File "train.py", line 197, in my_collate
    return data.dataloader.default_collate(batch)
  File "/opt/conda/lib/python3.6/site-packages/torch/utils/data/_utils/collate.py", line 34, in default_collate
    elem_type = type(batch[0])
TypeError: 'filter' object is not subscriptable

Seems default_collate() couldn’t recognize the ‘filter’ object. Don’t worry. We can just add a small function: list()

def my_collate(batch):
  ...
  return data.dataloader.default_collate(list(batch))

Robin on Linux

How to ignore illegal sample of dataset in PyTorch?

Leave a Reply Cancel reply

Robin on Linux

Related Posts

Leave a Reply Cancel reply