How to split the images and annotations into train, test and validation sets for an object detection task?

I know that similar threads exists, but I have not been able to find the appropriate answer.

I have seen solutions for the cases where the images are stored based on certain classes, but this is not what I am looking for. I would like to know how to properly split the image dataset into train, test and validation sets, where each image has a corresponding annotation (labeling) file.

The images and their respective annotations(pascal voc/yolo darkent formats) exist both in the same and different directories, depending on the convenience of splitting.

Thanks in advance.


I think this might be the fastest solution: This later can be applied for obtaining the validation as well.

train_size = int(0.8 * len(full_dataset))
test_size = len(full_dataset) - train_size
train_dataset, test_dataset =, [train_size, test_size])

or let's say:

train, val, test =, [1000, 100, 100]) 

given there are 1200 total images.

Read more here:

Content Attribution

This content was originally published by Karen at Recent Questions - Stack Overflow, and is syndicated here via their RSS feed. You can read the original post over there.

%d bloggers like this: