I know that similar threads exists, but I have not been able to find the appropriate answer.
I have seen solutions for the cases where the images are stored based on certain classes, but this is not what I am looking for. I would like to know how to properly split the image dataset into train, test and validation sets, where each image has a corresponding annotation (labeling) file.
The images and their respective annotations(pascal voc/yolo darkent formats) exist both in the same and different directories, depending on the convenience of splitting.
Thanks in advance.
Edit:
I think this might be the fastest solution: This later can be applied for obtaining the validation as well.
train_size = int(0.8 * len(full_dataset))
test_size = len(full_dataset) - train_size
train_dataset, test_dataset = torch.utils.data.random_split(full_dataset, [train_size, test_size])
or let's say:
train, val, test = torch.utils.data.random_split(dataset, [1000, 100, 100])
given there are 1200 total images.
Read more here: https://stackoverflow.com/questions/63139017/how-to-split-the-images-and-annotations-into-train-test-and-validation-sets-for
Content Attribution
This content was originally published by Karen at Recent Questions - Stack Overflow, and is syndicated here via their RSS feed. You can read the original post over there.