这俩天,天天和分割的数据集打交道,和检测的label不一样,分割的label是一个mask,我习惯将他们写在一个txt中,形如:
img1_path ann1_path
img2_path ann2_path
img3_path ann3_path
img4_path ann4_path
每次都重写一遍,今儿索性放着这里,下次遇到直接copy过来改
import os
from pathlib import Path
from os import fspath
sep = " " # 一般 \t 更好
assert isinstance(sep, str)
train_list = []
train_img_root = "/root/baseline_dataset/img_dir/train"
for imgObj in Path(train_img_root).glob("**/*.jpg"):
img_path = fspath(imgObj)
ann_path = img_path.replace("img_dir", "ann_dir").replace("jpg", "png")
assert os.path.exists(img_path), img_path
assert os.path.exists(ann_path), ann_path
temp = [img_path, ann_path]
train_list.append(temp)
write_str = ""
train_txt = "/root/baseline_dataset/train.txt"
for item in train_list:
write_str += sep.join(item) + "\n"
with open(train_txt, "w") as f:
f.write(write_str)
print("文件写入完毕")
# 有个小bug, 就是文件的结尾多个 \n, 如果你读进来直接"split", 可能会有问题,建议读入前,先strip