Use Models

裴理

2023-12-01

github链接
detectron2中的模型可由build_model, build_backbone, build_roi_heads来创建：

from detectron2.modeling import build_model
model = build_model(cfg)  # returns a torch.nn.Module

来看看，构建模型需要的输入输出。

Model Input Format

DatasetMapper的输出是一个字典格式，data loader是批操作的，所以准确来说，这里的输出是 list[dict]，一张图一个字典，这就是构建模型的输入。

字典包含如下关键字：

"image": Tensor in (C, H, W) format.
"instances": an Instances object, with the following fields:#实例，有如下字段：
- "gt_boxes": Boxes object storing N boxes, one for each instance.#框
- "gt_classes": Tensor, a vector of N labels, in range [0, num_categories).#类别
- "gt_masks": a PolygonMasks object storing N masks, one for each instance.#掩码
- "gt_keypoints": a Keypoints object storing N keypoint sets, one for each instance.#关键点
"proposals": an Instances object used in Fast R-CNN style models, with the following fields:#模型Fast RCNN的实例有如下字段：
- "proposal_boxes": Boxes object storing P proposal boxes.#建议框
- "objectness_logits": Tensor, a vector of P scores, one for each proposal.#每个框的得分值
"height", "width": the desired output height and width of the image, not necessarily the same as the height or width of the image when input into the model, which might be after resizing. For example, it can be the original image height and width before resizing.#原图大小，不是进入模型的图片大小

如果提供"height", "width"参数，模型会反馈同样分辨率的输出，而不是返回与模型输入同规模的结果，这样更高效，也更准确。

"sem_seg": Tensor[int] in (H, W) format. The semantic segmentation ground truth.#语义分割真值

标准模型输出list[dict]，一个字典表示一张图像。每个地点包含如下内容：

"instances": Instances object with the following fields:#每个实例对象的参数
- "pred_boxes": Boxes object storing N boxes, one for each detected instance.#框
- "scores": Tensor, a vector of N scores.#得分值
- "pred_classes": Tensor, a vector of N labels in range [0, num_categories).#检测类别
- "pred_masks": a Tensor of shape (N, H, W), masks for each detected instance.#检测实例的结果掩码
- "pred_keypoints": a Tensor of shape (N, num_keypoint, 3). Each row in the last dimension is (x, y, score).#检测关键点，最后一个维度是点坐标和得分
"sem_seg": Tensor of (num_categories, H, W), the semantic segmentation prediction.#语义分割
"proposals": Instances object with the following fields:#实例对象有如下字段
- "proposal_boxes": Boxes object storing N boxes.#框
- "objectness_logits": a torch vector of N scores.
"panoptic_seg": A tuple of (Tensor, list[dict]). The tensor has shape (H, W), where each element represent the segment id of the pixel. Each dict describes one segment id and has the following fields:#全景分割模型，每个字典表示一个全景分割，有如下字段：
- "id": the segment id
- "isthing": whether the segment is a thing or stuff
- "category_id": the category id of this segment. It represents the thing class id when isthing==True, and the stuff class id otherwise.