Skip to content

Latest commit

 

History

History
65 lines (54 loc) · 2.58 KB

ABSTRACTIONS.md

File metadata and controls

65 lines (54 loc) · 2.58 KB

Abstractions

The main abstractions introduced by maskrcnn_benchmark that are useful to have in mind are the following:

ImageList

In PyTorch, the first dimension of the input to the network generally represents the batch dimension, and thus all elements of the same batch have the same height / width. In order to support images with different sizes and aspect ratios in the same batch, we created the ImageList class, which holds internally a batch of images (os possibly different sizes). The images are padded with zeros such that they have the same final size and batched over the first dimension. The original sizes of the images before padding are stored in the image_sizes attribute, and the batched tensor in tensors. We provide a convenience function to_image_list that accepts a few different input types, including a list of tensors, and returns an ImageList object.

from maskrnn_benchmark.structures.image_list import to_image_list

images = [torch.rand(3, 100, 200), torch.rand(3, 150, 170)]
batched_images = to_image_list(images)

# it is also possible to make the final batched image be a multiple of a number
batched_images_32 = to_image_list(images, size_divisible=32)

BoxList

The BoxList class holds a set of bounding boxes (represented as a Nx4 tensor) for a specific image, as well as the size of the image as a (width, height) tuple. It also contains a set of methods that allow to perform geometric transformations to the bounding boxes (such as cropping, scaling and flipping). The class accepts bounding boxes from two different input formats:

  • xyxy, where each box is encoded as a x1, y1, x2 and y2 coordinates)
  • xywh, where each box is encoded as x1, y1, w and h.

Additionally, each BoxList instance can also hold arbitrary additional information for each bounding box, such as labels, visibility, probability scores etc.

Here is an example on how to create a BoxList from a list of coordinates:

from maskrcnn_baseline.structures.bounding_box import BoxList, FLIP_LEFT_RIGHT

width = 100
height = 200
boxes = [
  [0, 10, 50, 50],
  [50, 20, 90, 60],
  [10, 10, 50, 50]
]
# create a BoxList with 3 boxes
bbox = BoxList(boxes, size=(width, height), mode='xyxy')

# perform some box transformations, has similar API as PIL.Image
bbox_scaled = bbox.resize((width * 2, height * 3))
bbox_flipped = bbox.transpose(FLIP_LEFT_RIGHT)

# add labels for each bbox
labels = torch.tensor([0, 10, 1])
bbox.add_field('labels', labels)

# bbox also support a few operations, like indexing
# here, selects boxes 0 and 2
bbox_subset = bbox[[0, 2]]