For the past 9 months or so, I’ve worked an internship that has extensively required me to work with Object Detection. While data augmentation can be leveraged for great gains, I quickly realised that most of the data augmentation libraries or code bases out there don’t exactly support what you will call bounding box transforms.
What I mean is let’s consider the
torchvision package from the PyTorch which supports rotating an image randomly. When it does so, the bounding box containing the objects will also change, and
torchvision doesn’t support changing the annotation/label for the image we are rotating. The only library I have found that supports such augmentations is
imageaug which only supports scaling and translation, and not advanced stuff like rotating, shearing and resizing.
Most of the open source implementations of object detectors I came across therefore implemented their own augmentations. Therefore, I decided to implement a tiny library on my own that currently supports bounding box augmentations for flipping, rotation, shearing, scaling, translation and resizing. I’m currently looking to add more augmentations, so would be greatly helpful if you could chip in with augmentations that work well for you.
Here is the GitHub repo:
and the documentation can be found by opening the
If you want to know how I implemented it for pedagogical purposes, or you just feel like critiquing the design decisions, here’s a tutorial series that covers the implementation from absolute scratch. This tutorial series cover the implementations in gory detail where I go over:
How to set up a uniform interface for defining an augmentation, so you could define your own.
What to do when a bounding box crosses the boundary of the image. Do we keep it, or do we drop it? Something in between?
How to combine multiple augmentations where each augmentation is applied in a stochastic manner.
How to incorporate these augmentations into your input pipelines. I cover this considering people use a lot of annotation tools and annotations come in different formats.
Feedback either over the code, or the quality of the articles would be highly appreciated.