For the past 9 or so, I’ve worked an internship that has extensively required me to work with . While can be leveraged for great gains, I quickly realised that most of the augmentation libraries or code bases out there don’t exactly support what you will call box transforms.

What I mean is let’s consider the torchvision package from the PyTorch which supports rotating an image randomly. When it does so, the bounding box containing the objects will also change, and torchvision doesn’t support changing the annotation/label for the image we are rotating. The only I have found that supports such augmentations is imageaug which only supports scaling and translation, and not advanced stuff like rotating, shearing and resizing.

Most of the open source implementations of object detectors I came across therefore implemented their own augmentations. Therefore, I decided to implement a tiny library on my own that currently supports bounding box augmentations for flipping, rotation, shearing, scaling, translation and resizing. I’m currently looking to add more augmentations, so would be greatly helpful if you could chip in with augmentations that work well for you.

Here is the GitHub repo:

and the documentation can be found by opening the docs/build/html/index.html file.

If you want to know how I implemented it for pedagogical purposes, or you just feel like critiquing the design decisions, here’s a tutorial series that covers the implementation from absolute scratch. This tutorial series cover the implementations in gory detail where I go over:

  1. How to set up a uniform interface for defining an augmentation, so you could define your own.

  2. What to do when a bounding box crosses the boundary of the image. Do we keep it, or do we drop it? Something in between?

  3. How to combine multiple augmentations where each augmentation is applied in a stochastic manner.

  4. How to incorporate these augmentations into your input pipelines. I cover this considering people use a lot of annotation tools and annotations come in different formats.

Feedback either over the code, or the quality of the articles would be highly appreciated.

Source link
thanks you RSS link


Please enter your comment!
Please enter your name here