13 February 2018
The concept of computer vision was first introduced in the 1970s. The original ideas were exciting but the technology to bring them to life was just not there. Only in the recent years did the world witness a significant leap in technology that has put computer vision on the priority list of many industries.
Since 2012, when the first significant breakthroughs in computer vision were made at the University of Toronto, computer vision tech has been improving exponentially. Convolutional neural networks (CNNs) in particular have become the neural network of choice for many data scientists as it requires very little pre-programming compared to other image processing algorithms. In the last few years, CNNs have been successfully applied to identify faces, objects, and traffic signs as well as powering vision in robots and self-driving cars.
Greater access to images also contributed to the growing popularity of computer vision. Websites such as ImageNet make it possible to have almost instant access to images that can be used to train algorithms. And this is only the beginning. The worldwide library of images and videos is growing every day. According to an analysis from Morgan Stanley, 3 million images are shared online every day through Snapchat, Facebook, Facebook Messenger, Instagram, and WhatsApp, and most of them are owned by Facebook.
While computer vision was barely mentioned in the news only three years ago, 2017 was truly the year of computer vision. According to CB Insights, the news coverage about the topic grew by more than 500% since 2015.
The future of computer vision
Computer vision is a booming industry that is being applied to many of our everyday products. E-commerce companies, like Asos, are adding visual search features to their websites to make the shopping experience smoother and more personalized. Apple unveiled their facial recognition feature with their newest iPhone, a technology that was made possible through their acquisitions of companies like PrimeSense, RealFace, and Faceshift.
And more money is being invested in new ventures every year. AngelList, a U.S. based platform that connects startups and investors, lists 529 companies under the label of computer vision. The average valuation of these companies is at $5.2 M each. Many of these are in the process of raising between $5M and $10M in different stages of funding. It’s safe to say there is a lot of money being poured into computer vision development.
So, why is computer vision gaining such popularity? Because of the potential gains that can be reaped from replacing a human with a computer in certain areas of our lives.
As human beings, we use our eyes and brains to analyze our visual surroundings. This feels natural to us and we do it pretty well. A computer, on the other hand, cannot do that automatically. It needs algorithms and data in order to learn what it’s “seeing”. It takes a lot of effort but once a computer learns how to do that, it can do it better than almost any human on earth.
This can make processes faster and simpler by replacing any visual activity. Unlike humans, who can get overwhelmed or biased, a computer can see many things at once, in high detail, and analyze without getting “tired”. The accuracy of computer analysis can bring tremendous time savings and quality improvements, and thereby free up resources that require human interaction. So far, this can only be applied to simple processes only but many industries are successfully pushing the limits of what computer vision can do.
Computer vision applications in different industries
Computer vision technology is very versatile and can be adapted to many industries in very different ways. Some use cases happen behind the scenes, while others are more visible. Most likely, you have already used products or services enhanced by computer vision.
Some of the most famous applications of computer vision has been done by Tesla with their Autopilot function. The automaker launched its driver-assistance system back in 2014 with only a few features, such as lane centering and self-parking, but it’s set to accomplish fully self-driving cars sometime in 2018.
Features like Tesla’s Autopilot are possible thanks to startups such as Mighty AI, which offers a platform to generate accurate and diverse annotations on the datasets to train, validate, and test algorithms related to autonomous vehicles.
Computer vision has made a splash in the retail industry as well. Amazon Go store opened up its doors to customers on January 22 this year. It’s a partially automated store that has no checkout stations or cashiers. By utilizing computer vision, deep learning, and sensor fusion customers are able to simply exit the store with products of their choice and get charged for their purchases through their Amazon account. The technology is not 100% perfect yet, as several official tests of the store’s technology showed that some items were left out of the final bill. However, it’s an impressive step in the right direction.
A startup called Mashgin is working on a solution similar to Amazon Go. The company is working on a self-checkout kiosk that uses computer vision, 3D reconstruction, and deep learning to scan several items at the same time without the need of barcodes. The product claims to reduce check out time by up to 10x. Their main customers are cafeterias and dining halls operated by Compass Group.
Although computer vision has not yet proved to be a disruptive technology in the world of insurance and banking, a few big players have implemented it in the onboarding of new customers. In 2016, a Spanish banking group BBVA, introduced a new way of signing up for their services. New customers could get a bank account within minutes by uploading a photo of their ID and a selfie. BBVA utilized computer vision technology to analyze the photos. Number26, an online bank based in Germany is also working on similar technology, planning to introduce it to their future clients late in 2018.
In healthcare, computer vision has the potential to bring in some real value. While computers won’t completely replace healthcare personnel, there is a good possibility to complement routine diagnostics that require a lot of time and expertise of human physicians but don’t contribute significantly to the final diagnosis. This way computers serve as a helping tool for the healthcare personnel.
For example, Gauss Surgical is producing a real-time blood monitor that solves the problem of inaccurate blood loss measurement during injuries and surgeries. The monitor comes with a simple app that uses an algorithm that analyses pictures of surgical sponges to accurately predict how much blood was lost during a surgery. This technology can save around $10 billion in unnecessary blood transfusions every year.
One of the main challenges the healthcare system is experiencing is the amount of data that is being produced by patients. It’s estimated that healthcare related data is tripled every year. Today, we as patients rely on the knowledge bank of medical personnel to analyze all that data and produce a correct diagnosis. This can be difficult at times. Microsoft’s project InnerEye is working on solving parts of that problem by developing a tool that uses AI to analyze three-dimensional radiological images. The technology potentially can make the process 40 times quicker and suggest the most effective treatments.
Challenges of applied computer vision
As illustrated above, computer vision has come a long way in terms of what it can do for different industries. However, this field is still relatively young and prone to challenges.
Not accurate enough for the real world
One major aspect that seems to be the background for most of the challenges is the fact that computer vision is still not comparable to the human visual system, which is what it essentially tries to imitate.
Computer vision algorithms can be quite brittle. A computer can only perform tasks it was trained to execute, and falls short when introduced to new tasks that require a different set of data. For example, teaching a computer what a concept is hard but it necessary in order for it to learn by itself. A good example is the concept of a book. As kids, we know what a book is, and after a while can distinguish between a book, a magazine or a comic while understanding that they belong to the same overall category of items. For a computer, that learning is much more difficult. The problem is escalated further when we add e-books and audiobooks to the equation. As humans, we understand that all those items fall under the same concept of a book, while for a computer the parameters of a book and an audiobook are too different to be put into the same groups of items.
In order to overcome such obstacles and function optimally, computer vision algorithms today require human involvement. Data scientists need to choose the right architecture for the input data type so that the network can automatically learn features. An architecture that is not optimal might produce results that have no value for the project. In some cases, an output of a computer vision algorithm can be enhanced with other types of data, such as audio and text, in order to produce highly accurate results.
In other words, computer vision still lacks the high level of accuracy that is required to function optimally in the real, diverse world. As the development of this technology is still in progress, much tolerance for mistakes is required from the data science teams working on it.
Lack of high-quality data
Neural networks used for computer vision applications are easier to train than ever before but that requires a lot of high-quality data. This means that the algorithms need a lot of data that is specifically related to the project in order to produce good results. Despite the fact that images are available online in bigger quantities than ever, the solution to many real-world problems calls for high-quality labeled training data. That can get rather expensive because the labeling has to be done by a human being.
Let’s take the example of Microsoft’s project InnerEye, a tool utilizes computer vision to analyze radiological images. The algorithm behind this most likely requires well-annotated images where different physical anomalies of the human body are clearly labeled. Such work needs to be done by a radiologist with experience and a trained eye. According to Glassdoor, an average base salary for a radiologist is $290.000 a year, or just short of $200 an hour. Given that around 4-5 images can be analyzed per hours, and an adequate data set could contain thousands of them, proper labeling of images can get very expensive.
In order to combat this issue, data scientists sometimes use pre-trained neural networks that were originally trained on millions of pictures as a base model. In the absence of good data, it’s an adequate way to get better results. However, the algorithms can learn about new objects only by “looking” at the real-world data.
Now that the technology has finally caught up the original ideas of computer vision pioneers from the 70s, we are seeing this technology being implemented in many different industries. Both big players, like Facebook, Tesla, and Microsoft, as well as small startups, are finding new ways how computer vision can make banking, driving and healthcare better.
The main benefit of computer vision is the high accuracy with which it can replace human vision if trained correctly. There are a number of processes that today are done by people that can be replaced by artificial intelligence applications and eliminate mistakes due to tiredness, save time and cut costs significantly.
As great as computer vision algorithms are today, they still suffer from some big challenges. The first is lack of well-annotated images to train the algorithms to perform optimally, and the second being lack of accuracy when applied to real-world images different from the ones from the training dataset.
Work with InData Labs on your computer vision project
Have a project in mind but need some help implementing it? Schedule an intro consultation with our deep learning engineers to explore your idea and find out if we can help.