What if you could interact with your computer the same way you communicate with other people? Until recently, the means of signaling our intention to machines included a few input devices, which required a learning curve. First was the keyboard, next came the mouse and most recently touchscreens. Imagine being able to make the same gestures you would during a conversation and a machine reacting as intended.
Computer vision aims to bridge the gap between man and machine. The goal is to give intelligence and human-like sensing to devices. Once equipped with perception, these will act as our trusted guides in the world of Virtual Reality (VR) and Augmented Reality (AR).
Why Use Computer Vision?
The world described by futurists such as Asimov may not be so far away. We will be able to read each other like open books or see the world as an interactive map right in front of our eyes.
In fact, the next step is Merged Reality (MR), a new way of interacting with virtual worlds through regular gestures. This approach will minimize the side-effects of VR, including headaches.
Once a VR device can detect all the body movements of a user, through high-speed rendering, it can create the illusion of entire worlds. This immersive experience is a place for learning, conducting experiments in a safe environment, or even generating emotions.
The applications of computer vision are already used in e-commerce, medicine, security, and design. Computer vision company InData Labs listed the following as top uses of computer vision aided by AI:
· Multi-Object Detection — useful for automatic e-commerce tagging, counting, and logistics;
· Image Segmentation — best used for medical imaging and facial recognition;
· Image Similarity — great for visual search based on shape, color or even texture;
· Dense Point Cloud Creation — modeling life-like objects and scenes for VR and gaming.
Technical Aspects of Computer Vision
To create the illusion of reality, it is necessary to look inward and outward simultaneously. Computer vision uses external cameras or sensors to map the environment, as well as eye-tracking solutions and gyroscopes to position the user.
Fast computing places the user in an immersive environment, which reacts almost instantly to the user’s inputs. Any lag or delay in rendering is perceived as a flaw in the system and makes the user less inclined to see the virtual environment as real.
Let’s get in a few details about mapping the user, the environment, and other technical requirements.
Putting one or more cameras on the Head-Mounted Display (HMD) gives it the ability to perceive the environment and guide the user away from obstacles such as walls, items or other users.
A significant advantage of this approach of VR rendering is the cost-effectiveness since most smartphone-type cameras are very affordable.
The relevant algorithms for this job are SLAM (Simultaneous Localization and Mapping) and SfM (Structure from Motion). The live feed from the camera is analyzed by the algorithm which detects the way specific points, like corners, move. This is similar to what we, as people do when we assess distances and try to avoid objects. Through data fusion techniques, the computer generates a 3D model of the environment, which can be used to create an accurate VR viewpoint.
The downside of this approach is the high volume of computation necessary which can cause glitches. Such temporary disruptions are mostly associated with rapid movements and could result in image freezing or late warnings. This is potentially dangerous since it could lead to collisions and injuries.
3D Sensor Mapping
Fast changes and standing still are two states which are not compatible with mapping through regular cameras. For these situations, when previously discussed technology can’t provide an accurate rendering and endangers the feeling of immersion, 3D mapping is an alternative.
The easiest solution is stereoscopic vision, which mimics human eyes by combining the images from two 2D sensors through triangulation.
Another option which is getting more traction is time-of-flight (ToF). This technique resembles a bat’s echolocation system. The device emits an optical signal, and the algorithm computes the distance to surrounding objects by averaging the reflection. The system can map both stationary items, like walls, as well as moving features, like the user’s hands in real time.
The downsides of these systems are the requirements related to illumination challenges and camera performance. Since these methods rely on high-resolution cameras that need to be small, the costs can rise significantly.
To create believable VR scenes, it is necessary to follow the eye movements of the user closely. A camera located inside the VR headset can track the gaze and render the scenes in the direction the user is focusing their attention. By taking advantage of the eye’s anatomy, the algorithms can optimize rendering and only create details where the person is looking. This saves computational power and system memory without affecting the performance.
Taking into consideration individual particularities such as the focal distance of the eyes can lead to improvements in the perceived reality. Not fitting the device to the wearer creates uncalibrated views and dizziness.
Putting it All Together and Future Developments
Computer vision is not a new toy, but rather an old one which has become more affordable through technical developments. Although the first generation of VR devices is already obsolete, the entire industry is still in its infancy. We can expect further refinements of the technology and the creation of more immersive experiences.
Most people still experience VR as an exceptional event, for example as part of a roadshow or technology fair. The existing headsets are not compatible with our active lifestyles, and the costs are still prohibitive for most users. Once the technology becomes more affordable, it will exit the niche of gaming and novelty and break into the mainstream most likely through educational or retail usage. When these limitations are over, the proliferation of AR, VR, and MR will blossom.