How do robots see? Robotic vision systems

By Jeremy Cook

The short answer to the question, “How do robots see?” is via machine vision or industrial vision systems. The details are much more involved. In this article, we’ll frame the question around physical robots that accomplish a real-world task, rather than software-only applications used for filtering visual materials on the internet.

Machine vision systems capture images with a digital camera (or multiple cameras), processing this data on a frame-by-frame basis. The robot uses this interpreted data to interact with the physical world via a robotic arm, mobile agricultural system, automated security setup, or any number of other applications.

Computer vision became prominent in the latter part of the twentieth century, using a range of hard-coded criteria to determine simple facts about captured visual data. Text recognition is one such basic application. Inspection for the presence of component x or the size of hole y in an industrial assembly application are others. Today, computer vision applications have expanded dramatically by incorporating AI and machine learning.

Importance of machine vision

While vision systems based on specific criteria are still in use, machine vision is now capable of much more, thanks to AI-based processing. In this paradigm, robot vision systems are no longer programmed explicitly to recognize conditions like a collection of pixels (a so-called “blob”) in the correct position. A robot vision system can instead be trained with a dataset of bad and good parts, conditions, or scenarios to allow it to generate its own rules. So equipped, it can manage tasks like unlocking a door for humans and not animals, watering plants that look dry, or moving an autonomous vehicle when the stoplight is green.

While we can use cloud-based computing to train an AI model, for real-time decision-making, edge processing is typically preferable. Processing robotic vision tasks locally can reduce latency and means that you are not dependent on cloud infrastructure for critical tasks. Autonomous vehicles provide a great example of why this is important, as a half-second machine vision delay can lead to an accident. Additionally, no one wants to stop driving when network resources are unavailable.

Cutting-edge robotic vision technologies: multi-camera, 3D, AI techniques

While one camera allows the capture of 2D visual information, two cameras working together enable depth perception. For example, the NXP i.MX 8 family of processors can use two cameras at a 1080P resolution for stereo input. With the proper hardware, multiple cameras and camera systems can be integrated via video stitching and other techniques. Other sensor types, such as LIDAR, IMU, and sound, can be incorporated, giving a picture of a robot’s surroundings in 3D space and beyond.

The same class of technology that allows a robot to interpret captured images also allows a computer to generate new images and 3D models. One application of combining these two sides of the robotics vision coin is the field of augmented reality. Here, the visual camera and other inputs are interpreted, and the results are displayed for human consumption.



Industrial engineer using tablet to manage automation in robot arms


How to get started with machine vision

We now have a wide range of options for getting started with machine vision. From a software standpoint, OpenCV is a great place to start. It is available for free, and it can work with rules-based machine vision, as well as newer deep learning models. You can get started with your computer and webcam, but specialized industrial vision system equipment like the Jetson Nano Developer Kit or the Google Coral line of products are well suited to vision and machine learning. The NVIDIA® Jetson Orin™ NX 16GB offers 100 TOPS of AI performance in the familiar Jetson form factor.

Companies like NVIDIA have a range of software assets available, including training datasets. If you would like to implement an AI application but would rather not source the needed pictures of people, cars, or other objects, this can give you a massive head start. Look for datasets to improve in the future, with cutting-edge AI techniques like attention and vision transformers enhancing how we use them.

Robot vision algorithms

Robots see via the constant interpretation of a stream of images, processing that data via human-coded algorithms or interpretation via an AI-generated ruleset. Of course, on a philosophical level, one might flip the question and ask, “How do robots see themselves?” Given our ability to peer inside the code—as convoluted as an AI model maybe—it could be a more straightforward question than how we see ourselves!

Related Product Links

See related product

MCIMX8M-EVK NXP Semiconductors Evaluation Kit

NXP Semiconductors Embedded System Development Boards and Kits View

See related product

See related product

945-13450-0000-100 | Jetson Nano Developer Kit

NVIDIA Embedded System Development Boards and Kits View

ArrowPerks-Loyalty-Program-Signup-banner-EN


Latest News

Sorry, your filter selection returned no results.

We've updated our privacy policy. Please take a moment to review these changes. By clicking I Agree to Arrow Electronics Terms Of Use  and have read and understand the Privacy Policy and Cookie Policy.

Our website places cookies on your device to improve your experience and to improve our site. Read more about the cookies we use and how to disable them here. Cookies and tracking technologies may be used for marketing purposes.
By clicking “Accept”, you are consenting to placement of cookies on your device and to our use of tracking technologies. Click “Read More” below for more information and instructions on how to disable cookies and tracking technologies. While acceptance of cookies and tracking technologies is voluntary, disabling them may result in the website not working properly, and certain advertisements may be less relevant to you.
We respect your privacy. Read our privacy policy here