Research Robots Applications Industries Technology About Contact Sales
← Back to Knowledge Base
Robotics Core

Stereo Vision Cameras

Unlock true 3D depth perception for your Autonomous Guided Vehicles (AGVs) by replicating human binocular vision. Stereo vision cameras allow robots to navigate complex, dynamic environments, identify obstacles with precision, and operate safely alongside human workers without relying solely on active emission sensors.

Stereo Vision Cameras AGV

Core Concepts

Binocular Triangulation

Similar to human eyes, stereo cameras use two lenses separated by a baseline. The system calculates depth by measuring the shift in position of an object between the left and right images.

Disparity Mapping

The software generates a disparity map where pixel intensity represents distance. Closer objects have a high disparity (large shift), while distant objects have low disparity.

Dense Point Clouds

Stereo vision generates dense 3D point clouds, providing rich geometric data that allows AGVs to recognize the shape and volume of obstacles, not just their presence.

Passive Sensing

Unlike LiDAR or Time-of-Flight (ToF) sensors, stereo cameras are passive. They do not emit light, making them immune to interference from other robots' sensors in crowded fleets.

Semantic Understanding

Because the input is visual RGB data, AI models can run on the feed to identify objects (e.g., "Person" vs. "Pallet"), enabling intelligent decision-making logic.

Hardware Ruggedness

Modern industrial stereo cameras are solid-state with no moving parts, making them highly resistant to the vibrations and shocks common in warehouse environments.

How It Works

The fundamental principle behind stereo vision is epipolar geometry . The camera consists of two synchronized image sensors separated by a known horizontal distance, called the "baseline." When the camera captures a scene, it produces two slightly different images.

Onboard processors or an external computer then perform "stereo matching." The algorithm searches for the same feature points in both the left and right images. The distance between these matching points in the image plane (the disparity) is inversely proportional to the depth of the object.

This process results in a depth map where every pixel contains distance information (Z-coordinate). When combined with the 2D position (X, Y), the AGV receives a full 3D coordinate for every point in the scene, allowing for precise SLAM (Simultaneous Localization and Mapping) and obstacle avoidance.

Technical Diagram

Real-World Applications

Warehouse Obstacle Avoidance

AGVs in fulfillment centers use stereo vision to detect dynamic obstacles that LiDAR might miss, such as the forks of a manual forklift or hanging wires, ensuring zero-collision operations.

Outdoor Yard Automation

Standard IR sensors struggle in direct sunlight. Stereo vision works effectively in outdoor lighting, allowing AMRs to transport goods between buildings or perform yard jockeying.

Pallet Pocket Detection

For autonomous forklifts, stereo cameras provide the high-resolution 3D data needed to precisely locate pallet pockets and align forks, even if the pallet is skewed or damaged.

Negative Obstacle Detection

A critical safety feature for mobile robots is detecting drop-offs, such as loading docks or stairwells. Stereo vision analyzes the floor plane to detect "negative obstacles" (holes) effectively.

Frequently Asked Questions

How does Stereo Vision compare to LiDAR for AGVs?

LiDAR offers excellent range and precision but is often expensive and provides sparse vertical data (depending on channel count). Stereo vision is generally more cost-effective, provides dense 3D data across the whole field of view, and includes color information for AI object recognition, though it requires more computational power.

Does Stereo Vision work in the dark?

Standard passive stereo vision requires ambient light to see textures. However, many industrial stereo cameras include an active pattern projector (IR texture projector) that allows them to function perfectly in low-light or total darkness by projecting an artificial texture onto the scene.

What is the typical range of a stereo camera?

The range is determined by the baseline (distance between lenses). A small baseline (e.g., 5cm) is good for close-range manipulation (0.2m to 2m), while a wider baseline (e.g., 20cm+) allows for navigation sensing up to 10-20 meters. Accuracy decreases as distance increases.

Does it work on white walls or textureless surfaces?

Pure stereo matching fails on featureless surfaces because the algorithm cannot find matching points. This is why industrial units often use "Active Stereo," projecting a random dot pattern onto the wall to create artificial texture for the software to lock onto.

What is the computational load on the robot's CPU?

Generating depth maps from high-resolution stereo images is computationally intensive. However, modern cameras often feature onboard FPGA or ASIC chips that calculate the depth map inside the camera, sending only the processed data to the host computer to save CPU resources.

How often do these cameras need calibration?

Industrial stereo cameras are factory-calibrated and typically housed in rigid metal casings to maintain alignment. Re-calibration is usually only necessary if the unit suffers a significant physical impact or if the lens mounts loosen over time.

Can stereo cameras detect glass or transparent objects?

Generally, no. Stereo vision relies on visible surface features. Transparent glass does not reflect light in a way that allows matching points to be established. Fusing stereo vision with ultrasonic sensors is a common strategy to mitigate this limitation.

Is it compatible with ROS/ROS2?

Yes, virtually all major industrial stereo camera manufacturers provide robust ROS (Robot Operating System) drivers. They typically output standard messages like `sensor_msgs/Image`, `sensor_msgs/CameraInfo`, and `sensor_msgs/PointCloud2`.

How does sunlight affect performance?

Unlike structured light cameras (like the original Kinect) which are blinded by sunlight, stereo vision works well outdoors. However, direct glare into the lens or extremely high contrast scenes (deep shadows vs. bright sun) can reduce accuracy.

What is the latency typically involved?

With onboard processing, latency is very low, typically between 15ms to 30ms depending on resolution and frame rate. This is sufficiently fast for AGVs moving at standard warehouse speeds (up to 2-3 m/s).

Can I use stereo vision for SLAM?

Absolutely. Visual SLAM (vSLAM) is a primary use case. By tracking static feature points in the 3D environment over time, the robot can estimate its own ego-motion and build a map simultaneously.

What happens if one lens is blocked?

If one lens is obstructed, depth perception is lost immediately, similar to closing one eye. The camera essentially becomes a monocular sensor. Most systems have error flags to alert the robot controller to stop or switch to backup sensors.

Ready to implement Stereo Vision Cameras in your fleet?

Explore Our Robots