Chapter 2: Physical AI Landscape & Sensors
Learning Objectivesโ
After completing this chapter, you will be able to:
- Understand the current state of humanoid robotics and key players in the field
- Explain why sensor systems are critical for Physical AI systems
- Describe how LIDAR sensors work and their applications in robotics
- Understand different camera systems (RGB, depth, stereo) and their use cases
- Explain the role of IMUs in robot balance and orientation
- Recognize the importance of force/torque sensors for manipulation tasks
1. Humanoid Robotics Landscapeโ
Humanoid robotics has evolved from research curiosities to increasingly capable commercial systems. This section provides an overview of where the field stands today and who the major players are.
Key Players in Humanoid Roboticsโ
Boston Dynamics (Atlas, Spot)
- The pioneer of dynamic, agile humanoid and quadruped robots
- Atlas: Full humanoid capable of running, parkour, backflips
- Spot: Quadruped used in industrial inspection, construction, security
- Notable achievements: First humanoid to perform backflips, dynamic obstacle navigation
Tesla (Optimus)
- Consumer robotics focused on mass-market humanoid robots
- Optimus: General-purpose humanoid designed for factory and home automation
- Leveraging Tesla's AI/ML expertise from Autopilot and FSD
- Goal: Produce humanoid robots at scale similar to automobiles
Agility Robotics (Digit)
- Commercial focus on warehouse and logistics applications
- Digit: Bipedal robot designed for box handling and warehouse navigation
- Strong ROS 2 integration and developer-friendly APIs
- Deployed in pilot programs with major retailers
Figure AI (Figure 01)
- Startup focused on human-robot collaboration
- Figure 01: Full-sized humanoid with dexterous hands for manipulation
- Emphasis on safety and natural human-robot interaction
- Applications: Manufacturing, healthcare, retail
Unitree (H1, G1)
- Chinese robotics company focusing on cost-effective humanoids
- H1: High-performance humanoid for research and development
- G1: Consumer-oriented humanoid at lower price point (~$16k)
- Known for ROS 2 compatibility and open development
Toyota Research Institute (T-HR3)
- Research-focused humanoid with remote telepresence capabilities
- Master-slave control system for intuitive operation
- Focus on eldercare assistance and disaster response
- Advanced force feedback for safe human-robot collaboration
Current Capabilities and Limitationsโ
What humanoids can do today:
- Walk bipedally on flat and uneven terrain
- Lift and manipulate objects (with varying dexterity)
- Navigate using vision and LIDAR sensors
- Perform predefined tasks in structured environments (factories, warehouses)
- Execute dynamic motions (running, jumping, balancing)
What remains challenging:
- General-purpose manipulation in unstructured environments (homes, outdoors)
- Long autonomy (battery life limits operation to 2-8 hours)
- Robust perception in adverse conditions (rain, snow, darkness)
- Fast learning from few examples (humans can learn a task in minutes, robots need hours/days)
- Natural language understanding and task planning
The Path Forwardโ
The field is advancing rapidly due to:
- Improved sensors: Better LIDAR, cameras, and IMUs at lower cost
- AI advances: Better perception, planning, and control algorithms
- Simulation tools: Isaac Sim, Gazebo enable faster development cycles
- ROS 2 ecosystem: Standardized middleware reduces integration complexity
- Hardware cost reduction: Components become cheaper, enabling wider adoption
In the next 5-10 years, we expect humanoid robots to transition from research demonstrations to widespread deployment in manufacturing, logistics, and eventually consumer applications.
2. Sensor Systems Overviewโ
Sensor systems are the "eyes, ears, and sense of touch" for Physical AI systems. Without sensors, robots are blind and cannot interact meaningfully with their environment.
Why Sensors Matterโ
Physical AI systems operate in dynamic, uncertain environments. To navigate safely and perform useful tasks, robots must:
- Perceive what's around them (objects, people, obstacles)
- Understand where they are (localization, mapping)
- Sense their own state (orientation, balance, joint angles)
- Feel what they touch (force, pressure, vibration)
Each sensor modality provides different information. By combining multiple sensors (sensor fusion), robots can build a robust understanding of their environment.
Sensor Categoriesโ
Vision Sensors (Cameras, Depth Cameras, LIDAR)
- Provide rich, high-resolution information about the environment
- Enable object recognition, scene understanding, navigation
- Limitations: Affected by lighting conditions, occlusion
Proprioceptive Sensors (IMUs, Encoders, Joint Position)
- Sense robot's internal state (orientation, position, velocity)
- Critical for balance, coordination, and control
- Limitations: Drift over time, accumulate errors
Tactile Sensors (Force/Torque Sensors, Pressure Arrays)
- Provide information about physical interactions
- Enable gentle manipulation, object grasping, safe human-robot contact
- Limitations: Limited spatial resolution, calibration required
Figure 2.1: Overview of sensor categories in Physical AI systems
Sensor Selection Considerationsโ
When choosing sensors for a robot system, consider:
- Task requirements: What does the robot need to sense? (e.g., navigation needs LIDAR, manipulation needs force sensors)
- Environment: Where will it operate? (e.g., indoor vs outdoor, lighting conditions, weather)
- Budget: Sensor costs vary from $50 to $10,000+ per unit
- Processing requirements: Some sensors require significant compute (e.g., high-resolution cameras)
- Reliability: Industrial environments demand rugged, reliable sensors
- Integration: Compatibility with ROS 2, existing hardware, and software stacks
Multi-Modal Sensor Fusionโ
The most capable robots use multiple sensor modalities simultaneously:
- LIDAR + Cameras: LIDAR provides accurate 3D geometry, cameras provide color and texture
- IMU + Encoders: IMU provides high-rate orientation, encoders provide precise joint positions
- Vision + Force: Vision enables object detection, force sensing enables careful manipulation
By fusing data from complementary sensors, robots can compensate for individual sensor limitations and build more robust, accurate world models.
3. LIDAR Sensorsโ
LIDAR (Light Detection and Ranging) is a remote sensing technology that measures distance by illuminating a target with laser light and measuring the reflected pulses. It's a cornerstone sensor for autonomous vehicles, mobile robots, and aerial systems.
How LIDAR Worksโ
LIDAR operates on the principle of time-of-flight:
- A laser emits a pulse of light (typically near-infrared, 905nm or 1550nm)
- The light travels to a target object and reflects back
- The sensor measures the time delay between emission and reception
- Distance is calculated as:
distance = (speed of light ร time delay) / 2 - The laser is steered (mechanically rotating or solid-state) to scan the environment
Modern LIDAR systems emit thousands to millions of laser pulses per second, building detailed 3D point clouds of the environment.
Types of LIDARโ
2D LIDAR (Planar Scanning)
- Scans a single plane (usually horizontal)
- Produces 2D slice of the environment
- Common in industrial mobile robots (AMRs) for 2D navigation and obstacle avoidance
- Examples: Hokuyo, SICK (cost: $1,000-$5,000)
3D LIDAR (Volumetric Scanning)
- Scans in multiple dimensions (elevation and azimuth)
- Produces full 3D point cloud
- Essential for autonomous vehicles, drones, and humanoid robots
- Examples: Velodyne (now Ouster), Livox (cost: $500-$75,000)
Solid-State LIDAR
- No moving parts (uses optical beam steering or phased arrays)
- More reliable, smaller, and potentially cheaper than mechanical LIDAR
- Still emerging technology with limited field-of-view in some models
- Examples: Livox, Innovusion, Cepton
Figure 2.2: LIDAR operation principle and 3D point cloud output
LIDAR Specificationsโ
Key parameters when evaluating LIDAR systems:
| Specification | What it means | Typical values |
|---|---|---|
| Range | Maximum measurable distance | 10m (indoor) to 500m (outdoor) |
| Resolution | Angular spacing between beams | 0.1ยฐ (high-end) to 1ยฐ (consumer) |
| Update Rate | How often new data is generated | 5-20 Hz (mechanical), 50+ Hz (solid-state) |
| Field of View | Angular coverage of sensor | 360ยฐ (omnidirectional) to 120ยฐ (solid-state) |
| Accuracy | Distance measurement precision | +/-2-5 cm (typical) |
| Point Density | Points per second | 100K (low-end) to 2M+ (high-end) |
Advantages of LIDARโ
- Direct 3D measurement: Provides accurate, direct 3D geometry (no stereo vision ambiguity)
- Works in darkness: Doesn't require ambient lighting (active sensor)
- Precise distance measurement: More accurate than depth cameras at long range
- Occlusion handling: Can see "through" some obstacles (e.g., tree branches) depending on beam divergence
Limitations of LIDARโ
- Cost: High-performance 3D LIDAR costs $10,000-$75,000 (though solid-state may reduce costs)
- Weather sensitivity: Rain, fog, snow scatter laser light, reducing range and accuracy
- Reflective surfaces: Highly reflective (mirrors, windows) or dark/absorbent materials can be challenging
- Limited resolution: Point clouds are sparse compared to camera images (hard to see small objects)
- Motion artifacts: Fast-moving objects appear distorted due to scan time
Applications in Physical AIโ
- Autonomous vehicles: Detecting pedestrians, other cars, obstacles in 3D
- Indoor navigation: Mapping warehouses, hospitals, and office buildings
- Outdoor drones: Avoiding trees, power lines, and obstacles during flight
- Humanoid robots: Perceiving 3D structure of the environment for manipulation and navigation
- Industrial robots: Detecting obstacles and ensuring safe human-robot collaboration
4. Camera Systemsโ
Cameras provide rich visual information about the environment and are the most versatile sensors in robotics. They enable object recognition, scene understanding, and provide human-like perception.
Types of Camerasโ
RGB Cameras (Color, 2D)
- Capture 2D color images similar to human vision
- Provide rich texture and color information
- Limited depth perception (must infer from perspective, size, motion)
- Examples: Raspberry Pi Camera, Intel RealSense (RGB only)
- Cost: $30-$500
Depth Cameras (Structured Light, ToF)
- Measure depth directly using infrared illumination patterns or time-of-flight
- Provide depth maps aligned with RGB images
- Limited range compared to LIDAR (typically 0.5-10 meters)
- Examples: Intel RealSense D435i, Microsoft Kinect Azure
- Cost: $200-$700
Stereo Cameras (Binocular Vision)
- Use two cameras separated by known baseline to compute depth
- Mimic human stereo vision
- Effective for mid-range depth estimation (1-20 meters)
- Can be passively illuminated (no IR emitter needed)
- Examples: ZED 2, Intel RealSense stereo
- Cost: $400-$1,500
Event Cameras (Neuromorphic)
- Pixel-level operation: each pixel triggers events when it detects brightness changes
- Extremely high temporal resolution (microsecond response)
- Very low power consumption
- Still emerging technology with limited software ecosystem
- Examples: Prophesee, iniVation
- Cost: $1,000-$5,000
How Depth Cameras Workโ
Structured Light (e.g., Intel RealSense D400 series):
- Project a known infrared pattern (dots, grid) onto the scene
- Capture the pattern deformation with an IR camera
- Compare observed pattern to reference pattern
- Compute depth from triangulation based on pattern distortion
Time-of-Flight (ToF) (e.g., Microsoft Kinect Azure):
- Emit infrared light pulses from an LED array
- Measure round-trip time for light to reflect back
- Directly compute distance from time of flight (similar to LIDAR principle)
- Output depth map aligned with RGB camera
Stereo Vision (e.g., ZED cameras):
- Two cameras capture synchronized images
- Find corresponding points between the two images (stereo matching)
- Use triangulation with known camera baseline to compute depth
- Output depth map with confidence scores
Camera Specificationsโ
| Specification | What it means | Typical values |
|---|---|---|
| Resolution | Image dimensions | 640ร480 (VGA) to 4K+ |
| Frame Rate | Images per second | 30-60 FPS (typical), up to 200 FPS (high-speed) |
| Field of View | Angular coverage | 60ยฐ (narrow) to 120ยฐ+ (wide) |
| Depth Range | Effective depth measurement | 0.5-10m (structured light), 1-20m (stereo) |
| Depth Accuracy | Precision of depth measurement | +/-1-5% at 1m distance |
| Focal Length | Affects field of view and depth perception | 2-8mm (typical for robotic cameras) |
Advantages of Camerasโ
- Rich information: Provide color, texture, and semantic information
- Object recognition: Enable computer vision for detecting and classifying objects
- Human-like perception: Similar to how humans perceive the world
- Cost-effective: RGB cameras are inexpensive compared to LIDAR
- Versatile: Used for navigation, manipulation, inspection, human-robot interaction
Limitations of Camerasโ
- Lighting dependency: Performance degrades in low light, darkness, or extreme brightness
- No direct 3D: Must infer depth (except depth cameras, which have limited range)
- Occlusion sensitivity: Cannot see through obstacles
- Motion blur: Fast movement can cause blur, reducing accuracy
- Privacy concerns: Cameras capture visual data, which may raise privacy issues in public spaces
Applications in Physical AIโ
- Object detection and recognition: Identifying items to manipulate (cups, boxes, tools)
- Visual SLAM: Building maps and localizing using camera imagery
- Human-robot interaction: Detecting people, faces, gestures, and emotions
- Quality inspection: Visual inspection of manufacturing defects
- Navigation: Lane detection, traffic sign recognition, obstacle avoidance
5. Inertial Measurement Units (IMUs)โ
IMUs are sensors that measure and report a body's specific force, angular rate, and sometimes the magnetic field surrounding the body. They're critical for robot balance, orientation, and motion estimation.
What is an IMU?โ
An IMU typically consists of:
- Accelerometers: Measure linear acceleration (forces in X, Y, Z axes)
- Gyroscopes: Measure angular velocity (rotation rates around X, Y, Z axes)
- Magnetometers (optional): Measure magnetic field (for heading relative to Earth's magnetic field)
Together, these sensors provide 6 or 9 degrees of freedom (DOF) of motion data:
- 6 DOF: 3 accelerometers + 3 gyroscopes
- 9 DOF: 3 accelerometers + 3 gyroscopes + 3 magnetometers
How IMUs Workโ
Accelerometers
- Measure proper acceleration (deviation from free-fall)
- Based on Micro-Electro-Mechanical Systems (MEMS) technology
- Detect changes in capacitance caused by micro-mass movement when accelerated
- Important: Stationary accelerometer measures gravity (1g on Z-axis)
- Units: m/sยฒ or g (1g โ 9.81 m/sยฒ)
Gyroscopes
- Measure angular velocity (rotation rate)
- Based on MEMS vibrating structures or Coriolis effect
- Rotation causes mass to experience Coriolis force proportional to angular rate
- Units: degrees/second (ยฐ/s) or radians/second (rad/s)
Magnetometers (Compass)
- Measure magnetic field strength and direction
- Used as digital compass to determine heading (North, South, East, West)
- Affected by magnetic interference (motors, metal objects)
- Units: microtesla (uT) or Gauss
Figure 2.3: IMU components and rotation axes (Roll, Pitch, Yaw)
IMU Drift and Integrationโ
A critical challenge with IMUs is integration drift:
- To get position, integrate acceleration twice (acceleration โ velocity โ position)
- To get orientation, integrate angular velocity once (angular velocity โ orientation)
- Small sensor errors accumulate exponentially over time
- Without correction, position drifts to meters within seconds
Solution: Sensor fusion with other sensors:
- Use GPS to correct position drift (outdoor)
- Use vision or LIDAR to correct position (indoor)
- Use magnetometers to correct heading drift
- Use sensor fusion algorithms (Kalman filters, complementary filters)
| Specification | What it means | Typical values |
|---|---|---|
| Accelerometer Range | Max measurable acceleration | +/-2g (consumer) to +/-16g+ (high-dynamic) |
| Gyroscope Range | Max measurable angular rate | +/-250 degrees/s (consumer) to +/-2000 degrees/s (high-dynamic) |
| Accelerometer Noise | Random sensor noise | 100-200 micro-g/rtHz (MEMS) |
| Gyroscope Bias Stability | How much zero-rate error drifts | 10-100 degrees/hr (consumer) to <1 degrees/hr (tactical) |
| Update Rate | How often new measurements arrive | 100-1000 Hz (typical) |
| Power Consumption | Electrical power usage | 10-100 mW |
- High update rate: Provide data at hundreds of Hz, enabling fast control loops
- No external reference: Work in any environment (dark, underground, underwater)
- Low cost: MEMS IMUs cost $10-$100 (industrial-grade IMUs cost $1,000-$10,000)
- Small size: Can be integrated into robot bodies, limbs, or joints
- Low power: Suitable for battery-powered robots
Limitations of IMUsโ
- Drift: Accumulated errors require correction from other sensors
- No absolute position: Only measure changes (relative motion)
- Magnetic interference: Magnetometers affected by motors, metal structures
- Vibration sensitivity: Can be confused by external vibrations (e.g., from robot motion)
- Temperature sensitivity: Sensor characteristics change with temperature
Applications in Physical AIโ
- Balance control: Detect robot orientation and prevent falling
- Motion estimation: Track robot velocity and position for navigation
- State estimation: Combine with other sensors (GPS, vision) for robust localization
- Stabilization: Keep cameras steady during robot motion (e.g., drone gimbal)
- Human-robot interaction: Detect gestures, head orientation, posture
6. Force/Torque Sensorsโ
Force/torque (F/T) sensors measure forces and torques applied to a robot. They're critical for manipulation tasks, safe human-robot interaction, and delicate operations where too much force would damage objects.
What are Force/Torque Sensors?โ
Force/torque sensors measure:
- Forces: Linear forces in X, Y, Z axes (measured in Newtons, N)
- Torques: Rotational forces around X, Y, Z axes (measured in Newton-meters, Nm)
Six-axis F/T sensors provide complete information about forces and torques in all directions:
- Fx, Fy, Fz (forces)
- Tx, Ty, Tz (torques)
How F/T Sensors Workโ
Strain Gauge Sensors (most common):
- Sensor contains a deformable structure with attached strain gauges
- Strain gauges are resistive elements that change resistance when stretched or compressed
- When force is applied, the structure deforms slightly
- Strain gauges detect deformation and change resistance
- Wheatstone bridge circuit converts resistance changes to voltage
- Calibration maps voltage to force/torque values
Capacitive Sensors:
- Measure changes in capacitance between parallel plates
- Applied force changes distance between plates, changing capacitance
- More sensitive than strain gauges but more complex electronics
- Used in high-precision applications
Optical Sensors:
- Use optical encoders or interferometry to measure displacement
- Extremely sensitive and accurate
- Used in high-precision robotic research applications
F/T Sensor Typesโ
Wrist-Mounted F/T Sensors
- Mounted at robot's wrist (between arm and end-effector)
- Measure forces applied to gripper or tool
- Used for manipulation, assembly, delicate tasks
- Examples: ATI Industrial Automation, Robotiq
- Cost: $1,000-$10,000
Fingertip Force Sensors
- Embedded in robotic fingers or gripper pads
- Measure local forces at contact points
- Enable fine manipulation and grasping force control
- Examples: Tactile sensor arrays, force-sensitive resistors (FSR)
- Cost: $50-$1,000 per finger
Joint Torque Sensors
- Measure torque at robot joints (in actuators or gearboxes)
- Provide proprioceptive information about joint loads
- Used for impedance control and safe human-robot interaction
- Often integrated into robot actuators
- Cost: $500-$5,000 per joint
Tactile Sensor Arrays
- Matrix of force sensors distributed over a surface
- Provide spatial force distribution (e.g., on gripper pads)
- Enable detecting object shape, slip, and contact distribution
- Used for delicate manipulation and tactile feedback
- Cost: $500-$5,000 per array
F/T Sensor Specificationsโ
| Specification | What it means | Typical values |
|---|---|---|
| Force Range | Maximum measurable force | 10N (precision) to 5000N+ (industrial) |
| Torque Range | Maximum measurable torque | 0.5Nm (precision) to 100Nm+ (industrial) |
| Resolution | Smallest measurable change | 0.01N - 0.1N (force), 0.001Nm (torque) |
| Accuracy | Deviation from true value | 0.1% - 1% of full scale |
| Bandwidth | Update rate | 100 Hz (typical) to 1000+ Hz (high-speed) |
| Stiffness | Resistance to deformation | Very high (minimal compliance) |
Advantages of F/T Sensorsโ
- Delicate manipulation: Apply just enough force to grasp objects without damage
- Safe human-robot interaction: Detect accidental contact with humans and stop immediately
- Assembly tasks: Align parts and detect proper fit through force feedback
- Impedance control: Make robot "soft" and compliant for safe interaction
- Quality assurance: Detect assembly defects through abnormal force profiles
Limitations of F/T Sensorsโ
- Cost: High-precision F/T sensors are expensive
- Fragility: Sensitive to overload (can damage with excessive force)
- Calibration: Require periodic calibration for accuracy
- Complex integration: Need careful mounting and signal conditioning
- Limited range: Trade-off between precision and measurement range
Applications in Physical AIโ
- Grasping: Detect when an object is securely gripped
- Assembly: Align and fit parts together (e.g., inserting pegs into holes)
- Surface following: Move tools along surfaces with consistent contact force
- Collision detection: Detect collisions with people or obstacles and trigger safety stops
- Physical human-robot interaction: Guide robot by hand (lead-through programming)
- Tactile exploration: Explore objects to understand shape, hardness, texture
Summaryโ
This week, we've explored the humanoid robotics landscape and the sensor systems that enable Physical AI robots to perceive and interact with the physical world.
Humanoid robotics landscape: The field is rapidly evolving with key players like Boston Dynamics, Tesla, Agility Robotics, and others developing increasingly capable humanoid robots. While significant progress has been made in dynamic movement and manipulation, challenges remain in general-purpose operation, long autonomy, and natural task learning.
Sensor systems: We examined four critical sensor categories:
- LIDAR: Provides direct 3D measurement, essential for navigation and obstacle avoidance
- Cameras: Offer rich visual information for perception, recognition, and human-robot interaction
- IMUs: Enable balance control, orientation sensing, and motion estimation
- Force/torque sensors: Allow delicate manipulation and safe human-robot interaction
These sensors work together through sensor fusion, each compensating for others' limitations. Understanding these sensors is foundational because they provide the data that AI models process to make decisions and control robot motion.
Next week, we'll learn about ROS 2 (Robot Operating System), the middleware that integrates these sensors, actuators, and AI models into cohesive robotic systems. ROS 2 provides the infrastructure to stream sensor data, process it with AI algorithms, and send commands to actuatorsโall in real-time.
Further Readingโ
-
"Introduction to Autonomous Mobile Robots" by Siegwart, Nourbakhsh, and Scaramuzza - Comprehensive introduction to robot perception and navigation https://www.amazon.com/Introduction-Autonomous-Mobile-Robotics-Scaramuzza/dp/0262045777
-
"Probabilistic Robotics" by Thrun, Burgard, and Fox - Foundational textbook on probabilistic methods in robotics (SLAM, state estimation, sensor fusion) https://mitpress.mit.edu/9780262201629
-
"LIDAR Principles and Applications" - Technical overview of LIDAR technology and applications in robotics and autonomous systems https://www.sciencedirect.com/topics/engineering/lidar
-
"Depth Camera Technology Comparison" - Intel whitepaper comparing structured light, time-of-flight, and stereo vision depth cameras https://www.intel.com/content/www/us/en/developer/articles/technical/introduction-to-structured-light-depth-cameras.html
-
"MEMS IMU Technology and Applications" - Overview of MEMS inertial sensors and their use in robotics and navigation https://www.analog.com/en/technical-articles/imu-technology.html