Latest Posts
-
Event Cameras Are the Right Sensor for Fast Robots. The Field Is Finally Catching Up.
Every frame-based camera attached to a robot is a small lie about time. The sensor pretends the world updates at 30, 60, maybe 120 frames per second — but a robot’s actuators, contacts, and collisions play out in microseconds. This mismatch has been tolerated for decades because the algorithms built on dense, synchronous image frames are so mature. But as robots push into faster manipulation, aggressive locomotion, and low-light deployment, the frame-based paradigm is quietly becoming a bottleneck. Event cameras offer a fundamentally different contract, and the robotics community is finally building the infrastructure to exploit it.
-
The Chinchilla Question for Robots: Why Scaling Laws Don't Transfer Cleanly from Language to Physical AI
The scaling hypothesis — more parameters, more data, more compute equals better models — rewrote the trajectory of language AI. Chinchilla showed that the optimal frontier scales predictably. GPT-4 confirmed it at production scale. Now every robotics lab and humanoid startup is asking the same question: will robots scale the same way? The honest answer, backed by two years of cross-embodiment experiments, is that scaling in robotics is real but structurally different — and the community is only beginning to understand where the analogy breaks down.
-
Touch Is the Missing Modality: Why Tactile Sensing Will Define the Next Phase of Dexterous Manipulation
Every major VLA model released in the last two years shares a quiet limitation: they are, fundamentally, eye-hand coordination systems. Vision-Language-Action models ingest pixels and produce motor commands, and for a wide class of tasks — pick-and-place, tabletop rearrangement, coarse assembly — that’s enough. But watch a robot try to thread a cable, cap a syringe, or recover an object mid-grasp when something slips. Vision alone fails, not because of resolution or latency, but because the information isn’t there. The geometry of a contact event lives in forces and surface deformation, not in photons. This is the tactile gap, and it’s quietly becoming the central bottleneck in dexterous manipulation research.
-
3D Gaussian Splatting Is Quietly Becoming Infrastructure for Robot Perception
The field of robot scene understanding has been quietly colonized by a representation nobody originally designed for it. 3D Gaussian Splatting (3DGS), introduced by Kerbl et al. at INRIA in 2023 for novel-view synthesis, is now appearing in papers on grasp planning, semantic scene queries, sim-to-real transfer, and training data generation at a rate that suggests it is becoming infrastructure — not just a technique. Understanding why requires looking at the specific properties that make 3DGS robotics-friendly almost by accident.
-
Whole-Body Control Is the Unsolved Core of Humanoid Robotics
The humanoid moment is real — Figure 02 is assembling BMW door panels, Unitree’s G1 is doing backflips, Tesla Optimus is folding shirts. But beneath every demo, there is a piece of mathematics that nobody outside the lab talks about: whole-body control. WBC is the computational layer that decides, every few milliseconds, how to distribute forces across every joint in a robot’s body to achieve a desired task while simultaneously respecting physics, joint limits, and contact constraints. It is unglamorous, deeply mathematical, and arguably more consequential to the humanoid future than any foundation model running on top of it.
-
Flow Matching Is Replacing Diffusion Policy — Here's the Mechanism
The action generation layer of robot learning has quietly undergone a revolution in the past eighteen months. Diffusion Policy — Chi et al.’s 2023 paper that demonstrated score-based generative models could handle the multimodal, high-dimensional distributions that plague imitation learning — was a genuine breakthrough. But the field is now migrating to something faster, simpler to train, and better suited to the latency demands of real hardware: flow matching. Understanding why this transition is happening, and what it unlocks, matters whether you’re building manipulation systems or trying to read the next wave of robotics papers.
-
World Models for Robots: Learning to Predict Before Acting
One of the most significant shifts in embodied AI research over the past year has been the rise of world models — learned internal representations that allow a robot to simulate the consequences of its actions before executing them. Rather than reacting to the environment purely through trial and error, a robot equipped with a world model can reason about what will happen next, plan across longer horizons, and transfer learned behaviors far more efficiently to new settings. This is a fundamental architectural idea, and 2025–2026 has seen it move from theory into serious deployment-grade research.
-
Embodied AI in 2025: A Year of Breakthroughs
It has been almost a year since our last post. A lot has happened in the world of embodied AI. This post is a catch-up covering the most significant developments from 2025 — a year that may well be remembered as the inflection point where AI truly entered the physical world.
-
Universal Manipulation Interface (UMI)
Universal Manipulation Interface: UMI is an innovative framework designed to bridge the gap between human demonstration and robotic execution, enabling robots to learn complex manipulation tasks directly from human actions performed in natural settings. This approach addresses the limitations of traditional robot teaching methods, which often rely on controlled environments and expensive equipment. 
-
Human-Robot Interaction (HRI)
Human-Robot Interaction (HRI) is fundamentally different from Human-Computer Interaction (HCI). For decades, HCI has shaped the way we engage with digital systems—through keyboards, touchscreens, and increasingly, voice assistants. But as robots move from factories into homes, hospitals, and workplaces, a new challenge emerged. How to design interactions for machines that exist in the same physical space as us?
-
Computer Vision
Computer Vision: Human beings have survived by relying on rapid visual cues—detecting subtle movements in tall grass, discerning edible plants from poisonous ones, and identifying a friend from foe in split seconds. Sight was the original survival mechanism granting us the power to parse our environment swiftly and accurately. Today, machines can approximate that life-preserving instinct through computer vision.
-
Intersection of Edge AI and Embodied AI
Edge AI is the ability to run artificial intelligence algorithms directly on local devices—smartphones, sensors, robots—without constantly relying on cloud computing. Instead of sending data back and forth to a remote server, the device processes it on the spot. That means real-time decisions, lower latency, improved privacy, and independence from unreliable internet connections.
-
Sensor Fusion
Sensor Fusion: Embodied AI agents (robots, autonomous vehicles, etc.) are equipped with multiple sensors (e.g. cameras, LiDAR, radar, ultrasonic, IMU, GPS) to perceive their environment. Sensor fusion is the process of combining data from these sensors to produce a more accurate or robust understanding than any single sensor could provide.
-
Markov Decision Processes
Markov Decision Processes: PART I - What Is an MDP? A Markov Decision Process is a mathematical framework that helps make good decisions when outcomes aren’t 100% certain. While it sounds complicated, the main idea is straightforward:
-
Adversarial Attacks
What Are Adversarial Attacks? Over the past few years, researchers have demonstrated various ways to fool state-of-the-art systems. In one high-profile study, carefully crafted stickers on traffic signs confused self-driving cars. In another, hackers manipulated the LED lights on a robot vacuum, tricking its camera-based obstacle detector. These are few real world examples for adversarial attacks.
-
AI Agents
AI Agents: When we think about artificial intelligence, we often picture algorithms crunching data, generating text, or analyzing images. But what happens when AI needs to interact with the world—whether in a video game, a financial system, or even a physical robot? Here comes AI agents.
-
A Brief History of Embodied AI
A Brief History of Embodied AI Today, many people associate Artificial Intelligence with chatbots and algorithms analyzing vast data sets. But there’s another side to AI that’s all about real-world interaction: Embodied AI. It’s the branch of AI that puts machines (or agents) into physical environments—whether in actual hardware or simulations—so they can perceive, act, and learn more like living beings. Below is a concise tour of how embodied AI evolved from early robotic explorations to the dynamic field we see today.
-
Glossary Top 50
Embodied AI is an area of artificial intelligence focused on agents that interact with the world through a physical (or simulated) body. Embodied AI goes beyond purely abstract computational tasks by integrating perception (sight, hearing, touch, etc.), action (motor control), and decision-making to learn from and adapt to changing environments.