Glossary Top 50

Embodied AI is an area of artificial intelligence focused on agents that interact with the world through a physical (or simulated) body. Embodied AI goes beyond purely abstract computational tasks by integrating perception (sight, hearing, touch, etc.), action (motor control), and decision-making to learn from and adapt to changing environments.

Here are the top 50 terms that will commonly appear in discussions of Embodied AI:

Action Space: All possible actions (movements, motor commands, control signals) an agent can take in an environment. In robotics, actions might be turning a wheel, bending a joint, or applying force to a gripper. In a simulated environment, they could be discrete commands (e.g., move forward, turn left) or continuous (change velocity by a certain amount).
Action Primitives: Basic, reusable movements or maneuvers (e.g., “grasp,” “push,” “lift”) that can be combined for more complex behaviors. Often learned or pre-programmed in robotics.
Active Learning: A learning framework where the agent strategically selects which data (observations, samples) to label or explore to improve learning efficiency. In embodied AI, an agent using active learning could decide where to look or how to move to gain the most informative new data.
Actuator: A mechanical device that moves or controls a system (e.g., motors, servos, hydraulic cylinders). Actuators execute the actions in robotic systems.
Adversarial Attacks: Manipulating sensor inputs (e.g., adding perturbations to camera images) or environment factors to deceive or degrade an agent’s policy. Understanding adversarial robustness is crucial for safety and reliability.
Affordance: Opportunities for action that an environment provides to an agent based on the agent’s capabilities. In embodied AI, learning affordances means learning which objects can be picked up, walked on, or manipulated, and how.
Agent: A system—often a robot or virtual entity—that perceives its environment through sensors and acts upon that environment through effectors or actuators. In embodied AI, the agent has a physical or simulated presence, enabling it to explore, manipulate, and learn from its surroundings.
Behavior Trees: A hierarchical decision-making structure often used in robotics and game AI. Behavior trees break tasks into modular, reusable nodes for flexibility and readability.
Contact Dynamics: The study and modeling of how objects (including robot links) interact via forces, collisions, and friction. Precise contact modeling is often key to reliable manipulation.
Continuous Control: Refers to problems where the agent’s actions involve continuous variables (e.g., joint angles, accelerations, or velocities). Many robotic tasks—like arm manipulation—require fine-grained continuous control rather than discrete steps.
Controller: A system or algorithm (e.g., PID, MPC) that determines how to apply actuator inputs (forces, torques) to achieve desired states or trajectories.
Curriculum Learning: A training method that starts with easier or simpler versions of a task and gradually introduces more complex challenges. For embodied AI, gradually ramping up task difficulty can help the agent learn more stably than if it attempts a very difficult task from the start.
Curriculum Transfer: Using knowledge gained in simpler curriculum tasks to accelerate learning in more complex tasks, combining the benefits of curriculum learning and transfer learning.
Domain Adaptation: A specialized form of transfer learning where an agent trained in one domain (e.g., synthetic or simulation data) adapts to a different but related domain (e.g., real-world data). This term is often used when bridging the gap between synthetic and real image or sensor data.
Domain Invariance: A property of a model or representation that remains robust across different but related domains (e.g., different lighting conditions, shapes, or physics), boosting Sim2Real performance.
Domain Randomization: A strategy for sim2real transfer in which various aspects of a simulation (lighting, object textures, physics parameters) are randomized. By confronting an agent with many simulation variants, domain randomization helps it learn robust behaviors that transfer better to real-world conditions.
End-to-End Learning: An approach where the AI model learns directly from raw sensor inputs (like pixels) to produce actions (motor commands), without manual feature engineering. Neural networks commonly support end-to-end learning, discovering the best internal representations for a given task.
Energy Efficiency: A design and control objective aiming to minimize power consumption. In embodied AI, energy-efficient policies are critical for longer battery life or operational time.
Exploration vs. Exploitation: A core trade-off in RL and learning: exploration involves trying new actions to gather information, while exploitation uses known actions that yield high rewards.
Forward Kinematics: Calculating the position of the end-effector (e.g., a robot hand) given the joint angles. Straightforward for most robotic arms but essential for control.
Hierarchical Reinforcement Learning (HRL): A variant of RL that decomposes complex tasks into multiple levels of abstraction. Higher-level policies might select sub-tasks or goals, while lower-level policies handle the details of achieving those sub-tasks. This hierarchical structure can make learning more tractable in large or complex environments.
Human-Robot Interaction (HRI): An interdisciplinary field focusing on how humans and robots understand and influence each other. In embodied AI, agents often need to communicate with humans, understand social cues, or safely and effectively operate around people.
Imitation Learning (Behavior Cloning): An approach where an agent learns to perform tasks by mimicking expert demonstrations. Instead of learning solely through trial and error, the agent can use labeled examples (e.g., from human teleoperation or demonstration) to speed up the acquisition of task-specific behaviors.
Inverse Kinematics: Finding joint angles/configurations that achieve a desired end-effector pose or position. Often more complex than forward kinematics, sometimes requiring iterative solutions.
Kalman Filter: An algorithm for optimal state estimation in linear systems with Gaussian noise, widely used for sensor fusion and tracking.
Markov Decision Process (MDP): A framework defining an environment in terms of states, actions, transition probabilities, and rewards. RL commonly assumes an MDP structure when states are fully observable.
Model Predictive Control (MPC): A control method that uses a model of the agent’s dynamics to predict future states. At each time step, MPC optimizes a control input over a finite horizon, then applies only the first control in the computed sequence before re-optimizing at the next time step.
Motion Planning: The technique of determining a sequence of valid configurations that move an agent from its initial position to a goal position without collisions. Methods include algorithms like Rapidly-exploring Random Trees (RRTs) or Probabilistic Roadmaps (PRMs), often combined with optimization for smoothness or energy efficiency.
Multi-Agent Systems: Systems in which multiple embodied agents interact or collaborate in a shared environment. Research includes studying cooperation, coordination, or competition among multiple agents, each with its own goals or shared objectives.
Multi-modal Learning: A technique that uses different types of data—vision, audio, tactile, proprioception, etc.—to train AI models. In embodied AI, combining multiple modalities helps an agent develop a richer understanding of the environment and improves task performance.
Navigation: The task of planning and executing a route or path in an environment. An embodied agent may use mappings, cost functions, and motion planning algorithms to navigate efficiently while avoiding obstacles.
Partial Observability: A scenario where an agent cannot fully observe the true state of the environment, leading to uncertainties. Often modeled using POMDPs (Partially Observable Markov Decision Processes).
Particle Filter: A sampling-based method for state estimation in non-linear, non-Gaussian systems. Maintains a set of hypotheses (“particles”) of the current state.
Perception: The process of interpreting sensory data from an agent’s environment (e.g., images from cameras, signals from tactile sensors). In embodied AI, perception is central to guiding action: the agent uses sensor inputs to make sense of its current state before deciding what to do next.
Policy: A mapping from the perceived state (or observation) to an action. In RL, a policy can be learned using algorithms like Q-learning, policy gradients, or actor-critic methods. A well-learned policy lets an agent select actions that maximize rewards based on its state.
POMDP (Partially Observable Markov Decision Process): An extension of MDPs where the agent receives observations that are probabilistically related to the underlying (hidden) states, reflecting real-world uncertainty.
Proprioception: An organism’s sense of the relative position of its own parts—muscle tensions, joint angles, orientation. In robotics, proprioception can refer to data from internal sensors (encoders, current sensing, etc.) indicating joint angles or torque, used for feedback and control.
Reinforcement Learning (RL): A learning paradigm where an agent interacts with an environment and improves its performance by maximizing a reward signal. In embodied AI, RL is often used to teach robots or virtual agents to perform tasks through trial and error, guided by rewards for actions that lead to desired outcomes.
Reward Function: A mathematical function that quantifies how “good” or “bad” an action’s outcome is for achieving an agent’s goal. In RL, the agent tries to maximize cumulative reward over time. Designing a suitable reward function is often crucial in embodied tasks to encourage desired behavior.
Reward Shaping: Modifying or adding auxiliary rewards to guide an agent’s learning toward desired behaviors, especially when the main reward is sparse or delayed.
Sensor Fusion: Combining data from multiple sensors (e.g., cameras, IMUs, depth sensors) to achieve more robust, accurate perception and state estimation. Sensor fusion helps an embodied agent handle noise or missing information from any single sensor.
Simulation-to-Real Transfer (Sim2Real):The process of training or developing an AI policy in a simulated environment and then transferring it to the real world. Sim2Real is a major challenge because simulations never perfectly replicate reality (the “reality gap”). Researchers use techniques like domain randomization to improve robustness to this gap.
SLAM (Simultaneous Localization and Mapping): The process of building a map of an unknown environment while simultaneously tracking the agent’s position within that map. Commonly used in robotics for navigation and obstacle avoidance.
Sparse Rewards: A reward scheme where an agent only receives a reward upon completing a task or after a significant event, making learning more challenging and demanding careful strategy.
State Representation: How an agent internally represents the environment and itself, typically as a compressed set of features or latent variables. Good state representations can significantly improve learning efficiency and policy performance in embodied tasks.
Task and Motion Planning (TAMP): A combined approach to planning high-level tasks (e.g., “pick up object”) and low-level motions (e.g., joint trajectories). TAMP ensures logical feasibility (task) and physical feasibility (motion).
Trajectory Optimization: An approach to motion planning that formulates the agent’s path as an optimization problem (e.g., minimizing energy or time). Trajectory optimization calculates the entire path from start to goal under dynamic, kinematic, and environmental constraints.
Transfer Learning: A method of reusing knowledge gained from one task or domain in another, related task or domain. In embodied AI, transfer learning can accelerate training by leveraging skills already learned in one environment or scenario for a new one.
Waypoint Navigation: A navigation strategy that splits a route into multiple intermediate goals or “waypoints,” simplifying path planning and control in large or complex spaces.
Zero-Shot / Few-Shot Learning: Learning paradigms that aim to generalize from extremely limited labeled data (none or very little). For embodied AI, zero- or few-shot learning can be crucial in scenarios where large amounts of data are expensive or difficult to collect (e.g., complicated or unsafe tasks).