Google has launched two new AI models—Gemini Robotics and Gemini Robotics ER—designed to help robots think, move, and interact like humans. According to Google, for AI to be truly useful in the real world, it must have ‘embodied reasoning’ — the ability to understand, react, and take action safely in its environment, just like humans do.
What is Gemini Robotics?
Gemini Robotics is an advanced AI model that integrates vision, language, and action, enabling robots to understand commands, interact with objects, and perform tasks. Built on Gemini 2.0, it introduces physical actions as a new output, allowing robots to move and respond to their surroundings. This model focuses on three key capabilities: adaptability, interaction, and precision. It enables robots to adjust to new situations, engage naturally with people, and complete delicate tasks, such as folding paper or opening a bottle cap.
Gemini Robotics can also process natural language instructions in real time, meaning it can monitor its surroundings, detect changes, and modify its actions accordingly. This makes it particularly useful in homes, workplaces, and industrial settings.
Google emphasises that robots come in different shapes and sizes, so Gemini Robotics is designed to be flexible. It was primarily trained on the bi-arm robotic platform, ALOHA 2, but it also works with Franka arms, commonly used in research labs.
What is Gemini Robotics ER?
Alongside Gemini Robotics, Google has introduced Gemini Robotics ER, which enhances how robots understand space, objects, and motion. It is specifically designed to help engineers integrate AI with existing robotic systems, making AI-powered robots more efficient and capable. This model significantly improves spatial awareness, 3D object recognition, and movement planning. For example, when shown a coffee mug, it can instinctively determine the best way to grasp it and plan a safe path to pick it up.
Gemini Robotics ER performs all essential tasks needed for robot control, including perception, spatial understanding, movement planning, and generating instructions. By combining advanced reasoning with real-world interactions, it takes AI-driven robotics one step closer to functioning like humans.