Disney Unveils New Lifelike Animatronic Robots with a Human-Like Gaze

Disney has created new humanoid robots to be implemented at its parks. (Image courtesy of Disney Research.)

Disney, the leader of cutting-edge technology in shows, movies and overall storytelling, has created yet another robot to add to its theme parks. 

After years of research, the company has released information into its new animatronic figures (Disney’s second robotic addition to its parks following the Stickman robot), which aim to be lifelike robots of sorts with audio and visual elements. They will be used at Disney theme parks to provide interactive entertainment for all guests. 

In a recent research paper titled “Realistic and Interactive Robot Gaze,” Disney delves deeper into how it is creating a system that uses a lifelike gaze in human-robot interactions by utilizing a humanoid Audio-Animatronics bust. The team has created the technology around a general architecture that uses animation to further reinforce its believability. According to the mega entertainment corporation, its new robot will help the firm add an interactive human-robot experience capable of human-like gaze behaviors.

Disney's Audio-Animatronics figures. (Image courtesy of Disney Research.)

The robot was developed by engineers at Disney’s Research division and Walt Disney Imagineering; and robotics researchers from the University of Illinois, Urbana-Champaign, and the California Institute of Technology.

Each interaction will focus on accuracy and believability to present an illusion of life. Animators have worked endlessly to design lifelike motions with a concentration on the robot’s gaze. The humanoid identifies a guest and the environment they are in to interact with them appropriately. 

A gaze is a social signal to reveal one’s emotions and thoughts. Interestingly, the Audio-Animatronics figures can saccade to show visual interest—a complex human behavior. The robot rapidly looks between a guest’s eyes and nose for a range of 0.1 to 0.5 seconds and a frequency of 20 Hz. This occurs in the glance and engage stage. 

The following video describes how the figure interacts with visitors. Animatronics take on personas and scripts to tell a story or enact a scene. In this case, the humanoid animatronic bust plays an old man reading a book. The man has poor hearing and eyesight but is constantly distracted by his environment. Often, he stares at people who interrupt him with disapproval or acknowledgement.

To make the Animatronics figures look more real, Disney focused primarily on the animation principle of overlapping action. Parts of the robot naturally move at different rates during a motion. For example, the figure will move its eyes first when turning to glance or engage, which will make the action look more real as it transitions into mutual gazing.

Disney refrains from simply imitating human qualities by using basic animation principles. Disney uses arcs, which are the natural motions following curved trajectories, or slow in and slow out, the acceleration and deceleration at the beginning and end of actions. 

For the robot to complete actions, the Audio-Animatronics figure has 19 degrees of freedom (DOF). In the current system, there are three DOFs in the robot’s neck, two in the eyes, two in the eyelids, and two in the eyebrows. The remaining DOFs will eventually make up the jaw and lip movements but are not in the current system. 

The robot employs a MYNT EYE D-1000 RGB-D camera for depth perception. A sensor located in the robot’s chest area alerts it when to turn and face a person in front of it. The camera and perception engine see the guests as skeletons in 2D. The integrated depth computation then generates the depth data feed on the camera to provide 3D joint locations of the eyes, nose, ears, neck, shoulders, elbows, wrist, hips, knees, ankles, big toes, small toes, and heels. 

The infrared module in the camera emits and detects infrared radiation to get a sense of the surroundings and to see in all lighting conditions. The Animatronics figure has horizontal and vertical fields of view (FOVs) of 105° and 58°, compared to a human FOV of 200° and 135°. It can focus within a range of 0.3 to 10 meters. 

The humanoid’s main operating system runs custom proprietary software operating on a 100 Hz real-time loop. The main system includes the attention engine, the behavior selection engine, and the behavior library. 

The attention engine identifies features of guests in the environment that attract the Animatronics’ attention (salient stimulus) based on the movements of the fitted skeletons. The attention engine generates a curiosity score by collecting data on the people in its view and then assigning numbers based on what each person is doing or how far they are. If a person is waving to the robot, it will realize their significance and respond to them accordingly. 

The attention engine works with the behavior selection engine to store data about familiar guests and their interactions. The process, similar to habituation, limits the character from repeatedly responding to a single guest and ignoring others.

The behavior selection engine represents higher-level reasoning. It directs the robot’s behavioral state and maintains information about the humanoid’s current state, curiosity thresholds, and state timeout durations as well as those of the guests. In this case, the old man has four behavioral states. The default state is reading. The man will glance if he finds someone with a high curiosity score and will then engage with them when it detects the person of interest with both the eyes and head. The last state is acknowledgement, which is when the person of interest is deemed to be familiar with it. 

The behavior library stores all the motor actions and movements in the Animatronics system. These can run simultaneously, one after the other, on repeat, or randomly. The zero show is when the robot is turned off and its jaw remains closed. Next, there is an alive show where the robot is breathing and  blinking. There is also a show for each behavioral state: read, glance, engage and acknowledge. 

The overall system architecture. (Image courtesy of Disney Research.)

It’s hard to tell how it would all come together if the robot had a face. However, there are still some challenges with the Animatronics figure. It can depict human behaviors or at least appear to depict them accurately at a farther distance and for a short period. But it is harder for the robot to be believable at a closer distance and for longer periods, since it is difficult to mimic complex behaviors and social cues in a dynamic environment.

Disney is currently working to use real-time AI and subsumption architecture, behavior-based architecture that uses sensory information to select certain actions used in autonomous robotics. Subsumption architecture can add an emotional range to the robot, just as humans have layered behaviors with incoming sensory inputs. The robot will have higher behavioral levels that can control lower-level behaviors (heartbeats, breathing and blinking).

As the technology improves, the impressive animatronic attractions may get into potential Uncanny Valley territory. The idea is that as robots appear more humanlike, they become more appealing. But at a certain point, it can also produce eerie feelings in the guests who visit the beloved park. And while the Audio-Animatronics figures aim to be as lifelike as possible as the company attempts to create an illusion of life, Disney also reinforces the idea that it is solely concentrating on animation rather than biology. With an enormous amount of human-robot interaction (HRI) research, Disney just wants the robots to look alright, and guests are already excited to see it.