The party game Jenga is a test of players’ skill in gently removing blocks and placing them on top of the tower, with lots of laughter and teasing for whoever ends up knocking the structure over. It isn’t all fun and games for this most recent challenger, however.

Instead, a robot arm developed by engineers at MIT's MCube Lab is learning to play Jenga in order to master both tactile and visual feedback that will one day enable it to assemble smartphones and other small, delicate parts in the manufacturing line.

Carefully considering each move, the robot gently pokes at a Jenga tower, looking for the best block to extract without toppling the whole shebang, in a solitary, slow-moving, yet surprisingly agile game.

The robot is equipped with a soft-pronged gripper, a force-sensing wrist cuff, and an external camera. These enable the robot to both see and feel the tower and its individual blocks.

A computer collects both visual and tactile feedback from the camera and cuff when the robot pushes against a block. It then compares these measurements to other moves the robot has made previously. The computer also considers the outcomes of those previous moves, such as whether a block was successfully extracted or not, depending on the configuration of blocks and the amount of force used to push the block. This enables the robot to learn in real-time whether to continue pushing a block or move to a new on, in order to prevent the tower from falling down.

Alberto Rodriguez, an assistant professor in MIT’s Department of Mechanical Engineering, says their robot demonstrates something that’s been difficult to achieve in previous systems: the ability to quickly learn the best way to carry out a task using tactile physical interactions as well as visual cues.

“Unlike in more purely cognitive tasks or games such as chess or Go, playing the game of Jenga also requires mastery of physical skills such as probing, pushing, pulling, placing and aligning pieces. It requires interactive perception and manipulation, where you have to go and touch the tower to learn how and when to move blocks,” Rodriguez explains. “This is very difficult to simulate, so the robot has to learn in the real world, by interacting with the real Jenga tower. The key challenge is to learn from a relatively small number of experiments by exploiting common sense about objects and physics.”

The Jenga-playing robot demonstrates something that’s been tricky to attain in previous systems: the ability to quickly learn the best way to carry out a task, not just from visual cues, as it is commonly studied today, but also from tactile, physical interactions. (Image courtesy of MIT.)

Rodriguez believes that the tactile learning system developed for the robot can be used in applications beyond Jenga game-playing, such as for tasks that require careful physical interaction, which could include anything from separating recyclable objects from landfill trash to assembling delicate consumer products.

“In a cellphone assembly line, in almost every single step, the feeling of a snap-fit, or a threaded screw, is coming from force and touch rather than vision,” Rodriguez says. “Learning models for those actions is prime real-estate for this kind of technology.”

Learning When to Push and When to Pull

In the game of Jenga — named from the Swahili word for “build” — 54 rectangular blocks are stacked in 18 layers of three blocks each. The blocks in each layer are oriented perpendicular to the layers above and below. The aim of the game is to carefully extract a block and place it at the top of the tower to build a new level, without toppling the entire structure.

Programming a robot to play Jenga using traditional machine-learning schemes would need to capture everything that could possibly happen between a block, the robot and the rest of the tower. This would be a computationally expensive undertaking requiring data from thousands—if not tens of thousands—of block-extraction attempts.

Rather than pursuing such a laborious method, Rodriguez and his colleagues pursued a more data-efficient way for the robot to learn to play Jenga—a method inspired by human cognition and the way a person might approach the game.

The robot is a customized version of an industry-standard ABB IRB 120 robotic arm. The researchers set up a Jenga tower within reach of the robot, and then began a training period in which the robot first chose a random block and a location on the block against which to push. The robot then exerted a small amount of force against the block in an attempt to push it out of the tower.

A computer recorded the visual and force measurements associated with each block attempt, and labeled whether each attempt was successful.

Unlike typical machine-learning methods that rely on huge data sets to decide their next best action, MIT’s Jenga-playing robot learns and uses a hierarchical model that enables the gentle and accurate extraction of pieces. This model allows the robot to estimate the state of a piece, simulate possible moves and decide on a favorable one. (Image courtesy of MIT.)

Rather than having the robot carry out tens of thousands of such attempts—which would involve reconstructing the tower almost as many times—the researchers trained it on just around 300. Attempts that had similar measurements and outcomes were grouped into clusters representing certain block behaviors. For example, one cluster of data might represent attempts on a block that was hard to move, versus a block that was easier to move, or that toppled the tower when moved. For each data cluster, the robot developed a simple model to predict a block’s potential behavior according to its current visual and tactile measurements.

Nima Fazeli, lead author of the paper and a graduate student at MIT, says this clustering technique is inspired by the natural way in which humans cluster similar behavior, and enables a dramatic increases to the efficiency with which the robot can learn to play the game.

“The robot builds clusters and then learns models for each of these clusters, instead of learning a model that captures absolutely everything that could happen,” Fazeli said.

Stack ’Em Up to Knock ’Em Down

The researchers used a computer simulation of the game using the simulator MuJoCo to test their approach against other state-of-the-art machine learning algorithms. The lessons learned in the simulator informed the researchers of how the robot would learn to play Jenga in the real world.

“We provide to these algorithms the same information our system gets, to see how they learn to play Jenga at a similar level,” explains Miquel Oller, another member of the research team. “Compared with our approach, these algorithms need to explore orders of magnitude more towers to learn the game.”

Curious to see how their machine-learning approach stacks up against actual human players, the team carried out a few informal trials with several volunteers.

“We saw how many blocks a human was able to extract before the tower fell, and the difference was not that much,” Oller says.

If the researchers want to pit their robot against a human in competitive play, this means they still have some ways to go. For example, truly successful Jenga playing requires strategy in addition to the physical interactions—such as extracting the specific block that will make it difficult for an opponent to pull out the next block without toppling the tower.

Right now, however, the team is less interested in developing a robotic Jenga champion and more focused on applying the robot’s new skills to other applications.

“There are many tasks that we do with our hands where the feeling of doing it ‘the right way’ comes in the language of forces and tactile cues,” Rodriguez says. “For tasks like these, a similar approach to ours could figure it out.”