The Curious World of Machine Learning: OpenAI’s DOTA 2 Bot

Screengrab from DOTA 2.

Games have become the de facto proving ground for artificial intelligence (AIs). They provide a way to test machines against the best of humanity and figure out which is better—man or machine—and it appears that the machines have us beat.

While Google shocked the world with their AlphaGo bot, OpenAI quietly worked on their DOTA 2 bot. For those unfamiliar, DOTA 2 it is a complex multiplayer game with a huge following, and over $130 million in prize money awarded at its various tournaments. Needless to say, it is played as seriously as a professional sport – just without the exercise part.

OpenAI had one aim: to beat the best human players in a one-on-one age--and they did it! Everyone celebrated and had pie*. But that win alone is not the interesting bit; as is often the case, the headlines missed something important, the consequences of which have profound implications for the world of machine learning.

But first, a little about the bot.

OpenAI’s DOTA 2-Playing Bot

The bot functions through what’s called an application programming interface (API), which means it does not directly ‘see’ the game, but instead receives the information it requires through code. It’s like playing chess solely with grid coordinates, rather than a whole board.

The bot’s reaction time was slowed to be equivalent to that of human players, meaning the bot can’t win simply by reacting faster than humans; it has to play better tactically.

The bot improved by playing itself millions of times, and with every game the bot played it would accumulate information with which it built up a probability matrix of what works and what doesn’t. To guide the bot, it received incentives for winning and for metrics like health. (What these incentives were, I don’t know; perhaps honey-coated silicon wafers.) A few DOTA-specific tactics were trained by reinforcement learning, but these are not important to this discussion.

The Interesting Part

During the tournament, only a few hours before the bot was meant to go up against the world’s best players, OpenAI had a problem: their bot was broken. Their semi-pro tester was playing against the bot, which was ambling around the map like a blind man with a wooden leg five inches too short. It looked like a disaster—but the bot was up to something.

When you play games with children, you do not crush them immediately; if you do, they cry and you get blamed for winning. No, first you must lure them into a false sense of security: first you pretend you don’t know what you’re doing, then as their confidence builds, when they least expect it you crush them.

Which is exactly what the bot did.

It had learned that if it acted like a dumb bot, its opponent would think it was unskilled. They would let their guard down, and thus be open to attack. The tactic worked on their human tester, who thought, “Look how stupid this bot is! I’m going to break him!”

That tester lost—five times in a row.

The bot-baiting was a quirk of data, which is why it is so interesting. OpenAI chanced upon this tactic because they paused the learning process at a specific time. Had they paused the process earlier, it is unlikely the bot would have learnt to bait; if they paused it later, the bot would have learned that baiting is ineffective against good opponents and thus not used it during battle.

Furthermore, it is counterintuitive to think that just by random actions a machine would ‘learn’ to act as though it is unskilled, and thereby manipulate opponents. It is this counterintuitive process which harbours a problem for machine algorithms in the wider world.

The Open Environment of the Real World

You can sandbox-test machine learning algorithms, but if your sandbox doesn’t have enough data, when the algorithms are let loose they might act in unexpected ways. Conversely, an AI might be programmed to act in unexpected ways, in order to beat the opposition.

The real world is an open environment and, unlike a game, one action can start a cascade of events leading to profound global consequences. This isn’t a problem for the AI that drives a car or improves your search results; those are closed and work only within the ecosystem.

But this will be a problem for machine learning algorithms on the stock-market, for example, which are interacting with other AIs and could potentially be engaging in baiting behaviour. It will be a problem for algorithms on electricity markets, where prices already vary according to demand. AIs could be trained to draw power from the grid into battery storage, to push up the price of electricity, only to sell it back a fraction of a second later at a higher cost.

Of course, you don’t need an AI to do that; but to beat regulators, and disguise your actions, an AI might be perfect for the task. Society needs to consider the implications of machine learning in abstract fields before we release the dogs of war.

If ten years ago someone had said to the governments of the world, “In a decade, AIs will dictate the news people receive, and everyone will have a social media profile. To keep them visiting, the site the algorithms will only show them information that interests them. There will be industrial scale confirmation bias, where many in society will experience a one-sided view of the world, contributing to polarised politics and populism,” then perhaps we wouldn’t have situations such as the Facebook and Cambridge Analytica debacle.

In the future, too, there will be myriad more methods of seeking to exploit data for profit, and as machine learning advances, so will opportunities for its misuse. But it isn’t all doom and gloom.

However, AI has immense potential to improve the world. If you haven’t seen the lecture on machine learning by Jeff Dea, lead of Google’s AI division, you should check it out. Dean’s discussion is fascinating, particularly the section which talks about machine learning programs that can write themselves.

In a way, the work of current AI programmers is analogous to the work done by physicists on the Manhattan project: a group of smart people working on a problem for the love of it.

Perhaps Richard Feynman described it best:

“You see, what happened to me, what happened to the rest of us, is that we started for a good reason, then you’re working very hard to accomplish something, and it’s a pleasure, it’s excitement, you stop thinking, you know, you just stop.”

Humans are addicted to ‘progress’, but before we strap ourselves onto the SRB of machine learning, we should set some ground rules—because once this ride starts, there may be no getting off.

_{*They may not have had pie.}

John Ewbank specialised in Finite Element Analysis, before embarking on a round the world voyage. He now runs an online tea store, in Brighton, England, focusing on fine and unusual teas. Much of his downtime is spent researching EVs, power systems and renewable energies. In 2018 he intends to publish a book, considering the economical and environmental consequences of their adoption. For more information, visit his websites at mitea.co.uk and johnewbank.co.uk