A group from DeepMind used the "Capture the flag" mode of the multiplayer game "Quake III Arena". How they taught their AI system to play "Quake III Arena" successfully is explained by the members of the team of DeepMind around Max Jaderberg in "Science".
Every human being pursues his or her own goals and coordinates his or her actions with them. But people can also come together in groups and even form organizations and societies to solve tasks together. "Multi-agent-learning" is what this means in the AI language. Many individual agents have to act separately and at the same time learn to work together with other agents. This is a very difficult problem to understand because the agents adapting to each other are constantly changing the whole environment.
The researchers tried to find out what this was all about using Quake III Arena as an example. They trained agents who learn and act individually, but who have to play in teams - with and against other agents who may be AI systems or humans. In the computer game "Quake III Arena" two teams have to grab the flag of the respective opponent and bring it to their own base. They can take their opponents out of action with laser shots, which are returned to the game after a short time via their own base. The symmetrically arranged rooms and corridors should give both teams the same chances and are generated randomly.
According to the CTF rules, the game is played on the basis of a given card. The researchers at DeepMind have made this rule even more interesting: the card changes from game to game. So agents are forced to learn general strategies. Remembering only one card is not enough. The players move through the virtual terrain from the first-person perspective of a team member. They have to work together with their group members and keep their opponents in check.
After about 200,000 games, the AI agents were on average better than the best people. Last time, after around 450,000 games, they clearly won: When two human players competed against two AI agents, the latter conquered an average of 16 more flags. AI agents responded to the appearance of an opponent after an average of 258 milliseconds, humans after 559 milliseconds. But even if the researchers slowed down the reaction time of the AI agents, the artificial ones remained superior to the human ones.
The researchers have focused on three basic ideas:
- The agents are trained together in populations instead of individually.
- Each agent in a population learns its own internal reward signals so that it can individually determine its own goals, such as conquering the flag.
- The agents work on two time scales: slow and fast. This improves their ability to create consistent actions using a shared memory.
The learning process is based on Recurrent Neuronal Networks. An agent who has gone through the training - the researchers call him "For-The-Win" agent (FTW) - learns to play "Quake III Arena" at a very high level. In the future the researchers will apply their methods beyond "Capture the Flag" to the full handling of "Quake III Arena". The results already indicate that the agents will be able to cope with other game modes and cards and challenge the researchers' game skills. Overall, the work of the researchers shows that AI systems can be further improved through multi-agent training. AI agents will be able to work together with people in a team.