University of Bonn Software Looks into the Future

Was passiert als nächstes? Prof. Dr. Jürgen Gall (rechts) und Yazan Abu Farha vom Institut für Informatik der Universität Bonn.
What happens next? Prof. Dr. Jürgen Gall (right) and Yazan Abu Farha from the Institute of Computer Science at the University of Bonn.

Computer scientists at the University of Bonn have developed software that can look a few minutes into the future: The program learns the typical sequence of actions from video sequences. This allows it to accurately predict what certain people will do in new situations.

Prof. Dr. Jürgen Gall's research group wants to teach computers to predict the time and duration of actions – minutes or even hours before they take place.

A kitchen robot could thus support cooking by keeping the ingredients ready as soon as they are needed, preheating the oven in good time – and perhaps even warning the chef if he is about to forget a preparation step. On the other hand, the automatic vacuum cleaner knows that it has no business in the kitchen while it is being used and can take care of other rooms instead.

We humans can anticipate the actions of others very well, but computers are not yet in a position to do so. Now the researchers at the Institute of Computer Science at the University of Bonn have developed self-learning software that can, astonishingly accurately, estimate the time and duration of future actions – over periods of several minutes.

Salad Videos as Training

The training data included 40 videos in which performers prepared different salads. Each of the recordings was about 6 minutes long and contained an average of 20 different actions. In addition, the videos contained precise information about when which action started and how long it took.

According to the University of Bonn, the algorithm learned which actions typically follow each other during the task and how long they last from these salad videos. The success of the learning process was then tested.  »We confronted the software with videos that it hadn't seen before,« explains Gall. These videos also showed the preparation of a salad. For the test, the computer was told what could be seen in the first 20 or 30 percent of the new videos. On this basis it then had to predict what would happen in the rest of the film.

According to Gall, it went surprisingly well. »The accuracy was over 40 percent for short forecasting periods, but then decreased the further the algorithm had to look into the future. For actions that were more than three minutes in the future, the computer was still correct in 15 percent of the cases – the computer had to predict both the action and the time correctly.«

The study is only a first step into the new field of activity forecasting. Especially since the algorithm performs noticeably worse when it has to recognize for itself what is going on in the first part of the video and is not told that. This analysis is never 100 percent correct – Gall speaks of »noisy« data. »Our procedure works with it,« he says. »But unfortunately not nearly as well yet.«