Mastering the game of Go with deep neural networks and tree search – a very recent paper in Nature:
Silver, D., Huang, A., Maddison, C. J., Guez, A., Sifre, L., Van Den Driessche, G., Schrittwieser, J., Antonoglou, I., Panneershelvam, V., Lanctot, M., Dieleman, S., Grewe, D., Nham, J., Kalchbrenner, N., Sutskever, I., Lillicrap, T., Leach, M., Kavukcuoglu, K., Graepel, T. & Hassabis, D. 2016. Mastering the game of Go with deep neural networks and tree search. Nature, 529, (7587), 484-489.
Go (in Chinese: 圍棋 , in Japanese 囲碁) is a two-player board strategy game (EXPTIME-complete, resp. PSPACE-complete) for two players aiming to surround more territory than the opponent; the number of he number of possible moves is enormous (10761 with a 19 x 19 board) compared to approximately 10120 in chess with a 8 x 8 board) – despite simple rules.
According to the new article by Silver et al (2016), Go has long been viewed as the most challenging of classic games for artificial intelligence owing to its enormous search space and the difficulty of evaluating board positions and moves. The authors introduce a new approach to computer Go that uses ‘value networks’ to evaluate board positions and ‘policy networks’ to select moves. These deep neural networks are trained by a novel combination of supervised learning from human expert games, and reinforcement learning from games of self-play. Without any lookahead search, the neural networks play Go at the level of state-of-the-art Monte Carlo tree search programs that simulate thousands of random games of self-play. The authors introduce a new search algorithm that combines Monte Carlo simulation with value and policy networks. Using this search algorithm, the program AlphaGo (see: http://deepmind.com/alpha-go.html) achieved a 99.8% winning rate against other Go programs, and defeated the human European Go champion by 5 games to 0. This is the first time that a computer program has defeated a human professional player in the full-sized game of Go, a feat previously thought to be at least a decade away.
There is also a news report on BBC:
Congrats to the Google Deepmind people!