The Creativity Code. Marcus du SautoyЧитать онлайн книгу.
The match was to be played over five games scheduled between 9 and 15 March 2016 at the Four Seasons hotel in Seoul, and would be broadcast live across the internet. The winner would receive a prize of a million dollars. Although the venue was public, the precise location within the hotel was kept secret and was isolated from noise – not that AlphaGo was going to be disturbed by the chitchat of the press and the whispers of curious bystanders. It would assume a perfect Zen-like state of concentration wherever it was placed.
Sedol wasn’t fazed by the news that he was up against a machine that had beaten Fan Hui. Following Fan Hui’s loss he had declared: ‘Based on its level seen … I think I will win the game by a near landslide.’
Although he was aware of the fact that the machine he would be playing was learning and evolving, this did not concern him. But as the match approached, you could hear doubts beginning to creep into his view of whether AI will ultimately be too powerful for humans to defeat it even in the game of Go. In February he stated: ‘I have heard that DeepMind’s AI is surprisingly strong and getting stronger, but I am confident that I can win … at least this time.’
Most people still felt that despite great inroads into programming, an AI Go champion was still a distant goal. Rémi Coulom, the creator of Crazy Stone, the only program to get close to playing Go at any high standard, was still predicting another decade before computers would beat the best humans at the game.
As the date for the match approached, the team at DeepMind felt they needed someone to really stretch AlphaGo and to test it for any weaknesses. So they invited Fan Hui back to play the machine going into the last few weeks. Despite having suffered a 5–0 defeat and being humiliated by the press back in China, he was keen to help out. Perhaps a bit of him felt that if he could help make AlphaGo good enough to beat Sedol, it would make his defeat less humiliating.
As Fan Hui played he could see that AlphaGo was extremely strong in some areas but he managed to reveal a weakness that the team was not aware of. There were certain configurations in which it seemed to completely fail to assess who had control of the game, often becoming totally delusional that it was winning when the opposite was true. If Sedol tapped into this weakness, AlphaGo wouldn’t just lose, it would appear extremely stupid.
The DeepMind team worked around the clock trying to fix this blind spot. Eventually they just had to lock down the code as it was. It was time to ship the laptop they were using to Seoul.
The stage was set for a fascinating duel as the players, or at least one player, sat down on 9 March to play the first of the five games.
‘Beautiful. Beautiful. Beautiful’
It was with a sense of existential anxiety that I fired up the YouTube channel broadcasting the matches that Sedol would play against AlphaGo and joined 280 million other viewers to see humanity take on the machines. Having for years compared creating mathematics to playing the game of Go, I had a lot on the line.
Lee Sedol picked up a black stone and placed it on the board and then waited for the response. Aja Huang, a member of the DeepMind team, would play the physical moves for AlphaGo. This, after all, was not a test of robotics but of artificial intelligence. Huang stared at AlphaGo’s screen, waiting for its response to Sedol’s first stone. But nothing came.
We all stared at our screens wondering if the program had crashed! The DeepMind team was also beginning to wonder what was up. The opening moves are generally something of a formality. No human would think so long over move 2. After all, there was nothing really to go on yet. What was happening? And then a white stone appeared on the computer screen. It had made its move. The DeepMind team breathed a huge sigh of relief. We were off! Over the next couple of hours the stones began to build up across the board.
One of the problems I had as I watched the game was assessing who was winning at any given point in the game. It turns out that this isn’t just because I’m not a very experienced Go player. It is a characteristic of the game. Indeed, this is one of the main reasons why programming a computer to play Go is so hard. There isn’t an easy way to turn the current state of the game into a robust scoring system of who leads by how much.
Chess, by contrast, is much easier to score as you play. Each piece has a different numerical value which gives you a simple first approximation of who is winning. Chess is destructive. One by one pieces are removed so the state of the board simplifies as the game proceeds. But Go increases in complexity as you play. It is constructive. The commentators kept up a steady stream of observations but struggled to say if anyone was in the lead right up until the final moments of the game.
What they were able to pick up quite quickly was Sedol’s opening strategy. If AlphaGo had learned to play on games that had been played in the past, then Sedol was working on the principle that it would put him at an advantage if he disrupted the expectations it had built up by playing moves that were not in the conventional repertoire. The trouble was that this required Sedol to play an unconventional game – one that was not his own.
It was a good idea but it didn’t work. Any conventional machine programmed on a database of accepted openings wouldn’t have known how to respond and would most likely have made a move that would have serious consequences in the grand arc of the game. But AlphaGo was not a conventional machine. It could assess the new moves and determine a good response based on what it had learned over the course of its many games. As David Silver, the lead programmer on AlphaGo, explained in the lead-up to the match: ‘AlphaGo learned to discover new strategies for itself, by playing millions of games between its neural networks, against themselves, and gradually improving.’ If anything, Sedol had put himself at a disadvantage by playing a game that was not his own.
As I watched I couldn’t help feeling for Sedol. You could see his confidence draining out of him as it gradually dawned on him that he was losing. He kept looking over at Huang, the DeepMind representative who was playing AlphaGo’s moves, but there was nothing he could glean from Huang’s face. By move 186 Sedol had to recognise that there was no way to overturn the advantage AlphaGo had built up on the board. He placed a stone on the side of the board to indicate his resignation.
By the end of day one it was: AlphaGo 1 Humans 0. Sedol admitted at the press conference that day: ‘I was very surprised because I didn’t think I would lose.’
But it was game 2 that was going to truly shock not just Sedol but every human player of the game of Go. The first game was one that experts could follow and appreciate why AlphaGo was playing the moves it was. They were moves a human champion would play. But as I watched game 2 on my laptop at home, something rather strange happened. Sedol played move 36 and then retired to the roof of the hotel for a cigarette break. While he was away, AlphaGo on move 37 instructed Huang, its human representative, to place a black stone on the line five steps in from the edge of the board. Everyone was shocked.
The conventional wisdom is that during the early part of the game you play stones on the outer four lines. The third line builds up short-term territory strength on the edge of the board while playing on the fourth line contributes to your strength later in the game as you move into the centre of the board. Players have always found that there is a fine balance between playing on the third and fourth lines. Playing on the fifth line has always been regarded as suboptimal, giving your opponent the chance to build up territory that has both short- and long-term influence.
AlphaGo had broken this orthodoxy built up over centuries of competing. Some commentators declared it a clear mistake. Others were more cautious. Everyone was intrigued to see what Sedol would make of the move when he returned from his cigarette break. As he sat down, you could see him physically flinch as he took in the new stone on the board. He was certainly as shocked as all of the rest of us by the move. He sat there thinking for over twelve minutes. Like chess, the game was being played under time constraints. Using twelve minutes of your time was very costly. It is a mark of how surprising this move was that it took Sedol so long to respond. He could not understand what AlphaGo was doing. Why had the program abandoned the region of stones they were competing over?
Was this a mistake by AlphaGo? Or did it see something deep