New AlphaGo for the first time secret: stand-alone run, 4 TPU, more powerful (interview + speech)

Home > Sci-Tech

New AlphaGo for the first time secret: stand-alone run, 4 TPU, more powerful (interview + speech)

2017-05-28 19:12:27 1119 ℃
Shu Shi Tang Xu from East Yao Village

Reported | qubit public number QbitAI

Who beat "Jie Jie" in the end?

The answer seems obvious. But - the reason to ask this question, because now beat kija AlphaGo, and the last beat Li Shi AlphaGo, are essentially different.

DeepMind roughly divides the AlphaGo into several versions:

  • The first generation is to defeat AlphaGo Fan fan hui. Compared with Zen/Crazy, Stone and other previous go software, the chess force should be 4 higher.

  • The second generation is to defeat Li Shi's AlphaGo Lee. Compared with the previous generation, chess is 3 more powerful.

  • The third generation, is now Jie Jie opponent, but also the 60 winning streak at the beginning of the year: AlphaGo Master. Compared to defeating Li Shi's version, chess power has been raised again by 3.

It's important to emphasize that AlphaGo, Lee, and AlphaGo Master are fundamentally different. Where different today, DeepMind founder and CEO Kazakhstan Biscay (Demis Hassabis), AlphaGo team leader Silva (Dave Silver) for the first time to unveil the latest version of AlphaGo's secret.

So this article push content, sorting from Harry Sabis, Silva this morning's keynote speech, and today midday of the two qubit DeepMind key person interview.

First, use data to speak.

AlphaGo Lee

  • Run in Google cloud, consume 50 TPU to calculate

  • Each search calculates the next 50 steps, with a speed of 10000 positions / sec

  • Beat Li Shi in Seoul in 2016

By contrast, IBM's dark blue, who defeated Kasparov 20 years ago, could search for one hundred million positions. Silva said, AlphaGo does not need to search so many locations.

AlphaGo Master

  • Run on Google cloud, but only with a TPU machine

  • Self taught, AlphaGo self play chess to improve the power of chess

  • Have a stronger strategy / value network

Because of the more efficient algorithms, AlphaGo and Master, the AlphaGo against Lee, were only 1/10 of the previous generation. So a single TPU machine is enough to support it.

Dr Huang Shijie, of the AlphaGo team, also said in his circle of friends that the latest AlphaGo could be called stand-alone version. The last generation, AlphaGo, uses distributed computing.

Accept - interview after the meeting, Silva confirmed that the AlphaGo still use the first generation TPU, but not shortly before the announcement of the second generation.

In addition, Silva clarified: "this year's upgraded version of AlphaGo runs on a single machine and has 4 TPU deployed on its physical server."".

Obviously, PPT has a little misleading.

If you want to learn more about TPU, there are few reports recommending qubits:

  • Detailed Google second generation TPU: power consumption, what is the performance? What does the tycoon want to do with it? "

  • "Google demonstrates new strength of AI: the second generation TPU and AutoML"

  • "Google depth secret TPU: a text to understand the internal principles, and why rolling GPU"

Back in AlphaGo, you probably noticed that this new version of go AI has a more powerful strategy / value network. Continue to decrypt around this point below.


To clarify where the new strategy / value network is strong, or should you first describe how AlphaGo's algorithms are structured?. Silva introduced, handling qubits are as follows.

The original DeepMind team chose to go the direction of research, an important reason is that the construction of test bench and the best go is to understand the complexity of operation, and go far beyond the mystery of the computer chess, let not the same through the deep blue chess around the way to break the exhaustive violence.

The core of Li Shi's AlphaGo is a convolutional neural network. The DeepMind team hopes that AlphaGo will eventually understand weiqi and form a global outlook. Silva said that AlphaGo Lee consists of 12 layers of neural networks, while AlphaGo Master has 40 layers of neural networks.

These neural networks are further subdivided into two functional networks:

  • Policy network (Policy Network)

  • Value network (value, network)

In the training of the two networks, supervised learning and reinforcement learning are used in two ways.

First, based on the human expert library data, millions of parameters of the policy network are adjusted. The goal is to adjust the strategy of network, in the same situation, can reach the level of a human chess master: move the same under.

Then the reinforcement learning, artificial intelligence self game, the end of the training, on the formation of the value network, which is used to predict the future of the game winning, make a judgment in different method in.

Through the policy network, you can reduce the search width, reduce the number of options, and shrink complexity. And it doesn't let AlphaGo go crazy, unreliable steps.

On the other hand, by reducing the depth of the value network, the AlphaGo stops at a certain depth. AlphaGo doesn't need to be exhausted until the end.

Combine these two together, that is, AlphaGo tree search. Select several possible paths through the policy network, then evaluate these paths, and finally submit the results to the top of the tree. This process repeated hundreds of thousands of times, and finally AlphaGo reached the highest probability of winning the game.

So where is the new strategy / value network?

AlphaGo Master became his teacher this time, and in Silva's words, the go AI was self-taught. It is learning from the self chess chess game, accumulated the best training data. "The last generation of AlphaGo has become a teacher for the next generation," Silva said.

Through the self play of AlphaGo, continue to absorb experience and improve the power of chess, this time AlphaGo uses self training strategy network, you can do not need more calculation, directly given the next step of decision-making.

This change significantly reduces the need for computational power.

Another is the value network, AlphaGo game based on self training, through the game after checking, the value network can learn which step is the key. Through high quality self play, the training value network predicts which step is more important.

"At any step, AlphaGo will predict exactly how to win," says Silva".

The process continues to iterate, eventually creating a more powerful AlphaGo. Self game brings the improvement of data quality, thus promoting the rapid promotion of AlphaGo.

In the same way, DeepMind confirmed that it will publish papers about this generation of AlphaGo. For more details, we can expect Deepm to release later.

Conquer intelligence and solve problems

AlphaGo comes from DeepMind. In 2010, DeepMind was founded in London and has 500 employees, half of whom are scientists. Kazakhstan Biscay said, DeepMind to scientists, artificial intelligence data and computing power together, promote the development of artificial intelligence.

DeltaDemis Hassabis

The company's vision is to capture intelligence first. Second is the use of intelligence to solve all problems.

In other words, the goal of DeepMind is to build universal ai. The so-called universal artificial intelligence, first of all, AI has the ability to learn, and secondly, can draw inferences about the above, the implementation of a variety of different tasks. How to reach this goal? Kazakhstan Biscay said that there are two tools: deep learning, reinforcement learning.

AlphaGo is a combination of depth learning and reinforcement learning. AlphaGo is also a step towards DeepMind's goal of universal artificial intelligence, though now it's more focused on the go world.

Kazakhstan Biscay said, hope that through the research of AlphaGo, let the machine get intuition and creativity.

The so-called intuition here is the primary perception that is acquired directly through experience. Can not express, can confirm their existence and wrong behavior.

Creativity is the ability to produce new or unique ideas by combining existing knowledge. AlphaGo has clearly demonstrated these capabilities, despite limited areas.

"In the future, we can see the tremendous power of human-computer interaction, and human intelligence will be magnified by artificial intelligence."." Kazakhstan Biscay said. At present, AlphaGo technology has been used in data centers, can save 15% of the electricity, but also can be used in materials, medical, smart phones and education and other fields.

Despite the Toronto, AlphaGo still has to continue to explore the space. Kazakhstan Biscay and DeepMind still want to continue to go in the field: we from the optimal solution how far? What is the perfect game?

In today's society, more and more data have been produced. However, human beings often can not understand the global changes through these data. In this case, artificial intelligence is likely to promote scientific research progress.

Just as the chess legend Kasparov said:

"Deep blue is over, AlphaGo is just beginning."."

Exclusive interview

DeltaKazakhstan Biscay and Silva accept the interview as qubit

Question: what's the point of hosting this game after Master has won 60 consecutive victories with the human chess players including Ko Jie?

Hassabis: Master under the Internet is fast, human chess players in the time control may not be too accurate, human players in the online attention is not necessarily completely focused, so we still need to kija game to test AlphaGo.

At the same time, through the online game, the first is to test the AlphaGo system; second, also hope to provide some new ideas for kija must go community, preparing for the time, also provide some materials for his game analysis AlphaGo.

Question: on the AlphaGo industry applications, what are you more optimistic about? Will Deepmind carry out some industry applications in China in the future?

Kazakhstan Biscay: first, support technology behind the AlphaGo very much, at present in other areas of application is still in the early stage of exploration. Some of the apps I talked about this morning are just a small part of the AlphaGo go app. In the future, we will certainly apply AlphaGo technology in the field of Google, and maybe there will be corresponding business in china.

Question: has AlphaGo implemented unsupervised learning? Is it moving toward strong AI?

Silva: first of all, AlphaGo uses the method of reinforcement learning. We can only say that AlphaGo achieves his intuition and consciousness in a particular field - which is quite different from what we call human consciousness through direct training. Because it is not such a human consciousness, it has the opportunity to be applied to other fields, not just weiqi.

Question: Mr. Hassabis mentioned in the morning that artificial intelligence must be properly applied. What principles does this "right" include?

Kazakhstan Biscay: two level. First, the AI must be for the benefit of mankind, should be used for similar science, this kind of pharmaceutical help humans, but not for some bad things, such as the development of weapons; second, not only for the use of AI, a company or individual, it should be fully shared by mankind.

Question: morning speech two mentioned, this generation of AlphaGo requires only a TPU operation, while the generation of Li Shishi and the wartime AlphaGo deployed 50 TPU; but the calculation of this generation system required only a generation 1/10. Why is there such a gap in the proportion?

Silva: let me clarify. This year's upgraded version of AlphaGo runs on a single machine, with 4 TPU deployed on its physical server.

Question: why does AlphaGo play chess at a constant rate?

Silva: when we were training for AlphaGo, we found that it was continuous and stable in playing chess, and its total amount of calculation was constant in the course of the game. We have developed a safe time control strategy for AlphaGo, which is to maximize the use of their game time, if you want to use the game time rate maximization, uniform is best of course.

DeltaMustafa accepted the interview - etc.

Question: Weiqi is relatively simple. What are the obstacles to the application of AI in reality?

Mustafa: we have thought deeply about this, and pointed out in DeepMind's mission that we should build universal artificial intelligence technology and accept corresponding supervision and supervision. Previously, we have formed AI alliances with a number of organizations to develop algorithms in an ethical and secure manner.

Question: how to avoid invasion of privacy in the process of landing?

Mustafa: the deployment of new technologies, the application process, there is a mismatch with the supervision and supervision mechanism, and now the power of science and technology has been very strong, in this case, the rapid development of technology. The so-called digital technology or equipment to balance, is what we continue to advance.

We want to strengthen the doctor patient trust in technology, the first is to show the clinical effect of using second we have said publicly at the beginning, the data processing system, completely in the scope of regulation, does not apply to other business.

Question: what is the structure of DeepMind at present?

Mustafa: DeepMind is divided into two structures, Kazakhstan Biscay is responsible for research and development, I am responsible for the business application. The application is divided into three groups: 1, group Google, group 2, medical team, and British NHS, 3, the energy group to be established immediately. We hope to work with experts to obtain the necessary data.

We work with different departments of Google in different forms.

Question: Why did you first use AI in the medical field? Not finance

Mustafa: business profit is not our most important driving force. We choose the industry from two points: first, whether it will contribute to technical research; secondly, whether it will help accomplish the social mission.

Long term inefficiencies and stagnant technology in the medical industry have been around for a long time.

Question: one hand R & D, one hand commercialization, there are no hidden technical details?

Mustafa: we try to provide information that helps other people when they are open source. Of course, we are not 100% to announce technical details. Of course, we will do as much open source as possible.

Question: are the data needed to drive the AI application sufficient and the data needed?

Mustafa: we did a statistic. The world's most outstanding experts, Department of Radiology, life will see thirty thousand X ray photographs, our algorithm can see millions of Zhang, to develop the consciousness and instinct Difficult miscellaneous diseases. We are able to improve the accuracy of the algorithm and show very stable performance.

Human experts see X rays, and only 2/3 of the consensus can be reached. So our idea is to do the X with the algorithm, and then match the different disease experts, so it works better.



The qubit is recruiting editor, operations, products and other positions, working in Beijing Zhongguancun. For details, please reply at the public dialogue interface: "recruitment."".

What else does the AI community pay attention to today? The qubit (QbitAI) public dialogue interface reply "today", we see the whole network for the AI industry and research trends. Refill ~!

In addition, if you study or engage in the field of AI, assistant will take you into the qubit communication group.

The scan code attention "forced" qubit