The icon indicates free access to the linked research on JSTOR.

The 2017 documentary AlphaGo, released on Netflix in January 2018, is an underdog story. In March 2016, Lee Sedol, widely considered the best Go player in the world, was left confounded after DeepMind’s AI, known as AlphaGo, trounced him in a five-game series.

JSTOR Daily Membership AdJSTOR Daily Membership Ad

AlphaGo’s reign at the top didn’t last long. A year later, DeepMind researchers created AlphaGo Zero, an improved program that promptly beat its progenitor. The main difference between the two AIs? AlphaGo learned the game by studying games played by human players. AlphaGo Zero, on the other hand, was given the rules of Go and set to play against itself, using a technique called “reinforcement learning.” Over the course of 3 days, AlphaGo Zero played 4.9 million games against itself, and shortly after supplanted AlphaGo as the most formidable Go-playing program on Earth.

But how impressive is this really? If you could do anything 4.9 million times, you’d probably become a master too. The ability to iterate a larger number of times on a small time scale is what makes neural networks such useful tools in artificial intelligence. Running on hardware with relatively high processing power, neural networks are enjoying their fifteen minutes of fame—the en vogue machine learning architecture—being used for everything from image recognition to joke generation.

It was only recently that artificial intelligence researchers acquired the processing power to build neural networks like AlphaGo Zero. But the brute force power of repetition has been an object of philosophical fascination for a long time. In 1927, the physicist Sir Arthur Eddington popularized a now-familiar thought experiment that demonstrates the problem-solving capabilities of a system that relies solely on randomness and repetition.

Imagine you have a cohort of monkeys, each trained to type away randomly on a word processor. If you have them working long enough, eventually they’ll type out a Shakespeare play. By virtue of pure chance, a monkey can come up with Romeo and Juliet. All that’s required is the proper tools and an adequate timeline.

It’s not much of a leap from the troupe of monkeys to a computer program, a random text generator that pumps out strings of letters, usually nonsensical, but sometimes yielding comprehensible language. The computer scientist Stephen Clausing was inspired to build a text-generating program meant to explore the problem, reverently named Eddington. Reinforcement learning could be applied to this system by picking out strings that resemble Shakespeare’s writing, then directing the program to work toward similar results.

That’s essentially how AlphaGo Zero learned its craft. The first time it played Go, the neural network had no idea (to speak anthropomorphically) which moves were more likely to result in a win. It just knew which moves were possible. But by playing a game to completion, it could record, and learn, what a win looks like, and what led to that win. Each game played was a baby step, but 4.9 million baby steps is a massive leap.

That’s why iteration is such a powerful tool. If you have the time and energy to try something 4.9 million times, you don’t need other means of finding solutions. For humans, iteration is typically a foolish way to approach a problem. Imagine being given a pile of wood planks and trying to build a ship out of them. If you don’t know how to build a ship, this enterprise seems like pure folly. But if you can repeat the process over and over, do you really need to know much of anything? Trial and error eventually will get you there.

This suggests that you can circumvent comprehension (the knowledge of how to do something) and skip straight to competence (the act itself). This approach has been espoused by the philosopher Daniel Dennett as an explanation for a different medium: life. Dennett has argued that the key to understanding how complex life forms such as ourselves came to be can be found in Charles Darwin’s “strange inversion of reasoning.” This phrase, initially written as a disparaging remark by an early critic of Darwin, has since been appropriated by Dennett as an encapsulation of Darwin’s insight, the realization allowed him to develop the theory of evolution.

Darwin dared to notice what others of his time deemed heretical: sentient life just kind of happened. According to the prevailing Darwinian narrative, human intelligence was built by an uninformed, haphazard designer—namely, natural selection. Natural selection operates with vast resources (all life that has come before) and on a massive timescale, experimenting (so to speak) with the most efficient ways of passing down genetic material. On this theory, the gestalt-shifting takeaway from the meandering evolutionary path toward modern humanity is that iteration can outstrip any time-constrained application of intellect. The cohort of monkeys will eventually win.


JSTOR is a digital library for scholars, researchers, and students. JSTOR Daily readers can access the original research behind our articles for free on JSTOR.

Computers and the Humanities, Vol. 27, No. 4 (1993), pp. 249-259
Proceedings of the National Academy of Sciences of the United States of America, Vol. 106, Supplement 1: In the Light of Evolution III: Two Centuries of Darwin (Jun. 16, 2009), pp. 10061-10065
National Academy of Sciences