The Master Algorithm for How the Quest for the Ultimate Learning Machine Will Remake Our World by Pedro Domingos

Home » Technology » The Master Algorithm for How the Quest for the Ultimate Learning Machine Will Remake Our World by Pedro Domingos

The Master Algorithm explores the groundbreaking quest for artificial intelligence’s holy grail. Pedro Domingos takes you on a thrilling journey through the cutting-edge world of machine learning, unveiling its potential to reshape society. This book isn’t just for tech enthusiasts; it’s a must-read for anyone curious about the future of technology and its impact on our lives.

Dive into this eye-opening exploration of AI’s future and discover how it could transform your world.

Table of Contents

Genres
Review
Recommendation
Take-Aways
Machine learning can solve important problems by looking at data and then finding an algorithm to explain it.
To avoid hallucinating patterns, learning algorithms need to be restricted and tested for validity.
Rules using deductive reasoning and decision trees can allow machines and algorithms to think logically.
You can prevent effective algorithms from overfitting by keeping models open and restricting assumptions.
Unsupervised learning algorithms are great at finding structure and meaning in raw data.
There is no one perfect algorithm, and a unifying master algorithm is required to tackle the big problems.
In modern business, finding the best algorithm and the best data is the key to success.
In the future, you’ll have a digital model of yourself to help make life easier.
Summary
The “Machine Learning Revolution”
“Learning Algorithms”
The “Five Tribes of Machine Learning”
1. “Symbolists”
2. “Connectionists”
3. “Evolutionaries”
4. “Bayesians”
5. “Analogizers”
One Master Algorithm
A World of Learning Machines
Conclusion
About the author
Table of Contents

Genres

Technology and the Future, Education, Science, Artificial Intelligence and Semantics, Computer Science, Business, Programming, Philosophy, Computers, Technical, Algorithms, Knowledge Representation, Machine Learning, Mathematical Analysis, Information Theory, Computer Neural Networks

Book Summary: The Master Algorithm - How the Quest for the Ultimate Learning Machine Will Remake Our World

Pedro Domingos’ The Master Algorithm delves into the search for a universal learning algorithm that could revolutionize artificial intelligence. He explores five main schools of machine learning: symbolists, connectionists, evolutionaries, Bayesians, and analogizers. Each approach offers unique strengths and limitations in the quest for AI supremacy.

Domingos argues that combining these approaches could lead to a master algorithm capable of learning anything from data. He discusses potential applications across various fields, including healthcare, finance, and scientific research. The book also addresses concerns about AI’s impact on jobs and privacy.

Throughout the text, Domingos provides accessible explanations of complex concepts, using analogies and real-world examples. He envisions a future where AI enhances human capabilities rather than replacing them, emphasizing the need for responsible development and ethical considerations.

Review

The Master Algorithm offers an engaging and thought-provoking look at the future of machine learning. Domingos’ writing style makes complex topics accessible to a general audience without sacrificing depth. His enthusiasm for the subject is contagious, inspiring readers to contemplate AI’s potential.

The book’s strength lies in its comprehensive overview of various machine learning approaches. Domingos presents a balanced view, discussing both the potential benefits and risks associated with advanced AI. However, some readers might find certain technical sections challenging.

While the concept of a universal learning algorithm is intriguing, Domingos’ optimism occasionally borders on speculation. Some of his predictions may seem overly ambitious, given current technological limitations.

Despite these minor criticisms, The Master Algorithm serves as an excellent introduction to machine learning for non-experts. It encourages critical thinking about AI’s role in society and provides a solid foundation for understanding this rapidly evolving field.

Recommendation

That sci-fi moment you’ve been waiting for is here: Some machines are already learning on their own and even learning to program themselves. This lively, necessary report from computer science professor Pedro Domingos shows you what this transformation does for science and what it will do for human society. Machine learning is complex and the subject is conceptually dense, but Domingos explains it with clear love for his topic. He details how computers came to learn on their own; how learning algorithms function; and how competing theories of thinking and learning work. His vivid writing, anecdotes and examples help make the topic more accessible than you might expect (though the reader may have to do some heavy lifting). Domingos explains a lot, but sometimes relies too much on his illustrations. We recommend his treatise to anyone interested in how computers are revolutionizing society, in “machine learning” or in scientific development.

Take-Aways

“Machine learning” lets computers write their own algorithms – “precise and unambiguous” instructions that tell computers exactly “what to do.”
Machine learning can revolutionize society, but it will require a “Master Algorithm” to reach its full potential.
Five schools of thought about machine intelligence compete with each other:
“Symbolists” represent intelligence by manipulating symbols.
“Constructionists” model learning on the human brain.
“Evolutionaries” see natural selection as the main learning mechanism.
“Bayesians” says understanding naturally is flawed and partial. They follow Bayes’ theorem”: revise how much you believe a hypothesis when you discover new data.
“Analogizers” believe that learning requires identifying similarities.
A universal learner will unify all five models without embracing any of their weaknesses.
Once a Master Algorithm exists, it will provide access to models that are increasingly accurate and useful.

Machine learning can solve important problems by looking at data and then finding an algorithm to explain it.

Though you might not be aware of it, machine learning algorithms are already seeping into every aspect of human life, becoming more and more powerful as they continue to learn from an ever-increasing amount of data. The Master Algorithm (2016) provides a broad overview of what kind of algorithms are already out there, the problems they face, the solutions they can provide and how they’re going to revolutionize the future.

One of the world’s greatest mysteries is how a pound of gray jelly in the head of a newborn can eventually produce a stream of consciousness, able to perceive the world and interact with it. Maybe more astounding is how little teaching the brain requires while it’s undergoing this transformation.

No machine in the history of mankind has a learning capacity comparable to the human brain. But things are changing. Our ability to create ever more sophisticated machines means that, in the future, they may be able to challenge the brain.

Machines may even surpass human ability. They can learn from the enormous amounts of data that we encounter and ignore every day. So let’s put on our thinking caps and explore the fascinating world of algorithms and machine learning.

In this summary of The Master Algorithm by Pedro Domingos, you’ll find out

how machines will be able to learn without instruction in the future;
why seeing patterns sometimes is a problem; and
how an algorithm for winning Tetris could improve your route to work.

Have you ever been frustrated by recipes with imprecise instructions, like, “cook at medium heat for 15-20 minutes”? If so, you might be someone who prefers a good algorithm.

Unlike recipes, algorithms are sequences of precise instructions that produce the same result every time.

Though you might not be aware of their presence, algorithms are used everywhere. They schedule flights, route the packages you send and make sure factories run smoothly.

These standard algorithms are designed to accept information as an input, then perform a task and produce an output.

Let’s say an algorithm’s task is to give directions. When you input two points, the output would then be the shortest route between these two points.

But machine learning, or ML, algorithms are one step more abstract: they are algorithms that output other algorithms! Given lots of examples of input-output pairs to learn from, they find an algorithm that seems to turn the inputs into the outputs.

This comes in handy for finding algorithms for tasks that human programmers can’t precisely describe, such as reading someone’s handwriting. Like riding a bike, deciphering handwriting is something we do unconsciously. We would have trouble putting our process into words, let alone into an algorithm.

Thanks to machine learning, we don’t have to. We just give a machine learning algorithm lots of examples of handwritten text as input, and the meaning of the text as the desired output. The result will be an algorithm that can transform one into the other.

Once learned, we can then use that algorithm whenever we want to automatically decipher handwriting. And, indeed, that’s how the post office is able to read the zip code you write down on your packages.

What’s great is that ML algorithms like this one can be used for many different tasks, and solving emergent problems is only a matter of collecting enough data.

This means that the initial underlying algorithm is often the same and requires no adjustments in order to solve seemingly unrelated problems.

For example, you might think that making a medical diagnosis, filtering spam from your email and figuring out the best chess move might all need completely different algorithms. But, actually, with one ML algorithm and the right kind of data, you can solve all these problems.

To avoid hallucinating patterns, learning algorithms need to be restricted and tested for validity.

To hallucinate is to see something that isn’t really there. Interestingly enough, hallucinations are a central problem in the world of algorithms. In 1998, a best-selling book, The Bible Code, claimed that the Bible contained hidden predictions that were revealed by selectively skipping certain lines and letters.

Critics disproved this assertion, however, by demonstrating that similar “patterns” could be found in Moby Dick and within Supreme Court rulings.

This is a good example of hallucinating patterns, which, in ML lingo, is the result of overfitting. Overfitting takes place when an algorithm is so powerful that it can “learn” anything. You see, when you throw enough computing power at a data set like the Bible, you will always find patterns because the computer can construct increasingly complex models until some arise. But the resulting model won’t work on any other data.

So, to get your algorithms under control, their power needs to be bounded by limiting their complexity.

With the right kind of limiting restrictions, you make sure the scope of your algorithm isn’t too big and ensure that the results are verifiable and consistent. If it’s too flexible, your algorithm can end up like The Bible Code, finding patterns in any given text or set of data.

But what if your algorithm discovers multiple patterns that explain the data you have but disagree on new data? Which result should you believe? And how can you be 100 percent sure your results aren’t just a fluke?

This is when holdout data comes in.

When you are preparing your original data set for the learning algorithm, it is important to divide it into a training set, which the algorithm uses to learn, and a holdout set, against which to test it.

This way you can double-check the results and confirm that the patterns found in the data are valid.

Ensuring the validity of the results is what an ML expert’s work is all about. Her job is to restrict the power of the algorithm by making sure the rules are not too flexible and that the results perform well against both the training-set data and the holdout-set data.

Rules using deductive reasoning and decision trees can allow machines and algorithms to think logically.

Just as the medical world has specialists who have preferred ways of treating the body, the world of machine learning has specialized branches with their own perspectives and preferred style of algorithms.

Symbolists, for example, manipulate symbols and learn rules in order to create artificial intelligence (AI).

Symbolists are the oldest branch of the AI community; they’re rationalists who view the senses as unreliable, and therefore believe that all intelligence must be learned through logical methods.

For this reason, the symbolists’ preferred algorithm is inverse deduction.

Generally speaking, inverse deduction creates rules by linking separate statements, like this: If you provide two statements, such as “Napoleon is human” and “Therefore Napoleon is mortal,” the algorithm can arrive at broader statements, such as “Humans are mortal.”

While this kind of algorithm is good for data mining and sorting through relatively large amounts of data, such as medical records, it is costly and inefficient for truly massive databases because it has to consider all possible relationships between all variables in the data, which results in exponentially growing complexity.

So, to make this work less complex, you can use decision trees to find these rules.

As the name suggests, decision trees branch the data off into smaller sets. They do this by basically playing a game of 20 questions, with each question or rule further narrowing down the options and possibilities.

For example, if you wanted to come up with rules for sifting through medical records, you could use a decision tree. You’d start out with all of the records, but then, at the various branching points in the tree, you’d divide them into groups like “healthy,” “leukemia,” “lung cancer” and so forth. The ML algorithm would then find suitable rules that would also result in this division.

This method prevents overfitting by restricting the number of questions the decision tree asks, so that only the most widely applicable and general rules are applied.

Decision trees are used in software that make medical diagnoses by narrowing down someone’s symptoms. They were also used in an application that could predict the outcome of Supreme Court rulings with a 75-percent accuracy rate; the predictions from a panel of human experts had an accuracy rate of less than 60 percent.

In the next book summary, we’ll see how to deal with data of a more difficult, and more human, sort – data that’s uncertain or even contradictory.

You can prevent effective algorithms from overfitting by keeping models open and restricting assumptions.

Bayesianism is another popular branch of machine learning, and its followers are practically religious in their devotion.

Contrary to the rationalists, Bayesians are empiricists who believe that logical reasoning is flawed and that true intelligence comes from observation and experimentation.

Their algorithm of choice is called Bayesian inference, which works by keeping a number of different hypotheses or models open simultaneously. The degree to which we believe in any one of these hypotheses or models will vary depending on the evidence found in the data, as some will invariably receive more support than others.

This approach can also help provide a medical diagnosis. While remaining open to many hypothetical diseases and their symptoms, the algorithm can sift through the data of a patient’s record to make the best match. The more data the record provides, the more diseases the algorithm can rule out, until one hypothesis becomes the statistical winner.

Bayesian inference is a powerful algorithm, and it prevents overfitting by restricting assumptions about causes and events.

For example, if it is apparent that you have the flu but we want to find out if you also have a fever or a cough, we can categorize the flu as a cause and the fever or cough as events. The restricting assumption here is to assume that the two events do not influence each other, meaning that a cough does not affect your chances of also getting a fever.

By ignoring the possible connections between events, Bayesian inference avoids overfitting and becoming too powerful by strictly focusing on the connection between cause and effect.

Similar assumptions are used by voice-recognition software like Siri. When you say, “Call the police!” the Bayesian inference keeps options open and considers how likely it is that you might have said, “Call the please!”

But when checking its database of popular phrases, it is enough to look at the frequency of how often certain words follow one another. And, in this case, it is clear that “police” follows the word “the” far more often than “please.”

Unsupervised learning algorithms are great at finding structure and meaning in raw data.

Have you ever noticed how you can hear when someone says your name, even if it’s said quietly and you’re surrounded by dozens of loudly talking people? We have an impressive ability to filter the information our ears pick up and focus on. Could an algorithm learn to do the same thing?

As a matter of fact, unsupervised learning is a category of algorithms that are designed to use raw and noisy data.

The algorithms in the previous book summarys have all used data containing labeled examples to learn from, such as examples of correct diagnoses or emails that have been labeled as spam or non-spam.

However, clustering algorithms are one group of unsupervised learners that can discover categories from large amounts of raw data.

This is the kind of algorithm that can be used in image recognition or voice isolation software, which can identify a face or object among millions of pixels, or single out a voice in a noisy crowd.

These algorithms can find meaningful structures like these by reducing the dimensionality of the data by bringing the description of what you’re looking for down to its primary essentials.

Sketch artists, for instance, are able to reproduce faces with such accuracy because they memorize ten different variations of each facial feature – nose, eyes, ears and so on. This narrows their options down considerably, making it possible to produce a passable drawing based on a description alone. Similarly, facial recognition algorithms, after preprocessing all the different options, only need to compare a few hundred variables rather than a million pixels.

Neural networks are another effective way to crunch massive amounts of raw data.

While other algorithms process data sequentially, neural networks work like a brain and process multiple inputs at the same time.

One of the biggest neural networks ever created spent three days sifting through ten million randomly selected YouTube videos. And without being told what to look out for, the program learned to recognize human faces and, perhaps unsurprisingly, cats.

Now that you know about all these different algorithms, you’ve surely begun to wonder what would happen if they were all combined into a master algorithm. Well, let’s find out.

There is no one perfect algorithm, and a unifying master algorithm is required to tackle the big problems.

So, with all these different algorithms, you might be asking, which one’s the best?

The truth is, there’s no such thing as a perfect algorithm; they all rely on different fundamental assumptions.

Remarkably, for every data set where an algorithm comes up with something useful, a devil’s advocate could use the same algorithm on another data set to show that everything it does is nonsensical. That’s why it’s important to make the right assumptions about the data you’re applying the algorithm to.

Luckily, this isn’t as big of an issue as it might seem.

The majority of the most difficult problems in computer science are fundamentally related and could be solved with one good algorithm.

Just consider some problems that have already been solved: Determining the shortest route to visit several cities, compressing data, controlling urban traffic flow, turning 2D images into 3D shapes, laying out components on a microchip and, last but not least, playing Tetris.

Discovering an efficient solution for one of these problems essentially solved them all.

It might be hard to believe that one algorithm can address so much, but it’s true and it’s considered one of the most fascinating insights in all of computer science.

Unfortunately, the most important problems facing humanity require much more capable algorithms than are currently available.

For example, in order to come up with a cure for cancer, the ultimate algorithm needs to incorporate all the previously acquired knowledge, plus be able to keep up with the quick rate at which new scientific discoveries are being published. On top of that, it would need to consider the relevance of all of this data and discern an overarching structure that no one has yet been able to see.

While this is currently beyond the capability of algorithms, progress is being made.

Take Adam, for instance – a research robot at the Institute of Biology in Manchester that has learned general knowledge about genetics and can suggest hypotheses. It can even design and carry out experiments as well as test and analyze their results!

In modern business, finding the best algorithm and the best data is the key to success.

“Data is the new oil,” or at least so say the modern business prophets.

It’s hard to argue against, too, because, in the world of big business, many feel that the company with the best algorithms is the company that’s going to win.

In the pre-internet era, if a business had problems connecting with consumers, it could solve them with a physical solution, like coming up with a better ad campaign.

But with the internet came virtually unlimited consumer choice, and now the question is: How do you decide what to buy when there are 10 million options?

This is where machine learning comes in and helps narrow things down.

Amazon has led the race in offering intelligent suggestions on what products customers might like, and their service covers just about every market.

But the race is still on. And whoever has the best data can come up with the best algorithm, which is why data is a tremendous strategic asset. The average value of a user’s data trail for the online advertising industry is around $1,200 per year. Google’s data on you goes for about $20, while Facebook’s costs $5.

The business of buying data has become so big that experts believe data unions and data banks will eventually allow private citizens and companies to conduct fair negotiations about the use of their data.

Databanks could keep your information secure and also allow you to set the terms for when and how it is accessed. And a data union could operate like other worker unions, with like-minded people joining forces to ensure that information is being used fairly and accurately.

This kind of regulation could benefit everyone. It could help businesses, by improving their algorithms; you could get better purchase recommendations; and, with the extra security, more people might feel comfortable sharing their data to help advance medical and humanitarian causes.

In the future, you’ll have a digital model of yourself to help make life easier.

Chances are you’ve been caught talking to yourself at one point or another. Well, if you’ve ever wished you could talk back to yourself, your dreams may soon come true.

By sharing all your data with the ultimate master learning algorithm you will end up with a pretty accurate digital model of yourself.

Imagine the master algorithm: Seeded with a database containing all general human knowledge, then personalized with all the data that you’ve collected over the course of your life, including emails, phone records, web searches, purchases, downloads, health records, GPS directions and so on and so on.

You could then download the learned-model digital version of yourself onto a flash drive, carry it with you in your pocket and use it like a personal butler to help you run your life.

With your very own digital you, little annoyances could be quickly dealt with, saving you time and hassle.

In addition to simple things like automated web searches and recommending new books and movies, it could also file your tax returns, pay your credit-card bills, sort your email, plan your vacations and, if you’re single, it could even set you up on dates.

Or, if you’re feeling introspective, you could set it to a conversation mode and have a chat with your digital model.

In a society where digital models are a common thing, it could even interact with the rest of the world on your behalf.

Imagine you are looking for a new job. After spending a second on LinkedIn, your model could apply for every suitable job available, including some perfect jobs you might have otherwise overlooked.

Those companies might have personal models as well, and your digital self could interact with them and provide you with a list of every company that agreed to a personal interview. All you’d have to do is confirm your appointment.

Your digital self will be like power steering: you’ll get where you want to go with less hassle and a fraction of the effort.

Summary

The “Machine Learning Revolution”

You interact with “machine learning” every day. When Netflix suggests a movie or a search engine completes your query, that’s machine learning in action. This is revolutionary. Throughout history, if you wanted a machine to do something, you had to build it to do just that thing. For computers, you wrote a detailed algorithm explaining how it should do what you want it to.

“Machine learning is something new under the sun: a technology that builds itself.”

“Machine-learning algorithms,” or “learners,” work differently. Computerized learners figure things out by themselves. Computers that are “learners” can “program themselves.” Give them data, and they learn. The more data they have, the more effectively they think. This unprecedented development will revolutionize society. Machine learning is already transforming fields from politics to DNA sequencing.

“You may not know it, but machine learning is all around you.”

Understanding machine learning starts by becoming familiar with the term “algorithm.” Algorithms are “precise and unambiguous” instructions that tell computers exactly “what to do.” Designing algorithms is difficult, time-consuming and often counterintuitive. When programmers and computer scientists succeed in writing good algorithms, they build on each other’s work, producing more and more algorithms, which interact like the elements in an “ecosystem.” Just as ecosystems evolve predators, obstacles arise to threaten flourishing algorithms. These obstacles come in the form of different types of complexity, which slow or crash computerized systems, causing their algorithms to fail.

“Learning Algorithms”

Learning algorithms, or learners, write their own algorithms; learner algorithms – which might be housed in multiple computers – write the programs they use. The Industrial Revolution mechanized manual labor, and the Information Revolution automated mental labor. “Machine learning automates automation.” A human scientist generates, tests and discards or modifies hundreds of hypotheses in a lifetime of work. Machine learners check a hypothesis in less than a second. Machine learning lets scientists tackle data-dense, complex problems they could not handle on their own.

“An algorithm is a sequence of instructions telling a computer what to do.”

If you want to solve two different problems, you use two different tools or programs. Machine learning is different. You can often use “the same algorithms” to solve problems in different fields. Would it be possible to develop “a universal learner” that can derive all knowledge? A “Master Algorithm” would need more data to function than more specialized algorithms, but society generates increasing amounts of big data. Developments in physics, evolutionary biology, neuroscience, statistics and computer science suggest that a Master Algorithm is possible.

The “Five Tribes of Machine Learning”

Scientists have been researching machine learning for decades. Their inquiries emerge from five major “tribes.” Each one approaches the problem differently, cares most about one major aspect of the challenge and has “a set of core beliefs.” Each group advocates for its own Master Algorithm, which embodies its beliefs and approach:

1. “Symbolists”

The symbolists reduce intelligence to symbol manipulation. Symbolists recognize that learning can’t start “from scratch.” They include “pre-existing knowledge” in their model. Symbolists use “inverse deduction” as their Master Algorithm. Inverse deduction determines what constitutes knowledge through a process of deduction and then generalizes from the result. The symbolists’ family tree traces back to philosopher David Hume, one of the greatest empiricists and “the patron saint of the symbolists.” Hume asked a profound question: How can you generalize from what you’ve observed to what you haven’t experienced? All learning algorithms seek to find a solution to this query.

“The Master Algorithm is the unifier of machine learning: It lets any application use any learning.”

Some 250 years after Hume asked his question, physicist David Wolpert created the “no free lunch theorem,” which primes the pump of knowledge creation by using what you already know, but also including random chance. It offers “positive examples” of each concept for the learner to follow and “negative examples” of things that don’t illustrate the concept. To get a learner to identify cats, you’d add positive examples of cats and negative examples of animals that are not cats, such as dogs. To meet more learning goals, combine examples or assemble “sets of rules.”

“Machine learning is both a science and a technology and both characteristics give us hints on how to unify it.”

Because “induction is the inverse of deduction,” you can create rules by identifying what rule would let you deduce one fact from another. You can also “induce rules purely from other rules.” Since “inverse deduction” is “very computationally intensive,” applying “massive data sets” to such problems is very difficult.

2. “Connectionists”

Scientist Donald Hebb explained a key element of brain function in 1949, when he showed that repeated activity in one neuron sparks activity in nearby neurons – a principle often summarized as “Neurons that fire together wire together.” Connectionists use algorithms to “simulate a brain.” Computers don’t have as many connections as the brain, so faster processing must compensate. The brain might use 1,000 neurons, but computers would use “the same wire a thousand times.”

“Machine learning is a kind of knowledge pump: We can use it to extract a lot of knowledge from data, but first we have to prime the pump.”

Brains contain billions of neurons, which are shaped like little trees. Each neuron connects with “thousands of others” through synapses. Electricity runs along the trunk of each neuron, jumping across synapses to spark activity in nearby neurons. Applying this understanding to machine learning requires you to “turn it into an algorithm.” One algorithm – the “perceptron” – attempted to model how one neuron learns, but didn’t address the layered interconnections essential to brain function. This algorithm worked mathematically, but it had a terrible impact on machine learning. Because mid-20th century thinkers focused on “neural networks,” people incorrectly concluded that they would have to “explicitly program” a system to produce intelligence.

“Although it is less well-known, many of the most important technologies in the world are the result of inventing a unifier, a single mechanism that does what previously required many.”

Connectionists “reverse engineer” the brain to create machine learning. “Backpropagation” is their Master Algorithm. This approach compares the output from a system with the output you want and changes the connections one layer of neurons at a time, improving the output each time.

3. “Evolutionaries”

These scientists see “natural selection” as the engine for learning. Evolutionaries use “genetic programming” as their Master Algorithm: They evolve computer programs in much the same way that organisms evolve in nature. They have an advantage in creating machine learning – nature, by way of Darwin, already articulated their algorithm. Evolutionaries use “a genetic algorithm” which works based on “a fitness function” – a score given to programs according to how well they accomplish what designers created them to do. Genetic algorithms work like “sexual reproduction,” mating the fittest programs and producing offspring that contain somewhat different qualities. Genetic algorithms can test multiple hypotheses simultaneously, and excel at coming up with genuinely new things.

4. “Bayesians”

Reverend Thomas Bayes (1701-1761) created an equation for incorporating new evidence into existing beliefs. Bayesians recognize the inherent uncertainty and incompleteness of all knowledge. They see learning as “a form of uncertain inference.” Their challenge is separating data from their surrounding noise and building systems that can deal with incompleteness. Their Master Algorithm is “Bayes’s theorem and its derivates.” Bayes’s theorem says you should revise how strongly you believe a specific hypothesis when you discover new data. Bayesians see learning as a specialized use of this theorem.

“Whenever a learner finds a pattern in the data that is not actually true in the real world, we say that it has overfit the data.”

If the data support a hypothesis, you give the hypothesis more weight. If the data contradict it, you give the hypothesis less weight. Words are not the best tool for presenting this reasoning, because people neglect key steps in evaluating reasoning. Trying to integrate multiple chunks of evidence adds complexity. People deal with this by compromising and simplifying their evaluation process until it is workable. A machine learner applying Bayes is “a Naïve Bayes classifier.” The name recognizes a key point: Bayes’s theorem starts from “a naïve assumption,” like how two symptoms of the flu correlate. Search engines use algorithms like Naïve Bayes to make basic assumptions about the terms that people search for most often.

5. “Analogizers”

Analogizers see “recognizing similarities” as central to learning. Their challenge is determining just how alike the two compared things might be by using “the support vector machine,” their Master Algorithm. While “neural networks” played a larger role in the early years of machine learning, analogy offers exciting possibilities for this Master Algorithm.

“Machine learning will not single-handedly determine the future, any more than any other technology; it’s what we decide to do with it that counts and now you have the tools to decide.”

Analogizers offer one of the best learning algorithms: “nearest neighbor.” This works so well because it does nothing. You don’t calculate anything. You just compare the new thing you encounter with records of existing objects in your database. If you want a machine to recognize faces, don’t define “face.” Instead, compare the new image to other pictures of faces. This reasoning works for online recommendations of books or movies. If you like X, you might like Y. You can modify this system to give more weight to some correlations or similarities because your wishes resemble those of one recommender more than they tap into the suggestions of another. The problem with the nearest neighbor algorithm is “the curse of dimensionality.” The more factors you try to integrate, the more difficult it becomes to use this algorithm.

One Master Algorithm

Machine learning is “a science and a technology.” If machine learning is a science, someone must combine its various theories. Many technological advances occur when someone invents “a unifier.” Unifiers are single mechanisms that combine the function of several different objects. The Internet functions as a unifier among different networks that cannot talk directly to one another. Microprocessors are unifiers; so are computers, and so is electricity.

“Society is changing, one learning algorithm at a time.”

The Master Algorithm will be the necessary unifier of the existing models. Creating the Master Algorithm requires “metalearning,” that is, learning about learning or learners. Metalearning requires running and combining multiple models. To combine different learners quickly, you might run the learners and tally their results. This is “stacking”; both Netflix and Watson use it. “Markov logic networks,” or “MLNs,” can unify these various approaches. MLNs are flexible, and you can apply them to any feature you want. Such learners can solve the problems of the five different tribes and provide a major step forward. For instance, if you combine MLNs with the posterior probability Bayesians use as an “evaluation function” and “gradient descent” as an optimizer, you have a universal learner.

A World of Learning Machines

As the Master Algorithm emerges, it will reshape your world. Every time you use a computer, and do whatever you intend to do, you’ll be teaching “the computer about you.” Just as you show different aspects of yourself at work and at leisure, so you may choose which aspects you share with different algorithms. Make this decision based on your goals for using this algorithm, its functions and the potential effects of having it misunderstand you.

“Bottom line: Learning is a race between the amount of data you have and the number of hypotheses you construct.”

As more data on everyone become available, you will increasingly live in “a society of models.” The Master Algorithm will develop increasingly accurate models of your likes and desires and how those compare to what you think you like and want. It will seek out objects, experiences, jobs and people for you and negotiate on your behalf. Your “bot” will examine sales pitches, check their facts and cut through persuasive rhetoric. The information currently distributed throughout distinct sites like Yelp or Amazon will become unified. Your searches will be more comprehensive and objective.

Conclusion

The key message in this book:

Machine learning algorithms are universal problem solvers that need only a few assumptions and a whole lot of data to work their magic. Unifying the current branches of machine learning into one ultimate master algorithm would advance humanity like no other single event in history. Even as it stands today, advanced algorithms and access to personal data are already crucial for businesses to be competitive.

Actionable advice:

Be aware of your data trail.

Every digital interaction has two levels, getting you what you want and teaching the computer a little bit more about yourself. The second one will be more important over the long-term, since it will be used both to serve you, by helping you perform tasks, and also to manipulate you, by showing you ads and recommendations likely to make you buy. So be aware. If you don’t want a current internet session to influence your personalization, go to incognito mode. And if you don’t want your kids to be shown YouTube suggestions and ads based on your history, make sure to use different accounts.

About the author

Winner of the SIGKDD Innovation Award, Pedro Domingos is a professor of computer science at the University of Washington and a fellow of the Association for the Advancement of Artificial Intelligence. Readers can download the MLN learner “Alchemy” at alchemy.cs.washington.edu.

1. The Machine Learning Revolution
2. The Master Algorithm
3. Hume’s Problem of Induction
4. How Does Your Brain Learn?
5. Evolution: Nature’s Learning Algorithm
6. In the Church of the Reverend Bayes
7. You Are What You Resemble
8. Learning Without a Teacher
9. The Pieces of the Puzzle Fall into Place
10. This Is the World on Machine Learning