Machine Learning Philosophy of Machine Learning

Slides for Big Data Institute 5th Anniversary Panel

Slides located here.

Machine Learning Philosophy of Machine Learning

This is good and true

Machine Learning

Let’s Be Careful About Capsule Networks vs. ConvNets

This seems to be a popular introduction to Capsule Networks. In Part I, on the “intuition” behind them, the author (not Geoffrey Hinton, although formatted the same as a quote from him immediately above) says:

Internal data representation of a convolutional neural network does not take into account important spatial hierarchies between simple and complex objects.

This is very simply not true. In fact, Sabour, Hinton and Frosst address this issue in their Dynamic Routing Between Capsules [pdf]:

Now that convolutional neural networks have become the dominant approach to object recognition, it makes sense to ask whether there are any exponential inefficiencies that may lead to their demise. A good candidate is the difficulty that convolutional nets have in generalizing to novel viewpoints. The ability to deal with translation is built in, but for the other dimensions of an affine transformation we have to chose between replicating feature detectors on a grid that grows exponentially with the number of dimensions, or increasing the size of the labelled training set in a similarly exponential way. Capsules (Hinton et al. [2011]) avoid these exponential inefficiencies…

This is fundamental, and I hope folks avoid the error in thinking that ConvNets can’t “take into account important spatial hierarchies between simple and complex objects”. That’s exactly what they do, but as models of how brains take into account these hierarchies under transformations, they are badly inefficient at doing so.

Machine Learning

How Reuters turns truth into a supervised learning task

This is a very good headline to be pushing these days:


Its source explains how Reuters attempts to turn the detection of fake news into a supervised learning problem.

News Tracer also must decide whether a tweet cluster is “news,” or merely a popular hashtag. To build the system, Reuters engineers took a set of output tweet clusters and checked whether the newsroom did in fact write a story about each event—or whether the reporters would have written a story, if they had known about it. In this way, they assembled a training set of newsworthy events. Engineers also monitored the Twitter accounts of respected journalists, and others like @BreakingNews, which tweets early alerts about verified stories. All this became training data for a machine-learning approach to newsworthiness. Reuters “taught” News Tracer what journalists want to see.

That’s how the labels are assigned.

Here’s how the features are assigned:

The system analyzes every tweet in real time—all 500 million or so each day. First it filters out spam and advertising. Then it finds similar tweets on the same topic, groups them into “clusters,” and assigns each a topic such as business, politics, or sports. Finally it uses natural language processing techniques to generate a readable summary of each cluster.


News Tracer assigns a credibility score based on the sorts of factors a human would look at, including the location and identity of the original poster, whether he or she is a verified user, how the tweet is propagating through the social network, and whether other people are confirming or denying the information. Crucially, Tracer checks tweets against an internal “knowledge base” of reliable sources. Here, human judgment combines with algorithmic intelligence: Reporters handpick trusted seed accounts, and the computer analyzes who they follow and retweet to find related accounts that might also be reliable.

Dr. David McCarty used to joke to me that people who didn’t understand the factive nature of “facts” wanted computers to detect them using logic. Of course, that’s impossible.

Machine learning yokes computers to the world. For this reason, the joke isn’t funny when it’s machine learning detecting facts. This is how “learning machines”, to use Turing’s term, contains the solution to the failures of logic-based AI. This is what Geoffrey Hinton was getting at in his short, pithy acceptance speech for the IEEE Maxwell Medal:

50 years ago, the fathers of artificial intelligence convinced everybody that logic was the key to intelligence. Somehow we had to get computers to do logical reasoning. The alternative approach, which they thought was crazy, was to forget logic and try and understand how networks of brain cells learn things. Curiously, two people who rejected the logic based approach to AI were Turing and Von Neumann. If either of them had lived I think things would have turned out differently… now neural networks are everywhere and the crazy approach is winning.