Let’s Be Careful About Capsule Networks vs. ConvNets

Post author By the owner of the establishment.
Post date January 18, 2018
No Comments on Let’s Be Careful About Capsule Networks vs. ConvNets

This seems to be a popular introduction to Capsule Networks. In Part I, on the “intuition” behind them, the author (not Geoffrey Hinton, although formatted the same as a quote from him immediately above) says:

Internal data representation of a convolutional neural network does not take into account important spatial hierarchies between simple and complex objects.

This is very simply not true. In fact, Sabour, Hinton and Frosst address this issue in their Dynamic Routing Between Capsules [pdf]:

Now that convolutional neural networks have become the dominant approach to object recognition, it makes sense to ask whether there are any exponential inefficiencies that may lead to their demise. A good candidate is the difficulty that convolutional nets have in generalizing to novel viewpoints. The ability to deal with translation is built in, but for the other dimensions of an affine transformation we have to chose between replicating feature detectors on a grid that grows exponentially with the number of dimensions, or increasing the size of the labelled training set in a similarly exponential way. Capsules (Hinton et al. [2011]) avoid these exponential inefficiencies…

This is fundamental, and I hope folks avoid the error in thinking that ConvNets can’t “take into account important spatial hierarchies between simple and complex objects”. That’s exactly what they do, but as models of how brains take into account these hierarchies under transformations, they are badly inefficient at doing so.

Philosophy of Machine Learning

History and Philosophy of Science, Present Day

Post author By the owner of the establishment.
Post date October 9, 2017
No Comments on History and Philosophy of Science, Present Day

From Andrew Ng’s recent video on end-to-end deep learning. Really helps me make sense of being in Cognitive Science/Computer Science graduate programs ~1999-2006.

“One interesting sociological effect in AI is that as end-to-end deep learning started to work better, there were some researchers that had for example spent many years of their career designing individual steps of the pipeline. So there were some researchers in different disciplines not just in speech recognition. Maybe in computer vision, and other areas as well, that had spent a lot of time you know, written multiple papers, maybe even built a large part of their career, engineering features or engineering other pieces of the pipeline. And when end-to-end deep learning just took the last training set and learned the function mapping from x and y directly, really bypassing a lot of these intermediate steps, it was challenging for some disciplines to come around to accepting this alternative way of building AI systems. Because it really obsoleted in some cases, many years of research in some of the intermediate components. It turns out that one of the challenges of end-to-end deep learning is that you might need a lot of data before it works well. So for example, if you’re training on 3,000 hours of data to build a speech recognition system, then the traditional pipeline, the full traditional pipeline works really well. It’s only when you have a very large data set, you know one to say 10,000 hours of data, anything going up to maybe 100,000 hours of data that the end-to end-approach then suddenly starts to work really well. So when you have a smaller data set, the more traditional pipeline approach actually works just as well. Often works even better. And you need a large data set before the end-to-end approach really shines.”

Tags MachineLearningScrapbook

Data Science

How to automatically convert number to letter grades

Post author By the owner of the establishment.
Post date April 22, 2017
No Comments on How to automatically convert number to letter grades

This is a great resource, but because it doesn’t use anchor references, copying and pasting will break. Here is an example that is complete with rounding to the nearest whole number.

Machine Learning

How Reuters turns truth into a supervised learning task

Post author By the owner of the establishment.
Post date December 6, 2016
No Comments on How Reuters turns truth into a supervised learning task

This is a very good headline to be pushing these days:

REUTERS BUILT A BOT THAT CAN IDENTIFY REAL NEWS ON TWITTER

Its source explains how Reuters attempts to turn the detection of fake news into a supervised learning problem.

News Tracer also must decide whether a tweet cluster is “news,” or merely a popular hashtag. To build the system, Reuters engineers took a set of output tweet clusters and checked whether the newsroom did in fact write a story about each event—or whether the reporters would have written a story, if they had known about it. In this way, they assembled a training set of newsworthy events. Engineers also monitored the Twitter accounts of respected journalists, and others like @BreakingNews, which tweets early alerts about verified stories. All this became training data for a machine-learning approach to newsworthiness. Reuters “taught” News Tracer what journalists want to see.

That’s how the labels are assigned.

Here’s how the features are assigned:

The system analyzes every tweet in real time—all 500 million or so each day. First it filters out spam and advertising. Then it finds similar tweets on the same topic, groups them into “clusters,” and assigns each a topic such as business, politics, or sports. Finally it uses natural language processing techniques to generate a readable summary of each cluster.

and

News Tracer assigns a credibility score based on the sorts of factors a human would look at, including the location and identity of the original poster, whether he or she is a verified user, how the tweet is propagating through the social network, and whether other people are confirming or denying the information. Crucially, Tracer checks tweets against an internal “knowledge base” of reliable sources. Here, human judgment combines with algorithmic intelligence: Reporters handpick trusted seed accounts, and the computer analyzes who they follow and retweet to find related accounts that might also be reliable.

Dr. David McCarty used to joke to me that people who didn’t understand the factive nature of “facts” wanted computers to detect them using logic. Of course, that’s impossible.

Machine learning yokes computers to the world. For this reason, the joke isn’t funny when it’s machine learning detecting facts. This is how “learning machines”, to use Turing’s term, contains the solution to the failures of logic-based AI. This is what Geoffrey Hinton was getting at in his short, pithy acceptance speech for the IEEE Maxwell Medal:

50 years ago, the fathers of artificial intelligence convinced everybody that logic was the key to intelligence. Somehow we had to get computers to do logical reasoning. The alternative approach, which they thought was crazy, was to forget logic and try and understand how networks of brain cells learn things. Curiously, two people who rejected the logic based approach to AI were Turing and Von Neumann. If either of them had lived I think things would have turned out differently… now neural networks are everywhere and the crazy approach is winning.

Computer

Is this what Phil was on about?

Post author By the owner of the establishment.
Post date September 13, 2016
No Comments on Is this what Phil was on about?

screen-shot-2016-09-13-at-5-59-21-pm

Source: “Technology and Courage” (warning, PDF) by Ivan Sutherland, April 1996, pg. 29. That last bit about what scientific progress is — what a gem. Anyone know where he got that from?

Issues

Violent crime is decreasing,
but is way worse than the early 60s

Post author By the owner of the establishment.
Post date July 23, 2016
No Comments on Violent crime is decreasing,
but is way worse than the early 60s

Rplot

Source

Consider the following terrible visualization:

Screen Shot 2016-07-23 at 8.21.49 PM

Source

Here are some serious problems with the presentation of data here:

Having a statistic hovering around at around 10 times the differences of the important numbers makes them look small and insignificant.
One scale applies to percentage of the population and another to year over year change.

Computer Issues

Exploring myweb.dal.ca

Post author By the owner of the establishment.
Post date June 24, 2016
No Comments on Exploring myweb.dal.ca

I received this from my University admin two days ago — the “reply” address for this email is not monitored.

adminEmail

I’ve written a short bit of python and given instructions for archiving myweb.dal.ca content — it’s up to you to supply URL(s) to the script.

Data Science Issues

New York Primary: Google Trends

Post author By the owner of the establishment.
Post date April 12, 2016
No Comments on New York Primary: Google Trends

Computer Music

What Sharky Laguana gets wrong

Post author By the owner of the establishment.
Post date August 18, 2015
3 Comments on What Sharky Laguana gets wrong

Sharky Laguana, if that is your real name, you get at least two things wrong in your article about streaming music.

It’s spelled “aficionados”;
Your argument about inequity depends on facts you don’t have about distribution of choices.

We can do number 2 a couple of different ways.

Your Rdio spreadsheet example only works, with its difference between columns, on the premise that Brendan is the only person paying each of the artists for which the inequity is great.

Let’s do an example. Suppose that everyone, on average, has the same musical tastes, and listens to artists at the same rate. I know that’s not true — but if it were, then clearly there would be no difference between the Subscriber Share and Big Pool methods as you call them:

1/8x +1/8y = 1/8(x + y)

Your argument requires the premise that distribution of artists listened to is very different at different streaming levels. Do you know this? If so, how?

Finally, I find it funny that you think that these companies aren’t aware of the problem of stream falsification, and aren’t working on addressing it.

Computer Music

Google Music* Got Smart

Post author By the owner of the establishment.
Post date August 16, 2015
No Comments on Google Music* Got Smart

Anyone can google “It’s Like That” and find out that it’s a 1983 song by American hip-hop group Run-D.M.C.

That’s the second result I get for it. The first is a video for the 1997 Jason Nevins remix of the song.

Suppose I ask Google Music to “Start radio” for that song. Since it doesn’t have that song, I have to Start radio for the original 1983 version. What does “Start radio” actually mean? You can find the vague answer here.

Google should know that I like the 1997 version — after all, YouTube’s data is their data. If Google Music were interested in clever music discovery, it wouldn’t just use Wikipedia facts (American hip-hop, 1983) and build a playlist of the top 40 songs that more or less meet that description. Unfortunately, sometimes that’s exactly what it does. I’m pretty sure that every one of these songs was top 40 American R&B in the 1980s.

Non stop hip hop

Wouldn’t it be nice if it included, say, some “electro-hop“?

When I did “Start radio” on my phone for Farbwechsel by Myrone Aiden, something strange happened. Check this out:

PlayM

This radio station contains music that cannot be grouped together by era nor, therefore, by genre. That’s a funny thing about popular music: its genres are very specific to decades. Maybe that will change now that our corporate overlords are less in charge of defining them than they used to be.

There’s a thread running through these tunes. I’m not sure what it is, but the fact that Todd Terje is making light-touch remixes of Roxy Music and collaborating with Bryan Ferry to do covers of Robert Palmer suggests that it’s there.

This playlist contains music from a bunch of different decades/genres/styles/countries. It is even largely weighted towards artists that I’m familiar with and like. I see only one song that I’m sure I won’t like, having listened to its album on a road trip recently. There’s even some songs where I want to cast a sly glance in Google’s direction and wonder how it knows me so well. (I know how it knows me so well, and the answer is exactly as creepy as you’d expect — it watches us.)

Google is a company that is defined by “being good at things that you need crazy amounts of data to be good at.” The technical term for ‘crazy amounts’ is ‘web-scale data.’ Their strengths at search, advertising and translation (among many others) are defined by their effective use of web-scale data.

Wouldn’t it be amazing if Google Music’s discovery tools always made such effective use of web-scale data? Of course, all data mining/machine learning algorithms need to be tuned by someone knowledgeable.

Google, here’s my suggestion: if you haven’t already, hire the folks at Soma.fm to do quality assurance for music discovery on your service for a while. They are in the (not-for-profit) business of defining new genres.

Right now their website says they need $940 by the end of the day. You can afford that, right?

*

Google Play Music All Access is a terrible name for a service. I wonder what Apple would call a similar service?