The slides for my talk “The Return of Mind Design: Cognitive Science and the Turing/Ashby Debate”, with Erik Nelson, June 1, 2019 at the University of British Columbia are available here.
Slides located here.
Before you vote on Monday, please find out what your preferred candidate for Ward 13 will do about affordable housing. I believe it can’t be solved any time soon if we don’t take strong action against reseller websites.
A vote for Darren Abramson means a vote for Toronto city action that goes at least as far as Vancouver’s actions.
What will you do about Airbnb taking units from residents and driving up rent?
This seems to be a popular introduction to Capsule Networks. In Part I, on the “intuition” behind them, the author (not Geoffrey Hinton, although formatted the same as a quote from him immediately above) says:
Internal data representation of a convolutional neural network does not take into account important spatial hierarchies between simple and complex objects.
This is very simply not true. In fact, Sabour, Hinton and Frosst address this issue in their Dynamic Routing Between Capsules [pdf]:
Now that convolutional neural networks have become the dominant approach to object recognition, it makes sense to ask whether there are any exponential inefficiencies that may lead to their demise. A good candidate is the difficulty that convolutional nets have in generalizing to novel viewpoints. The ability to deal with translation is built in, but for the other dimensions of an affine transformation we have to chose between replicating feature detectors on a grid that grows exponentially with the number of dimensions, or increasing the size of the labelled training set in a similarly exponential way. Capsules (Hinton et al. ) avoid these exponential inefficiencies…
This is fundamental, and I hope folks avoid the error in thinking that ConvNets can’t “take into account important spatial hierarchies between simple and complex objects”. That’s exactly what they do, but as models of how brains take into account these hierarchies under transformations, they are badly inefficient at doing so.
From Andrew Ng’s recent video on end-to-end deep learning. Really helps me make sense of being in Cognitive Science/Computer Science graduate programs ~1999-2006.
“One interesting sociological effect in AI is that as end-to-end deep learning started to work better, there were some researchers that had for example spent many years of their career designing individual steps of the pipeline. So there were some researchers in different disciplines not just in speech recognition. Maybe in computer vision, and other areas as well, that had spent a lot of time you know, written multiple papers, maybe even built a large part of their career, engineering features or engineering other pieces of the pipeline. And when end-to-end deep learning just took the last training set and learned the function mapping from x and y directly, really bypassing a lot of these intermediate steps, it was challenging for some disciplines to come around to accepting this alternative way of building AI systems. Because it really obsoleted in some cases, many years of research in some of the intermediate components. It turns out that one of the challenges of end-to-end deep learning is that you might need a lot of data before it works well. So for example, if you’re training on 3,000 hours of data to build a speech recognition system, then the traditional pipeline, the full traditional pipeline works really well. It’s only when you have a very large data set, you know one to say 10,000 hours of data, anything going up to maybe 100,000 hours of data that the end-to end-approach then suddenly starts to work really well. So when you have a smaller data set, the more traditional pipeline approach actually works just as well. Often works even better. And you need a large data set before the end-to-end approach really shines.”
Source: “Technology and Courage” (warning, PDF) by Ivan Sutherland, April 1996, pg. 29. That last bit about what scientific progress is — what a gem. Anyone know where he got that from?
Consider the following terrible visualization:
Here are some serious problems with the presentation of data here:
- Having a statistic hovering around at around 10 times the differences of the important numbers makes them look small and insignificant.
- One scale applies to percentage of the population and another to year over year change.