In this amazing
blog post of the magazine Nature, Davide Castlevicci pointed out that we understand pretty little about how machines learn. Where 'we' is actually defined as the experts who develop and implement deep learning systems. I would even state that we understand far more about human learning now, than about how the way the machines learn we create!
The underlying causes of this are simplicity, complexity and chaos theory - let me explain.
Simplicity
The basic learning rules of a artificial neural network aka deep learning system can be only understood when focusing on the smallest unit of this construct, the artificial neuron. It goes way back into the year 1949 where neuropsychologist
Donald O. Hebb formulated the idea of 'what fires together, wires together' in his book
'The Organization of Behavior'. Today it is known as the
Hebbian learning theory or in short: The Hebb rule.
Simply written, you can imagine two nerve cells and mentally observe their connections strength to each other. Hebb postulated now that if these cells are excited at the same time and even fire an electrical impulse at the same time, then their connection strength will increase.
Here is is interesting to note that this postulate came from the theory of
associative learning.
So, Donald Hebb took a theory for macroscopic learning effects and transferred it to the microscopic level of nerve cell assemblies. Some years later, this theory was proven right by
Eric Kandel, who was able to confirm these predictions in the nerve cells of the
California sea hare. Back then, the reason was simply because the sea hare had nerve cell bodies up to 1 mm in diameter - which made it easy to place measurement electrodes inside of them.
The learning rule itself is very easy to understand and laid the foundation for the later theoretical formulations of artificial neural networks.
Complexity
The movement of one pendulum is something which is mathematically very easy to describe - high school stuff. Take two of these buggers, connect them and it gets quite complicated. Be extremely mischievous and attach them to each other to form a
double pendulum, give a mathematician the task to formally (aka with lots of equations) describe it's movement and he will most likely quit the job or end up in an insane asylum. This is because describing the movement of a single pendulum is possible with formal mathematics to a sufficient level, so that we confuse the mathematical model with reality. Trying to apply this formalism to a double pendulum reveals the inaccuracies of this single model. The slightest influences, like air movement, magnetic fields or even the infinitesimally small force of light pushing, may change the movement of this system so drastically that any formalism must yield.
Now, above we had two nerve cells - but deep learning networks can reach sizes of hundred of billion parameters. Where parameters are the numerical representations of the mathematical formalism in the machine's memory. It can be very roughly compared to the size of the human brain by assuming the numbers of neurons and the numbers of connections (
Synapses) resemble these parameters. The number of parameters in the human brain would be some orders of magnitude higher.
So we can conclude that deep learning networks and biological networks are very complex stuff.
Chaos theory
Chaos theory postulates that such complex systems, no matter how vast they are, are behaving not necessarily completely random. Even if we take the simple mechanics of two colliding molecules in the air and scale the description up to the level of all molecules in the air of this planet Earth - there is still hope.
Otherwise there would me no weather forecast!
Right, weather can be seen as a very complex assembly of simple physical rules - that is why weather frogs, sorry meteorologists, need so powerful computers the even the most megalomaniac gaming rig would pale in comparison. In physics there is a simple rule:
If mathematicians can't describe it, then calculate it!
Only, there are a lot of calculations to make for the weather forecast!
Same is in deep learning networks. The difference is, that in the weather forecast we observe a natural phenomenon and try to predict its behavior. In deep learning networks the system expresses behavior and we try to understand it.
Psychology!
Since a bit more than one hundred years psychologists try exactly that what deep learning geeks are trying to figure out now. To understand and predict the observable behavior of very complex systems. The difference is that, we psychologists mostly deal with organic computers walking on two legs and having mood swings. Where computer scientists deal with silicon computers without legs (for now!) which have no mood swings (for now!) but are also 'doing' strange things.
My point is:
Since both disciplines try to understand the macroscopic behavior of complex systems,
which are based on similar principles (!), is it too far fetched when we try to apply psychological theory construction on artificial information systems?
PS:
In all this lies the root of another major humiliation of the human condition. On the scale of Copernicus, Galileo's and Darwin's conclusions. More about this, another time!