Switching perspectives is important in both science and in life

Neural networks need not be thought of only in terms of brain metaphors

May 06, 2026

Today, 06 May 2026, Dr Brian Keating asked on X if neural networks in AI could be thought of differently from brain metaphors.

(See @Brian Keating on his Substack, @Keating Experiments)

Well, not only can they be, but in some ways, the non-brain view is even more useful to understand their seemingly limitless power, and their natural limitations.

More on that later in this post.

But it serves to highlight the point I made both in my Quora profile written in 2001, reshared here on Substack in 2025, that rather than subscribing to one viewpoint, it is helpful to live in a superposition of competing views until the need arises to choose one single view.

And that, given context, you then choose the most helpful illuminating view at that point. But crucially, you don't have to stick to that viewpoint for subsequent use in other contexts.

Let the new context make that choice for you.

To quote myself,

“I have no problem with not knowing things. I'm never 100% certain on anything, nor ascribe 0% probability to anything. I can hold 17 competing views in my head, and find some value in each of them - always looking for a certain angle or illumination given a specific context. I find that in different contexts one viewpoint may illuminate better than another - but there are no absolutes. To me observation and reflection is far more important than certainty. I think living in a superposition of competing ideas is helpful until the need arises for a decision in a specific context. At that point we must collapse the wavefunction of our competing ideas in favour of just one, but to do that too early is counterproductive. And if we collapse it and leave it collapsed, for future usage, we short change ourselves.”

Now I've seen my above description copied in the last year or so by individuals such as Chris Williamson, a podcast host and Love Island contestant. It was a little annoying not only because they didn't give me credit, but because they missed the crucial part:

It's not just about superposition of competing ideas, but being able to choose (collapse into) one single viewpoint under context. And then being able to choose different viewpoints in different contexts.

Williamson and other copycats peddling their supposed wisdom completely missed this crucial part.

Here is the full link:

Now let me focus on the advantages that occur from the ability to be flexible in viewpoints:

The power of switching viewpoints in Science:

Physics examples:

As a physicist, Dr Keating would know that in quantum mechanics, you can have the equivalent Schrödinger or Feynman path integral versions. Or, say, “interaction picture” Vs Schrödinger or Heisenberg pictures for time evolution in Quantum Mechanics.

In fact, the interaction picture is the picture we use in perturbative quantum field theory, that underpins the Standard Model of physics that explains the electroweak force (electromagnetism and the weak nuclear force) and the strong interactions (that binds atomic nuclei).

Yes, the interaction picture is the viewpoint behind quarks, gluons and leptons etc.

But the value of switcheling viewpoints is not only limited to physics.

Finance examples:

It’s true in mathematical finance as well. You can look at the Black Scholes model as an equation where one term (gamma) is diffusion and the other term (delta) is convection. You can see heat diffusion and convection, or you can see in your trader’s mind the P/L from gamma being offset by P/L from “carrying the delta” against time decay...

That's two different viewpoints with vastly different uses.

But there is a third one too (that in fact holds interesting analogies with the mathematics of neural networks).

That viewpoint is geometrical, where you see curvature (gamma) Vs slope (delta) etc.

That’s 3 very different viewpoints of the same simple Black- Scholes model. And they all add value given context.

Being able to switch viewpoints can be very helpful.

Those who can’t hold competing viewpoints in their heads tend to have limited, or “not enough” understanding. The brain metaphor for neural networks as the sole view is similarly trapped.

Now let me move on now, giving a view of neural networks where I totally eschew the brain metaphor view.

Neural network example:

I hope to show that while the brain metaphor is helpful it is not only not necessary, but in another way misses a more helpful direct view.

So I'm going to give you a totally different picture. I claim that a neural network is a single, complex, non-linear composite function.

Let me explain in 3 steps. First by motivation example, then by generalising neural networks, and finally by clarifying where activation functions fit in this view.

1. Motivating example:

To understand the general case, start with a simple non-linear target function:

g(x) = ax²

If you only use linear components like f(x, β) = βx, you can’t reach g(x) through standard composition (layering linear functions just results in linear function).

However, if you treat g(x) as the product of these components, g(x) = f(x, β) * f(x, γ), you see that by "layering" interactions and searching for the parameters (β, γ) where βγ = a, you can perfectly reconstruct my simple non-linear target.

2. Generalize to multi layer neural network:

A "neural network" is just this logic scaled into high-dimensional space. Let's define a network G as a sequence of nested functions (layers):

Y = G(x; θ) = (f_L ∘ f_{L-1} ∘ ... ∘ f_1)(x)

x is your input vector.

f_i are the individual layer functions.

θ is the massive parameter space (the "weights") you search through via optimization to map your inputs to the desired outputs.

In this view you aren't "teaching" a brain.

You are performing a global optimization over a parameter space to find a specific functional form, even if you never see the form written out explicitly.

People don't realise this because they don't see the explicit functional form written out explicitly.

Now I want to ensure the activation function part does not create confusion.

3. The Activation Function is Part of the Unit

For this composite function G(x) to work, each constituent function f_i must be non-linear. You define the operation at each step i as:

f_i(h) = σ(Wi * h + bi)

In this notation:

(Wi * h + bi) is the linear transformation (the affine map).

σ (sigma) is the activation function (like ReLU or Sigmoid).

Crucially, σ isn’t an "add-on." It is an integral part of the function f_i itself.

By embedding this non-linearity into every step of the iteration, you allow the total composite function G(x) to approximate virtually any continuous mapping.

Conclusion:

A neural network, in this view, is a differentiable, high-dimensional nonlinear composite function that uses iterative optimization to approximate a target mapping.

In short, just one large nonlinear function with a great many tweakable parameters to get the desired output.

I've found that very useful in understanding why they seem to have such limitless powers, even if in a limited sense.

So now you have two pictures:

A) The brain picture

B) Nonlinear function picture

Switch between these two viewpoints under context. And realise that the second viewpoint is related to a third:

C) An energy landscape over a large parameter space

Elinor Greenberg

Fascinating! I have been describing a similar phenomenon to describe my way of thinking:

It is as if I am sitting in the middle of an encyclopedia. I can reach out and grab any three things I want and find a useful way to connect and utilize them. My knowledge = multiple different resources. I choose which to use depending on how I perceive the situation.

Am I understanding you correctly? I just did a very quick read.

2 replies by Nasir Afaf and others

2 more comments...

Nasir’s Substack

Discussion about this post

Ready for more?