heading · body

Transcript

A Multiscale Logic Of Collective Intelligence Hoffman And Prakash

read summary →

A Multiscale Logic of Collective Intelligence - Donald Hoffman and Chetan Prakash

Okay, so multi-scale logic of collective intelligence and it’s what we call the the recursive trace logic. So, we’ve had the trace logic for couple years, but in the last couple months discovered a a recursive aspect to it that will lead into the notion of agency that’s that’s novel. So, this is this is different Chris than the conscious agent theory. It’s a different notion of agency that we’ve had before.

So, the big topics I’d like to — Can you guys see me? Yeah. Okay. So, so I’m going to talk a little bit about collective intelligence, our our model of collective intelligence, how it involves coarse graining, which is important to you guys, how it involves generative models, minimizing surprise automatically, bending problem spaces, a recursive notion of agency and self, a new intelligence metric for for agents we’ll call lambda sub two and and its relationship to your measure K, and then how this is all beyond space-time and and quantum theory and I’ll start there just briefly about why I’m thinking entirely outside of space-time and quantum theory.

The idea is that high energy theoretical physicists are done with space-time. They say it’s not fundamental. So, here’s Nima Arkani-Hamed at the Institute for Advanced Study: “Space-time is doomed. There’s no such thing as space-time fundamentally in the actual underlying description of the laws of physics.” And he makes it very very clear that he’s saying space-time and anything inside space-time and that includes anything with unitary evolution, quantum theory in particular. So, he’s going beyond space-time and quantum theory. And it’s not just him. Because of his success and his collaborators the ERC has funded a 10 million euro initiative called Universe Plus and it’s all about going entirely beyond space-time and entirely beyond quantum theory. Looking for what they’re calling positive geometries. And so, there’s over a hundred high energy theoretical physicists and mathematicians now working on this and they’re finding these positive geometries that give you scattering amplitudes without any quantum theory whatsoever. And you get it much more easily and more simply than with quantum theory. So, I’m stepping entirely outside of space-time.

[Question about status of general relativity]

Right. So, the idea is that the very notions of space and time, even the combination of them as space-time is not fundamental at all. So, general relativity will go the way of all theories. It will be, you know, like Newton we still use Newton for certain cases. We’ll still use GR for certain cases, but we needed a much deeper theory. The hard fact is that when you bring together GR and quantum theory, you find that space-time has no operational meaning at the Planck scale. 10 to the minus 33 centimeters, 10 to the minus 43 seconds. It simply has no operational meaning. So, that means we have to find a deeper foundation. So, these are only at best approximation theories.

Absolutely. That’s the idea that we thought space and time were the fundamental nature of reality. We might have even thought they were a priori true or something, but that’s just wrong. And science has a way of forcing us.

[Chris Fields adds comment]

If one formulates basic ideas of quantum theory outside space-time completely, then there are many routes which are under study by huge numbers of people for generating space-time as a consequence of basically assumptions about quantum information theory. And also many many routes for generating Einstein’s equations as either approximations or outcomes of other kinds of assumptions. So, GR turns into something like the status that classical physics has with respect to quantum theory in space-time. Which is a limiting case, an approximation that’s good in some circumstances for doing some things.

Right. And what Nima and the ERC group are doing is even going beyond that because they’re saying we’re not going to even start with quantum information theory. Anything quantum itself is going to arise joined at the hip with space-time from something far deeper. So, they want to show quantum information theory and general relativity arise together from something that couldn’t care less about unitarity at all. So, there is no locality and there is no unitarity period in these new positive geometries. And they don’t care about unitarity and they show that then quantum information theory comes out as a approximation and special case at the same time that you get space-time.

John Wheeler of course was trying to think out of the box and he was saying someday — this is 1990 in his wonderful book on gravity and space-time — he says someday surely we’ll see a principle underlying existence so simple, so beautiful, so obvious that we’d all say to each other, oh how could it have all been so blind so long. So, that’s what we’re looking for.

Wheeler suggested that the note struck out on the piano by the observer participants of all places and all times — bits of the it — are in and of by themselves constitute the great white world of space and time and things. So, he was trying to start with what he calls observer participants. And he thought that maybe somehow that was in his “it from bit” paper 1989. And he actually in his paper cited work that Chetan and I were doing in our book observer mechanics.

So, what’s a minimal observer participant? We’re going to start with just the absolute bare basics. They have experiences like smell of garlic, taste of mint. And these experiences can change. That’s all I’m going to assume. That’s the foundation of everything. So, my ontology is there are experiences and they can change.

So, for example, maybe I have four experiences. A very simple observer: red, green, blue and I’ll call that yellow. And they change. So, now I’m seeing yellow. Now I’m seeing green. Now I’m seeing blue and so forth. They keep changing.

The simplest and most general way of talking about that is just to talk about Markov chains. The Markov matrix — the first row has if I see red now what’s the probability I’ll see red next. The point three is if I see red now it’s a three tenths chance, a 30% chance I’ll get green next and so forth. It’s just a transition matrix probability of seeing the next color given that I’m seeing the current color.

One aspect of Markov chains is that they immediately instantiate a very interesting kind of goal directed behavior. No matter what state you start the Markov chain in, it has a target stationary measure. No matter what state you start this matrix in it is going to go eventually to that state. So, and you can perturb it as much as you want. It will resist the perturbation and head back to that target state. So, already we have in the very structure of this a goal directed behavior. As William James mentions, intelligence is having achieving a fixed goal with variable means of achieving it. So, if you have an ergodic Markov chain then you will have a stationary measure.

Now, the idea is I want to have multi scale collective intelligence and so we need a notion of scale. So, I’m just going to take an observer that sees a subset of the states. The first observer has four colors it can see. Let’s consider one that has only two. So, that’s my notion of scale — the subset relationship among the number of experiences that you have.

Now, here’s the key idea. Suppose I take the matrix on the right as describing the reality. This is what’s happening. Those are the transitions. But this observer on the left only sees two: red and green. What transition probabilities is it going to see?

When you do the mathematics it turns out you get a very specific two by two matrix. Notice that the numbers are completely different from the original matrix. You’re not just copying. It’s a computation that you have to do. The matrix on the left is called the trace of the matrix on the right. That’s just standard in Markov theory that’s been around for more than half a century. So, this is not new to me or to us.

The way you compute the trace: take matrix P and divide it into four submatrices. There’s a two by two matrix that has point two point three point five point two — that’s for the red and green. That we’ll call matrix A. So, A is the submatrix based on the states that are going to be visible to the sub observer. C is the submatrix relating states that are dark to this new observer. It doesn’t see this. So, this is all dynamics that’s dark to it. B is the matrix that is the exits — this is the exits from what you can see to the dark region. D is the re-entrance. This is getting from the invisible world into the visible world.

The trace formula: The trace matrix on A (the visible states) is you just take the original matrix A. And you add this interesting thing on the right. I is the identity matrix. So, you take the identity matrix minus the dark matrix. And you take its inverse. That has the effect of being able to explore all possible paths — there’s an infinite number of paths through C that you could take. So, I minus C quantity inverse is exploring the infinite number of paths there and then you pre multiply by the exits and post multiply by the entrances. You’re basically getting the trace by looking at all the ways that you can go outside of the trace and then coming back into the trace.

So, you have hidden memories and controls. B, C, and D are going to be hidden layers of control that the agent A cannot see but will be influencing their behavior.

All of that is old. Here’s the new stuff. We discovered just a couple years ago that the trace relationship gives you a partial order on all Markov chains. That was the discovery. And that’s what launched this whole thing. So, it’s a partial order which means that there is a logic.

The definition is that a matrix M is less than or equal to a matrix N in the trace order if and only if M is a trace of N. That’s it. One trivial definition but no one saw it before. It turns out that that definition gives you a multi scale logic of minimal surprise. And the reason it’s minimal surprise is because the trace is the zero surprise view of the bigger matrix. That’s the key idea. It is the zero surprise subset matrix view. The stationary measure is identical to the normalized restriction of the original stationary measure. So, you have zero surprise in the dynamics and in the stationary measure again zero surprise. So, the trace logic is the logic of minimal surprise for arbitrary dynamical systems. Minimizing surprise is of course a key to intelligence. But this is multi-scale.

The set of all Markov chains form a non-Boolean logic under the trace order. That means that there’s no global top, there’s no global negation, many matrices do not have meets and joints (ands and ors). However, if you take any particular Markov chain P and you look at all of its traces they form a Boolean sub-logic. So, I can pick any Markov chain, look at all of its traces, all those Markov matrices together form a Boolean logic. This Boolean logic has two to the n members. If there’s n experiences then there are two to the n members in this Boolean logic of traces.

Here’s the key new idea — only two months old. Once we have the trace logic which is a logic on observer windows — the infinite space of all possible observer windows — there’s this minimal surprise logic on all of it, the trace logic cleanly well defined.

How do I want to model agency? Agency is a matter of changing which window I want to look through. I want to have a policy for how if I’m looking at the world this way then how do I want to look at the world next? And how do I do that? Well, another Markov chain. The Markov kernel will say what’s the probability if this is my current window that my next current window will be such and such. So, a policy is a Markov matrix on the trace logic itself. The trace logic is the entire logic of minimal surprise on possible conscious observers. The first step of agency is to say let’s crawl along the trace logic.

Now, if we look at the collection of all Markov kernels — I’ll call those policies — each Markov kernel is a policy. It’s a first order of agency. Since they’re Markov matrices they satisfy their own trace logic. We now have the first trace logic of observer windows. Now, we start crawling on that trace logic of observer windows. That’s our first layer of agency. It has its own trace logic. So let’s call this recursive trace logic. It’s recursive now. And you can see we can do this ad infinitum. Once we have the trace logic of policies, I can now crawl on it and get meta policies. I can take agency to whatever layer of complexity I want.

Policies can model attention shifts, scale shifts, reparameterizations. The recursive trace logic is the collection of all policies with their trace logics and then recurse recurse recurse again. So we have a choice of policy, meta policy, meta meta policy, and so forth.

An intelligence metric: for any probability measure pi there are many Markov chains for which pi is a stationary measure. There’s an infinite class of Markov chains that will have that stationary measure and they vary in interesting ways. They have different rates of convergence. Some will have this goal-directed behavior where they’re going almost immediately to the goal. No matter where you start them, they will go in just a couple steps to the goal and others will converge very very slowly. So in the trace logic we can choose how quickly we want to converge to our goal. Search efficiency is your measure K — it’s a model of intelligence. So we have a dial here that we can dial the intelligence.

The convergence rate is dominated by lambda two (lambda sub two) which is the largest eigenvalue of the Markov matrix less than one. There are Markov chains with different lambda twos that all have the same stationary measure and so they converge to it at different rates. There’s a connection between this Markov notion of intelligence (the lambda two convergence) and your metric K.

You talk about different layers, it’s hierarchical and higher layers can bend the geometry of the problem space for lower layers. How do you model that with Markov chains? It turns out you can have lots of different community structures. For any stationary measure pi there are an infinite number of Markov chains that have pi stationary but that have differing community structures. Community structure: you could have thousands of states in this Markov chain, maybe a few hundred are tightly connected over here, a few hundred are tightly connected over there. Just a few cross links. So the whole thing is ergodic, but you might have like 10 communities that are tightly knit. Within each community, maybe my hundred state community, if I look at it more closely, it itself is composed of maybe three new sub communities. You can have an infinite number of communities, sub communities, sub communities all the way down, all having the same stationary measure. What this gives us is you might have one big goal, reach the stationary measure, but you could have subgoals — which community. The community structure is dictated by the eigenvectors with eigenvalues close to one because they involve slow mixing between communities. The communities mix inside themselves, but they don’t mix between the communities very much.

We can have policies that are trying to focus on stationary measures, community structures, convergence rates, particular dynamical models. And then the meta policies can explore different policies. This starts to give us a recursive notion of agency. The reason I’m bringing this up is here is a framework of mathematical tools that’s incredibly simple. There’s one definition, the trace. That’s the only mathematics there. And then there’s one observation, the trace logic. And then the third observation is it’s recursive. That’s it. And then all the tools are at your disposal.

How does this relate to notions of Markov blankets and the self versus the world? Markov blankets are strictly speaking defined for directed acyclic graphs. And there they define a boundary between self and the world. I want to upgrade these notions to Markov chains. Markov chains are graphs but they’re not acyclic — they allow cycles. This is one upgrade. And then we’re upgrading to labeled cyclic graphs, labeled by the transition probabilities. So we’ll want to move from the standard notion of Markov blanket to what I would call a trace blanket. Here we have to actually construct the self and the world.

Certain experiences like pleasure and pain will be part of the experiences agents have, and to the extent that certain actions lead to greater hitting of the pleasure centers — higher stationary measure for pleasure — they’ll be sought. Higher stationary measures for the pain will be avoided.

[Question: is an action just a change in policy?]

A policy gives you an action on observer windows because your action is to change observer windows. A meta policy gives you a higher level of action because you’re now changing policies. A meta meta policy would be an even higher level of action because you’re changing your meta policy. So actions are all either changing what you’re looking at or changing how you decide what you’re looking at. It’s a recursive notion of action.

Some policies — in my observer windows, my hands and my body often appear whereas other things that I call the external world don’t appear that often. I also notice that I seem to be able to directly control my hands and my body. But if I want to have my phone move, I need to move my hand so that I can pick up the phone. In the Markov blanket approach, it’s clean: you give me a set of nodes, their blanket is the parents, offspring, and parents of the offspring. That’s your skin, the boundary. Here it’s much more complicated. I have to use the notion of agency in a non-trivial fashion and learn probabilistically what features of my sequence of observer windows remain there most of the time. My hands are there most of the time, and certain actions with my hands are associated with pleasure signals, others with pain signals. So I’m learning to do certain things with my hands and don’t stick them in the fire. Other things are much more contingent. So I can use probabilities of what I’m seeing in my observer windows as a way of starting to construct myself versus the outside world plus these pleasure and pain guides.

[Question: what does it mean to control your hand if the only action is changing what you’re looking at?]

I have an observer window where my hand is touching my ear. Now I want an observer window in which my hand is touching my leg. And so I transition to that observer window. What’s happening is I’m choosing what I want to see in my movie next. And that’s what we call moving my hand. You have to really think out of the box now. This is a choice of what I want to see next. And that’s what the actions are. It’s very austere. What I love about it is it’s austere. There’s only one equation and one logic and so you have very very tight guides. And yet the claim is we should be able to get everything out of it.

Bayes’ rule falls out of the meet — the and — of the trace logic. Bayesian inference is effectively a special case of the meet of the trace logic.

You talk about bending the option space. Of course that was metaphorical when you talk about bending the option space, but there is a real sense in which I want to get space and space-time itself. What I’m working on heavily is I believe that we can actually boot up special and general relativity entirely from the trace logic. So relativistic space-time can be constructed entirely from the trace logic. And this would be fulfilling John Wheeler’s goal that starting with only observer participants we can build up all of space-time physics.

It’s standard in Markov chain theory to have what are called enhanced Markov chains. You have a Markov chain but you also have a counter. Every time your experience flips, your counter increments. Here I’ve got a case where I’ve got the four color agent and then the sub agent of just red and green. Each has a counter. The counter on the left is going much faster than the counter on the right because it’s seeing more experiences. The counters for the sub observers are going at a slower rate than the ones above them. So if I’m less than you in the trace logic, my time counter is going less than yours. The trace logic also is giving you a relationship among counters and that we claim is the time dilation of special relativity and general relativity.

Distances can also be derived. The distances you will get in the trace window are different than the distances you’ll get in the bigger window. This is where we’re hoping to get general relativity coming out of this. There are notions like commute time between states. The commute time: the expected time starting at green getting to blue and then back to green. That expected time can be viewed as the square of Euclidean distance. There are canonical ways of getting Euclidean distances from commute time properties. There are also Dirichlet measures which are even more to the point. The trace logic gives us the time dilation and length contractions of special and general relativity. Time runs lower on the trace. Gaps between ticks are wider.

We haven’t solved the agency thing. What we’ve got is a language that’s principled for talking about agency.

[Mike Levin: What do you make of the fact that you’re pulling descriptions of physics and descriptions of agency out of the same starting material?]

Something I’ve been saying for quite a while is that space-time’s just a headset. We’re effectively saying we can build the headset. Space-time is not the reality that’s independent of us. Our typical view is Hoffman is this tiny little 160-lb thing inside of a massive space-time universe. And I’m saying no. What we call Hoffman is just an avatar inside a space-time headset that’s being created by consciousness. And the proof of the pudding is can we build the headset? For this approach to go through, we have to be able to show that we can get special relativity, no hand wave, just from the trace logic. And also general relativity and quantum theory. We have to be able to show that we can get entanglement and all of this stuff simply from the trace logic and Markov chain.

One objection might be: look, in quantum theory we have unitary matrices, you just have Markov matrices, how are you going to do that? Most Markov matrices are not unitary, but there are some that are. There are a subset of the Markov matrices that are unitary. When you look at the long-term asymptotic behavior of a Markov matrix, it turns out that the eigen functions — this is work Chetan and I did back in 2014 — Chetan discovered that the eigen functions of the enhanced Markov chains are identical in form to the quantum wave functions of free particles. Identical. So the idea is going to be that quantum theory arises as an asymptotic description of a Markov dynamics. The Markov dynamics gives you a step-by-step analysis of agency and consciousness. Quantum theory only gives you the asymptotic behavior, not the step-by-step behavior.

The no-cloning theorem in quantum theory: if you look carefully at the proof, it does not require unitarity. It only requires linearity. Markov chains are linear and they have their own no-cloning theorem. So I see no obstruction right now. We have a principled notion of agency. And the nested community structure can give us nested goals and nested bending of problem spaces. We can actually have real space-time curved representations of bending — general relativistic descriptions of bending.

[Discussion turns to embedding this in variational free energy principle framework, Karl Friston’s new book. Mike Levin asks about causal emergence metrics like phi. Hoffman: those are very useful, I think they have nothing to do with consciousness, but as just step one of getting measurements, I’m all for it.]

[Chris Fields asks about contextuality — situations where joint probability distributions can’t be defined (violating Kolmogorov axioms), as in quantum theory.]

Right, it depends on your time window and how you want to coarse-grain the states. There are many systems where the probability of the next state cannot be given exactly just based on the current state. You might need to look at three or five or ten states. But in those cases, you can always create new states and turn it Markov. You can expand the state space. There can be a combinatorial explosion though. This only works for finite memory.

What I would hope would be to somehow find sub logics which actually look like quantum logics. If that happens then we could possibly answer your question in the affirmative. Another way to think of it: if there are pairs in the trace logic network that don’t commute. Oh, easily. That may be a way of approaching this joint question. Matrices that have no joint.

Unitarity is really just conservation of information. And Kolmogorov probability is really just conservation of information. So if you don’t have situations in which information is actually lost in some global sense, at informational singularities, then the system satisfies unitarity as it’s used as an axiom within information theory.

It is striking that Nima Arkani-Hamed and these high-energy theoretical physicists are strident that they’re not assuming unitarity. They’re saying we don’t need it and we’ll show that it arises from these positive geometries that are entirely outside of space-time.

[Chris Fields pushes back: they derive scattering amplitudes matching the Feynman approach, but that doesn’t mean you’ve derived space-time or unitarity — it just means you’ve matched something. They’re referring to unitary processes in space-time, which is very different from unitarity as a strictly information-theoretic concept. It’s a disconnect in language.]

Right now they don’t give you space-time, they give you scattering amplitudes. That’s what they give you.

I suspect that eventually we’ll be able to identify amplitudehedron-like structures with quantum error-correcting codes. We can go the other way from amplitudehedron-like structures to quantum error-correcting codes. And there are many ways to get space-time from quantum error-correcting codes.

What I’m hoping to show is that some of these positive geometries like the associahedron are sub polytopes of the Markov polytope. The Markov polytope could be describing the probabilities of certain interacting processes that we would think of as scattering processes. If that’s the case there may be a deep connection between some of these positive geometries and the Markov hedron, which is itself a positive geometry. The set of all possible Markov chains is a positive geometry.

What’s interesting is that theories of consciousness all the main theories of consciousness assume otherwise. We start with space-time, we try to figure out what physical systems in space-time could possibly have the right structure to give rise to consciousness. So we all start with the kludge as the assumption. And then try to go from there. So they’re doomed completely doomed to failure.

Wheeler saw it. He wrote the book on space-time — Misner, Thorne and Wheeler, that is the Bible — and he knew space-time was a kludge. He was looking for something entirely beyond. And he was going to call it observer participants.

[Discussion about the biggest myopia in consciousness science: consciousness is the starting point, not something you squeeze at the end of a long tube. Mike Levin on platonic patterns: I don’t think we are beings that occasionally get visited by platonic patterns. I think we are the patterns. What a pattern from the platonic space experiences when it interacts with the physical world is what we tend to call consciousness. There may be lateral interactions within that world — when mathematicians think about abstract mathematical objects, two patterns in resonance. Ultimately some version of idealism is probably more accurate. Consciousness is fundamental. But I don’t know what to do with that on a practical level right now. So I’m sticking with two separate interacting realms because that’s what we can handle right now.]

My one of our goals is to show that we can actually get space-time, special and general relativity, and quantum theory from this theory of Markov chains of consciousness. In which case we could inherit all the work that’s been done in physics but see it arising from a consciousness-first point of view.

[Mike Levin: one of the important pieces of our research program is to understand the mapping between the interfaces we construct — sorting algorithms, cyborgs, xenobots, embryos — and to understand what properties of these things facilitate the ingression of specific patterns from the platonic space. We’ve got some wild stuff ripening in the next few months. The key claim: this isn’t just a redescription. It actually makes new predictions — it suggests that what you’re getting from that space gives you free lunches or at least heavily discounted lunches. You get more than you put in. Our accounting isn’t adding up everything that you get. We now have the ability to quantify how much free memory, free compute, free whatever we’re getting. Kind of a crazy prediction of mine: you can actually do compute in that space that you don’t pay for in this space.]

I think there’s a connection with the hidden states in this Markov system. So that when we just see a trace, most of the intelligence is something you don’t see. To the trace observer, that’s all in a Platonic realm because you literally cannot see it. And yet what you’re seeing is entirely a trace of that world. Your visible world is controlled by this Platonic space that you cannot see.

That’s what your stuff about the planaria — you cut off the head and cut off the tail and you can change the electric fields and make it have two heads. How does it know how to do that? Where is that state? If somehow all we’re seeing is the planaria in our trace, we’re not seeing beyond what we can see. There’s a whole Markov realm of intelligence out there that is projecting down into what we can see, which is just a planarian. I’m really left to explore with you guys as this formalism matures, how we might use the formalism in concrete ways to model specific Platonic spaces for specific biological memories. This gives us the tools — the exits, the dark states, and the entrances. All those tools are part of the Platonic space. We might be able to get that Platonic space not just a hand wave, but here is the Markov chain, here’s the trace, and this is why it looks like this Platonic intelligence that’s guiding you.

[Mike Levin: what might be fun is to apply behaviorist assays — habituation, sensitization, associative conditioning, delayed gratification, path planning, illusions, counterfactuals — to simple random Markov chains. Not building ones that do it, but finding it in simple or random ones you don’t think should be doing it. We’re finding these capacities in very simple things with no design, no selection, no learning. Usually you need one of those three. We’re not doing any of that.]

We may be able to find matrices where what we’re seeing in the organism is the trace, but the invisible states are having a lot of the intelligence that leads to what you’re seeing. We write down the matrix — even though the trace you can’t see why it’s doing that, in the big matrix you see why it’s doing it.