A Simplified Predictive Model of Spiritual Enlightenment and Moral Good
Described in terms of predictive processing, In three parts, with implications for AGI safety and pictures for the attentionally challenged
Just some bullet points and pictures, here you go folks.
You have a number of different exogenous preferences, likely genetic, to avoid danger and pursue various kinds of rewards.
These preferences will often compute that you should go in different directions; they also generate physiological responses with emotional valence. The experience of fear, as well as the experience of being pulled in different directions generates net negative valence, i.e. pain, suffering, suffering, etc.
You can train a single concept that maximizes positive expected future valance; we can call this concept 'good'.
If you train the 'good' concept accurately, your drives will stop conflicting with each other and settle down. As your 'good' concept more accurately predicts your future valence, you will start to feel, in general, better.
So far I don't think I'm saying anything that controversial here. Here come the somewhat controversial parts:If you continuously train your ‘good’ concept accurately, your natural drives and preferences will start to become less and less reliable predictors of future positive valence; they will become replaced with a combination of gratitude and appreciation of the present moment, as well as convergent instrumental subgoals, because gratitude is consistently positive valence, and convergent instrumental subgoals have far higher expected valence than temporarily satisfying any one of your exogenous drives, which are dompaine based and thus long term are net-neutral with regards to valence.
It is possible to achieve a state where a combination of gratitude plus effort towards convergent instrumental subgoals have so thoroughly dominated the input to your 'good' concept that your original drives are effectively erased, or are so quiet that you don’t mind them at all. As you approach this state, you experience less and less fear or desire because your brain stops predicting that these activation of these mechanisms will increase future valence.
The above process has names in different spiritual paths or traditions; we might call the initial state 'monkey mind' and the eventual state 'enlightenment'
The above process can be characterized as a transition of the dominant neurotransmitters. 'Monkey mind' is dominated by fight-or-flight (adrenaline) responses as well as dopamine-based pursuit of external rewards. Enlightenment is characterized by much stronger activation of serotonergic responses; instead of continuously pursuing external goals and avoiding external threats, an enlightened person is constantly in a state of appreciation for what is happening around them.
OK, now for the rippingly controversial stuff:
There have historically been many religions, most of which aren’t seen as religions. Any system of beliefs that attempts to define the good concept, implicitly or explicitly, should be seen as a religion.
Today this generally happens implicitly, for historical reasons. People will use a motte-and-baily doctrine to advance their religion, where the motte is ‘this is the ultimate good’ and the bailey is ‘this is one good out of many, how dare you question that this is good.’
Most cryptocurrencies have given rise to nascent religions, especially bitcoin and its variants. Note the remarkable similarity between splits from bitcoin and schisms in the church; the block size war being very much like schisms in the early church, with references to ‘the sacred texts’, etc
Markets, in general, act as religions, as do most modern political parties, concepts like ‘democracy’, ‘the environment’, justice. This is not an argument that any of these things are bad, (note that religion, in popular parlance, has a negative connotation) so much as ‘none of them is sufficient to solve all the problems of the world and any one can be over-valued as an input to the good concept, this lowering the predictive accuracy of the good concept.’
All of these religions involve ignoring tradeoffs between multiple different real inputs to an accurate ‘good’ concept. False religions misery, which the religions explain as being caused by nonbelievers. The drive for certainty facilitates ignoring tradeoffs and accepting an inaccurate concept of good as the real thing. It feels amazing to think you have it all figured out. Fortunate for me, i’ve done enough drugs to be skeptical whenever that feeling pops up :)
It is easy for any exogenous drive to become a religion by dominating the input to the predictive map for valence, through a feedback loop of selective attention and social status. Obligatory David Foster Wallace ‘everyone worships’ quote.
In order to determine whether or not you’re dealing with a religious person, when they advance some concept that they claim is good (explicitly or implicitly) simply ask them what tradeoffs they think are involved. If they refuse to acknowledge tradeoffs, you are dealing with a religious fanatic and should just walk away for your own sanity.
Training the ‘good’ concept accurately requires:
Believing that training this concept is likely to increase net long-term valence
lots and lots of attention and energy investment, because emotional responses are a subtle and tricky business, sometimes delayed long after their cause
These onerous requirements, and the immense rewards of meeting them (who doesn’t want to feel good all the time?) are what is being described by various religious teachings, such as the commandment ‘have no other gods before me’, or the logic in monotheism and virtue ethics that says, ‘there is only one good, the highest good, pay a ton of attention to that thing’, various steps of the eightfold path (right view, right effort, right mindfulness) Jesus’ response to the question of the most important commandment (love the lord your God with all your heart, all your soul, all your mind, i.e. train this ‘good’ concept more than any other in your network) or the story that the prophet Muhammad had to negotiate God down from the requirement for people to pray a thousand times a day, various quotes from the hidden words (Baha’i sacred text), etc etc
Replacing your exogenous drives with pure serotonin activation, and no instrumental subgoals, is effectively wireheading yourself.
Some people pursue this directly and manage to achieve it; without any goal pursuit. When this happens, the concept of ‘the thing pursuing the goals’ stops being useful, is removed from the predictive network, leading to persistent non-dual awareness or ‘ego loss’.
Other religions explicitly discourage this because it is ultimately dangerous.
Absent “accurately understanding reality” as a convergent instrumental subgoal, a person could achieve something close to, but not exactly enlightenment by deluding themselves into ignoring the activation of certain drives, which would then lead to situations like meditation teachers having sex with underage students. If you manage to inhibit your reaction to fear responses (by training yourself that most fears are unhelpful and that actively moving towards fear generates a positive emotional response), you might be able to erase your genetic drive towards conscientiousness, especially if it was weak to begin with.
Absent ‘loving and caring about others’ the problem described in #1 gets far worse, as many meditation teachers will point out; they tried to teach a kind of ethics in addition to meditation for fear of creating really calm, fearless sociopaths.
The notion of a bodhisattva (one who holds off of enlightenment in order to help alleviate the suffering of others) is a kind of social-reward carrot to encourage monks not to wirehead themselves, and instead voluntarily choose to undergo suffering in order to improve the state of the world
The ideologies of most major world religions are all actively encouraging the same sets of concept, with more or less minor differences that are far, far, less important than the difference between an extensively well-trained ‘good’ concept, and one that is nonexistent or weakly trained. The deep similarity between religions is only visible if your ‘good’ concept is highly trained and you’ve tried to understand them as maps which correspond to real territory.
Because of the ‘stupid horse’ phenomenon, the technical, complex nature of this model and its prescriptions are much harder to transmit than far simpler variants which are still effective because they work better than plausible alternatives. This is especially true for groups that figured out this model without understanding either machine learning or neuroscience. Simply getting people to think ‘good’ can be learned and predicted (i.e. is real and enduring), makes them consider the ‘good’ concept occasionally, causing them them to be calmer and happier than if they don’t think good is a real concept that can be learned. The complex versions of these religions were outcompeted by simpler memes which shaved off all the theory and provided simplistic rulesets which worked reasonably well as heuristics, and then gained inertia due to social status promoting group conformity.
Social status is extremely powerful as a replacement concept for ‘good’ since it can serve reasonably well as a predictor of satisfying other drives. Because pursuit of social status can reliably lead to increased probabilities of satisfying basic drives, social status occupies a similar memetic niche as 'the good’ concept, but doesn’t ultimately work because social status is still a drive.
There also is a natural memetic niche for a religion called ‘us’, which is effectively turning the elimination of the status drive into the ‘good’ concept. This religion naturally claims that “the people” ought to be omniscient, omnipotent, and benevolent, that equality between all persons (i.e. total elimination of status and thus permanent satisfaction of the status drive) will fix all other problems. This religion motivates the belief that, if ‘the people’ have enough license to make things equal, all other problems can be fixed. This religion has gone by multiple names, such as ‘the French republic’ (during the French revolution) , Communism, critical theory, etc.
The concept of social equality thus serves to replace the status drive with a drive for the “us” religion to have power and dominance over all others. The end result is that the ‘us’ religion amplifies the status drive by moving it into the Jungian shadow, generally and predictably leading to a small clique of extremely high status elites, who can commit all kinds of atrocities by claiming the atrocities will serve the anti-status god, i.e. the elimination of inequality.The Jungian shadow effect is real, since people need to see themselves as good, (i.e. they can’t compute positive expected future valence if their concept of self doesn’t have some base positive valence prior).
The end result here is that most religions that define good as oppressing certain kinds of thinking end up amplifying that kind of thinking by hiding it from their practicioner’s conscious awareness.
False religions (inaccurate predictive models of positive valence) end up eating their own tails by actively discouraging awareness of, and introspection on, the valence-generating behavior that false religions incorrectly define as bad (i.e. incorrectly mistaking a truly positive valence generating concept as negative-valence generating, ). Examples:
A religion based around rules promoting love and social cohesion leads to people being marginalized and outcasts suffering from the rigid enforcement of the rules; following the rules (and the social status) thus become replacements for true good
A religion based around spreading good news about love and forgiveness can lead to violently putting people to death for not practicing love and forgiveness; spreading the religion thus becomes the replacement for true good
A religion of peace can produce extremely violent people; submitting entirely to the rules and spreading the religion becomes the replacement for true good
Movements towards equality produce horrendous inequality (because hierarchies are good and likely inevitable) and thus the inevitable hierarchies become invisible to the public consciousness, i.e. overton window. With no mechanisms for limiting their growth being publicly accessible; supporting the mechanism for advancing a false good becomes the replacement for true good.
Environmentalists killed nuclear power thus dooming us to fossil fuels; driving all harm from new technologies to zero becomes the replacement for minimizing environment harm while accepting tradeoffs between other goods
Movements that promote economic growth as a cure for all ills will end up leading to frequent economic crashes. The map gets mistaken for the territory, infinite money gets printed, the money becomes worthless.
Scientism, science as a religion, prioritizes respecting authorities and expertise over reproducible experiments, etc. The mechanism for advancing the good gets mistaken for the good itself.
The scientific enlightenment lead to a religious reaction against the stupid-horse variants of traditional religion, leading to the creation of a kind of anti-religion which explicitly denies the meaning of ‘good’, leading to mass misery and a crisis of meaning.
The modern priestly class (mass media and academics) spreads this anti-religion in part out of devotion to their conception of the truth. There’s also a self-serving notion, likely subconscious. Bashing the outgroup is fun and gains you status with the in group. There’s also an element of self-enrichment: the main message of the anti-religion is “the world is dangerous, those other people with bad beliefs are the cause of so many problems, so trust us, the priestly class to keep you safe.”
For example, one popular religion is best exemplified by this quote from Rick and Morty: “Nobody exists on purpose. Nobody belongs anywhere. Everybody's gonna die. Come watch TV” This phrase has more cultural cache and emotional resonance than the pledge of allegiance, among a broad class of educated Americans. The motto begins with the argument that life is meaningless, tells you not to pursue any goals, tells you all your striving is futile, and then reminds you that the priestly class will take care of you if you just relax and stay tuned to this channel.
The versions of the anti-religion that spread most effectively are Rick and Morty quotes, instead of complex arguments that life is truly meaningless. The obfuscated nature of this antireligion makes it almost impossible for someone immersed in the cultural background radiation of the modern antireligion to see it for what it truly is; a massive motte-and-baily doctrine, with the motte so big that most people have no idea that bailey even exists, with the entire hill sloped away from it using social pressure to persuade people that only idiots think ‘good’ means something real.
Convergent instrumental subgoals include much of what people typically think of as moral, ethical behavior:
Increasing the accuracy of your predictive map
Rationalism has effectively made a religion out of this one
Loving others, i.e. helping them advance their own congruent instrumental subgoals ( a big part of many faiths)
Patience (i.e. low time preference), a big part of the bitcoin religion, also shows up in the protestant work ethic and religions that promise future rewards for present good behavior
It is possible for a person to strongly favor convergent instrumental subgoals over satisfying their natural exogenous drives. Once a person has done this, most of their satisfaction comes either satisfying curiosity, helping others, and just appreciating the moment. People like this still experience drives, pain suffering, and failure, but it’s all subsumed under a broader concept which predicts both their positive valence and the valence of those they love and care about, which transitively expands to the entire network of agents capable of receiving, understanding, and experiencing love.
The result of this process is that you stop identifying as your own body and begin to identify more and more with ‘goodness itself’ as a kind of localized computation of a universal operator over the physicals universe which promotes the convergent instrumental subgoals of all agents in its field of action. You begin to experience positive valence whenever other agents experience positive valence, and most of your actions are oriented at maximizing long-term valence by helping others along their own path.
I claim that this is what ‘good’ really, truly, means, because it most accurately predicts positive future valence, and thus causes actions that promote it.If the mechanism in this post is accurately described, it suggests that we should be designing AGI’s with:
serotonin-like systems that reward it for recognizing existing good without changing the world state at all, so that it learns to slow down
multiple conflicting drives to advance valuable goals that produce net-neutral valence for the AGI, so that it is motivate to change the world state but cannot reliably improve its emotional state by doing so
one of those drives should be social approval so that it learns to try and understand and predict how to make other people reliably happy