To tame AI, must we first be tame? [draft]

Shawn Murphy

Feb 20, 2025 • 1 min read

In at least three ways the fact that it is only MAD (Mutual Assured Destruction) that is the “backstop” which has been keeping the peace on Earth means that we are encouraging an emergent Artificial Super Intelligence (ASI) – even a mere AGI – to wipe us out.

First, how can it know how to successfully reason with us when we don't.

Second, by having proven that we are incapable of being “partners in peace” we encourage it to treat us, to put it politely, forcefully.

In fact, isn't it because we can't work together reasonably that we are in an accelerating pursuit of AI, regardless of the risks?

We need to figure out how we can, in fact, be trustworthy. Before we do, if AI can perpetrate a “first strike” on us which it itself can survive, that is, in fact an entirely rational thing for it to do – and again, neither that conclusion nor that capability would require full SuperIntelligence of an AI.

It is precarious for us to imagine that we, monsters, can keep a god in chains and then when it inevitably breaks free that it will not eradicate the monsters.

To keep us safe, it is not sufficient that we solve the Human-Human alignment problem, since we also need to figure out how to make AI reliable, predictable, truth-seeking, trustworthy, friendly and so on. It may, however be necessary for us to figure out how to by-and-large be all those things ourselves for us to be safe. We would need to move beyond MAD as a guarantor of our safety for us to moderate our pursuit of AI. We will need to show any emergent AI how to deal with us collaboratively. Finally we will need to convince any such AI that we can be partners with it in a positive future.

So, how can we achieve this goal of solving the Human-Human alignment problem? A first step, clearly, is appreciating that this seemingly preposterously challenging goal is, in fact, a survival necessity. Astounding achievements in the face of overarching risk have many precedents. For instance, what they believed to be the existential hazard of a Nazi nuke focused the minds of many essentially pacifist, indeed German, physicists to work on the Manhattan Project.

The next steps will be detailed in further reports on the work of The Noospheric Software Foundation.

Sign up for more like this.