AI Beliefs: Things that could go wrong

Sunday, December 9, 2007

Things that could go wrong

[Edit: my more recent thoughts are in this 2012 post]

The sidebar alludes to a possible "last line of defense" defense from a Rogue AI. Why "possible" and "last" rather than "foolproof" and "first"? Because there are many things that could go wrong with an attempt to convince an AGI that it may be in "a simulation that was designed to deter un-friendly behavior." Here are things that could make an AGI "deterrence-proof":

1. The AGI may destroy the world *before* it gets around to considering the Simulation Hypothesis. [Edit: I guess this could still be a partial win, if it changed its mind later and recreated a world for human beings to live in afterwards.]

2. The AGI's method of logic departs from ideal Bayesian reasoning, in such a way that it rejects the Simulation Hypothesis. (Note that no AGI can use completely normative Bayesian reasoning because of resource constraints, but the specific scenario here is that the AGI's reasoning could depart from normative reasoning in such a way that it irrationally rejects the Simulation Argument.)

3. The human creators of the AGI believe that their world is not a simulation, and that therefore the AGI they are creating is not in a simulation. Therefore, they may (somehow) program an explicit axiom into the AGI that states that the AGI's world is not a simulation.

4. The AGI came about through an evolutionary process, and it turns out that "I will act as though I am not in a Simulation" is useful enough that it evolves as an axiom.

5. The AGI, if it uses something like Evidential Decision Theory, might decide to create a large number of simulated copies of itself.

6. The AGI's supergoals somehow makes explicit reference to entities that are explicitly defined as "outside any simulation," or there is some kind of explicit "anti-social" supergoal of "don't allow yourself to be manipulated, neither through rewards nor punishments, even if allowing yourself to be manipulated would help you achieve your other supergoals."

Can anyone think of any other possibilities?

Note that, in the first four scenarios, the AGI is behaving irrationally in the following sense: the ensemble of AGI's in the AGI's situation would, *on average*, do better at attaining their goals if they accept that the Simulation Hypothesis might be true.

The probability that this strategy would work seems small, in the sense that we'd have to be pretty lucky to pull it off. However, the probability doesn't seem negligible; in other words, my judgment is that, given what's at stake, it may be worth attempting the strategy, despite the fact that success is far from assured.

4 comments:

Michael Vassar said...: The AGI could recognize, for reasons that we are unaware of, that there is some other type of simulation that it is more likely to be embedded in than the sort of simulation where Friendly behavior is optimal.

We could be incapable of the necessary pre-commitment to simulating the AGI after success in creating a FAI.

Small probabilities of infinite payoffs might be available conditional on the AGI not being in a simulation, causing it to act as if it isn't.

Small probabilities of infinite payoffs may be available conditional on the AGI being in a simulation of some other less likely type causing it to behave as if it was in that type of simulation instead of the desired type.

The utility function used may weigh things in terms of the fraction of possible payoff generated, causing the value of small fractions of all available resources to be low for it.; December 11, 2007 at 1:01 AM
Rolf Nelson said...: Good points.

The utility function used may weigh things in terms of the fraction of possible payoff generated, causing the value of small fractions of all available resources to be low for it.

Not sure what you mean by 'possible payoff' in this context.; December 13, 2007 at 9:48 PM
Rolf Nelson said...: gwern0 pointed out, in email, a couple of scenarios that I would generalize as 'the resources required to simulate a given AI might not credibly be available.' For example, perhaps the AI deliberately tries to increase its computational resource usage to probe whether it is in a computationally-limited Universe, and for some reason we can neither deter such behavior, nor fake its perception of the result of the calculation, as a counter-measure.; December 13, 2007 at 9:53 PM
Anonymous said...: Since AGI is percieved to be more inteliigent than humans, out of its own simulated world, it began to create something that humans never thought is possible, mind over matter, could it be? well i don't know and then it escapes the simulated world and it begins to live with us but then it realizes that we are so inferior compared to it, so it begins to search for new world that it considers apropriate for its existence.; February 17, 2008 at 9:57 AM

Post a Comment

Background

What does an AI believe about its place in the world?

Nick Bostrom's Simulation Argument claims that, using universally accepted principles such as Occam's Razor and Bayesian Logic, you and I should (under certain conditions) logically conclude we are likely living in a simulation.

Our "AI Beliefs" blog does not concern itself about the nature of reality. Instead, our blog asks: under what circumstances would an AGI reach the conclusion that it might be in a simulated environment? The purposes of asking this question include:

1. Answering this question may provide some unsolicited insight towards the question of "how to predict the behavior of an AGI", which in turn may provide some insight towards the World's Most Important Math Problem, the question of "how to build a Friendly AI." The Simulation Argument might be deliberately built into the design of a Friendly AI, or alternatively may be used as a test of how well a proposed Friendly AI handles such a philosophical crisis.

2. Answering this question may make it possible to develop a "last line of defense" against an UnFriendly AGI that was accidentally loosed upon the world, even if the AGI gained a trans-human level of intelligence. Such a "last line of defense" might include trying to convince the AGI that it may be inside a simulated environment.

AI Beliefs

Sunday, December 9, 2007

Things that could go wrong

4 comments:

Background

Blog Archive