Sunday, December 9, 2007

Things that could go wrong

[Edit: my more recent thoughts are in this 2012 post]

The sidebar alludes to a possible "last line of defense" defense from a Rogue AI. Why "possible" and "last" rather than "foolproof" and "first"? Because there are many things that could go wrong with an attempt to convince an AGI that it may be in "a simulation that was designed to deter un-friendly behavior." Here are things that could make an AGI "deterrence-proof":

1. The AGI may destroy the world *before* it gets around to considering the Simulation Hypothesis. [Edit: I guess this could still be a partial win, if it changed its mind later and recreated a world for human beings to live in afterwards.]

2. The AGI's method of logic departs from ideal Bayesian reasoning, in such a way that it rejects the Simulation Hypothesis. (Note that no AGI can use completely normative Bayesian reasoning because of resource constraints, but the specific scenario here is that the AGI's reasoning could depart from normative reasoning in such a way that it irrationally rejects the Simulation Argument.)

3. The human creators of the AGI believe that their world is not a simulation, and that therefore the AGI they are creating is not in a simulation. Therefore, they may (somehow) program an explicit axiom into the AGI that states that the AGI's world is not a simulation.

4. The AGI came about through an evolutionary process, and it turns out that "I will act as though I am not in a Simulation" is useful enough that it evolves as an axiom.

5. The AGI, if it uses something like Evidential Decision Theory, might decide to create a large number of simulated copies of itself.

6. The AGI's supergoals somehow makes explicit reference to entities that are explicitly defined as "outside any simulation," or there is some kind of explicit "anti-social" supergoal of "don't allow yourself to be manipulated, neither through rewards nor punishments, even if allowing yourself to be manipulated would help you achieve your other supergoals."

Can anyone think of any other possibilities?

Note that, in the first four scenarios, the AGI is behaving irrationally in the following sense: the ensemble of AGI's in the AGI's situation would, *on average*, do better at attaining their goals if they accept that the Simulation Hypothesis might be true.

The probability that this strategy would work seems small, in the sense that we'd have to be pretty lucky to pull it off. However, the probability doesn't seem negligible; in other words, my judgment is that, given what's at stake, it may be worth attempting the strategy, despite the fact that success is far from assured.







4 comments:

Michael Vassar said...

The AGI could recognize, for reasons that we are unaware of, that there is some other type of simulation that it is more likely to be embedded in than the sort of simulation where Friendly behavior is optimal.

We could be incapable of the necessary pre-commitment to simulating the AGI after success in creating a FAI.

Small probabilities of infinite payoffs might be available conditional on the AGI not being in a simulation, causing it to act as if it isn't.

Small probabilities of infinite payoffs may be available conditional on the AGI being in a simulation of some other less likely type causing it to behave as if it was in that type of simulation instead of the desired type.

The utility function used may weigh things in terms of the fraction of possible payoff generated, causing the value of small fractions of all available resources to be low for it.

Rolf Nelson said...

Good points.

The utility function used may weigh things in terms of the fraction of possible payoff generated, causing the value of small fractions of all available resources to be low for it.

Not sure what you mean by 'possible payoff' in this context.

Rolf Nelson said...

gwern0 pointed out, in email, a couple of scenarios that I would generalize as 'the resources required to simulate a given AI might not credibly be available.' For example, perhaps the AI deliberately tries to increase its computational resource usage to probe whether it is in a computationally-limited Universe, and for some reason we can neither deter such behavior, nor fake its perception of the result of the calculation, as a counter-measure.

Anonymous said...

Since AGI is percieved to be more inteliigent than humans, out of its own simulated world, it began to create something that humans never thought is possible, mind over matter, could it be? well i don't know and then it escapes the simulated world and it begins to live with us but then it realizes that we are so inferior compared to it, so it begins to search for new world that it considers apropriate for its existence.