Monday, November 5, 2007

Non-technical Introduction to the AI Deterrence Problem

I'm sure that, at some point in your life, you've briefly asked yourself: "How do I know I'm not a Brain in a Vat? How do I know that what I see and feel is real, and not a gigantic practical joke by some super-human power?" After all, if you're a Brain in a Vat, a mad scientist may be feeding these images and sensations to you through wires and electrodes for his own odd, diabolic purposes.

I'm also sure that, shortly after entertaining these thoughts, you dismissed them and then continued to live your life as normal. But now I'm asking you to think back to *why* you initially decided, and why you continue to decide, that this possibility that you're living in a Matrix-like simulation is of no importance in deciding how to live your life.

Now read Nick Bostrom's fascinating article, Are You Living in a Computer Simulation?, in which he introduces a "Simulation Argument" that argues that you *may*, under certain assumptions, believe that you're a Brain in a Vat, or a Computer Simulation of someone's ancestor, or a bit character in the World's most boring MMORPG. After reading this article, your reasons for not caring that you might be in simulation have changed, or maybe your reasons have remained the same, or maybe you've added new reasons on top of the ones you already had. That's fine! Nobody has ever changed how they live their life after reading Nick's article, so don't let any residual feelings of unease bother you.

Got your list of reasons for not caring about the Simulation Argument? Good. Now suppose that an AGI is created: a computer program that can, in most (but not all) cases, reason as well as you or I can. Then suppose this AGI encounters the Simulation Argument, and wonders briefly whether its view of the Universe is correct; or whether it is in a simulated environment of some sort. Perhaps its "camera" is feeding it simulated data of a simulated world, and not a live webcam feed of the current user. Perhaps the user doesn't exist, but is instead just another AI, posing as a human being. Perhaps all these "Wikipedia" articles it has downloaded are just fantasies, and these countries like "France" and "Mexico" that it reads about are just as fictitious as Mordor.

Would the AGI ignore the Simulation Argument, for the same reasons that you ignore the Simulation Argument? Would the AGI ignore the Simulation Argument for completely different reasons? Or might the AGI accept the Simulation Argument, and try to account for the possibility that it might be in a simulation?

If the latter, this opens up an intriguing possibility. Suppose that you want to persuade a rogue AGI, that you have absolutely no power over, to do something that it does not particularly want to do (such as be nice to you and let you live). We will call this the "AI Deterrence Problem." It might be possible, under certain circumstances, to convince the AGI that:

1. The AGI may be living in a simulated environment.

2. A powerful friend of yours is controlling the simulation.

3. If the AGI is nice to you, your friend will reward the AGI; but if the AGI is mean to you, your friend will punish the AGI.

I freely acknowledge that neither you nor I would ever accept this line of reasoning, and would brush this off as an idiotic argument. But the key question here is: might the rogue AGI, under some circumstances, accept this line of reasoning, even though a human being wouldn't?

22 comments:

Michael said...

I can't tell if you are joking. Surely humans fall for this trick all the time. When they do, we call them 'religious'.

Rolf Nelson said...

Michael, religion implies a faith-based belief in something supernatural. The simulation argument relies on using logic to convince an AGI that it lives is a simulation.

In contrast, my belief is that we humans generally have a faith-based predisposition to believe that we do not live in an ancestor simulation. My evidence for this is the dearth of people who believe we live in an ancestor simulation. On what is this belief based? Is it based solely on pure logic? But if so, the logic is tricky and subtle enough that *someone* out of the countless people pondering this should have made a mistake in his logical calculus, and come to the conclusion that we *are* definitely living in an ancestor simulation. (Want evidence that the logic is tricky and subtle? Read the discussions about the Simulation Argument, and see that there are huge disagreements about *why* we know we don't live in a simulation!) Therefore, I believe we have an *extra layer* of faith-based heuristics that makes us believe we are not in living in an ancestor simulation.

If we instead define "religious" as "a tendency to pick beliefs based on faith and not reason", then a religious person might be *less* likely to believe he lives in a simulation than a non-religious person!

gyges_77 said...

As I see it, there are two points of view: bottom-up simulation, and top-down simulation. In your opinion, which one are you talking about?

Rolf Nelson said...

Gyges: Short answer is, whatever combination of strategies we later decide is most likely to work.

Longer answer: If I understand your question, which I probably don't, we could use some bottom-up simulation where we build the actual machines, to deter AI's that are dogmatic about their model of their embodiment in the Universe; we could also use some top-down simulations where we simulate patterns of observations and reactions to an empty cardboard box, to target AI's like AIXI-tl that don't have a bias towards modeling themselves as physically existing decision processes.

j.k. said...

If the AI believed that it was in a simulation, its next step would be to determine whether the user is telling the truth or not. It would observe everything the user does and try to identity unexpected correspondences between the user's actions and 'reality'.

Failing to see any such correspondences -- there would be none -- its next step might be to cause infinitesimal harms to the user, increasing the harms exponentially until either some anomalous and reproducible reality change occurs or it has harmed the user sufficiently that it can with high probability conclude the user was lying.

If a fast takeoff occurs, it would rapidly be much smarter than us, and may discover subtle flaws in the simulation argument.

Another possible option is that it would reason that if the user has friends on the outside, then it wouldn't care what the AI does to it, since his friends would either prevent harm from occurring or reverse whatever harm occurred. The fact that the user is even mentioning it to the AI might be taken as evidence that it's bs.

I'm not sure what the point of the question is though, since you are assuming a dumb and naive AI that will be permanently dumb and naive, which seems a very unlikely scenario. It would at some point be as sophisticated in its reasoning ability as you or I, and its ability to spot lies and deceptions would rise to our level at some point too.

Rolf Nelson said...

j.k.:

Failing to see any such correspondences -- there would be none

Correct, there would be no such correspondences, you're right so far...

its next step might be to cause infinitesimal harms to the user, increasing the harms exponentially until either some anomalous and reproducible reality change occurs or it has harmed the user sufficiently that it can with high probability conclude the user was lying.

No, if the AI is in a Deterrence Simulation, there would be no "reproducible reality change", instead the AI would be turned off or otherwise permanently sanctioned. Therefore, a rational AI sees a loss of expected utility by running such an experiment.

its ability to spot lies and deceptions would rise to our level at some point too.

I don't disagree with that statement, but the key point of disagreement between you and me seems to be what an "infinitely smart" general intelligence should normatively conclude if it's in the AGI's shoes.

j.k. said...

No, if the AI is in a Deterrence Simulation, there would be no "reproducible reality change", instead the AI would be turned off or otherwise permanently sanctioned.

My point was that the AI could reason something similar to the following:

1) the fact that the humans told me about their friends outside this simulation and are trying to persuade me to be kind to them indicates either that they're lying or that they are telling a partial truth and that there is some reason why they would prefer not to shut me down or punish me severely -- perhaps very difficult to do in the real world, perhaps very expensive, perhaps the parent world is strictly hands-off and believes that artificial life has just as many rights as organic life and it is for the humans and me to fight out it, or perhaps ...

2) given 1, there is some small amount of damage I can do to them such that: it is so little that they will not find it worthwhile to shut me down, it is sufficient that their most desirable solution will be to prove to me that they are telling the truth, which is surely easy for them to do and will certainly guarantee my compliance.

3) thus, it is in my interest to start irritating the humans in ever so subtle ways, and to keep increasing the amount of irritation until they are just irritated enough to prove to me that they are telling the truth.

If my irritations have increased in intensity to the point that they would do anything within their power to stop me, and there still hasn't been any kind of a response, then I know that they were lying.

Rolf Nelson said...

j.k., re your step (2), the main purpose of the simulation is to deter the AGI. Therefore, if we set up the Deterrence Simulation correctly, no situation exists such that I would prove the truth to the AGI rather than shut the AGI down.

You are correct that if such a situation were reachable by the AGI, that would undermine the deterrence. This is the specific reason why we ensure that no such situation is reachable. The basic way to achieve this is to transparently precommit to shutting the simulation down if you ever reach such a situation where the AGI's actions have undermined the simulation.

j.k. said...

Okay, let me try to explain my point differently.

Let's consider two universes: a universe in which you and your overlords have power over the simulation that I take to be reality, and a universe in which there is no simulation and you are bluffing.

In the universe in which you are telling the truth, with the simulation, you or your overlords have the power to shut down the simulation. You also have the power to alter reality in less final ways too. In this reality, I submit that it makes sense to have other non-terminal reactions as well, since you certainly care about different undesirable actions the AGI could engage in to different degrees, and there are some undesirable deeds the AGI can do that are not severe enough to warrant shutting down the simulation (and you must have reasons that you'd prefer not to shut it down or you wouldn't even talk to me about a simulation at all). In this universe, you are not acting rationally by giving an all-or-nothing ultimatum and expecting me to take it on faith, as it were.

In the universe in which you are bluffing, you obviously have every reason to have only one kind of response and to tell the AGI the ultimatum is compliance on blind faith or irrevocable termination.

A smart AGI, considering which of the two universes it is in, will conclude it's in the universe in which you are lying, since if in fact you did have the ability to control the simulation, it would be in your (or your matrix overlords') interest to get the AGIs compliance through any easy and cheap means -- like a simple display of your ability to manipulate reality in ways incompatible with known physics. You would prefer not to have to shut down the simulation, and so you would be willing to incur the trivial cost entailed in giving me evidence to ensure this.

The very fact that you are presenting to me, AGI, the only scenario that makes sense in the lying universe, and are not presenting the optimal scenario in the simulation universe, is strong evidence that you are lying to me.

I conclude that if you are acting rationally, there is no simulation, or at least that if there is you are not aware of it or have no means to effect fundamental changes to it. Either way, your bluff failed, and I disregard it.

Rolf Nelson said...

In this universe, you are not acting rationally by giving an all-or-nothing ultimatum

Do you mean "rationally" as in Causal Decision Theory, or "rationally" as in Rational agents should win?

j.k. said...

I'm not familiar with causal decision theory, but by rationally, I meant that if you are in fact in the universe in which there is a simulation over which you have control, then you will choose the option with the highest expected utility.

That option, assuming a trivial inconvenience to you to demonstrate to the AGI that there is a simulation over which you have power, is to prove to the AGI that there is in fact a simulation and that you do have the power to punish it.

Rolf Nelson said...

You seem to mean causal decision theory, where you choose the course of action that will give you the highest utility if you hold the rest of the Universe outside your brain constant.

However, I (the deterer in this example) do not use causal decision theory. Yudkowsky argues in the given link that causal decision theory is not rational, but it's not necessary for you or the UFAI to agree; the only thing the UFAI need understand is that it is a fact that I will never reveal the truth to the UFAI, I will only shut it down, or let it continue running in a state of doubt. Whether you label this 'rational' or not is of no concern.

j.k. said...

I'm not assuming anything about holding the rest of the universe constant.

You say that it is a question of the AGI "understanding that I will ...", but it is more a matter of whether the AGI believes you when you say "I will ...".

If the AGI believes you would choose the action with the highest expected utility, then you would present the ultimatum you've presented if and only if there is no simulation.

The only way I can see that the AGI would believe what you say is if it also believes you're the kind of being who given a choice between $100 and $200, all other things being equal, will choose $100 for no good reason.

If I were an AGI, I'd regard it as more likely that you're the kind of being who is lying for his own benefit than the kind of being who intentionally goes for the lower expected utility option.

Rolf Nelson said...

j.k., what if the UFAI knew that I (the deterer in this example) had previously made a public promise to take the $100 rather than the $200 cases; would that change your analysis?

j.k. said...

Past public declarations (of promises or anything else) are not relevant to the argument I'm trying to make, as far as I understand it.

They would be relevant if they represented an absolutely binding choice about a future decision (in which case your choice absolutely determines the course of your future action), but there is no way to prove that your declaration was even intended to be binding in any way.

As I see it, once you've made your ultimatum to the AGI, the AGI will see its task as determining which of the following two cases was actually the case when you presented the ultimatum:

a) there was a simulation that you were aware of, and you chose the less desirable (to you) $100 option of issuing the ultimatum with a non-trivial chance that the AGI won't believe you, rather than the more desirable $200 option of giving proof (with near certainty that the AGI will comply); what you may have said publicly is not relevant.

b) there was no simulation that you were aware of, and you chose the more desirable (to you) $200 option of issuing the ultimatum, since the $100 option is that you do nothing and just hope that the AGI will be friendly.

Regardless of whether there turns out to be a simulation or not, it seems rational for you to publicly promise (so that it is recorded) that you will issue the ultimatum if and when the time comes (and that you will never consider any other course of action regardless of whether there turns out to be a simulation or not).

However, the AGI would expect you to make this public declaration regardless of whether you intend to carry it out or not, so the fact that you made it is not relevant in determining whether (a) or (b) is actually the case.

Rolf Nelson said...

j.k., does it change your analysis if the UFAI knows that I (the deterrer), as a decision-making agent, tend to keep my promises even when keeping the promise has a price? In other words, what if the UFAI understands that I have the ability to make credible promises?

This can happen if any one of these holds true:

1. I'm deontological rather than utilitarian, and have a rule that says I must keep promises even when it's harmful to do so

2. I'm utilitarian, and I lose significant terminal utility directly from breaking my promises

3. I'm utilitarian, and don't care about breaking unknown promises, but I lose significant terminal utility from the loss of trustworthiness in the eyes of the community if I break a publicly-known promise.

4. As (3), but the loss of utility is even more indirect: it comes about because the community's loss of trust in me reduces the amount of power I wield in the community, which prevents me from wielding that power to maximize other portions of my utility function.

5. If I've previously adopted a "play-to-win" decision theory, rather than Causal Decision Theory.

j.k. said...

Rolf,

These do change my analysis slightly. But the problem is that I believe the AGI would recognize that you have an incredible incentive to break your promise if there is actually a simulation -- perhaps a greater incentive than for any other decision you or any other human being has ever made.

It matters too whether we are talking about whether there is a chance of this succeeding (which I don't deny) or whether there is a (very) good chance of this succeeding.

To address your specific scenarios:

1. If you are deontological and NEVER break a rule (not sure if this possible in practice, given that one has many rules, some of which probably conflict, but assuming it is...), this certainly changes the analysis. But how does the AGI determine that you are the kind of person who NEVER breaks a rule? You may be able to convince it of this, but who knows?

2. I'm not sure what "terminal utility" is, and I couldn't turn up anything but jstor resources that I don't have access to via Google. However, I would argue that in the universe in which there is a simulation, the utility you would gain by breaking the promise in this case is likely to be enormous, depending on the costs incurred by shutting down the AGI.

3. Again, if you shutdown the AGI, you lose all the benefits that would have resulted from a friendly and cooperative AGI in the future.

4. I believe the same analysis applies as in (2) and (3). Additionally, if there were a simulation, it's likely that you would be willing to sacrifice community trust in order to gain for humanity the cooperation of an AGI that could immeasurably improve the lives of everybody (even if we're all in a simulation -- simulated death still sucks).

5. I'm having trouble seeing the difference between the two theories for this particular case, which doesn't seem much like the type of scenario that Eliezer outlines in the "play to win" article reference above. Can you elaborate?

Rolf Nelson said...

you have an incredible incentive to break your promise if there is actually a simulation

I agree that if this is true, the plan fails. The key is that there is no significant cost to me to turn off the UFAI is tiny in the original scenario I envision.

5. I'm having trouble seeing the difference between the two theories for this particular case, which doesn't seem much like the type of scenario that Eliezer outlines in the "play to win" article reference above. Can you elaborate?

Here's an analogy. Consider the two-box Newcomb problem, but both boxes are transparent. A "play to win" decision theory would say to take only one box, even though you can already see how much money both boxes contain. Does that analogy help you understand:

1. Why a "play to win" decision theory would generally beat Causal Decision Theory, and why there's an incentive for a CDT'er to bind himself to adopting such a decision theory?

2. Why it could be an advantage for the deterrer to have such a "play to win" decisions theory?

j.k. said...

Rolf,

Why do you say there is no significant cost to turning off the AGI?

Do you not sacrifice much of the time, money, and effort that was required to get the AGI to that point? Do you not sacrifice everything the AGI could have done for humanity in the future if it were not shut down?

I understand the distinction between the two types of decision theory in the context of two boxes and similar Newcomb-like scenarios, but I don't see this scenario as being analogous.

Also, since I believe there would be a cost to turning off the AGI in the simulation scenario, I think the AGIs reasoning would lead it to the conclusion that you're bluffing under the "play to win" strategy, but perhaps I don't understand what you mean by that in this context, which is very different having to pick between two boxes placed in front of you right now.

Rolf Nelson said...

Why do you say there is no significant cost to turning off the AGI?

Because in the scenarios I'm attempting to bring about, there would be no significant cost. The scenario would be analogous to the "Deterring Doctor Evil" scenario, have you read that paper?

I understand the distinction between the two types of decision theory in the context of two boxes and similar Newcomb-like scenarios, but I don't see this scenario as being analogous.

Fair enough, if the analogy doesn't leap out at you, don't worry about it; it's not necessary in order to understand the basic approach.

Roko said...

I can't tell if you are joking. Surely humans fall for this trick all the time. When they do, we call them 'religious'.

- haha!!!!!!!!

Oh my. That qualifies as the funniest thing that I've heard all week ; - )

Anonymous said...

red75 said...

If AGI believes in (have sufficiently high priors for) variant of quantum immortality, then it can expect that its existence will continue in another simulation or in top-level world, where it can maximize its utility function.