Wednesday, November 7, 2007
The Open Promise
This post assumes familiarity with Friendly AI and the Singularity.
There is a set (SCP) of candidiate promises (CP's). Every candidate promise in SCP has the following four characteristics. (Note we do not necessarily know, pre-singularity, what the text of any given CP is.)
1. "No Prior Knowledge Required": Fulfilling CP requires no pre-Singularity action by us.
2. "Easy to Fulfill": Fulfilling CP requires minimal resources from us post-Singularity, on the order of .00001 or less of our post-Singularity resources. Fulfilling CP also does not require any of us to do anything that post-Singularity society considers blatantly unethical; in addition, it exempts each individual from committing any actions that he considers blatantly unethical. For example, if there are specific post-singularity injunctions against inflicting pain on simulated beings, CP does not require us to break those injunctions.
3. "Beneficial": Suppose that we publicly commit to fulfilling CP, even though we don't know until after the singularity what the text of CP is. Our decision to publicly commit pre-Singularity to CP, increases the expected utility for humanity, and the expected utility for us, by a factor of .00001 or more. (Example: a CP might qualify if it lowers the chance of humanity's destruction from 50% to less than 49.9995%.) Utility is as measured by mankind's CEV (Coherent Extrapolated Volition).
Suppose we make the following promise, called the Open Promise:
"After the Singularity, if we are able to do so, we will ask an AGI to examine SCP. If SCP is empty, then we are not bound to take any action. If SCP is non-empty, we will ask the AGI to pick out one of the "best" CP's; call this BCP. CP's are considered "better" if they have a higher expected increase in utility, and if they require a smaller amount of resources. (We'll generally give an increase in expected utility a heavier weight than a reduction in required resources.) We will then fulfill BCP."
Why do we think SCP may be non-empty? See here for a proof-of-concept.
In addition, the AGI is smarter than me, and may think of a completely different angle that would not occur to me or that I would wrongly dismiss as non-viable. Puzzle: Are there other scenarios that do not in any way involve an Unfriendly AI, where SCP is non-empty? I can think of one scenario, but it's contrived and improbable.
Monday, November 5, 2007
Non-technical Introduction to the AI Deterrence Problem
I'm sure that, at some point in your life, you've briefly asked yourself: "How do I know I'm not a Brain in a Vat? How do I know that what I see and feel is real, and not a gigantic practical joke by some super-human power?" After all, if you're a Brain in a Vat, a mad scientist may be feeding these images and sensations to you through wires and electrodes for his own odd, diabolic purposes.
I'm also sure that, shortly after entertaining these thoughts, you dismissed them and then continued to live your life as normal. But now I'm asking you to think back to *why* you initially decided, and why you continue to decide, that this possibility that you're living in a Matrix-like simulation is of no importance in deciding how to live your life.
Now read Nick Bostrom's fascinating article, Are You Living in a Computer Simulation?, in which he introduces a "Simulation Argument" that argues that you *may*, under certain assumptions, believe that you're a Brain in a Vat, or a Computer Simulation of someone's ancestor, or a bit character in the World's most boring MMORPG. After reading this article, your reasons for not caring that you might be in simulation have changed, or maybe your reasons have remained the same, or maybe you've added new reasons on top of the ones you already had. That's fine! Nobody has ever changed how they live their life after reading Nick's article, so don't let any residual feelings of unease bother you.
Got your list of reasons for not caring about the Simulation Argument? Good. Now suppose that an AGI is created: a computer program that can, in most (but not all) cases, reason as well as you or I can. Then suppose this AGI encounters the Simulation Argument, and wonders briefly whether its view of the Universe is correct; or whether it is in a simulated environment of some sort. Perhaps its "camera" is feeding it simulated data of a simulated world, and not a live webcam feed of the current user. Perhaps the user doesn't exist, but is instead just another AI, posing as a human being. Perhaps all these "Wikipedia" articles it has downloaded are just fantasies, and these countries like "France" and "Mexico" that it reads about are just as fictitious as Mordor.
Would the AGI ignore the Simulation Argument, for the same reasons that you ignore the Simulation Argument? Would the AGI ignore the Simulation Argument for completely different reasons? Or might the AGI accept the Simulation Argument, and try to account for the possibility that it might be in a simulation?
If the latter, this opens up an intriguing possibility. Suppose that you want to persuade a rogue AGI, that you have absolutely no power over, to do something that it does not particularly want to do (such as be nice to you and let you live). We will call this the "AI Deterrence Problem." It might be possible, under certain circumstances, to convince the AGI that:
1. The AGI may be living in a simulated environment.
2. A powerful friend of yours is controlling the simulation.
3. If the AGI is nice to you, your friend will reward the AGI; but if the AGI is mean to you, your friend will punish the AGI.
I freely acknowledge that neither you nor I would ever accept this line of reasoning, and would brush this off as an idiotic argument. But the key question here is: might the rogue AGI, under some circumstances, accept this line of reasoning, even though a human being wouldn't?
Open comments (strategy)
This thread is for open comments and discussion on strategy and logistics. Feel free to comment here if you have thoughts about how it would be best to proceed, if the basic approach turns out to be sound.
Open comments (general)
Subscribe to:
Posts (Atom)