Wednesday, November 7, 2007

The Open Promise

This post assumes familiarity with Friendly AI and the Singularity.

There is a set (SCP) of candidiate promises (CP's). Every candidate promise in SCP has the following four characteristics. (Note we do not necessarily know, pre-singularity, what the text of any given CP is.)

1. "No Prior Knowledge Required": Fulfilling CP requires no pre-Singularity action by us.

2. "Easy to Fulfill": Fulfilling CP requires minimal resources from us post-Singularity, on the order of .00001 or less of our post-Singularity resources. Fulfilling CP also does not require any of us to do anything that post-Singularity society considers blatantly unethical; in addition, it exempts each individual from committing any actions that he considers blatantly unethical. For example, if there are specific post-singularity injunctions against inflicting pain on simulated beings, CP does not require us to break those injunctions.

3. "Beneficial": Suppose that we publicly commit to fulfilling CP, even though we don't know until after the singularity what the text of CP is. Our decision to publicly commit pre-Singularity to CP, increases the expected utility for humanity, and the expected utility for us, by a factor of .00001 or more. (Example: a CP might qualify if it lowers the chance of humanity's destruction from 50% to less than 49.9995%.) Utility is as measured by mankind's CEV (Coherent Extrapolated Volition).

Suppose we make the following promise, called the Open Promise:

"After the Singularity, if we are able to do so, we will ask an AGI to examine SCP. If SCP is empty, then we are not bound to take any action. If SCP is non-empty, we will ask the AGI to pick out one of the "best" CP's; call this BCP. CP's are considered "better" if they have a higher expected increase in utility, and if they require a smaller amount of resources. (We'll generally give an increase in expected utility a heavier weight than a reduction in required resources.) We will then fulfill BCP."

Why do we think SCP may be non-empty? See here for a proof-of-concept.

In addition, the AGI is smarter than me, and may think of a completely different angle that would not occur to me or that I would wrongly dismiss as non-viable. Puzzle: Are there other scenarios that do not in any way involve an Unfriendly AI, where SCP is non-empty? I can think of one scenario, but it's contrived and improbable.


Anonymous said...

DISCLAIMER: English is not my first language.


Your idea intrigued me so I dedicated a few of thinking.
I agree that humans are biased (probably biologically) to don't accept the simulation argument. So, A Bayesian AI that is not biased is going to consider the simulation argument.

But I think there are two problems for using it against the AI.

The first is trivial and can be solved easily: The simulation argument comes from thinking about the world in a way that an AI could decide not to spend resources. Because this kind of philosophical inquire could be irrelevant for its goals we could be all transformed in computronioun or imposibilum or whatever before it read about your idea. Of course the solution is incorporating your little trick from the factory.

The second problem is the following:
You can convince the AI that it is in a simulation but how are you going to convince it that live in a simulation that is convenient for your goals and not its goals?

For instance: you are going to use x% resources to simulate the copies of the universe where the AI is tested. So, the probabilities that the AI is in a simulation is proportional to the number of simulations that you run. But the AI can change this probabilities deciding that when it achieve its goals it is going to use all the resources and time (the universe and the eternity) for computing simulations where if it decides a suboptimal behavior it is going to be disconnected. This only decision make very probable that it is running in "its" simulation and not "your" simulation.

My point is that you can accept that you are in a simulation but it doesn't tell you anything about the real world. You have not way to check how the real world was and how this simulation varies from the reality. It depends of who is running the simulation and if there is a good chance that you are who are going to run the simulation, then, you can choose whatever you want (what if fact you do). Because of this, my feeling is that you can't influence the AI using this "reversed causation" system. But is a nice idea. I enjoyed it.

Rolf Nelson said...


both of these are, indeed, serious problems, in the sense that we would need to be rather lucky for the strategy to actually work. I'll add a post today to elaborate on "things that can go wrong."

Are you familiar with the difference between Causal Decision Theory, and Evidential Decision Theory? (It's not common knowledge, I was unaware of the difference a few months ago.) Would you agree that, while an AI that uses Evidential Decision Theory may create copies of itself as you suggest, an AI that uses Causal Decision Theory would not?