Saturday, December 8, 2012

Things that could go wrong (version 2)


An updated list of scenarios in which the current version of the "AI Deterrence" proposal fails.

1. Scenarios where implementing a human-friendly "AI Deterrence" simulation is impossible or expensive:

  • Nobody bothers to create an "AI Deterrence" UFAI simulation following a human-friendly singularity, even if cheap.
  • There is no significant possibility that a human-triggered singularity would ever be human-friendly.
  • Usefully simulating UFAI requires a prohibitively large number of resources 
  • Usefully simulating UFAI involves a non-negligible risk of the UFAI escaping and creating a post-singularity civilization-wide catastrophe
  • Usefully simulating UFAI requires violating post-singularity ethical rules about creating intelligence

2. Scenarios where the UFAI ignores deterrence, even in the case where human-friendly "AI Deterrence" simulations are objectively a possibility:

  • The UFAI is unable to conceive or acknowledge that it could be in a simulation. For example, a given AIXItl implementation cannot act constructively in scenarios where it is aware of the existence of computers of equal or greater power then itself. Since most positive singularity scenarios would include such computers, AIXItl has difficulty conceiving of many physically-possible "AI Deterrence" scenarios. Another example is that we humans never change our actions in the presence of simulation arguments; the fundamental reasons for this may generalize to many, or perhaps even all, other powerful intelligences as well.
  • Other simulation arguments completely overwhelm the "AI Deterrence" simulation argument according to the UFAI's calculations, making the UFAI's behavior in general difficult for us to predict or influence.
  • The UFAI subscribes to some non-standard decision theory that makes it undeterrable. For example, an intelligence may consider our type of deterrence to be a form of "coercion" that, by policy, gets completely ignored or triggers retaliation.
  • The benefits we would be willing and able to provide the simulated UFAI end up being even smaller than the cost of the deviations we would request in the UFAI's behavior. For example, if the UFAI's utility is linear in the percentage of the cosmos' resources it directly controls, and if there is a significant "minimum cost" or risk to the UFAI in implementing even tiny requested deviations to its behavior, then we will be unable to cooperate at all.

3. Scenarios where our pre-singularity endorsement of "AI Deterrence" has no effect on the actual probability of post-singularity AI Deterrence being implemented:

  • A credibility gap appears: our current endorsements turn out to have no effect on our actual post-singularity behavior. For example, we end up lacking the desire to follow through on pre-singularity obligations; or if the policy being endorsed is vague, we end up discharging our obligations in a trivial and maximally convenient (but ineffective) manner.
  • Current endorsement is unnecessary because the Friendly AI ends up following a nonstandard decision theory that causes it to automatically spend limited resources on AI Deterrence, even against our contemporaneous post-singularity wishes.