Ruminations on matters of some importance. Also programming and AI.

Thursday, November 10, 2011

Guerilla - my attempt to build a strong AI

Having given the topic of artificial general intelligence (or AGI) a fair amount of thought over the last decade, a plausible path forward, based on something like an encyclopedia for algorithms, has slowly congealed in my head. I need to write it down before I start coding. Preferably in the form of a blog post, to get feedback. This is that post.

Background

When I write about strong AI or AGI, I mean an algorithm with general problem solving skills. Not necessarily a mind inhabiting a robot, running around talking to people and passing the Turing test (though eventually a successful AGI could be taken in that direction), but rather something that can be applied to a wide variety of problems. For example: playing chess and poker, picking stocks, solving puzzles, proving math theorems, analyzing and writing computer code, speech and image recognition, improve itself by learning and self modification, etc.


There is an alluring approach to AGI, where you begin with a "simple" seed program, which will learn and self improve and eventually evolve to human intelligence and beyond. I think that in practice, the problem is that it has taken lots of people lots of time to invent all those algorithms that can be useful in general problem solving. Humans are pretty good at general problem solving - certainly much better than the best software/hardware combination we have today. Constructing algorithms to solve specific problems is in itself a kind of problem solving, and we humans have certainly invented many different algorithms for a wide variety of purposes. One might suspect that the computational depth of inventing/discovering algorithms is very large. So unless the seed program is actually very advanced, more like a full grown Sequoia than a seed, it might take too much time for it to invent all those algorithms and heuristics that we, as a civilization, already have.


Topics that are potentially useful to understand include:


  • Statistics (Bayes rule, distributions, Markov chains, running an experiment, etc).
  • Algorithms for optimizing parameters (genetic algorithms, simulated annealing, steepest descent, linear programming, random testing and purely analytical methods). In some situations it could take days or more to try a single parameter configuration. In other situations evaluating the fitness of a parameter configuration is just a couple of CPU instructions. Approaching these different tasks require a variety of methods
  • Logic
  • Basic mathematics (calculus, algebra, geometry, etc)
  • Code analysis (lambda calculus, etc)
  • Formal proof methods (knowledge of the methods listed here: http://en.wikipedia.org/wiki/Mathematical_proof) and formal reasoning
  • Tree and graph searching (depth-first, breadth-first, A*, beam, minimax, alpha-beta, Dijkstra)
  • Bayesian belief networks
  • Pattern recognition
  • Compression
  • Monte Carlo method
  • Clustering and classification
  • Fourier transforms, wavelets
  • Function approximation (analytical or with neural networks or genetic programming)
  • Inverting functions (in other words, given a program function and its output tell me what the input was - this turn out to be a very general way of posing questions)

...and of course many more.

I consider statistics and parameter optimization to be the most important areas for intelligence, since you need them to learn. Pattern recognition (perhaps implemented with statistics and optimization) and various forms of tree searching are also vital.

An encyclopedia of algorithms

My approach is based on implementing an encyclopedia of useful algorithms that:

  1. Know to which tasks they can be applied
  2. Can give a rough, initially often ridiculously rough, estimate of what the probability is that they solve the task after a certain time or, in the case of an open-ended task such as optimization, can give a rough estimate of how well the task is solved after a certain time.
  3. Can continuously update the estimate as the task is solved

It is important to stress that it is not enough to just implement a library of algorithms that can work on the same datastructures. The important thing is that you need metadata, describing when an algorithm can be used and the algorithmic complexity in time and memory. With time, you want to automatically build up more knowledge of the algorithms, gradually improving the time and success estimations as well as improving your knowledge of which algorithms are suitable in which situations.

The algorithms should be broken up into as many natural subtasks as possible, so that when new algorithms are added to the system, they can try to solve these subtasks as well, thus creating new hybrid algorithms.

A task is basically a function call together with its arguments. An algorithm that can solve a task implements the corresponding function and a time/success estimator. Similarly to function overloading in C++, the function header might state specializations, additional properties, of the function arguments that must be true for the algorithm to be a contender to solve it. It is important that the Scheduler (see below) immediately knows which algorithms are suitable for a certain task, so the mentioned "additional argument properties" must be immediately available. If a certain property requires work to find out - "is the list in the first argument sorted?" - and an algorithm still needs it, a new algorithm can be constructed that first checks if the list is sorted and then either fails or asks the task again with the new property set. This new algorithm would have higher estimates of running time and lower estimates of success than the original algorithm.

The Scheduler

When a task is added to the task pool it always has a price attached to it. The Scheduler runs those algorithms that currently promise best expected price per time unit. Algorithms that need subtasks solved has to assign a price to those too, before adding them to the task pool. That price should reasonably reflect how much of the overall time the subtask is expected to take. If it turns out that a subtask consistently take a smaller or larger fraction of the estimated total time, there should be algorithms that modify the price for these subtasks and correspondingly the total time estimate (also, see Self Improvement below).

Open-ended tasks where something should be optimized cannot have just one value attached to them. Instead they need to have a function from achieved performance to price, or at least a rough mapping from some performance values to price. This mapping stops the system from optimizing for too long on a relatively unimportant subsubsubtask somewhere.

Algorithms that can either fail or succeed on a task need a similar mapping, where they give probability of success as a function of time.

One can also imagine that the algorithms could give a confidence interval or standard deviation on their estimates to tell the Scheduler how sure they are of their estimates, but I am not quite sure how this should be used, so for now they won't.

For my first try, the Scheduler will use a simple heuristic. The algorithm that claims to have the best price / time ratio for any task currently in the task pool will get to run it. For one thread this will be optimal in some sense. It gets more complex when you have many algorithms running in parallel on multiple cores or even clusters. For example, you want to slightly punish two algorithms trying to complete the same task in parallel, since the first one to succeed will always make the other algorithms work moot. On the other hand sometimes it makes sense to attack an important problem from several angles, so you don't want to forbid it entirely either.

In a later design, the Scheduler should be able to use the task pool to think about how it should Schedule. Obviously this must not end in an infinite Scheduling loop or general inefficiency, since normally the Scheduler must work very quickly.

The Scheduler's work and indeed that of the whole system will not be especially interesting when there are only a few algorithms implemented. The first interesting moment will be when new hybrid algorithms emerge, where subtasks are sometimes handled by unanticipated algorithms. I am not sure how many algorithms needs to be implemented for the system to show interesting emergent behaviour. Probably more than ten, but less than a hundred, depending, of course, on which algorithms and what you count as an individual algorithm.

Self improvement

From the above, you can see that the system will not be self improving at first. However, by adding self improvement tasks, it will start doing things like improving the time estimates of the algorithms, learn to what degree one algorithm's failure to solve a task should also reflect on the estimates of other algorithms, learn which situations are suitable for which algorithms; for example which algorithms perform well on the subtasks posted from a certain algorithm. It can also have an algorithm that constructs new lower-priced training tasks from real tasks, for example generalizations or specializations of a problem, just out of "curiosity".

Producing new/improved code and algorithms, either for self improvement or as the solution for a puzzle or some other task, is among the most advanced tasks the system can try. It will not be able to do much of interest in this area until it is really strong, but it could start out by trying simple modifications of existing algorithms or trying them on similar tasks, a bit like in genetic programming.

The system is also inherently self improving from a sort of network effect, since for each algorithm added, the existing algorithms get potentially better.

What now?

When I have implemented the base system, I will start by applying the AGI to function inversion. Trivial stuff at first, of course, but I hope to eventually make it solve real puzzles like Towers of Hanoi by a combination of searching and deduction. Also, it would be fun to try some games and an NP-complete problem like 3-SAT.


It would be beautiful if the algorithms were written in the same simplified, purely functional (thus easier to analyze), LISP that I plan to write the problem definitions in. Alas, good AI needs to be fast and a 100x slower system just because the algorithms run in my own immature poorly interpreted language instead of C is not so fun. However, a good JIT compiler is a very good test for an AGI. You continuously have to weigh optimization time against running what you have. If the AGI in some distant future JITs its own code, effectively running and optimizing itself, I will consider the entire project a grand success :).

I forget why I called the project Guerilla. It was probably terribly clever. Nevertheless, here is the link to the Github repository: https://github.com/gurgeh/Guerilla. It does not contain much yet.