SThinking as evolution on behaviours Tpcp-discuss@listserv.lanl.gov - Hi Cybernetics people, I take the liberty to send with this email a brief description of my AI robot brain design ideas. My purpose with this is to ask all of you, if you can, to point me to existing data that overlaps with this, and above all to people and groups who might at this moment be working on these kinds of things. Thanks very much for any pointers you can give me. ----- * Goal : ~~~~ To create a computer program (called ''Brain Program'') that would make a PC to operate like the brain of an intelligent robot. The PC is connected to 1. a set of robot arms (''Motor'') controlled by the brain program, and 2. a set of Sensors that provide the brain with data about its environment. Therefore the assembly { Brain + Motors + Sensors } acts like an intelligent robot. * Further analysis of the goal : ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ The Brain Program, in my opinion, is the essential part of the intelligent robot. The Brain Program should be constructed in such a way that it automatically learns to make use of input Sensor data and learns to operate its Motors, even in the case where e.g. new Sensors/Motors are plugged into the robot in the course of its life. That is: the Brain Program compensates for the ideosyncracies of the particular Sensors/Motors that it is connected with. I see the above problem in evolutionary terms : The Brain Program should control the robot ( = { Brain+Motors+Sensors} ) in such a way that the robot survives. The ''intelligence'' if the robot consists of the circumstance that it learns autonomously how to survive optimally. For the time being, I only look at this survival of the robot as survival of the individual robot itself. My goal is to make a robot that learns selfishly useful behaviour. The last word ''useful'' is here to be interpreted as a synonymn for behaviour that is favourable for the survival of the robot. * General design : ~~~~~~~~~~~~~~ I've created a minimally simple Brain Program that consists basically simply of a neural network. This simple Brain Program is constructed as follows (source code and some documentation at http://www.rubingscience.org/aiclub/toc.html) : The data maintained by the Brain Program is a list of items, each of which is arbitrarily called a ''Brain Cell''. The contents of each Brain Cell is as follows: Motor Priority Sensor values value value s1 s2 s3 s4 s5 sN m p +---+---+---+---+---+--- ... --+---+ +----+ +---+ | a | b | c | e | f | | z | | m1 | | P | +---+---+---+---+---+--- ... --+---+ +----+ +---+ which means: if the Sensors are in states (a,b,c,d,e,f,...,z), then actuate Motor 'm1'. This presupposes that the robot has N Sensors (labelled s1...sN) and M Motors (labelled m1...mM). The Sensor states are discrete, and might be e.g. binary values (e.g. 1 = light sensor receives light, 0 = doesn't receive light). Each Brain Cell must be unique. The Brain Program operates in discrete, successive moves, each of which consists of : 1. Receive the states of the Sensors. 2. Choose from all Brain Cells the one cell the sensor values of which are most ''like'' the input Sensor states. The likelihood with which a certain Brain Cell is selected, *also* depends on the ''Priority'' value p of the Brain Cell. Higher p means higher likelihood of being selected. 3. Get the Motor value from the chosen Brain Cell, and send a signal to that Motor that makes that motor go active for a (short) time. 4. Get the pain/pleasure feedback signal that results from the physical action of the activated Motor, and change the Priority values of the Brain Cells on the basis of the value that pain/pleasure signal. (More on this below.) In step 2, if the degree of alike-ness between the input Sensor states and the chosen Brain Cell is below a certain threshold, the Sensor input is considered as a ''new'' thing, and another action than the above is executed, namely: a set of new Brain Cells is inserted, with as Sensor states that ''new'' set of Sensor input values, and with all possible values for the Motor ('m'). Each of the Brain Cells is a PROGRAM in which is encoded the action that the robot executes when the sensors see the pattern (a,b,c,....z). That is: all behaviours that the robot can potentially execute are stored in the Brain Cells. The sensor input values, in this minimally complex design, fairly directly control the robot's actions. The set of all the Brain Cells in the robot's brain is a population of competing programs. (I suspect that this may overlap with some of Koza's ideas/designs.) The Priority value of a Brain Cell is a non-negative real value, and represents the population size of the program (= behaviour) coded in that Brain Cell. Additionally to the input Sensors, the robot is also equipped with sensors that sense the pain/pleasure state of the robot. Pain might mean that the robot has bumped into a wall and has thereby damaged itself, or that the robot's fuel level is becoming uncomfortably low. Pleasure might mean the robot's fuel level has just increased. All pain/pleasure sensor inputs are summed (pain with negative weights, pleasure with positive weights) in a fixed (probably hard-wired) function, and result signal of that fixed function is fed to the Brain. The set of Brain Cells is a *list*, in which the Brain Cell that has just been executed is always removed from its old place and moved to the top of the list. Near the top of the list are thus always the Brain Cells that have been used most recently. When a pain/pleasure signal is received, the Priority values of the Brain Cells near the top of the list are multiplied by a non-negative quantity that depends on the value v of the pain signal as C * exp( v ), or a similar function. The effect of this is that a net-positive feedback signal (meaning ''Pleasure'') rewards the most-recently executed Brain Cells, by increasing the Priority value (population size) of these Brain cells; analogously, a net-negative feedback signal (meaning ''Pain'') decreases the Priority value (population size) of those Brain Cells. Good behaviour, i.e. behaviour that results in Pleasure signals, therefore reaches ever higher Priority, and is therefore more and more likely to be re-executed. Bad behaviour, i.e. behaviour that results in Pain signals, gets lower and lower Priority, and thereby gets executed less and less. * Ideas for extending the above simple design ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ In my perception, the key idea in the above is that the Brain Cells are programs in which *behaviour* is encoded, and that these programs compete with each other via an evolutionary process. These evolutionary processes are what makes the robot adapt and learn, and what makes the robot ''intelligent'' and also *autonomous*. At the moment, the very simple and minimal version of the Brain Program that I have running right now, and which effectively can control a simple simulated robot that learns to avoid walls and learns to ''seek out'' food pellets in a simulated environment it walks around it, doesn't yet do genetic mutation or crossover of the contents of Brain Cells -- this would be IMO the next interesting thing to extend the above design with. Crossover IMO could be included e.g. as an extra step in the cyclical operation of the Brain: e.g. in each cycle select two high-Priority Brain Cells, let them mate, and insert the created offspring behaviour pattern as a new Brain Cell, with a certain (small) population size (= Priority). In this way, it would seem possible to create in the operation of the Brain an infrastructure through which successful sub-patterns in Brain Cells can be communicated between Brain Cells. One of my promiment longer-term goals is to extend the above simple neural-network design into a Brain in which the programs (which are the entities that undergo evolution) consist of ''memes'', in the sense of being pieces of data/information that the robot communicates to/from other intelligent entities in its environment. I mean that in such a meme-communicating robot, the ''*thinking*'' going on in the Brain consists of an evolutionary process on these ''memes''; that is : ''thinking'' is nothing else than a dumb evolutionary process on these ''memes'' (which are themselves only dumb pieces of data). The meme-communicating robot would have one or more buttons, providing to the Brain the pain/pleasure signals, that the entities with which the robot communicates can press when they are pleased or disgusted with what the robot says to them. Result would be that the robot learns to communicate in a way that these parties find ''pleasant'' -- or in other words : the robot learns that behaviour that is in that case optimal for the survival of the robot is to communicate in an interesting and pleasant way with those people empowered to press its buttons. An even more interesting variant on the latter would be to eliminate the buttons, and to replace them by a pain/pleasure signal derived from the amount of new data that the robot learns. This would result in a robot that -- independently of whether it ''pleases'' persons in its surroundings -- autonomously and independently seeks to maximize its own knowledge. (A further thesis of mine is of course that scientists are already robots of that last kind :-).) --- The above is a fairly minimal description of my ideas. (But enough for now, given this medium of emails in a mailing list, I think.) Again, anyone who can point me to people who are already working on things like this, please inform me of those, if you will. At the moment, I have a feeling that I cannot proceed easily without interaction with others with overlapping ideas. It IMO just has to be the case that there exist research groups which do things that overlap with the above, but I find it a very strenous task to find them. Thanks very much ! --- Best regards, Menno (xxxxx@xxxx.xx> Ir. Menno Rubingh, Scientific programmer, Software designer, & Software documentation writer Doelenstraat 62, 2611 NV Delft, Netherlands phone xxxxxxxxxxxxxxx (answering machine backup) email xxxxxxxx@xxxx.xx http://www.rubinghscience.org/