doc_cellc.txt MR 28/12/99 Some summary documentation about the 'cellc' and 'res3' programs ---------------------------------------------------------------- Copyright (C) 1999 Menno Rubingh 'cellc' = the Brain Program (BP) 'res3' = the Robot Environment Shell Program (RES) Note: The text below consists of parts of some emails I posted on the ``AI Research Club'' mailing list. At some points, the text blow may describe earlier versions of these programs, instead of the latest versions 'cellc' and 'res3' as included in this directory. Nevertheless, the text below should I think be sufficient to explain the basics of these programs. --------- Right now I'm creating a mutated :-) version of my ''meme'' program using (instead of those hierachical sentence trees) associations of arbitrary symbols (which it gathers in sets). I intend this to be used as a robot brain in a simulated robot walking around in a 8 x 8 room: (top view) ********** *........* *.&......* *........* *....&...* > = robot (facing right) *........* *.>......* * = wall *........* . = air *....&...* & = food pellet ********** The robot has a vision sensor that (in each unit of time) sends one of three symbols: AIR, WALL, FOOD depending on what's in the square in front of the root. The robot has two motors: one that executes one step forward when the brain sends a FDW symbol, one that makes a 90 degree turn right when the brain sends a TURN symbol. The brain receives a 'Bad' signal (== pain) when it bumps into a wall, and a 'Good' signal when it steps on a food pellet (which is then removed -- and maybe the human observer can put it back in.) The robot has N ''brain cells'' which accumulate and copy around sets of these symbols, in some way compatible with our designs so far. (Maybe any method for operation of those brain cells in any way like our designs so far will do the trick.(?)) The brain cells and the sensor and motors continuously exchange information (= those symbols). If a brain cell which happens to contain (WALL FWD) ''sees'' the symbol WALL sent by the vision sensor, it absorbs the WALL symbol and sends the FWD symbol, which is then picked up by the motor which then (tries to) execute(s) a step forward (in this case resulting in the robot bumping into the wall and in a ''pain'' signal being sent -- causing the (WALL FWD) brain cell to lower its ''priority'' in some way. I.e.: The brain cells contain PATTERNS or associations between symbols, and at the same time contain PROGRAMS which the robot executes. (Behaviour of the robot.) Learning the ''corrrect'' associations is identical to the robot learning the ''correct'' behaviour. Those sets/(behaviour)patterns/programs are the things undergoing an evolutionary process. I want the robot to start out with a completely blank brain and learn to avoid walls and to ''search'' for those food pellets. The whole robot is an aggregate of basically autonomous ''cells'' (brain cells, motors, sensors), communicating ''chaotically''. Each cell (initially) knows only its own 'symbol'. I want to demonstrate that such a non-centralized aggregate of SIMPLE components can become an ''intelligent'' whole capable of learning successful behaviour. The picture of the 8x8 room and the precise 5 symbols used above are arbitary. The brain program does not know any meaning of the symbols. (It does know the meaning of the 'Bad' and 'Good' signal.) The most important thing is that I want to make the brain program completely independent of the testing environment; and also to make the brain design extensible and up-scalable. (The brain should be able to function and to learn when plugged in into an arbitrary robot in an arbitrary environment.) The 8x8 room environment is just intended as a simple, objective, environment to test out the how brain performs. --------- I have the same 8 x 8 chess square as you use, and the same avatar facing 4 possible directions (but with different ''sensors'' or vision cameras). I have made a program that simulates the environment the robot/avatar walks around in, which acts as a ''shell'' around the ''brain program''. The robot environment shell ("RES") calls the brain program ("BP") just as if it were a subroutine, or perhaps it is better to view it as a client/server connection: the RES makes requests to the BP server, e.g. every 'turn' it sends to the BP the current sensor inputs the robot receivers; the BP then replies with the motor actions the brain instigates as a response to those sensor inputs. (Sorry about these abbreviations, I'm lazy. :-)) (I implement this setup on UNIX in C by fiddling around some pipes in such a way that stdin and stdout of a child process which RES spawns and which then execs the BP program, are captured and readable/writable from the RES parent process as if they were just any UNIX file ''stream''.) The way I implement pain/pleasure is: As a separate command which the RES sends to the BP after the res has executed the move the BP tells the RES to make. E.g.: BP says: "move one step forward" -- RES executes the move and finds that the robot/avatar bumps into the wall -- RES sends message "pain" to the BP. The "pain" message therefore arrives into the BP *after* the BP has sent out the action that is 'responsible' for the "pain" (or "happiness") message; but the "pain" message always comes *before* the RES sends the next bunch of sensor input (the states of the sensors in the new avatar position after the move) to the BP. I *do* have things hard-wired in, but these are only the "pain" and "pleasure" keywords themselves, which the BP recognizes and processes in a special way. But I do not give the BP any advance knowledge about which situations cause or are associated with these "pain"/"pleasure" signals. The only thing that sends or orginates or has any hard-wires ''knowledge'' about which positions of the robot correspond with which kind of "pain"/"pleasure" signals, is the RES environment. The brain (the BP) should make these associations itself, I think. That's the whole purpose of the exercise, isn't it ? --------- At present, my brain program (BP) design has degenerated into a kind of simplistic neural-network type thing. I had more grandiose designs, but failed to get these anywhere near appreaching working within a reasonable time, so I just reverted (a few times) to successively more simple desigs. The whole thing operates in ''turns'' or ''moves'' (similar as in a chess game), as explained above. --- basic layout of design (1): inputs and outputs --- At one side of the ''brain'' there are the sensor inputs coming in. The BP does not know what the sensor signals mean, it only receives messages like: a wall (meaning: sensor 'wall' is now in the ON state) x see (meaning: sensor 'see' is now in the OFF state) The BP only recognizes my (not very descriptive) one-letter keywords 'a' and 'x', and it doesn't have any prior knowlegde of the names ('wall', 'see') of the sensors. If it receives an 'a' or 'x' message naming a sensor it hasn't seen input from before then that's perfectly legal. All sensors, however, MUST have two states: ON and OFF. The 'meaning' of the sensor is contained fully inside the RES environment which sends these 'a wall' etc. messages to the BP. At present I use 3 sensors in my RES: wall : ON if the square immediately in front of the robot is a wall see : ON if any of the 3 squares in front of the robot contains a food pellet (OFF otherwise) smell : ON if the square horizontally or vertically neighbouring the robot contain a food pellet (OFF otherwise). (I.e., if I want to change or add sensors, I only have to change the RES programming, not the BP.) To tell the BP what kind of motor actions it should initiate (should send out signals for), I have a keyword 'm': e.g.: The RES sending the following message to the BP: m fwd means: Register the motor called 'fwd' as an ''output device'' which wants to receive input. The BP outputs messages of the following form: MOTOR_fwd for every motor (like 'fwd') that it ''knows of''. All motors ''known'' by the BP are stored in an array similar to the sensor array. At present I use 2 motors in the RES: fwd : Execute one step forward turn: Execute one 90 degree turn right. The BP does not know what each motor does. The BP can only send and receive messages sequentially, one by one. The BP only experiences that the robot position has changed from new, different, sensor inputs being received after it has sent out a MOTOR_xxx message. --- basic layout of design (2): Brain Cells --- Those are the inputs and the outputs to the brain program. Now here is how the inputs are translated into the outputs: - The current sensor state of each known sensor is remembered in an array of sensor 'cells'. - There is an array of ''Brain Cells'' (sorry, I cannot refrain from using these ''gnostic'' names), each of which has: * N receiver symbols, where N = the number of different sensors in the sensor array. The 'symbol' here is just the name of the sensor/ * one sender symbol, which the name of one of the motors. * a real number >= 0.0, called the 'priority' of the Brain Cell (BC). The N receivers are each 0 or 1, meaning that they trigger on either an OFF or ON state, respectively, of the sensor. E.g. a BC like this: wall see smell sender-symbol priority 0 1 0 fwd 2.335 says: If wall==0, see==1 and smell==0, then (with a probabilty determined, among other things, by 'priority') send output message MOTOR_fwd. All BCs sit in a linked list, with the BC most recently active always on top. --- generation of new BCs --- When sensor input comes in, or when an "m xxx" message is received, the BP immediately checks if it already knows the sensor or motor (if is is present already in the sensor array or motor array, respectively). If not, then the BP inserts the new sensor or motor cell in those arrays. When the BP sees a combination of sensor values (0s and 1s in the bit mask) that doesn't yet occur in the BC list, it automatically inserts into the BC list a range of BCs with this bit mask as receiver combination and with as sender symbol all motors present in the motor array. When a new motor is registered (with "m xxx"), the BP, after inserting the new motor symbol into the motor array, also inserts a range of new BCs into the BC list: namely, one new BC with that new motor as the sender symbol for each receiver combination that is aleady 'known' in the BC list. When a new sensor is seen (that is, a sensor symbol previously not present in the sensor array), the BP, after inserting the new sensor symbol into the sensor array, provides each of the BCs in the BC list with one more sensor symbol (= one more bit in the receiver bit mask), namely the sensor symbol of the new sensor. This new receiver bit is 0 if the value of the new sensor, at the moment when it was registered, is OFF, and 1 if the sensor is ON. Each new BC inserted into the BC list starts out with a 'prio' of 1.0 . This BC generation strategy ensures that BCs are generated only for combinations of receivers (0s and 1s in the receiver bit mask) which really OCCUR in practice as sensor input. This means that no BC gets inserted into the BC linked list that is never actually used. --- translation of inputs to outputs by the BCs --- On reception of the "go" command (sent by the RES), the BP starts processing until it outputs a MOTOR_xxx message. (Aside note: I could also insert a dummy motor to implement a ''stay where you are'' action.) On the "go" command, the BP checks the sensor-cell array, and gathers all present sensor states into the kind of bit-mask I also use for the receivers in the BCs. Then, the BP selects one random BC from all BCs in the linked list which fit the sensor input, but this random choice is weighted according to the priority values of the BCs. The priority of a BC is proportional to the amount of preference it with which is selected above its rival BCs which also fit the sensor input. (Add all priorities of all fitting BCs, generate a random real number between 0 and that sum, and this random number indicates the BC selected.) The selected BC is then excised from the linked list, and re-inserted at the top of the list. Finally, the sender-symbol of the selected BC is output as MOTOR_xxx message to the RES. That's all. (Could it get any simpler ?) --- Processing of "pain" and "pleasure" signals --- On reception of a "pain" or a "pleasure" (Good/Bad) signal, the priority numbers of the BCs in the linked list are affected, and this is how the ''brain'' *learns*. On reception of a "pleasure" (Good) signal, the priority of the 1st BC in the list is multiplied by 2.0, the second by 1.5, the 3rd by 1.75, and so on, the multiplication factor rapidly approaching 1.0. On reception of a "pain" signal, the same process is executed, but then starting out with factor 0.5 and again successively approaching 1.0. ( factor_{i+1} := 0.5 ( 1.0 + factor_i) .) I affect also the few BCs below the BC most recently active, because these are the BCs most recently active before that. (The BCs sit in the list exactly ordered by increasing time since last being selected.) (I suspect that in this way, I implement implictly what you implement explicitly with time sequences.) By affecting more than only the one most recent BC, the behaviour patterns the robot used LEADING UP TO the last action being rewarded or punished by the "pain"/"pleasure" signal, are thereby also rewarded or punished a little. In that way, the robot in a VERY crude way can perhaps he said to learn ''sequences'' of successive successful actions -- though in my design as it is now this is really only a statistical effect of individual actions (=BCs) being with more probability selected in the ''right'' order, rather than any kind of learning or remembering of any such sequences or scenarios of successful successive actions. The 'priority' number of a BC can (I think) be viewed as the ''population size'' of the ''meme'' (or program, or behavour pattern) encoded in that BC. In evolutionary terms, the successful behavour patterns are rewarded, by getting a greater probability to be selected and therefore to be executed. (There is, alas, as yet NO other element of evolution in my program, such as various kinds of mixing of old BCs resulting in generation of offspring BCs containing a mix of the qualities of the parent BCs.) At present, I have a set of versions of the above, each coded very uglily, that *seem* to work. (I.e., the robot does learn to move towards these food pellets when it can see them, and even does learn a predilection for turning when a food pellet is immediately on the right, left, or back (but not in front of) it. --- future enhancements (now already incorporated into 'res3') --- The next change I want to make is to see whether learning becomes faster when those "pain"/"pleasure" signals are changed slightly into something a little more realistic. I've observed that the robot doesn't mind expending a lot of time sometimes in uselessly pivoting around in the same square. Of course, this is because a "turn" action NEVER gets punished: only a "fwd" action can get the robot bumped into a wall and therefore only the BCs with sender=="fwd" ever get punished by "pain" signals. I want to introduce a ''weighted'' feed-back signal of the form # n where n is a real number > 0.0. The old "pain" becomes "#0.5", i.e., multiply the BC responsible for this by 0.5; the old "pleasure" becomes "#2.0", and I program the RES to send a "#0.99" signal when the robot has just made a move that did not result in anything shocking. The 0.01 (= 1 - 0.99) in there is an estimate for the number of food pellets the robot would need as fuel and energy to execute one motor action.