doc_cellc.txt
MR 28/12/99

   Some summary documentation about the 'cellc' and 'res3' programs
   ----------------------------------------------------------------

Copyright (C) 1999 Menno Rubingh


'cellc' = the Brain Program (BP)
'res3' = the Robot Environment Shell Program (RES)


Note: The text below consists of parts of some emails I posted on the ``AI 
Research Club'' mailing list.  At some points, the text blow may describe 
earlier versions of these programs, instead of the latest versions 'cellc' and 
'res3' as included in this directory.  Nevertheless, the text below should I 
think be sufficient to explain the basics of these programs. 


---------


Right now I'm creating a mutated :-) version of my ''meme'' program using 
(instead of those hierachical sentence trees) associations of arbitrary 
symbols (which it gathers in sets).  I intend this to be used as a robot brain 
in a simulated robot walking around in a 8 x 8 room:  (top view) 

        **********
        *........*
        *.&......*
        *........*
        *....&...*   > = robot (facing right)
        *........*
        *.>......*   * = wall
        *........*   . = air
        *....&...*   & = food pellet
        **********

The robot has a vision sensor that (in each unit of time) sends one of three 
symbols: AIR, WALL, FOOD depending on what's in the square in front of the 
root.  The robot has two motors: one that executes one step forward when the 
brain sends a FDW symbol, one that makes a 90 degree turn right when the brain 
sends a TURN symbol.  The brain receives a 'Bad' signal (== pain) when it 
bumps into a wall, and a 'Good' signal when it steps on a food pellet (which 
is then removed -- and maybe the human observer can put it back in.) 

The robot has N ''brain cells'' which accumulate and copy around sets of these 
symbols, in some way compatible with our designs so far.   (Maybe any method
for operation of those brain cells in any way like our designs so far will
do the trick.(?))  

The brain cells and the sensor and motors continuously exchange information
(= those symbols).  If a brain cell which happens to contain (WALL FWD) 
''sees'' the symbol WALL sent by the vision sensor, it absorbs the WALL
symbol and sends the FWD symbol, which is then picked up by the motor
which then (tries to) execute(s) a step forward (in this case resulting
in the robot bumping into the wall and in a ''pain'' signal being sent --
causing the (WALL FWD) brain cell to lower its ''priority'' in some way.
   I.e.: The brain cells contain PATTERNS or associations between symbols, and 
at the same time contain PROGRAMS which the robot executes.  (Behaviour of the 
robot.)  Learning the ''corrrect'' associations is identical to the robot 
learning the ''correct'' behaviour.  Those sets/(behaviour)patterns/programs 
are the things undergoing an evolutionary process. 

I want the robot to start out with a completely blank brain and learn to 
avoid walls and to ''search'' for those food pellets. 

The whole robot is an aggregate of basically autonomous ''cells'' (brain 
cells, motors, sensors), communicating ''chaotically''.  Each cell (initially) 
knows only its own 'symbol'.  I want to demonstrate that such a 
non-centralized aggregate of SIMPLE components can become an ''intelligent'' 
whole capable of learning successful behaviour. 

The picture of the 8x8 room and the precise 5 symbols used above are arbitary. 
The brain program does not know any meaning of the symbols. (It does know the 
meaning of the 'Bad' and 'Good' signal.) The most important thing is that I 
want to make the brain program completely independent of the testing 
environment; and also to make the brain design extensible and up-scalable.  
(The brain should be able to function and to learn when plugged in into an
arbitrary robot in an arbitrary environment.)  The 8x8 room environment is
just intended as a simple, objective, environment to test out the how brain
performs.


---------


I have the same 8 x 8 chess square as you use, and the same avatar facing
4 possible directions (but with different ''sensors'' or vision cameras).
I have made a program that simulates the environment the robot/avatar walks 
around in, which acts as a ''shell'' around the ''brain program''.  The robot 
environment shell ("RES") calls the brain program ("BP") just as if it were a 
subroutine, or perhaps it is better to view it as a client/server connection: 
the RES makes requests to the BP server, e.g. every 'turn' it sends to the BP 
the current sensor inputs the robot receivers; the BP then replies with the 
motor actions the brain instigates as a response to those sensor inputs.   
(Sorry about these abbreviations, I'm lazy. :-)) 

(I implement this setup on UNIX in C by fiddling around some pipes in such a 
way that stdin and stdout of a child process which RES spawns and which then 
execs the BP program, are captured and readable/writable from the RES parent 
process as if they were just any UNIX file ''stream''.) 

The way I implement pain/pleasure is: As a separate command which the RES 
sends to the BP after the res has executed the move the BP tells the RES to 
make.  E.g.: BP says: "move one step forward" -- RES executes the move and 
finds that the robot/avatar bumps into the wall -- RES sends message "pain" to 
the BP. 

The "pain" message therefore arrives into the BP *after* the BP has sent out 
the action that is 'responsible' for the "pain" (or "happiness") message; but 
the "pain" message always comes *before* the RES sends the next bunch of 
sensor input (the states of the sensors in the new avatar position after the 
move) to the BP. 

I *do* have things hard-wired in, but these are only the "pain" and "pleasure" 
keywords themselves, which the BP recognizes and processes in a special way.  
But I do not give the BP any advance knowledge about which situations cause or 
are associated with these "pain"/"pleasure" signals.  The only thing that 
sends or orginates or has any hard-wires ''knowledge'' about which positions 
of the robot correspond with which kind of "pain"/"pleasure" signals, is the 
RES environment.  

The brain (the BP) should make these associations itself, I think.  That's the 
whole purpose of the exercise, isn't it ? 


---------


At present, my brain program (BP) design has degenerated into a kind of 
simplistic neural-network type thing.  I had more grandiose designs, but 
failed to get these anywhere near appreaching working within a reasonable 
time, so I just reverted (a few times) to successively more simple desigs. 

The whole thing operates in ''turns'' or ''moves'' (similar as in a chess
game), as explained above.


   --- basic layout of design (1): inputs and outputs ---

At one side of the ''brain'' there are the sensor inputs coming in.  The
BP does not know what the sensor signals mean, it only receives messages
like:

       a wall  (meaning: sensor 'wall' is now in the ON state)
       x see   (meaning: sensor 'see' is now in the OFF state)

The BP only recognizes my (not very descriptive) one-letter keywords 'a' and 
'x', and it doesn't have any prior knowlegde of the names ('wall', 'see') of 
the sensors.  If it receives an 'a' or 'x' message naming a sensor it hasn't 
seen input from before then that's perfectly legal. 

All sensors, however, MUST have two states: ON and OFF.   The 'meaning'
of the sensor is contained fully inside the RES environment which sends
these 'a wall' etc. messages to the BP.

At present I use 3 sensors in my RES: 
    wall :  ON if the square immediately in front of the robot is a wall
    see  :  ON if any of the 3 squares in front of the robot contains
             a food pellet (OFF otherwise)
    smell : ON if the square horizontally or vertically neighbouring the
             robot contain a food pellet (OFF otherwise).

(I.e., if I want to change or add sensors, I only have to change the RES 
programming, not the BP.) 

To tell the BP what kind of motor actions it should initiate (should send
out signals for), I have a keyword 'm': e.g.:  The RES sending the
following message to the BP:

      m fwd

means: Register the motor called 'fwd' as an ''output device'' which wants to 
receive input.  The BP outputs messages of the following form:

     MOTOR_fwd

for every motor (like 'fwd') that it ''knows of''.  All motors ''known''
by the BP are stored in an array similar to the sensor array.

At present I use 2 motors in the RES:

       fwd :  Execute one step forward
       turn:  Execute one 90 degree turn right.

The BP does not know what each motor does.  The BP can only send and receive 
messages sequentially, one by one.  The BP only experiences that the robot 
position has changed from new, different, sensor inputs being received after 
it has sent out a MOTOR_xxx message. 


   --- basic layout of design (2): Brain Cells ---

Those are the inputs and the outputs to the brain program.  Now here is
how the inputs are translated into the outputs:

- The current sensor state of each known sensor is remembered in an
  array of sensor 'cells'.

- There is an array of ''Brain Cells'' (sorry, I cannot refrain from using
  these ''gnostic'' names), each of which has:
    * N receiver symbols, where N = the number of different sensors
       in the sensor array.  The 'symbol' here is just the name of the
       sensor/
    * one sender symbol, which the name of one of the motors.
    * a real number >= 0.0, called the 'priority' of the Brain Cell (BC).

The N receivers are each 0 or 1, meaning that they trigger on either an OFF or 
ON state, respectively, of the sensor.  E.g. a BC like this: 

      wall  see  smell   sender-symbol   priority
      0     1    0       fwd             2.335

says: If wall==0, see==1 and smell==0, then (with a probabilty
determined, among other things, by 'priority') send output message
MOTOR_fwd.

All BCs sit in a linked list, with the BC most recently active always on top. 


   --- generation of new BCs ---

When sensor input comes in, or when an "m xxx" message is received, the BP 
immediately checks if it already knows the sensor or motor (if is is present 
already in the sensor array or motor array, respectively).  If not, then the 
BP inserts the new sensor or motor cell in those arrays.

When the BP sees a combination of sensor values (0s and 1s in the bit mask) 
that doesn't yet occur in the BC list, it automatically inserts into the BC 
list a range of BCs with this bit mask as receiver combination and with as 
sender symbol all motors present in the motor array.  

When a new motor is registered (with "m xxx"), the BP, after inserting the new 
motor symbol into the motor array, also inserts a range of new BCs into the BC 
list: namely, one new BC with that new motor as the sender symbol for each 
receiver combination that is aleady 'known' in the BC list. 

When a new sensor is seen (that is, a sensor symbol previously not present in 
the sensor array), the BP, after inserting the new sensor symbol into the 
sensor array, provides each of the BCs in the BC list with one more sensor 
symbol (= one more bit in the receiver bit mask), namely the sensor symbol of 
the new sensor.  This new receiver bit is 0 if the value of the new sensor, at 
the moment when it was registered, is OFF, and 1 if the sensor is ON. 

Each new BC inserted into the BC list starts out with a 'prio' of 1.0 . 

This BC generation strategy ensures that BCs are generated only for 
combinations of receivers (0s and 1s in the receiver bit mask) which really 
OCCUR in practice as sensor input.  This means that no BC gets inserted into 
the BC linked list that is never actually used.  


   --- translation of inputs to outputs by the BCs ---

On reception of the "go" command (sent by the RES), the BP starts processing 
until it outputs a MOTOR_xxx message.  (Aside note: I could also insert a 
dummy motor to implement a ''stay where you are'' action.)  

On the "go" command, the BP checks the sensor-cell array, and gathers all 
present sensor states into the kind of bit-mask I also use for the receivers 
in the BCs.  Then, the BP selects one random BC from all BCs in the linked
list which fit the sensor input, but this random choice is weighted according 
to the priority values of the BCs.  The priority of a BC is proportional to 
the amount of preference it with which is selected above its rival BCs which 
also fit the sensor input.  (Add all priorities of all fitting BCs, generate a 
random real number between 0 and that sum, and this random number indicates 
the BC selected.) 

The selected BC is then excised from the linked list, and re-inserted at the 
top of the list.  Finally, the sender-symbol of the selected BC is output as 
MOTOR_xxx message to the RES.  That's all.  (Could it get any simpler ?) 


   --- Processing of "pain" and "pleasure" signals ---

On reception of a "pain" or a "pleasure" (Good/Bad) signal, the priority 
numbers of the BCs in the linked list are affected, and this is how the 
''brain'' *learns*.  

On reception of a "pleasure" (Good) signal, the priority of the 1st BC in the 
list is multiplied by 2.0, the second by 1.5, the 3rd by 1.75, and so on, the 
multiplication factor rapidly approaching 1.0.  On reception of a "pain"
signal, the same process is executed, but then starting out with 
factor 0.5 and again successively approaching 1.0.  ( factor_{i+1} :=
0.5 ( 1.0 + factor_i) .)

I affect also the few BCs below the BC most recently active, because these are 
the BCs most recently active before that.  (The BCs sit in the list exactly 
ordered by increasing time since last being selected.) (I suspect that in this 
way, I implement implictly what you implement explicitly with time sequences.) 
By affecting more than only the one most recent BC, the behaviour patterns the 
robot used LEADING UP TO the last action being rewarded or punished by the 
"pain"/"pleasure" signal, are thereby also rewarded or punished a little.  In 
that way, the robot in a VERY crude way can perhaps he said to learn 
''sequences'' of successive successful actions -- though in my design as it is 
now this is really only a statistical effect of individual actions (=BCs) 
being with more probability selected in the ''right'' order, rather than any 
kind of learning or remembering of any such sequences or scenarios of 
successful successive actions.  

The 'priority' number of a BC can (I think) be viewed as the ''population
size'' of the ''meme'' (or program, or behavour pattern) encoded in that BC.  
In evolutionary terms, the successful behavour patterns are rewarded, by 
getting a greater probability to be selected and therefore to be executed. 
(There is, alas, as yet NO other element of evolution in my program, such
as various kinds of mixing of old BCs resulting in generation of offspring
BCs containing a mix of the qualities of the parent BCs.)

At present, I have a set of versions of the above, each coded very uglily, 
that *seem* to work.  (I.e., the robot does learn to move towards these food 
pellets when it can see them, and even does learn a predilection 
for turning when a food pellet is immediately on the right, left, or back (but 
not in front of) it.  


   --- future enhancements (now already incorporated into 'res3') ---

The next change I want to make is to see whether learning becomes faster when 
those "pain"/"pleasure" signals are changed slightly into something a little 
more realistic.  I've observed that the robot doesn't mind expending a lot of 
time sometimes in uselessly pivoting around in the same square.  Of course, 
this is because a "turn" action NEVER gets punished: only a "fwd" action can 
get the robot bumped into a wall and therefore only the BCs with sender=="fwd" 
ever get punished by "pain" signals.  I want to introduce a ''weighted'' 
feed-back signal of the form 

       # n

where n is a real number > 0.0.  The old "pain" becomes "#0.5", i.e., multiply 
the BC responsible for this by 0.5; the old "pleasure" becomes "#2.0", and I 
program the RES to send a "#0.99" signal when the robot has just made a move 
that did not result in anything shocking. The 0.01 (= 1 - 0.99) in there is an 
estimate for the number of food pellets the robot would need as fuel and 
energy to execute one motor action.