Technical Aspects of Distributed Genetic Objects

Bill Bereza
Computer Science and Information Systems
Grand Valley State University
Allendale, MI 49401

Faculty Advisor: Carl Erickson

Introduction

Genetic algorithms

Genetic algorithms use techniques of natural evolution to solve problems. The algorithm usually starts with a random solution and keeps modifying it in some way until an optimal solution is found. This is more than just random searching. If the GA is properly set up, it will be constantly building upon previous, sub-optimal solutions. The goal of the genetic algorithm is to take the best parts of a solution, and use those pieces to make a better solution. The term Genetic Algorithm is sometimes strictly confined to solutions which can be represented entirely as bit-strings. A more general term would be evolutionary programming. I will use the term GA in the broader sense. In any GA, the solution of a problem is defined by a gene or set of genes. In this report I will use the terms "solution" and "gene" interchangeably. In a broad definition of GA genes can be represented as any computer manipulatable data type.

Modifying genes

Figure 1: Recombination

A genetic algorithm creates new solutions by modifying the previous ones. It can make these new genes using methods of recombination and mutation. Recombination is the process of mixing two or more solutions together. To understand the process of recombination, think of the genes as a piece of string. If you have two parents, you'll have two pieces of string. To recombine the parents into new children, cut the strings into equal lengths. Then reassemble the strings, making sure to swap every-other piece, and you'll have totally new genes from old genes.

Mutation in nature occurs when a mistake is made during the reproduction process. In a GA, a mutation is simply a random change to one or more parts of a gene. It can occur either during recombination, or it can be placed into the genetic data at some other time.

Fitness

Mixing together old solutions to make new ones is useless without the ability to know which is better. That's why calculating fitness is the most important process in a GA. A genetic algorithm must be able to incrementally move toward better solutions. This requires that partial fitness be rewarded. A sub-optimal solution can't be thrown out, because there may be parts of it that are better than other solutions. Recombination of solutions should hopefully bring out the best parts of the antecedent solutions while removing most of the unfit genes.

Genetic algorithms are based entirely on judgement calls to know if the GA is improving, or how well the recombination method is working. A poorly designed fitness function could have your genetic algorithm going around in circles with bad solutions. Eventually you'll want to end the GA search, so you need to know how close you came to succeeding.

Artificial Life

Artificial life forms are virtual entities which behave in some way like real life. They do now have to correspond to any known life. The environment and structure of Alife can be completely artificial. Alife is a broad term which can include "blobs" which just move, eat and mate, or it can be a simulated eye which behaves like the real thing. You can create life in any form or environment you want.

Artificial life does not always evolve. It can use genetic algorithms, but GA is not a required part of Alife. Artificial life forms can just be simulations of natural organisms. An example of non-evolving Alife would be a simulated school of fish that are only designed to move like fish, but not to exhibit other social behaviors of fish, like eating and mating.

Distributed GA

Distributed GA is the extension of genetic algorithms to multiple computers looking for solutions to the same problem. There are two basic ways of sharing genetic data between the separate machines, the Island Model and the Shared Pool Model.

Island model

Figure 2: Island Model of DGA

In the Island Model of DGA, the multiple machines working almost totally independent of each other. A computer will occasionally share genetic information with one or more computers.

Shared Pool Model

Figure 3: Shared Pool Model of DGA

Every N generations in a Shared Pool Model, a machine will put it's best genes into the pool and then take some genes out for a new generation of solutions. The pool will only store the best genes from all the machines working on the problem.

CLASS LIBRARY

The purpose of the library was to have a generic class hierarchy which would allow us to experiment with different aspects of Genetic Algorithms using the same root design. The classes were designed so that any form of genetic replication could be implemented, and many different genotypes were possible. The root of is the Genetic Protocol, a set of rules which described what actions a genetic object would have to be able to perform. The methods in the protocol are the minimum required to implement an evolving block of data. There are two sets of actions specified in the protocol, methods of procreation, and access to the genetic data. Procreation is performed through either cloning (asexual), or two-parent sexual reproduction. Cloning can be performed with a variable mutation rate.

Figure 4: Two-Point Crossover

With sexual reproduction a mutation rate and a split point can be specified. The protocol doesn't set rules about what kind of recombination is performed. For example, although a split point can be set in calls to the sexual reproduction methods, the protocol doesn't require that a specific kind of crossover recombination be used. It could be two-point crossover with the first cut at the split point, or it could be an N-point crossover where the split point is the first cut. These details are up to the class which implements the Genetic Protocol. The other method which the protocol requires is for accessing the genetic data. It just says that a genetic object must be able to return a pointer, (char *), to a block of data which is the genetic information of this object.

Genetic Protocol

The figure below shows that there are two classes in the Genetic Library which implement the Genetic Protocol, Genes and GenesNXData. Genes implements the protocol from scratch while GenesNXData inherits data storage capabilities from the NextStep NXData class. Both of these are abstract classes and should not be instantiated directly. Both of these classes implement the protocol methods for reproduction. The only difference between them is that Genes does not do any allocation of the genetic data, while GeneNXData automatically picks up this ability from the NXData class. The code for reproduction in these two classes is the same, and they should inherit it from a mutal parent class rather than reimplement the same methods.

Figure 5: Genetic class hierarchy

Genes and GenesNXData

There are two classes in the library which implement the Genetic protocol. They're both very similar, but the differences between them are important. The Genes class was designed first. It is a true abstract class, so if you subclass it, you would have to implement the methods for allocating and accessing the genetic data. GenesNXData inherits byte storage from it's parent class NXData. Because GenesNXData subclasses NXData, it is specific to the NextStep operating system. Genes uses only standard Objective-C, so it could be used on any system with an Objective-C compiler. The reason GenesNXData was designed was to take some of the work from inheriting classes. Having a super-class allocate data took one more possible problem away from sub-classes. The NXData class also handles transporting data to distributed objects, which will be useful later.

Genetic Representation

The genetic information in the Genetic Protocol is represented as a block of bytes. These bytes can be remapped to anything required for your solution. The Genes and GenesNXData class will perform recombination with a byte as the smallest part of a gene.

Figure 6: Structure of Schedule Genes

An example of what can be done with byte-oriented genetic data would be a course schedule.One course is represented by a tuple of three bytes. The entire gene is a list of N courses. Each byte in the course tuple represents a number. The first number in the course tuple is for a Professor, the second is a room, and the third is the time slot.

Figure 7: Structure of PlantGenes

But each part of your genetic data doesn't have to fit within byte-size boundaries. A representation of genes for an artificial plant is shown at the right. Each part of the gene may be larger than a single byte.

Reproduction

Two different methods of reproduction are defined in the Genetic Protocol. Sexual and asexual reproduction are the two processes by which natural organisms duplicate themselves, and Genetic classes also have these methods available.

The Objective-C methods for reproduction in the Genetic Protocol are described below as they are implemented in the Genes and GenesNXData classes. If you implement your own Genetic Protocol conforming class, you could have the methods behave differently.

-clone;
-cloneWithMutation:(long)proba;

These two methods handle asexual reproduction. cloneWithMutation: returns a new Genetic object. proba is a number between 0 and (231)-1 which is the range of the random() function. For every byte of genetic data cloned, the method will get a random number and compare it to proba. If the number is below proba, the byte will be copied, and if the random number is greater than proba, then a random number will be copied instead. clone just calls cloneWithMutation: with a proba value of 0.

-reproduceWith:(id <Genetic>)other withMutation:(long)proba splitAt:(unsigned int)split newSize:(unsigned int)newSize onto:(id <Genetic>)another;

There are five different reproduceWith: methods, but they all eventually call this method. Documentation for each specific method can be found in the Appendix.

The five arguments to the reproduceWith: method are other, proba, split, newSize and another. other is the Genetic object that this object will reproduce with. It is the other parent. proba is the same as in cloneWithMutation:. split defines the first split point in an N-point crossover. The Genes and GenesNXData classes use two-point crossover. newSize is the size in bytes that the returned child will be. The Genes and GenesNXData classes require that this value be the minimum of the sizes of both parents. another is an object onto which the new recombined genes will be copied. This is the child, and it must be a previously allocated Genetic object.


-(unsigned long)leftGen;
-(unsigned long)rightGen;
-setLeftGen:(unsigned long)toGen;
-setRightGen:(unsigned long)toGen;

These methods retrieve and set the number of generations that this object has descended from the first object. Since the Genetic Protocol defines two-parent sexual reproduction, leftGen and rightGen correspond to the generations of descent through the "mother" and "father" of this gene. Mother and father are nebulous terms in GA, so left and right are used instead.


-(unsigned int)geneSize;
-(char *)geneticCode;

geneSize returns the number of bytes in the genetic data, while geneticCode returns a pointer to the beginning of the genetic data.


-adjustCode;

adjustCode is a method which should go through the genetic data and make sure the genes fit the problem. Genetic data is initialized as random bytes. Depending on how the genes are interpreted, the random data may need to be set within limits. From the course schedule example above, you saw that each byte of data was a number identifying a professor, room or time slot. A byte has 256 different possible values, but there may not be that many professors, so every byte of genetic data for the professor number will have to be set within the correct range of values.

Future Work

There are a few things that could be done to enhance the Genetic Protocol. The first would be to add a reproduceWith: method to handle true N-point crossover, rather than just two-point crossover. Also the Genetic Protocol only supports asexuality or bisexuality.These could be considered special cases of multi-sexuality. The protocol should support reproduction with N parents. This could be done by adding two methods, reproduceWithList:, where the object will recombine it's genes with every Genetic object in a list, and nGen:N, which would return the number of generation through parent N.

Using the Genetic Protocol

The Genetic Protocol can be used either by subclassing a Genetic class, or by implementing a class which conforms to the Genetic Protocol. There are three main steps to using the Genetic Protocol. First, represent the problem as a genotype. Secondly you would create a class which conforms to the Genetic protocol. You can do this either by sub-classing one of the provided classes, or by designing your own. Third, you would actually use the class.

Representing the problem

You need to be able to represent possible solutions in some computer manipulatable, block oriented way. A good way to represent your genetic data is to declare a C struct which represents your solution, or a piece of it. The length of the genes would be the size of the structure, and a block of bytes of that size could be type cast to the declared structure. After every initialization or reproduction, the Genetic classes will call the adjustCode method which will give you the oppurtunity to make sure that any random data will fit within your solution type. You have to be able to compare one solution to another and know which is better.

Calculating Fitness

The fitness function is the most influental part of the genetic algorithm. It is the fitness algorithm that judges which genes will form the next generation and which will be dumped. The Genetic Protocol does not declare any methods for calculating fitness. That is entirely up to the user of the Genetic class. The Genetic classes were designed so that they could be used for things as diverse as artificial life or mathematical equations. An artificial bug could use the genetic classes without having a defined fitness function. How long the bug lived and how often it procreated would specify it's fitness.

There are a few guidelines to calculating fitness.The range of fitness values should be as large as possible. If the fitness granularity is too course, e.g. only 12 or 256 possible fitness values, then it's possible that there may be many genes with the same fitness score, and some of those may not be as good as other genes. The two requirements of any fitness function is that genes with the same fitness really are equals, and a gene with a higher fitness is always better than a gene with a lower fitness.

Creating a Genetic class

The easiest way to make a Genetic class is to subclass one of the provided abstract Genetic classes, Genes and GenesNXData. All of the information given up to this point would be enough for you to create your own Genetic class directly from the Genetic Protocol. If you use one of the abstract classes, there are a few things that need to be done.

Some methods are fully implemented in the abstract classes, but they should be over-ridden for a Genetic sub-class specifically for your problem.Both of the abstract classes have initialization methods. By adding initialization methods, problem specific additions to the Genetic class can be initialized for every new object. A basic init method should allocate any data it needs to, randomize the data, and call the method adjustCode. Both Genes and GenesNXData do this in a method called initWithSize:(unsigned int)length. You should create your own init method that would first call this method with length set to the number of bytes in your problems genetic representation.

The other method in the Genes and GenesNXData classes which should be over-ridden is adjustCode. This method goes through the random genetic data and adjusts it to fit into the type of solution you're looking for. It is called after every initialization and reproduction.

There are also some methods you'll need to implement only if you're subclassing the Genes class. The methods geneSize, geneticCode, leftGen, rightGen, setLeftGen:, and setRightGen: are described above, and are already implemented by the GenesNXData class.

Using a Genetic class

The Genetic classes can be used in any imaginable way, but to use them in a traditional genetic algorithm the following sequence could be performed.

  1. Initialize a group of genetic objects.
  2. Until the best possible solution is found,
    1. calculate the fitness of the objects.
    2. Select the best, or kill the worst.
    3. Use the remaining objects for the next generation.
  3. Repeat.

Distributed Genetic Protocol

The Distributed Genetic protocol inherits all of the functionality of the Genetic Protocol

inherits the Genetic Protocol

adds the following methods

- fitness

every gene now carries around it's own fitness value

This allows genes to be manipulated anonymously

- setFitness

explicitly adjust the fitness

- reFit

tell the gene to recalculate it's own fitness if it can

this method should be sent any time a genes code is change from outside itself

- resetWithGenes

copy a block of bytes into the object's genetic data and recalculates the fitness

- copyOnto

tells a genetic object to copy it's genes onto someone else

Class Heirarchy

Genetic Protocol

|GenesNXData

|Distributed Genetic Protocol

DistGenesNX

<another figure, of course>

how to use - methods to implement, subclass, override

subclassing the top-level Distributed Genetic classes

clarify the difference between "implement" and "override" - and above too

implement methods

- fitness

the most important thing to override.Calculate the fitness any way you choose

to improve performance, you should cache the fitness value, and only change it after

reFit was called

- setFitness

DistGenesNX implements this, but not if it were an abstract class.

- reFit

should set some flag so that the next time [fitness] is asked for, it will be

recalculated. This method should _not_ actually calculate fitness at this time

override methods

resetWithGenes

implemented by DistGenesNX

copyOnto

implemented by DistGenesNX

how to use bottom-level Genetic classes - needs a better name

By themselves, these Distributed Genetic Objects work just like normal (genetic?) objects

except that they carry and calculate their own fitness, and they can copy onto each other

PoolServing and MachineClientel

these classes do the actual sharing of genetic objects

they implement and island model where genes are shared in a communal pool

Example problems

Class Scheduling

Contents of DGO/RCS

BruceRawInfo,v A raw info file that matchesBruce's schedule

DGO.h,v <unused>

DGO_outline.rtf,v outline for this paper

DiffNumber,v a very big number

FullCopyList.h,v subclass of List which implements encodeUsing:

FullCopyList.m,v and decodeUsing: methods

MachineClient.h,v client side class which communicates with the pool

MachineClient.m,v

MachineClientStats.h,v client statistics tracked by PoolServer

MachineClientStats.m,v

MachineClientel.h,v A protocol which MachineClients must follow

MachineSpeed.h,v This class just tests the speed of mating. <garbage>

MachineSpeed.m,v <garbage>

Makefile,v Makefile

Makefile.malloc,v Makefile which compiles in libMallocDebug

Makefile.prof,v <don't trust this Makefile or Makefile.malloc>

PoolServer.h,v Pool Server

PoolServer.m,v

PoolServing.h,v protocol which must be followed by PoolServer

ScheduleRawInfo,v The Main rawinfo file as used by the Makefile

SecureVendor.h,v Does some type of authentification b/w

SecureVendor.m,v MachineClient and PoolServer

TheNumber,v A big number

accelderivout.sh,v* This script is used by pool_deriv.gnuplot.sh

bruce.out,v An empty file.

bruce.sched,v A schedule file in PRT format.

bruces,v Bruce's own schedule.

checkfitness.m,v A program which checks the fitness of schedules.

client_size.sh,v* An experiment script based on experiment.sh

contable.pl,v* Converts raw constraints to a table for genes +more

deriv.sh,v* Used for massaging the poold log files

deriv_gen.sh,v* Similar to above.

experiment.sh,v* Runs experiments automatically using the evil rsh.

f95a6.txt,v Bruce's schedule again.

gens_trans.sh,v* poold log massaging

gettop.m,v Communicates with a poold process to check status.

machine.m,v Runs the MachineClient and connects to poold.

oldderv.sh,v* <junk>

parsederivout.sh,v* More poold log massaging.

pool.gnuplot,v ""

pool.gnuplot.sh,v* ""

pool_deriv.gnuplot,v ""

pool_deriv.gnuplot.sh,v* ""

pool_gens.gnuplot,v ""

pool_gens.gnuplot.sh,v* ""

pool_size.sh,v* Version of experiment.sh

poold.m,v The pool daemon.

reform.pl,v* <???>

sched.csv,v Bruce's schedule?

sched.txt,v ""

speed.m,v Speed tests using MachineSpeed

start.sh,v* Used by experiment.sh

starta.sh,v* ""

startall.sh,v* ""

startclient.sh,v ""

startpool.sh,v* ""

startsched.sh,v* ""

Contents of GenLib/RCS

DistGenesNS.h,v <unused>

DistGenesNS.m,v <unused>

DistGenesNX.h,v Subclasses GenesNXData and adds DO encoding

DistGenesNX.m,v

DistributedGenetic.h,v Protocol for distributed genes

Genes.h,v Genes class <unused?>

Genes.m,v

GenesNSData.h,v <unused>

GenesNSData.m,v <unused>

GenesNXData.h,v The most useful genetic class

GenesNXData.m,v

Genetic.h,v Genetic Protocol

Makefile,v

Contents of SchedGA/RCS

Constrainer.h,v

Constrainer.m,v

Constraining.h,v

Makefile,v

SchedGenes.h,v <unused>

SchedGenes.m,v <unused>

SchedNSGenes.h,v <unused>

SchedNSGenes.m,v <unused>

SchedNXGenes.h,v

SchedNXGenes.m,v