Resource Scheduling with Distributed Genetic Objects

William I. Bereza
bereza@pobox.com

Computer Science and Information Systems
Grand Valley State University
Allendale MI 49401

Faculty Advisor:
Dr. Carl Erickson

erickson@oak.csis.gvsu.edu
Searching large state spaces with genetic algorithms combines the use of a best-fit search along with the possibility of searching the entire state space. The problem of scheduling resource use has a large number of possible solutions. By using a set of best-fit solutions, and allowing random changes, genetic algorithms can decrease the amount of time required to search the state space. Implementing genetic algorithms with object-oriented technology promotes development of generic, reusable classes and a natural representation for genetic entities. In our research we developed an Objective C class library for genetic algorithm experimentation. Then we extended traditional genetic algorithms to work in a distributed environment of networked workstations for increased performance. We analyzed the problem of scheduling classes and described it in terms amenable to genetic algorithms. A problem consisting of 94 classes, 21 professors, 15 rooms and 79 time slots was solved using our distributed genetic technique. This problem represents a state space of 10413 possible schedules. The results from the experiment were compared to an actual human-developed schedule for the same semester. This research will be discussed in terms of its application for finding solutions to traditionally intractable problems.

Classroom Scheduling Problem

At Grand Valley State University the courses for departments are scheduled by the department chair. The process for creating the schedule by hand involves making a list of courses to be scheduled and assigning a professor, room and time to each course. When each course is scheduled, it must be checked to make sure that there are no conflicts between any of the assignments made.

There are two problems which can arise because the department chair makes the schedules by hand. Creating the schedule is a time-consuming, mundane task. To save time, schedules from previous years are reused and modified for the new semester. Because of this schedules are pretty static. Changes are only made when necessary, and they're usually difficult to make because it involves rechecking every part of the schedule. If a schedule is not good one semester, it will very likely be just as bad the next year. Our goal in solving this scheduling problem would be to alleviate the two faults of time and quality, and to find the best parameters for the distributed GA which will solve the problem the fastest.

Genetic Algorithms

Genetic algorithms use techniques of natural evolution to solve problems. The algorithm usually starts with a random solution and keeps modifying it in some way until an optimal solution is found. This is more than just random searching. If the GA is properly set up, it will be constantly building upon previous, sub-optimal solutions. The goal of the genetic algorithm is to take the best parts of a solution, and use those pieces to make a better solution. The term Genetic Algorithm is sometimes strictly confined to solutions which can be represented entirely as bit-strings. A more general term would be evolutionary programming. We use the term GA in the broader sense. In any GA, the solution of a problem is defined by a gene or set of genes. The entire set of genes which make up a solution is called the genotype. In a broad definition of GA, genes can be represented as any computer manipulatable data type.

Modifying Genes

A genetic algorithm creates new solutions by modifying the previous ones. It can make these new genes using methods of recombination and mutation. Recombination is the process of mixing two or more solutions together. Crossover recombination involves combining separate parts of the genes to form a new gene. To understand the process of crossover recombination, think of the genes as pieces of string. If you have two parents, you'll have two pieces of string. To recombine the parents into new children, cut the strings into equal length segments at the same relative points. Reassemble the strings, making sure to swap every other piece, and you'll have a totally new solution from old genes. The number of cut points in the genes can be varied as 1-point, 2-point or n-point crossover. 1-point crossover is just a special case of 2-point crossover where the first cut point is at the very beginning of the genes. 2-point crossover is just a case of n-point crossover where n equals two.

Figure 1: Recombination

Mutation in nature occurs when a mistake is made during the reproduction process. In a GA, a mutation is simply a random change to one or more parts of a gene. It can occur either during recombination, or it can be placed into the genetic data at some other time.

Fitness

Mixing together old solutions to make new ones is useless without the ability to know which is better. That's why calculating fitness is the most important step in a GA. A genetic algorithm must be able to incrementally move toward better solutions. This requires that partial fitness be rewarded. A sub-optimal solution can't be thrown out, because there may be parts of it that are better than other solutions. Recombination of solutions should hopefully bring out the best parts of the antecedent solutions while removing most of the unfit genes.

Genetic algorithms are based entirely on judgement calls to know if the GA is improving, or how well the recombination method is working. A poorly designed fitness function could have your genetic algorithm going around in circles with bad solutions. Since a fitness must be calculated for every single individual in every generation, the speed of the function is very important. Eventually you'll want to end the GA search, so you need to know how close you came to succeeding.

Distrubuted GA

Distributed GA is the extension of genetic algorithms to multiple computers looking for solutions to the same problem. There are two basic ways of sharing genetic data between the separate machines, the Island Model and the Shared Pool Model.

Island Model

In the Island Model of DGA, represented in Figure 2, the multiple machines work almost totally independently of each other. A computer will occasionally share genetic information with one or more computers.

Figure 2: Island Model of DGAFigure 3: Shared Pool Model of DGA

Shared Pool Model

Every N cycles in a Shared Pool Model, shown in Figure 3, a machine will put it's best genes into the pool and then take some genes out for a new cycle of solutions. The pool stores the best genes from all the machines working on the problem. A cycle is not the same as a generation, because a client may go through many cycles while the genes it's using may be genes from an earlier generation which have been in the pool for a long time.

Scheduling Genetic Algorithm

The genetic algorithm for finding teaching schedules would be implemented as a class based on a distributed genetic class library, created by us, called DGO (Distributed Genetic Objects). We use the Shared Pool Model of distributed genetic algorithms. The first step in this task was to find a way to represent a classroom schedule in a form suitable for genetic manipulation. The schedule representation needed two things. It had to be able to be split up and recombined and still form a valid schedule, and we had to be able to quickly calculate a numeric fitness of a schedule.

Genotype

First we specifically defined what a schedule was, and what were the parts that made up a schedule. A minimal schedule would be a single class. A schedule for a single class is a tuple of the form { Person, Room, Time }, so a complete schedule would be a set of class tuples. To represent the schedule for the GA, we defined a schedule to be an array of structures, one for each class to be scheduled. Each structure in this array would represent the class tuple, with fields for Person, Room and Time. Every person, room and time that would be part of the schedule was assigned a unique numeric id. This would allow a schedule to be simply a bunch of numbers which can be easily randomized, split and recombined by a GA.

Fitness and the Constraints Model

For a scheduling genetic algorithm to work, we needed some way to judge the fitness of a schedule. Obviously there wasn't a perfect schedule to compare it to, but we could, in a way, compare it to the worst possible schedule. By defining a set of constraints which a good schedule should meet, we could deduct from the fitness of a schedule for every constraint it failed. Since a schedule is based on the elements Person, Room and Time, the basic constraints for any schedule should also involve these parts. We recognized eleven basic constraints. These constraints are split into two groups; constraints involving any single class, and constraints over all classes in a schedule. For each constraint we assigned a numeric value which would be subtracted from the fitness of a schedule if it failed to meet the constraint. Calculating the fitness is just a matter of checking a schedule against every constraint.

The eleven constraints are:

  1. Person constraints for each person which state whether that person is able to teach a specific class or how much they want to teach a class.
  2. Room constraints for each room which state if a class can be taught in that room.
  3. Time constraints for each time state if a class can be taught at that time.
  4. Person & Room constraints which state whether or not someone would be able to teach in that room.
  5. Person & Time constraints which lists a persons preferences or requirements for teaching at specific times.
  6. Room & Time constraints which state when certain rooms can be used.
  7. Person & Room & Time constraints which state whether a person can teach in a certain room at a certain time.
  8. Person constraints over all classes which specifies the normal course load for a teacher.
  9. Time constraints over all classes which states whether some classes should be taught at the same time.
  10. Person & Time constraints which say that a person can only teach one class at a time.
  11. Room & Time constraints over all classes which state that multiple classes can't be taught in the same room at the same time.

A constraints file was designed so that there was a table for every one of the constraints listed above. For example, for the Person & Time constraint, a table would consist of a list of Persons. With each Person would be a list of times, and a positive or negative fitness value which would be added to the schedule's overall fitness if these constraints were met.

Experiments

We designed four sets of tests to evaluate the performance of the Distributed Genetic Objects system in solving the scheduling problem. We chose four variables in the DGO which could be modified independent of each other. They were client size, pool size, cycles between transfer, and mutation rate. There were others that could have been chosen, such as the number of crossover points used in genetic recombination. Each of the tests was run on a lab of 23 Hewlett-Packard 712/60 NEXTSTEP workstations and also on a single Intel Pentium 100 NEXTSTEP system to compare the differences between an individual computer and a group of machines working in parallel on the problem.

A benchmark was performed to see how long it took for the two different systems to process a certain number of generations. The Intel system was found to be approximately two times faster than the HPs in performing this benchmark. One of the goals we had in designing the Scheduling GA was to make sure the fitness could be quickly calculated. This is one area which could still be improved. The Intel system could generate and check about 66 schedules per second, while the HP systems could do 33 schedules per second. The overall performance of the system could be improved by optimizing the fitness function and by fine-tuning the constraints tables.

Client population size

The client size experiment involved varying the number of schedules that each client processed per cycle. A large number of schedules per cycle would mean that each client would process a larger group of schedules making it more likely to find better schedules. A smaller number of schedules would mean that each cycle would be performed faster. A client could more quickly share it's results with the pool.

In the experiment eight different client sizes were tested ranging from 80 to 14,000 schedules. All of the different client sizes resulted in equally good schedules, but median sizes found the better schedules faster. With multiple clients 400 schedules per cycle was the fastest while on a single client where speed in sharing didn't matter, 2000 schedules per cycle resulted in the quickest solutions.

Pool size

The pool size experiment involved varying the number of schedules that are stored in the central pool in the Shared Pool model. These schedules in the pool represent only the fittest schedules from all the clients. A larger pool size would allow for more less-fit schedules to be kept and shared with other clients. A smaller pool size would cause a more restrictive set of schedules to be shared.

Tests were run varying the pool size from 4 to 250 schedules. The results showed that there was no disadvantage to using small pool sizes, but that larger pool sizes had a very detrimental effect both on speed and quality of the results. Figure 4 shows a portion of the results of three different pool sizes, and how large pools are more detrimental. The best pool size we found was 16 schedules. We theorize that large pool sizes are bad because retaining many genotypes in the pool dilutes the effect of having good genes spread quickly through the population. Another disadvantage to large pool sizes is the cost of communication between the pool and client. For every cycle performed by the client it sends and receives as many schedules as there are in the pool.

Figure 4 Figure 5
Figure 4: Varying Pool SizeFigure 5: Varying Mutation Rate

Cycles between transfers

This experiment studied the effect of having the clients work independently for shorter or longer periods of time (cycles of mating/evaluating) before transferring their best genes to the pool server. A generation in our multi machine system is the number of times that a single gene has been improved by mating. This is not comparable to the number of cycles of mating/evaluation a client does.

On an individual machine this change had nearly no effect, while on multiple machines, increasing the number of cycles per transfer resulted in a slight decrease in the speed of finding a good schedule. Increasing the cycles per transfer causes a client to wait longer before sharing it's genes with the pool and the rest of the clients. More sharing results in quicker solutions.

Mutation rate

The mutation value in the genetic algorithm is defined as a number between 0 and 231-1. Every time a byte is copied from a parent to a child, the mutation value would be compared to a random number within the same range. If the random number is smaller, then a random byte would be placed in the child, otherwise an actual copy would be perform. Therefore the probability of a byte being mutated, Pr{byte}, would be the mutation value divided by 231-1.

In the scheduling genotype, each class in the schedule is represented by three bytes. The probability that a class is mutated, Pr{class}, equals the probability that at least one byte in the class changed. This is represented by Equation 1. The probability that a schedule changes, Pr{sched}, equals the probability that at least one class in a schedule was changed as represented by Equation 2. The expected number of classes mutated for each reproduction, E[mutations], is shown in Equation 3. The number of classes in equations 2 and 3 is C.

Equation(1)
Equation(2)
Equation(3)

The mutation experiments tested eight mutation values from 0 to 1 billion. For comparing the results, we converted this value to the probability of at least one class being mutated during the reproduction of a schedule. A probability of 0.73 was found to be the optimum mutation rate for our experiments. With this mutation rate the expected number of classes mutated was 1.30. Very low and very high mutation rates gave terribly poor results. Figure 5 shows the results of three different mutation rates. The next lowest and hight mutation probabilities we tested were 0.12 and 0.99 with expected class mutations of 0.13 and 12.53 respectively.

Conclusion

In the introduction we described two problems we hoped to overcome by using a genetic algorithm to create classroom schedules. The first was the amount of time and menial labor that a person had to spend on each schedule. The second problem was the fact that schedules are often just modified from last semester, so it may not always match what people want. We also wanted to find the best parameters for the distributed scheduling genetic algorithm.

Our tests found that an optimum schedule based on the constraints could be found in a few hours, and most of this time would be computer time, freeing up valuable human time. Although some effort must be put into first creating the constraints database, modifying the constraints for a new semester is a much easier task than trying to change an existing schedule. The human schedule actually had a lower fitness value than the DGA schedule, but this just means that the DGA schedule more closely met the constraints we made up. We couldn't know what constraints were used for the actual schedule, because no one had ever listed all the constraints with their levels of importance.

Because a genetic algorithm works by starting with random solutions and building upon them to find the best solution, the schedule from one semester to the next will not always be so similar. The schedule will represent the best match to the constraints instead of just being a reworked version of last semester.

Our experiments have given us a set of parameters which work best for the schedule GA, and we can use these values to predict new parameters for different sets of constraints.

The next step in this project would be to gather all of the information necessary to design an accurate constraints file that would match the needs of each class and professor, and then to gather opinions about the quality of the found schedule and find out if everyone involved would be satisfied with the results.