|
IntroductionHexapawn is a simple 3-by-3 board game played with chess pawns. The game was originally invented by Martin Gardiner in his book "The Unexpected Hanging and other Mathematical Diversions: A Classical Collection of Puzzles and Games from Scientific American". Gardiner describes a strategy that utilizes 25 match boxes and 116 jelly beans which would allow a robotic device to "learn" appropriate strategies for beating human opponents. This article describes how to build this device without the match boxes or jelly beans, but using a simple strategy of teaching the machine to learn by allowing it to make random moves against an opponent. Over the course of multiple games, the machine builds a collection of possible games and applies punishments/rewards depending on the outcome of its early games against a human user. This a diversion from material that describes rather complex "artificial intelligence" systems. It is also indicative of simple non-rules-based systems that can be built within their own business applications. This is particularly useful when systems are built without a complete understanding of the business rules that define system requirements. An example is a reporting tool that identifies trends and abnormalities without having specific logic to look for those phenomena. Bottom-up programming???The bulk of software written today probably falls into a "top down" style of design where a programmer encodes all logical branches that might resemble a set of knowledge. If rules or strategies are required, they are written in logical statements that are fed into a program. This article delves into a style of program design that starts up without pre-ordained knowledge and allows the program to train itself using simple, straightforward techniques. If you have delved into artificial intelligence program design, you will seem some similarities, which make this approach a rudimentary neural network. We will start with a chess-like game that is coded in Java with very minimal "rules of the road." The object is to design an application that allows the computer to play intelligently without hard-coding winning strategies. It does this largely by a process of learning from its mistakes, remembering board positions and keeping a record of winning and losing moves in a game. A very simple game...Hexapawn is a 3-by-3 square board game played with chess pawns. The moves are exactly as in the game of chess and the object of the game is to advance to the last row or to prevent the opposing player from moving. The pawn advances forward and captures diagonally, just as in chess. There are a few additional rules: The game was originally invented by Martin Gardiner in his book "The Unexpected Hanging and other Mathematical Diversions: A Classical Collection of Puzzles and Games from Scientific American". Gardiner describes a strategy that utilizes 25 match boxes and 116 jelly beans which would allow a robotic device to "learn" appropriate strategies for beating human opponents. This article describes how to build this device without the match boxes or jelly beans, but using a simple strategy of teaching the machine to learn by allowing it to make random moves against an opponent. Over the course of multiple games, the machine builds a collection of possible games and applies punishments/rewards depending on the outcome of its early games against a human user. Jeffrey Satinover's "Quantum Brain" describes this as a Hexapawn Educable Robot -- a "device" that remember a series of 25 board states that are possible with this game and how a punishment-only scheme works better for the Hexapawn device. However, for the purpose of training the computer in as few games as possible, it is better to reward victories as well as losses and that is exactly what we do in this version. Critical to both our design and the Satinover/Gardiner design is that we separate the rules of the game from game stategy. Basically, the actual rules of play need to be encoded. Strategy should not be. Admittedly, this game is simple enough that a strategy layer could be very brief, but if more squares were added -- or if the games were as complex as , say, a Reversi (Othello) style game with 64 squares, this would not hold true. Game design...As Satinover points out, there needs to be a sharp distinction between basic game rules and strategy. To this end, this version of Hexapawn is designed to separate those responsibilities. There are three classes representing the brains: Hexapawn in action...The intelligence behind the game is the MoveAdvisor. While its role is to suggest possibilities, it only works when the MoveManager registers each move of a game. It does this when the MoveManager calls the getComputerMove() method with parameters that indicate the current state of the board and all possible moves for the computer within that state.
public MoveState getComputerMove(BoardState b, List moves){
List smartMoves = (List) boardStates.get(b.toString());
MoveState move;
if (smartMoves == null){ //if we haven't seen this boardState before
boardStates.put(b.toString(), moves);
return null;
} else {
move = (MoveState) smartMoves.get(0);
gameMoves.add(move);
return move ;
}
}
If the advisor has seen this BoardState previously, it will pull out the list of moves associated with it and return the first move in the list. The smartMoves list has been previously sorted by each MoveState's weight so that the best move is at index zero. However, if the advisor has never seen this BoardState before, it registers it and its moves, but has no intelligent suggestions for a move, so it returns null. In this situation, the MoveManager picks a random move from the moves available to it. It then calls MoveAdvisor's registerMoves() to track the move within the game. It is essential that the MoveState tracked is also referenced by the MoveAdvisor's internal boardStates' HashTable. In Jeffrey Satinover's design, each move represents a jelly bean (MoveState) that may or may not be returned to the BoardState (the matchbox described above...). In the Java version, you need to make sure that the MoveState reference within the games List and the boardState table are one and the same. However, no moves (jelly beans) are removed from the boardStates HashTable, because a losing move in one game may exhibit winning capabilities in another sequence. When the games eventually ends, the games List is emptied out for a new game. Before it is, the MoveStates within will have their weights adjusted, depending on whether the computer won or lost.
public void registerLoss(){
for (int i = 0; i < gameMoves.size(); i++){
((MoveState) gameMoves.get(i)).decrementWeight(1);
}
gameMoves = new ArrayList();// start another collection for a new game.
this.sortMoves();
this.displayMoveStates();
}
In addition, a call to the private sortMoves() is required to get the better weighted MoveStates near the top of the list.
public void sortMoves(){
Iterator it = boardStates.keySet().iterator();
while (it.hasNext()){
List moves = (List) boardStates.get(it.next());
//keep the better moves at the top
Collections.sort(moves,weightComparator);
}
}
As the game proceeds, the poorer moves remain well down in the list. Next steps in intelligent design...The MoveAdvisor as created in this version provides little intelligence other than remembering raw moves and weighting them according to their success or failure. When there is no weight available, it has no suggestion for a move. By adding a true intelligence layer, the game could always provide a "smart" move to fulfill the MoveManager requests. As currently designed, the MoveAdvisor chooses the first move in the Move collection. However, the collection may contain several moves with the same weight. In these cases, a MoveStrategy layer -- essentially a layer design to spot patterns in moves -- might select something other than the first move. If you study the game a little, you will see that an excellent strategy is to always capture a piece if you can. A strategy layer would discover this by observing that the most successful moves ie, 0-->4 , 2-->4, 1--3>, 1-->5 have an even numbered difference between the fromMove minus the toMove. However, to learn this a Strategy layer would require many more training games than would be practical for a human trainer. The current design could also be enhanced with the addition of extra "sensors". The MoveAdvisor is currently aware only of its move and whether these moves win or lose games. It does not have an awareness of its opponents moves and is unable to advise of a strategy where a move should be selected if it prevents its opponent from moving. Candid observations...If you have been used to the type of programming where you have to feed rules into an engine, you may find the training a little haphazard. It does take some time to train it sufficiently and,even then, it is possible for the computer to make some rather unintelligent moves. It can seem that it is more like a statistical intelligence, where it will succeed most of the time, but not always. For the simple game of Hexapawn, it does suffice however. Copyright (c)2004 Gervase Gallant gervasegallant@yahoo.com | |||||||||||||||||||