Intelligent code with Hexapawn

February 2004
Game
Your browser is completely ignoring the <APPLET> tag!

download source

Board layout
0 1 2
3 4 5
6 7 8
Move notation
As you can see from the grid, the move notation resembles Java array index references. The MoveState.toString() method displays 0-->3 , indicating a move from array index 0 to 3. For this article, the moves are noted as B0-->3, indicating a computer (black) move.
BoardState notation
The article and Java code rely on each move being linked to a BoardState object. To reduce overhead, the BoardState's data is indicated by a char array "BBBNNNWWW" -- the initial position -- and indexed as in the grid above.
Training
To get an indication of how the computer learns, you can watch its progress without spending hours training it.

You can achieve this by starting every game with the same move. For example, W7-->4 will require the computer to make one of 4 possible moves. Because the available moves are chosen not randomly, but with a stack-type access, the computer will initially respond with B0-->3. It will keep responding this way until it discovers that this is a losing move. Then it will choose from the other 3 possible moves.

Once the computer begins winning games, it will always respond in the same way. For example, a good response to W7-->4 is B0-->4. When it start winning games, this move becomes weighted so that it always appears at the top of the stack.

Introduction

Hexapawn is a simple 3-by-3 board game played with chess pawns. The game was originally invented by Martin Gardiner in his book "The Unexpected Hanging and other Mathematical Diversions: A Classical Collection of Puzzles and Games from Scientific American". Gardiner describes a strategy that utilizes 25 match boxes and 116 jelly beans which would allow a robotic device to "learn" appropriate strategies for beating human opponents.

This article describes how to build this device without the match boxes or jelly beans, but using a simple strategy of teaching the machine to learn by allowing it to make random moves against an opponent. Over the course of multiple games, the machine builds a collection of possible games and applies punishments/rewards depending on the outcome of its early games against a human user.

This a diversion from material that describes rather complex "artificial intelligence" systems. It is also indicative of simple non-rules-based systems that can be built within their own business applications. This is particularly useful when systems are built without a complete understanding of the business rules that define system requirements. An example is a reporting tool that identifies trends and abnormalities without having specific logic to look for those phenomena.

Bottom-up programming???

The bulk of software written today probably falls into a "top down" style of design where a programmer encodes all logical branches that might resemble a set of knowledge. If rules or strategies are required, they are written in logical statements that are fed into a program.

This article delves into a style of program design that starts up without pre-ordained knowledge and allows the program to train itself using simple, straightforward techniques. If you have delved into artificial intelligence program design, you will seem some similarities, which make this approach a rudimentary neural network.

We will start with a chess-like game that is coded in Java with very minimal "rules of the road." The object is to design an application that allows the computer to play intelligently without hard-coding winning strategies. It does this largely by a process of learning from its mistakes, remembering board positions and keeping a record of winning and losing moves in a game.

A very simple game...

Hexapawn is a 3-by-3 square board game played with chess pawns. The moves are exactly as in the game of chess and the object of the game is to advance to the last row or to prevent the opposing player from moving. The pawn advances forward and captures diagonally, just as in chess. There are a few additional rules:

  • there is no "en passant" move
  • white moves first
  • the computer plays the black pieces
  • the game is won when one side advances to the last row or when the opposition has nowhere to go.

    The game was originally invented by Martin Gardiner in his book "The Unexpected Hanging and other Mathematical Diversions: A Classical Collection of Puzzles and Games from Scientific American". Gardiner describes a strategy that utilizes 25 match boxes and 116 jelly beans which would allow a robotic device to "learn" appropriate strategies for beating human opponents.

    This article describes how to build this device without the match boxes or jelly beans, but using a simple strategy of teaching the machine to learn by allowing it to make random moves against an opponent. Over the course of multiple games, the machine builds a collection of possible games and applies punishments/rewards depending on the outcome of its early games against a human user.

    Jeffrey Satinover's "Quantum Brain" describes this as a Hexapawn Educable Robot -- a "device" that remember a series of 25 board states that are possible with this game and how a punishment-only scheme works better for the Hexapawn device. However, for the purpose of training the computer in as few games as possible, it is better to reward victories as well as losses and that is exactly what we do in this version.

    Critical to both our design and the Satinover/Gardiner design is that we separate the rules of the game from game stategy. Basically, the actual rules of play need to be encoded. Strategy should not be. Admittedly, this game is simple enough that a strategy layer could be very brief, but if more squares were added -- or if the games were as complex as , say, a Reversi (Othello) style game with 64 squares, this would not hold true.

    Game design...

    As Satinover points out, there needs to be a sharp distinction between basic game rules and strategy. To this end, this version of Hexapawn is designed to separate those responsibilities. There are three classes representing the brains:

  • Game Manager acts as a controller for the GUI. It sets up the pieces initially and knows how to tell the computer to start thinking when a human (white) move has been registered.
  • Move Manager acts as a "rules of the road" layer. Its responsibility is capturing all possible moves for either human or computer moves. It requests "suggestions" from another layer, but when these are not forthcoming, it will respond with a random move.
  • Move Advisor houses the computer intelligence layer. Actually, there's not much intelligence in it, but it does have a long memory. The idea is that is should remember all game states that it encounters as well as the moves that are possible for the computer in that boardstate. MoveAdvisor also remembers a game as it is played. At the end of a game, it is notified by the MoveManager of the victory or loss and it responds by incrementing or decrementing a value within each remembered game move.

    BoardState, which encodes the various board positions in a simple string. The key is B=Black, W=White, N=None for each position on the nine square board. A string like "BBBNWNWNW" is sufficient to represent a board where white has made an initial move 4-->7. The structure of BoardState may resemble some DNA-like state conditions in genetic programming, but the design is purely for convenience. You need a class that is lightweight enough to hold many more than the 25 possibilities if Hexapawn were to evolve to a 64-square board with 32 pieces like Chess.
  • MoveState, a simple structure representing an index for a toMove and a fromMove. Additionally, it holds "weight" data, which is what affected by each victory or loss.

    Hexapawn in action...

    The intelligence behind the game is the MoveAdvisor. While its role is to suggest possibilities, it only works when the MoveManager registers each move of a game. It does this when the MoveManager calls the getComputerMove() method with parameters that indicate the current state of the board and all possible moves for the computer within that state.

    public MoveState getComputerMove(BoardState b, List moves){
    				
    		List smartMoves = (List) boardStates.get(b.toString());
    		MoveState move;
    		if (smartMoves == null){ //if we haven't seen this boardState before
    			
    			boardStates.put(b.toString(), moves);
    			return null;
    		} else {
    			
    			move = (MoveState) smartMoves.get(0);
    			gameMoves.add(move);
    			
    			return move ; 
    		}
    	}
    

    If the advisor has seen this BoardState previously, it will pull out the list of moves associated with it and return the first move in the list. The smartMoves list has been previously sorted by each MoveState's weight so that the best move is at index zero.

    However, if the advisor has never seen this BoardState before, it registers it and its moves, but has no intelligent suggestions for a move, so it returns null.

    In this situation, the MoveManager picks a random move from the moves available to it. It then calls MoveAdvisor's registerMoves() to track the move within the game. It is essential that the MoveState tracked is also referenced by the MoveAdvisor's internal boardStates' HashTable. In Jeffrey Satinover's design, each move represents a jelly bean (MoveState) that may or may not be returned to the BoardState (the matchbox described above...). In the Java version, you need to make sure that the MoveState reference within the games List and the boardState table are one and the same. However, no moves (jelly beans) are removed from the boardStates HashTable, because a losing move in one game may exhibit winning capabilities in another sequence.

    When the games eventually ends, the games List is emptied out for a new game. Before it is, the MoveStates within will have their weights adjusted, depending on whether the computer won or lost.

    public void registerLoss(){
    
    		
    		for (int i = 0; i < gameMoves.size(); i++){
    				((MoveState) gameMoves.get(i)).decrementWeight(1);
    				
    		}
    		
    		gameMoves = new ArrayList();// start another collection for a new game.
    		this.sortMoves();
    		this.displayMoveStates();
    		
    	}
    

    In addition, a call to the private sortMoves() is required to get the better weighted MoveStates near the top of the list.

    	public void sortMoves(){
    		
    		Iterator it = boardStates.keySet().iterator();
    		
    		while (it.hasNext()){
    			List moves = (List) boardStates.get(it.next());
    			//keep the better moves at the top
    			Collections.sort(moves,weightComparator);	
    			
    		}
    	}
    

    As the game proceeds, the poorer moves remain well down in the list.

    Next steps in intelligent design...

    The MoveAdvisor as created in this version provides little intelligence other than remembering raw moves and weighting them according to their success or failure. When there is no weight available, it has no suggestion for a move. By adding a true intelligence layer, the game could always provide a "smart" move to fulfill the MoveManager requests.

    As currently designed, the MoveAdvisor chooses the first move in the Move collection. However, the collection may contain several moves with the same weight. In these cases, a MoveStrategy layer -- essentially a layer design to spot patterns in moves -- might select something other than the first move. If you study the game a little, you will see that an excellent strategy is to always capture a piece if you can. A strategy layer would discover this by observing that the most successful moves ie, 0-->4 , 2-->4, 1--3>, 1-->5 have an even numbered difference between the fromMove minus the toMove. However, to learn this a Strategy layer would require many more training games than would be practical for a human trainer.

    The current design could also be enhanced with the addition of extra "sensors". The MoveAdvisor is currently aware only of its move and whether these moves win or lose games. It does not have an awareness of its opponents moves and is unable to advise of a strategy where a move should be selected if it prevents its opponent from moving.

    Candid observations...

    If you have been used to the type of programming where you have to feed rules into an engine, you may find the training a little haphazard. It does take some time to train it sufficiently and,even then, it is possible for the computer to make some rather unintelligent moves. It can seem that it is more like a statistical intelligence, where it will succeed most of the time, but not always. For the simple game of Hexapawn, it does suffice however.


    Copyright (c)2004 Gervase Gallant gervasegallant@yahoo.com