Chapter 3 - A Specification for an Evaluation Function

Return to Previous Chapter: Chapter 2 - An Explanation of the Proposed Heuristic

Continue to Next Chapter: Chapter 4 - A Refinement of the Search Process

Everything is vague to a degree you do not realize till you have tried to make it precise. - Bertrand Russell

The goal is to transform data into information, and information into insight - Carly Fiorina, American business executive, best known as former CEO (1999–2005) and Chairman of the Board (2000–2005) of Hewlett-Packard

Everything that can be counted doesn't necessarily count; everything that counts can't necessarily be counted. - Albert Einstein

The indispensable first step to getting the things you want out of life is this: decide what you want. - Ben Stein

It's not hard to make decisions when you know what your values are.- Roy Disney

Planning is bringing the future into the present so that you can do something about it now. - Alan Lakein

It is far better to foresee even without certainty than not to foresee at all. - Henri Poincare

All human situations have their inconveniences. We feel those of the present but neither see nor feel those of the future; and hence we often make troublesome changes without amendment, and frequently for the worse. - Benjamin Franklin

Whether your objectives have been given to you by your boss or you're creating objectives, make sure they are SMART—specific, measurable, achievable, realistic, and time bound.

Until you can measure something and express it in numbers, you have only the beginning of understanding. - Lord Kelvin

If you want it, measure it. If you can't measure it, forget it. - Peter Drucker

3. A Specification for an Evaluation Function which Uses this Heuristic

"The main task of an intelligent system is to intelligently determine what options will be looked at... the building blocks for this are grouping, focusing attention, and searching" - van Wezel, Jorna, Meystel, Planning in Intelligent Systems, p.529

"the key to building practical intelligent systems lies in understanding how to focus attention on what is important and ignore what is irrelevant. Attention is a mechanism for allocating sensors and focusing computational resources on particular regions of time and space. Attention allows computational resources to be dedicated to processing input that is most relevant to behavioral goals. Focusing of attention can be accomplished by directing sensors toward regions in space classified as important and by masking, windowing, and filtering out input from sensors or regions in space classified as unimportant." - James Albus, Engineering of Mind, p.80

"The brain uses vast amounts of memory to create a model of the world. Everything you know and have learned is stored in this model. The brain uses this memory-based model to make continuous predictions of future events. It is the ability to make predictions about the future that is the crux of intelligence." - Jeff Hawkins, On Intelligence, p.6

"Yet it is my contention that the two, abstract planning and concrete calculation, are really closely tied together; that in one as in the other the most important thing is the ability to focus one's attention on the few relevant moves, and dispose of the rest quickly and confidently." - Julio Kaplan, The ABC's of chess, Chess Life and Review, July 1978

What does the evaluation function look like for the proposed heuristic? The following specification is as detailed as I can determine at this point in time:

Version 1.4, 30 August 2007

1. For each terminal position we evaluate in our search tree, we generate unrestricted "future mobility" maps for each piece of the moves it can make 1, 2, and 3 moves into the future. We consider moving pieces out of the way or capturing pieces in order to accurately calculate this "future mobility" (See Appendix E: Example Number 1) where we work out the contents of the mobility tables for three pieces in a sample middle-game position).

2. We award each piece a bonus based on how much territory it covers on the board (and how much damage it can inflict or defensive support it can provide) in 3 moves. This could be in lieu of any fixed number like 300 for Bishop, 500 for rook, etc. Our bonus for mobility in itself will be small - it is the mobility that leads to positional pressure on another piece that we are primarily interested in.

3. We give a piece an offensive score based on the number and type of enemy pieces we can attack in 3 moves. We give a piece a defensive score based on how many of our own pieces it can move to defend in 3 moves. Again, this bonus is reduced for each move it takes to potentially defend such piece. Again this information is derived from the "future mobility" move maps we just calculated. Extra points can be given for weak or undefended pieces that we can threaten.

4. The proposed heuristic determines also king safety from these “future mobility” move maps. We penalize our king if our opponent can move pieces into the 9-square template around our king within a 3 move window. The penalty is larger if the piece can make it there in 1 or 2 moves, or if the piece is a queen or rook. We penalize our king if multiple enemy pieces can attack the same square near our king. Our king is free to move to the center of the board - as long as the enemy cannot mount an attack. The incentive to castle our king will not be a fixed value, such as a quarter pawn for castling, but rather the reduction obtained in the enemy's ability to move pieces near our king (the rook involved in the castling maneuver will likely see increased mobility after castling is performed) The king will come out of hiding "naturally" when the number of pieces on the board is reduced and the enemy does not have the potential to move these reduced number of pieces near our king. We are likewise free to advance the pawns protecting our king, again as long as the enemy cannot mount an attack on the monarch. The potential ability of our opponent to mount an attack on our king is the heuristic we use as the basis for king safety. Optionally, we will consider realistic restrictions that our own pieces can make to our opponent's ability to move pieces near our king, as discussed in point 6 below. For "King mobility" in the middle game, we can reward our King for a potential to move to a safer square. For example, if our King has the potential to castle in the future, we would give it a bonus based on its ability to move to a safer square, and how safe that square actually is.

5. Pawns are rewarded based on their chance to reach the last rank, and what they can do (pieces attacked and defended in 3 moves, whether or not they are blocked or movable). The piece mobility tables we generate should help us identify pawns that cannot be defended by other pawns, or other pieces - it is this weakness that we should penalize. Doubled or isolated pawns that cannot be potentially attacked or blockaded by our opponent should not be penalized. Pawns can be awarded a bonus based on the future mobility and offensive/ defensive potential of a queen that would result if it made it to the back rank, and of course this bonus is reduced by each move it would take the pawn to get there.

6. The diagonal squares in front of pawns are unlikely locations for enemy pieces (because the pawn could capture them), and so we should reduce the mobility bonus for enemy pieces that trace their mobility through stopping points on these particular squares. [In fact, a more ambitious approach will reduce the mobility bonus for all pieces when they trace mobility through squares attacked by lesser-value pieces. This concept will result in a more positional style of play.]

Consider now the starting position where our king-side knight has moved to square f3. Let's look at the piece mobility tables - an 'x' represents a potential move.

Figure 3.1

1-move map

Figure 3.2

2-move map

Figure 3.3

3-move map

Figure 3.4

As mentioned previously, before we trace mobility through friendly pieces we must first check and then spend 1 move moving the friendly piece out of the way.

The squares in figure 3.4 above marked with a circle represent potential moves where our knight has to first stop on a square attacked by an enemy pawn. The enemy pawn represents a limiting factor to our piece as it attempts to accomplish objectives. From experience, it is unlikely that the knight can move directly to this square without being captured, at this present point in time. But our knight might be able to move to the square in question if our opponent is under pressure and has to move a piece or pieces out of the way. We would therefore give our knight a reduced bonus for future mobility to these kinds of squares even though it is unlikely at present that the piece can move directly to this square (due to limiting factors). The positional pressure that our knight can exert on these kinds of squares (and any pieces located on them) is real but small in quantity, and should be accounted for in our evaluation function.

However we decide to model (and therefore estimate) the positional pressure of our pieces, we follow a two-step (or possibly a three-step) process:

1. we calculate the unrestricted future mobility of each chess piece 3 moves into the future (by updating each piece mobility database from the previous position), then

2. we estimate the realistic operating range/ level of engagement by determining the limiting factors that bound the unrestricted mobility [see Systems Engineering and Analysis, 4th Edition, by Blanchard and Fabrycky p.162]. We reduce the bonus for accomplishing objectives if the required moves can only be traced through squares that are likely to result in the piece being captured before it can accomplish its objective (such as, attacking an enemy piece or defending a friendly piece).

[3. additionally, we may wish to target the limiting factors themselves (the pieces which restrict the range of our pieces) either directly or indirectly, for example by overloading them.]

At a minimum, we reduce the bonus given to pieces for future mobility traced through squares attacked by enemy pawns, and through squares attacked by lower-valued enemy pieces. We may use another scheme for determining mobility reduction for piece movement through squares attacked both by friendly and enemy pieces (see Appendix G: Tracing Mobility through Squares Attacked by both Friendly and Enemy Pieces for a discussion of this subject).

We might be bold enough to use Botvinnik's notation and label the black pieces under the circles in Figure 3.4 as "type IV" pieces - the white knight can theoretically trace mobility to these squares, but limiting factors prevent the knight from moving to the square in a way that the piece would not be captured. The 7 black pawns under direct attack from the knight in Figure 3.3 (marked with an 'x') we might classify as "type III" pieces since the white knight can realistically trace mobility to these squares (no limiting factors present). If there were any black pieces directly capturable by the white knight, we would call these "type II" pieces (they would be on the squares labeled with an "x" in figure 3.2). Pieces already captured and removed from the board are classified as type I pieces. Pieces therefore are placed into category IV as they fall into distant range of the enemy pieces, become category III as the enemy can directly target them without opposition, become category II as they now become directly capturable, and end life as category I when they sit in the box after they are captured. The positional evaluation function classifies pieces in these 4 categories and monitors the pieces as they move back and forth from these categories as the game progresses.

Notice that we did NOT give any piece a bonus for *sitting on a square* like a traditional chess program. We reward each piece for its predicted ability to accomplish strategic objectives, exert positional pressure, and restrict the mobility of enemy pieces, based on the current set of pieces on the chess board at the time we are calling our evaluation function. This means that many of the pieces will have moved from their current positions and have a different set pieces in their line of sight.

Here is the bonus that the traditional chess program "crafty" (by respected chess program author and academic Dr. Robert (Bob) Hyatt) gives rooks and queens for "sitting on squares". Imagine a chessboard superimposed over these tables. Note that this bonus is awarded regardless of the location of other pieces on the board. There is nothing wrong with this approach - many successful chess programs (including the one by Dr. Hyatt) have been built using this method - it is just a different way to estimate the positional pressure produced by the chess pieces:

int rval_w[64] =
-3, -3, 1, 3, 3, 1, -3, -3,
-3, -3, 1, 3, 3, 1, -3, -3,
-3, -3, 1, 3, 3, 1, -3, -3,
-3, -3, 1, 3, 3, 1, -3, -3,
-3, -3, 1, 3, 3, 1, -3, -3,
-3, -3, 1, 3, 3, 1, -3, -3,
-3, -3, 1, 3, 3, 1, -3, -3,
-3, -3, 1, 3, 3, 1, -3, -3
;

int qval_w[64] =
-10, -8, 0, 0, 0, 0, -8,-10,
-10, 2, 8, 8, 8, 8, 2,-10,
-10,-10, 4, 10, 10, 4,-10,-10,
-10, 2, 10, 12, 12, 10, 2,-10,
0, 2, 10, 12, 12, 10, 2, 0,
0, 2, 4, 10, 10, 4, 2, 0,
-15, 0, 4, 5, 5, 4, 0,-15,
0, 0, 0, 0, 0, 0, 0, 0
;

Continued Explanation of the Proposed Heuristic

If we do things right, the information present in the 1st, 2nd and 3rd order mobility maps that we have generated allow us to estimate the positional pressure produced by the chess pieces. This pressure is measured by taking into account the presence and abilities of the other pieces in the current position on the board. We create an agile model of the future by calculating "realistic mobility" of the pieces (unrestricted mobility modified by restrictions placed by enemy pieces) and then measure the ability of the pieces to realistically pressure the opponent or support friendly pieces. In this way we bring the future into the present (a definition of planning) so we can 1. fill our search tree with positions that matter and 2. reward pieces for the positional pressure they create at our endpoint evaluation.

From these calculations we can make a reasonably accurate estimate of the "winnability" of a position, or determine if compensation exists from sacrificing a piece. This evaluation 'score' also helps "steer" the search process, as the positional score is also a measure of how "interesting" the position is and helps us determine the positions we would like to search first. I would point the interested reader to the description of the evaluation function for the 1970's era Northwestern University chess program Chess 4.5, found on page 93-100 in Peter W. Frey's Chess Skill in Man and Machine, for comparison.

The technical difficulty involved in the proposed heuristic is in the “counting” exercise that goes into keeping the "future mobility" move maps up to date. To save time, we can update our mobility database from the previous position as we move a piece. Most of the 'unrestricted mobility' information for each piece will remain unchanged, or can be moved around with a programming structure called a "pointer" as we take into account the new potential moves that result when a piece moves on the chessboard. This process, outlined in example 1 in the Appendix, is not simple to implement. The proposed evaluation function will spend much of its time updating the piece mobility database, and then counting the 'cues' present in the position that determine how desirable or interesting the position is (and worth further searching). The evaluation function will return a score that represents the estimated winnability of the position, based on the material, positional pressure, king safety, mobility, offensive and defensive potential and other factors that contribute positively to measured performance.

Testing and Calibration

We can measure this performance by setting up automated tournaments of games with other chess programs, such as a tournament of a few hundred games with a time control of 1 minute per game plus 1 second per move. This fast and convenient method is a suggested starting point for test and evaluation. Extensive testing by the CEGT group (Chess Engines Grand Tournament) at much longer time controls (see CEGT web page http://www.husvankempen.de/nunn/) indicates that tournaments using longer time controls do not usually produce performance numbers that differ significantly.

Another Method for Testing

Alternatively, we can use another method. If we base our evaluation function on a winning percentage rather than the traditional "centipawns" used by most programs, we can calibrate our evaluation function against known positions from opening theory. If the evaluation must be performed in centipawns, we can convert to winning percentage using this table (posted on rybkaforum.net by V. Rajlich, modified to represent pawns as 1.00):

% to win advantage in pawns
----------------------------------

50% -> .00
55% -> .22
60% -> .43
65% -> .64
70% -> .86
75% -> 1.07
80% -> 1.29
85% -> 1.51
90% -> 1.89
95% -> 2.41

Table 3.1

For example, let's look at the following position from opening theory, reached after the move sequence 1. d4 Nf6 2. c4 c5 3. d5 b5 4. cxb5 a6 5. bxa6 g6 6. Nc3 Bxa6 7. Nf3 d6 8. e4 Bxf1 9. Kxf1 Bg7

Position A

White to play, after 9...Bg7

If we look at a database of current master and grandmaster games that have reached this position such as the Rybka II opening book, (opening explorer from chessgames.com or similar device from chesslab.com can also be used) we can see that white has won 47.3 percent of the time, drawn 27.3 percent of the time and lost 25.4 percent of the time. This equals a winning percentage of 60.9 percent. Our "evaluation" of this position should be 60.9 percent, or close to it. Using Table 3.1 above and interpolating, this winning percentage corresponds to a centipawn score of 0.468. [Chesslab reports these numbers as 39 percent white win 34 percent drawn and 27 black win for a winning percent of 56%. Keep in mind that new games are always being added to both databases so these numbers might change.] If we select 200 positions from opening theory that are generally empty of tactical possibilities, we can store the expected winning percentage from current practice, and determine our program's estimated winning percentage (the output of its evaluation function). By the way, the Rybka 2.3.2a evaluation of the position in the above diagram (Position A) is +0.49 after 26-ply analysis with the continuation 10.g3 0-0 11.Kg2 Nbd7 12.a4 Qa5 13.Bd2 Qa6 14.Re1 c4.

Using this procedure, we can calibrate our program's evaluation function with results from current master-level practice. For each position from opening theory we evaluate, we subtract our program's evaluation from the actual winning chances from master level play and generate an error function. From this information we can calculate a mean and a standard deviation for this error function. We can look at positions that for some reason do not score well (perhaps there are deeply hidden tactical opportunities) and eliminate these positions from consideration. We can then make changes to our evaluation function and immediately judge the accuracy of those changes. We can even determine the weighting of the various parameters in our evaluation function by randomly bumping up or down the various parameters and seeing if we have a more accurate estimate of the winning chances of the various test positions we have obtained from opening theory.

When we are confident that our evaluation function correlates with the winning chances of master level games from opening theory, we can construct automated tournaments against chess programs of known playing strength to evaluate how well our program does in simulated game conditions, especially where the slowness of our evaluation function will hurt the tactical ability our our program's performance. Knowledge can be added to our evaluation function, but only when the tournament performance of our program under game conditions is improved.

Continuing the explanation

If a chess expert decides to sacrifice a pawn in a chess game, he or she hopes that the present and future positional pressure of the sacrificed pawn is exceeded by the increase in positional pressure of his or her remaining pieces. It is the positional pressure of our pieces, and the potential ability to attack or defend strategic targets, that compensates for the loss of “material”. We are using a heuristic to estimate this positional pressure, much the same way a human chess player does.

Now that we have precisely defined (by means of our proposed rule-of-thumb called a heuristic) what kind of positions we are looking for, the search portion of our chess program can try out different moves. The engine "knows" how to evaluate the positions, and will focus along lines that are interesting, and perhaps might even play a game of chess that looks reasonably human.

The proposed heuristic (ideally) obtains a more realistic evaluation of the winning chances (of a given position) by extracting information from the distance and future mobility of the pieces on the board instead of looking up numbers in tables. The machine is spending time searching for ways to increase the real and measured positional pressure on the opponent, not for ways to move its pieces to the center of the board. There is a difference - perhaps insignificant when playing a beginner, but significant enough when playing someone capable of playing a strong positional game of chess. Note that the bonus we give our knight to move to square f3 (from the starting position at g1) is based on the positional pressure it is capable of creating in the present position - offensive and defensive.

Note also that there are many ways to reward pieces based on the distant pressure they are capable of creating - one would have to build a chess program and run tests to determine which method has the strongest performance.

In summary, we have created a model of positional pressure in our evaluation function. The proposed method is certainly not the only way to do this. There is no doubt in my mind that someone will find a better way to model this. The question we should ask is: do we find the model useful for focusing the attention of our search efforts? If not, how might we change the model? We might begin to think of an upper and lower bound for positional pressure. An upper bound might consider what the pieces can accomplish using unrestricted mobility 3 moves into the future. A lower bound might consider what the pieces can accomplish considering only mobility through squares not attacked by an enemy piece. We also might consider a middle ground, somewhere in between these two extremes, where we consider placing realistic restrictions on the unrestricted piece mobility before awarding points for things the pieces can (potentially) accomplish. There exist objectives and limiting factors, as described by Systems Engineering and Analysis, 4th Edition, by Blanchard and Fabrycky, 1999, p.162-163. Additionally, the decision protocol of Orasanu and Connolly (1993) includes the identification of the constraints, resources, and goals facing the decision maker.

Michalewicz and Fogel in How to Solve It: Modern Heuristics have this to say about using models to solve problems:

p.31"Whenever you solve a real-world problem, you have to create a model of that problem first. It is critical to make the distinction that the model that you work with isn't the same as the problem. Every model leaves something out. It has to - otherwise it would be as complicated and unwieldy as the real-world itself. We always work with simplifications of how things really are. We have to accept that. Every solution we create is, to be precise, a solution only to the model that we postulate as being a useful representation of some real-world setting that we want to capture. The trouble with models is that every one of them has an associated set of assumptions. The most common mistake when dealing with a model is to forget the assumptions."

Starfield, Smith and Bleloch in How to Model It emphasize that problem solving and thinking revolve around the model we have created of the process under study:

p.21"The point we want to make is that thinking consciously and explicitly about models is a crucial part of problem solving in general."

The model we have created for positional pressure in our evaluation function allows our machine to intelligently focus the search efforts on likely moves, and by proper ordering of our moves, to produce faster cut-offs in our alpha-beta pruning. We desire a proper balance between an anticipatory and a reactive planning strategy. From van Wezel, Planning in Intelligent Systems:

p.65"At the symbolic level, anticipation is elaborating a representation of the future (forcasting)... expert process operators have more planning ability than beginners because they anticipate more"

p.96"Anticipation is costly and limited by time constraints. It can be a handicap when it goes beyond actual competency, so that reactive strategies are necessary. The focus on planning strategies should not let us forget the adaptive power of reactive strategies, especially when time constraints are high or process knowledge is weak. The main problem to solve is finding an efficient compromise between anticipative and reactive strategies to reach a satisfying performance level."

We might actually use two (or more) evaluation functions. Function 1 might be a general purpose algorithm that will be faster in operation, but might not be completely accurate. If our score for the position in question (using this faster method) is somewhat close to being the best move, we might wish to re-score our position with a more accurate evaluation function, or function 2. Perhaps we use the unrestricted mobility of the chess pieces as our function 1, and we modify this algorithm as proposed in point 6. above for function 2. We might use yet other evaluation functions to resolve situations where safe piece mobility through certain squares cannot be accurately determined (see Appendix G), or use search extensions to handle these cases.

Now that we have told our machine how to score each position, we need to search for the moves that lead to the best playable continuation.

Continue to next section

Chapter 4 - A Refinement of the Search Process

Enter supporting content here