Alpha-Beta Pruning


Alpha-Beta Pruning is a method by which to speed up a Minimax Algorithm.  This algorithm works by pruning away the branches of a search that cannot possibly influence the outcome of the final decision.  The Alpha-Beta Pruning algorithm has no effect on the outcome of the Minimax Algorithm.

 

More specifically, the way the pruning works is for every node n, if a player had a better choice at the parent of n or at any choice point further up, then n will never be reached in actual play.  So when enough information about n is determined by examining its decedents, it is pruned.

 

Alpha-Beta Pruning gets its name from the following two terms:

 

alpha (α) = the value of the best choice at any node for the MAX algorithm

beta (β) = the value of the best choice at any node for the MIN algorithm

 

These terms are passed along in the code as a means of bookkeeping.

 

Example 1:

As an example, consider the following tree:

 

The triangles pointed up represent maximizing nodes.  The triangles pointing down represent minimizing nodes. The [-inf, +inf] are place holders for the alpha-beta bookkeeping.  Alpha is initialized to negative infinity (-inf).  Beta is initialized to positive infinity (+ inf).

When performing the Alpha-Beta Pruning with a Minimax Algorithm, the following steps will be performed:

Step 1:  Node E will be explored.  From node E, it can be seen that the worst that the minimize agent B and the maximize node A can do is 8.

Step 2: The minimax algorithm will continue to the next node, F.  The minimizing node will then prefer 2 over 8. Thus, the worst the maximizing node A can do now is 2

Step 3: In the next step node G is revealed to have a value of 1.  This is the last child of B, and its best option.  Therefore, the bookkeeping numbers by B have both changed to 1.  Again, the maximizing node A’s worst choice has changed to the minimizing node B’s best choice, 1.

Step 4: As the minimax algorithm continues, node H is explored.  As H has a value of 7, and Node C has no smaller option, nothing can be pruned.  Furthermore, 7 becomes C’s worst possible choice, and A’s new worst choice.

Step 5:  Node I is explored to reveal a better option for node C.

 

Step 6:  Node J is revealed to hold an option for C that it will not take.  While node J is not a better option to the minimizing node C, node J could not be pruned as nodes H and I both help options for the maximizing node A than what the minimizing node B provided.

 

Step 7:  Node K is revealed to hold a value of 2.  This value is smaller than what node C has, 6.  This means that nodes L and M cannot possible influence node A and are subsequently pruned.  This is because D is a minimizing node, and L and M would not change D unless they were smaller than 2.  But D will not influence A unless K had been a number larger than 6.

 

Example 2:

Consider the following example:

Step 1: Explore Node E

Step 2: Continue to explore node E.  This allows for the worst case of 2 for Nodes B and A

Step 3: Explore Node F.  Because the first part of Node F is greater than 2, there is no need to continue exploring.  This is because no matter what the next part of the node is, the minimizing Node B will always pick E before it will pick F.

Step 4: Explore the first part of G and prune the second.  The reasons for pruning the second part of G are the same as why the second part of F was pruned in step 4.

Step 5:  Explore node H. The Alpha of 2 is passed along because 2 is currently the best that A can do.  If C found a value equal to or less than 2, than it can stop exploring because it will not be able to find any other value that will affect A.

Step 6:  Continue to explore node H. 10 becomes the best that H can do, and the worst that C can do.

Step 7: Continue exploring.  As Node H contained values that influences A to chose a path other than B, C continues to explore.

Step 8: Continue exploring to Node J.  Node C is still trying to find something better, a smaller value, than 10.

Step 9: Continue exploring.  The worst choice that A now has will have a utility of 5.

Step 10: Continue exploring.  As the value of 5 has become the worst that A can choose, the alpha value changes to 5.  5 is also now the value to match or beat.  If D, a minimizing node, can find a value that is equal or less than 5, then it can stop exploring.

Step 11:  Because K is a maximizing node, it will continue to explore after it has found the 2.  Once its children have been explored, and the best that it can do is to choose a value of 3, all other nodes can be pruned.  This is because no matter what the values of the other nodes are, the minimizing node is not going to pick a value that is greater than 3.  And in order for the maximizing node A to pick a path other than picking Node C, D would have to pick a value greater than 3.

 

Psuedocode

One of the great advantages of Alpha-Beta Pruning is the ease of implementation.  Alpha-Beta Pruning can be implemented with only a few extra lines of code to a Minimax Agent.  Please see the following Psuedocode, also found on page 170 of Artificial Intelligence: a Modern Approach (Second Edition) by Stuart Russell and Peter Norvig. Prentice Hall, 2002.

 

 

function ALPHA-BETA-SEARCH(state) returns an action

            inputs: state, current state in game

v← MAX-VALUE(state, -inf, +inf)

return the action in SUCCESSORS(state) with value v

 

 

function MAX-VALUE(state, α, β) returns a utility value

inputs: state, current state in game

α, the value of the best alternative for MAX along the path to state

β, the value of the best alternative for MIN along the path to state

if TERMINAL-TEST(state) then return UTILITY(state)

v ← -inf

for a, s in SUCCESSORS(state) do

v ← MAX(v, MIN-VALUE(s, α, β))

if v ≥ β then return v

α ← MAX(α, v)

return v

 

 

 

 

 

function MIN-VALUE(state, α, β) returns a utility value

inputs: state, current state in game

α, the value of the best alternative for MAX along the path to state

β, the value of the best alternative for MIN along the path to state

if TERMINAL-TEST(state) then return UTILITY(state)

v ← +inf

for a, s in SUCCESSORS(state) do

v ← MIN(v, MAX-VALUE(s, α, β))

if v ≤ α then return v

β ← MIN(β, v)

return v

 

 

 

Effectiveness

The effectiveness of Alpha-Beta Pruning depends upon the ordering of the nodes to explore.  An example of this can be seen in step 11 of the second example presented here.  If the Nodes K and L were switch, the algorithm would not have been able to prune as much as it did.  With optimal ordering, Alpha-Beta Pruning can take the Minimax Algorithm runs in O(b^m) time to O(b^(m/2)) time.

 

More Information

For more information please see Chapter 6 section 2 of Artificial Intelligence: a Modern Approach (Second Edition) by Stuart Russell and Peter Norvig. Prentice Hall, 2002, or slides 18 through 22 of class lectures from day 6, http://www.cs.utah.edu/~hal/courses/2009S_AI/cs5300-day06-adversarial-search.pdf