Tablut AI

A funny challenge!

Tablut is an ancient Lappish board game and its very well defined rules make it a perfect test field for a search algorithm.

As part of my Master’s degree exam in Fundamental of AI, my team and I developed the Nega-Max algorithm with an alpha-beta pruning, which is able to autonomously play Tablut game. We also designed a Genetic Algorithm to find the best evaluation function .

Let’s have a look at the ancient Tablut game and at the structure of our algorithm, with a short overview of the main theoretical concepts.


The game

Tablut is a two-players game which is played on a 9×9 board. The two players, black and white, have two different goals:

  • the white player has to bring his king in one of the escape squares;
  • the black player, who has no king, tries to capture the opponent king.
Tablut9x9
Tablut initial configuration.
(Wilhelm meis / Public domain)

The initial configuration has the white king in the center of the board with 8 white soldiers to protect him. The 16 black soldiers are placed in the camp squares

The white player starts the game. Each piece on the board can move vertically or horizontally by any given number of vacant squares. To capture an opponent pawn, the player has to catch it between two of her/his pawns set in a row or column.

The game ends as soon as the white king escapes or is captured.


A bit of theory

Tablut is a so called zero-sum game: each player’s gain or loss is perfectly balanced by the loss or gain of the other player, therefore by adding all the participants gains and subtracting all their losses, we get exactly zero. In Tablut this concept can be formulated in terms of advantage given by the current board configuration which can be encoded in a designed objective function. The two players have two complementary objective functions.

Tablut is also a perfect information game, which means that the opponents have a full knowledge about the current game configuration and there is no hidden information (an example of games with hidden information is card games).

Thanks to its characteristic, a Tablut game can be seen as the generation of a search tree with the starting configuration as root and the final configurations as leaves (fail, success or draw leaves).

A very common algorithm used to traverse the search tree of a game like Tablut is the Mini-Max Algorithm. This algorithm is exploited in fields such as AI, decision theory and statistics to minimize the possible loss in a worst case scenario.

Let’s call the two players maximizer (MAX) and minimizer (min). Given a configuration, the algorithm provides the best next move for MAX assuming that min plays at its best. The focus of Mini-Max is not on the entire path to the success leaf, but only on the next move.

The Mini-Max algorithm has several disadvantages which make it useless for complex games: its time complexity grows exponentially with the maximum depth of the search tree.

To overcome these problems, there exist several improvements of Mini-Max algorithm and we decided to implement the so called Nega-Max algorithm with alpha-beta pruning technique and transposition tables.


The algorithm: Nega-Max with alpha-beta pruning

As a first step, we built the game structure by developing the rules and the winning conditions.

Then we started thinking about the possible development of the algorithm and we decided to implement:

  • the Nega-Max algorithm which exploits the zero-sum property of Tablut;
  • the alpha-beta pruning techinque which reduces the number of evaluated nodes during the search;
  • the Trasposition Table which avoids re-evaluating an already seen configuration.

Here is the pseudocode of the algorithm taken from Wikipedia (the implementation is on Github).

function negamax(node, depth, α, β, color) is
    alphaOrig := α

    (* Transposition Table Lookup; node is the lookup key for ttEntry *)
    ttEntry := transpositionTableLookup(node)
    if ttEntry is valid and ttEntry.depth ≥ depth then
        if ttEntry.flag = EXACT then
            return ttEntry.value
        else if ttEntry.flag = LOWERBOUND then
            α := max(α, ttEntry.value)
        else if ttEntry.flag = UPPERBOUND then
            β := min(β, ttEntry.value)

        if α ≥ β then
            return ttEntry.value

    if depth = 0 or node is a terminal node then
        return color × the heuristic value of node

    childNodes := generateMoves(node)
    childNodes := orderMoves(childNodes)
    value := −∞
    for each child in childNodes do
        value := max(value, −negamax(child, depth − 1, −β, −α, −color))
        α := max(α, value)
        if α ≥ β then
            break

    (* Transposition Table Store; node is the lookup key for ttEntry *)
    ttEntry.value := value
    if value ≤ alphaOrig then
        ttEntry.flag := UPPERBOUND
    else if value ≥ β then
        ttEntry.flag := LOWERBOUND
    else
        ttEntry.flag := EXACT
    ttEntry.depth := depth	
    transpositionTableStore(node, ttEntry)

    return value

Evaluation Function and Genetic Algorithm

A crucial element of any Mini-Max variant is the evaluation function that estimates the heuristic value of a non-terminal node. The evaluation function has to be carefully designed in order to find a balance between the algorithm efficiency and its optimality.

We designed our evaluation function looking at:

  • the number of black and white pieces;
  • the king position.

The king position is evaluated according to the number of reachable escapes, the number and color of the closest soldiers, and the physical position on the board.

Each element of the evaluation function is weighted, but how can we tune the weights?

We decided to use a very well known population-based method: the Genetic Algorithm (GA), which is inspired by the process of natural selection.

Our GA has a population of 200 agents (100 white players and 100 black players). Each player has a random weights configuration and at each epoch we select the best individuals using a tournament selection with the number of winning as fitness function. The fitness function also includes a time-out parameter to fix a maximum game duration.

The offsprings are generated through a two-point crossover with a low mutation probability. The new population for each color is composed by the best 50 old agents and the best 50 new agents (truncation selection).

At the end we obtain the best white and black weights configuration according to the fitness function.