A*

Pathfinding is the process of finding a path from one point to another. A* (a-star) is an algorithm that efficiently finds a shortest path in a graph. In many game-programming situations this is the best choice for pathfinding. A* also has significant applications beyond finding paths across terrain; it can 'pathfind' in abstract problem spaces like sets of actions to undertake in order to accomplish a goal just as easily, making it a powerful AI tool.

Prerequisites

To be able to perform A* pathfinding you need two things:

A graph of your 'world' or problem space (this does not have to be a separate data-structure, a grid-based world can be approached as a graph for example)
A method to estimate distances (more generally, costs to reach) between points; easily done when pathfinding within a coordinate system

In some worlds the first element may be slightly problematic. For example in a world that consists purely of 3d meshes, you might need to generate a separate pathfinding graph to be able to do proper pathfinding. That is beyond the scope of this article though.

Basic Method

Finding a path in itself is trivial to do: Just try all possible paths, and see which one is the shortest. This method has one important drawback... it takes time. In fact if you have a complicated graph/world and you want to do it often it takes so much time that you can forget about real-time performance. This is why more sophisticated pathfinding algorithms were invented - if you do it clever, you can still get the best path without looking at every possible path.

An important concept for efficient pathfinding is sorting paths while you are building them. While you can't really know whether a path will turn out to be any good before you finish it, you can save yourself a lot of time by not starting with the path that goes the opposite direction from where you want to go. A* uses some method (usually the distance between the endpoint of a path and the target point) to approximate how 'good' the path is. If you keep growing the current 'best' path, reevaluating which is the current best at each step, you'll arrive at the target without having to evaluate all the really bad paths out there.

Algorithm

The A* algorithm uses a few main concepts:

The open set: This contains the nodes that currently need to be considered as possible starts for further extensions of the path. You start by placing the start node into this set.

The closed set: This contains nodes that have had all their neighbours added to the open set, the paths leading to them are 'done' (not entirely, but I'll get back to that).

The G score: The (weighted) distance from the starting point. Every node in the open or closed sets has a G score. G increases as you travel farther from the start.

The H score: The estimated distance to the goal. The H Heuristic score resembles the G score, except it increases as you travel farther from the goal. Because the path is not complete yet you can not really know this score, and that is where the estimation method mentioned earlier comes in. For A* to produce optimal paths, the estimate should never be larger than the actual distance. The closer the estimate is to the true cost, the faster A* will run.

So you start with an empty closed set and just the starting point in your open set. For every node in these sets you'll want to store its G score and the node that was used to arrive at this node (to backtrack the path).

Now we have to extend the path until it reaches the endpoint. This goes as follows:

Pick the node in the open set for which the sum of the G and H scores is the lowest. The G scores are stored in the nodes of the open set, and the H scores can be calculated for them. (However, if H is expensive to calculate, then calculate it once and store it in the node.) If the open set is empty there is no path from the start point to the end point. We'll call the point picked P.

For every point that is adjacent to P but not in the closed set: If it is not yet in the open set, add it. The 'previous' node for these new nodes is P, and their G score is the G score of P plus the distance between the new node and P. If it was already in the open set, check it's current G score, and if the new G score would be less than the current one update the G score and 'previous' pointer, otherwise leave it alone. Adjacency in a graph means there is a connection between the points, in a grid it means a point is next to P (whether you include points that are diagonally next to P is up to you). If the new point happens to be the destination point, you've found your path.

Move P to the closed set, and start over.

If you use a grid that contains 'impassable' terrain, just ignore nodes that are not passable in the second step, they are not part of the pathing graph. The distance between nodes on a grid is rather straightforward. If you go up, down, left or right you use the size of a grid unit, if you go diagonally you use sqrt(2) times the size of a grid unit. If some terrain is 'difficult' to pass, you can increase the weight for paths on that terrain.

To find an optimal (shortest) path, you have to make sure that your distance estimation function underestimates the distance between points. For the distance estimation method on a grid where you only travel horizontal or vertical you could just take sum of the vertical and horizontal differences between the node and the endpoint. You can experiment a little with what method you use. If your method overestimates the distance the algorithm may run faster, but it will not always find the shortest path.

Optimizations

The open and closed sets are often called lists, because lists are a common implementation. The best choice for the sets will depend on how many insertions there are and how many removals there are. Unsorted linked lists have fast insertion and slow (sorted) removal; sorted lists or arrays have slow insertion and fast removal; trees such are partially sorted and have medium insertion and medium removal. If your game map leads to lots of insertions but few removals, then you may favor the unsorted data structures. A commonly used data structure is the binary heap, which is a form of tree that can be stored efficiently in an array (no memory allocations).

The closed set will have to be searched each time a new node is looked at — if it is already in closed set it has to be ignored. A bit vector or hash table are reasonable choices.

If you only want to search a limited area of a grid for a path (or the grid is just small) you can gain a tremendous speed advantage by just allocating all the nodes in the grid right away in a big array. This way you won't have to dynamically allocate any storage during execution, and you can very easily locate nodes in the 'closed list' — instead of a list this will just be a 'closed flag' that is set on nodes that are closed. For the open list you keep a binary heap with pointers into this node array. I'll leave the details up to you.

Other uses

The structure created by an A* Pathfinder algorithm has some other potential uses for AI, basicly the target of the algorithm is to return the shortest path, the possibility of different movement costs adds a couple of alternatives. We spoke of movement costs for the case of being walking over hills or mountains to give the second a higher weight in order to avoid it if possible, and a road would likely get an inferior value to normal plain terrain so our actors would prefer road to clear terrain.

This avoidance or preference can be considered on a factor of danger, if an AI knows that there is an enemy bunker in a certain spot, movement cost in its vecinity for a non military unit should have higher values so it would naturally try to avoid it. This uses an already developed system to implement a basic more realistic and smarter behavior without much trouble or overhead, only requires some tampering with the heuristic formulae.

In general lines, A* provides the foundations for a raw basic enviroment recognition and interpretation sistem rather than just pathfinding, in the Age of Empires series they used this very sistem for most AI map considerations, like where to build, where to harvest resources and so on.

Links

A* tutorial (with pictures!)
AI-depot on pathfinding
2D Dev Learn contains simple FreeBASIC code
Production A* implementation in LPC
Wikipedia article

A*

Contents