AVL Trees - CMPSCI 187

We have worked through all of the operations for an implementation of an ordered list as a binary search tree. We expect to gain efficiency with this implementation, reducing the cost of lookup, insert, and delete from O(n) of a linear linked list to O(logn). This is an enormous speedup.

We have seen though that for a given ordered list, there are many different binary search trees that represent that ordered list. This wasn't the case for a linear linked list - there is just one correct representation in that implementation. So, we need to deal with the fact that there are many possible binary search trees, and that some are more efficient than others. Consider a binary search tree that is as poorly balanced as possible. For example, imagine that each node has no left subtree. This is really just a linear linked list using .right fields instead of .next fields. Surely changing the name of a field doesn't improve efficiency, so it must be that access (lookup, insert, delete) cost is O(n) in that case. For us to achieve the O(logn) cost, we need a tree that is reaosnably well balanced. To the extent that the tree is unbalanced, worst case performance degrades. So, our task now is to modify our operations so that the binary search tree always maintains a reasonably well balanced shape.

Since there are many possible binary search trees for a given set of keys, we need to be able to transform one to another. A method for doing this was devised by Adelson-Velski and Landis, and binary search trees maintained in a reasonably well balanced manner by this method are called AVL trees. So, an AVL tree is a binary search tree that also possesses the AVL property at every node. Before we talk further of properties, let's look at the method, which consists of a set of tree transformations, called rotations, and rules for when to apply which.

Let's look at one of the transformations. We'll use simpler key names too - just single letters.

               y                       w
              / \                     / \
             /   \                   /   \
            /     \                 /     \
           w       z    ===>       v       y
          / \                             / \
         /   \                           /   \
        v     x                         x     z
This transformation is called a SingleRight rotation. The node `w' rose, the node 'y' sank, and the node `x' was reattached. Notice that the relationship `w' < `x' < `y' is preserved. The height relationships have changed however. If the tree were larger, say with more nodes below `v', this could help the balance by raising that subtree with respect to the rest. We'll see this in more detail, but this is why we're looking at this rotation. What would a SingleLeft rotation look like? [wait] Yes, it is the inverse of SingleRight.
               y                       w
              / \                     / \
             /   \                   /   \
            /     \                 /     \
           w       z    <===       v       y
          / \                             / \
         /   \                           /   \
        v     x                         x     z

Let's try SingleRight on an unbalanced tree, just to see that this kind of rotation can be helpful, and to see it applied.

               g
              / \
             /   \
            /     \
           d       h
          / \
         /   \
        b     f
       / \   /
      a   c e
What do we get? [wait] Right, the `g' comes down, the `d' goes up, and the `f' is reattached, giving:
               d
              / \
             /   \
            /     \
           /       \
          b         g
         / \       / \
        /   \     /   \
       a     c   f     h
                /
               e

Let's talk about what it means to be balanced, or even reasonably well balanced. We can invent a variety of metrics, but since we're interested in efficiency, let's think about the average number of key comparisons needed to locate a node. Notice that we're think about average rather than worst case. We can compute the average for these simple trees by counting the number of key comparisons needed for each of the eight nodes, and then dividing the sum by 8. For the first tree, this is (4 + 3 + 4 + 2 + 4 + 3 + 1 + 1)/8 = 23/8. For the second tree, it is (3 + 2 + 3 + 1 + 4 + 3 + 2 + 3)/8 = 21/8. For such small trees, this probably looks like a pitifully small gain, but our starting tree was not so unbalanced to begin with. The gains compared to very poorly balanced trees are much larger. What would the average cost be for a completely unbalanced tree of these same eight nodes? [wait] Yes, (1 + 2 + 3 + 4 + 5 + 6 + 7 + 8)/8 = 36/8.

Let's define the notion of `height' of a node. We'll say that it is equal to the number of levels above the deepest node in the tree below it. So, let's compute the height of each node in the tree:

               g
              / \
             /   \
            /     \
           d       h
          / \
         /   \
        b     f
       / \   /
      a   c e
What is the height of node `a'? [wait] Yes, it is has height 0 because it is 0 levels above the deepest node in its subtree. All leaf nodes have height 0. What is the height of `b'? [wait] Yes, 1. How about `f'? [wait] Yes, it is 1, How about node `g'? Yes, good, it has height 3. Who can give us a recursive definition of height? [wait] Okay, it is:
  height(t) = 0 if t is a leaf
            = 1 + max(height(t->left),height(t->right)) otherwise
This is the obvious one that follows from our discussion, but it wouldn't quite work if we implemented it this way because we haven't accounted for an empty tree. There are no nodes in an empty tree, so we would like height to be undefined, but it will be convenient if we define the height of an empty tree to be -1, giving:
  height(t) = -1 if t is NULL
            =  1 + max(height(t->left),height(t->right)) otherwise

With the ability to compute the height of a node, we can check the balance of a tree at any given node. We can say that a tree is reasonably well balanced if at every node, the height of its children differ by at most 1. This is the AVL property that we mentioned earlier. If the binary search tree has the AVL property at every node, then the tree is an AVL tree.

Let's revisit our example, noting the heights. We start with a tree that is not AVL (why? [wait]):

               g3
              / \
             /   \
            /     \
           d2      h0           (heights differ by 2)
          / \
         /   \
        b1    f1
       / \    /
      a0  c0 e0
and transform it to a tree that is AVL:
               d3
              / \
             /   \
            /     \
           /       \
          b1        g2         (heights differ by 1)
         / \       / \
        /   \     /   \
       a0    c0  f1    h0
                /
               e0

Well, even though we don't know quite when to rotate what, we do know that we'll be wanting to make such decisions based on heights of nodes, so let's revise our implementation so that the height of each node is maintained (kept up to date) in each node throughout the operations. Well, we'll do Insert, and I'll leave the others to you.

First, let's recall our typedef for a node. We'll want to add a .height field so that we can store the height of a node in the node itself. This gives:

typedef
  struct t1
    { struct t1 *left,*right;
      int height;
      keytype key;
    } node;

Now we can define two functions that will be useful. The first is quite simple. It returns -1 if a tree is NULL, and the value in the .height field otherwise. This will save us the chore of writing this conditional when we need to lookup the height of a node. The function is:

int get_node_height(node *tree)
{ if (tree == NULL)
    return(-1);
  else
    return(tree->height);
}
By the way, C has a construction called a conditional expression. It would be legal to rewrite the above as:
int get_node_height(node *tree)
{ return(tree == NULL ? -1 : tree->height);
}
If the first expression (before the `?') is non-zero (TRUE), then the conditional expression takes the value following the '?'. Otherwise it takes the value following the `:'. It is a matter of style whether to use a conditional expression in place of a conditional statement. No clear winner - it is usually a matter of what one is accustomed to seeing.

Now we can write a procedure (function of type void) to set the .height field of a node, assuming that the .height fields of the subtrees are set correctly. We'll see in a moment that this will be quite useful for Insert. As the recursion for the Insert unwinds, we'll reset the heights of the nodes just traversed. Let's call this one `set_node_height'. Let's start it off as:

void set_node_height(node *tree)
{ 
}
Now what? [wait] Well, almost, never neglect to handle the empty tree case. So, it would be:
void set_node_height(node *tree)
{ if (tree != NULL)
    tree->height = 1 + maximum(get_node_height(tree->left),
                               get_node_height(tree->right));
}
How would you write the function maximum? [wait] Yes, and using a conditional expression, one often sees:
int maximum(int x, int y)
{ return( x>y ? x : y);
}

Now let's recall our definition of Insert.

void insert(list *tree, keytype k)
{ node *tmp;

  if (*tree == NULL)
    { tmp = MALLOC_ONE(node);
      tmp->key = k;
      tmp->left = NULL;
      tmp->right = NULL;
      *tree = tmp;
    }
  else

  if (k < (*tree)->key)
    insert(&((*tree)->left),k);
  else
    insert(&((*tree)->right),k);
}
How can we change this so that the .height field of every node is correct after the insertion? One can assume that they were already correct prior to the insertion. [wait] Well, what about the base case? [wait] Yes, a leaf node has height 0, so we can set .height to 0 for that case, giving:
void insert(list *tree, keytype k)
{ node *tmp;

  if (*tree == NULL)
    { tmp = MALLOC_ONE(node);
      tmp->key = k;
      tmp->left = NULL;
      tmp->right = NULL;
      tmp->height = 0;
      *tree = tmp;
    }
  else

  if (k < (*tree)->key)
    insert(&((*tree)->left),k);
  else
    insert(&((*tree)->right),k);
}

Now, what about those recursive cases? [wait] Yes, after the insertion, it is possible that the height of the subtree has changed, so we should reset the .height of the root node *after* doing the recursive insert. Fortunately, we have set_node_height ready to go, giving:

void insert(list *tree, keytype k)
{ node *tmp;

  if (*tree == NULL)
    { tmp = MALLOC_ONE(node);
      tmp->key = k;
      tmp->left = NULL;
      tmp->right = NULL;
      tmp->height = 0;
      *tree = tmp;
    }
  else

  if (k < (*tree)->key)
    { insert(&((*tree)->left),k);
      set_node_height(*tree);
    }
  else
    { insert(&((*tree)->right),k);
      set_node_height(*tree);
    }
}
This is beautiful. When you go to do delete, it won't be quite as easy. Why not? [wait] Right, for the left subtree, when one removes the node with the largest key, all the nodes that were traversed to find the largest key may need new .height values. You'll need to think about this. Hint: can you write a recursive function that finds the largest key of the left subtree, and revises the left subtree when it finds it?

We're almost ready to take on the problem of figuring out when to apply which rotation, but let's write the code for one of the rotations first. This will give some increased familiarity with the mechanics, and we can also take care of writing it in a way that maintains the .height fields.

Let's write the SingleRight rotation. Here's the first step:

void single_right(list *tree)
{
}
We've written a few functions that have `list *tree' as a parameter, and it has been a bit of a chore to use (*tree) throughout. This time, let's do it a little differently. I often do it this way for clarity. We'll set a local to *tree, and when we're done, we'll copy the value to *tree. So, let's write some code.

Recall, that we like to set some local variables to point to the nodes/subtrees we want, and then we use those local variables to relink the pieces. Let's set some pointers to the tree, the old left subtree, and the middle subtree that needs to be reattached. This gives:

void single_right(list *tree)
{ node *old_root,*old_left,*middle;

  /* attach to pieces of interest */
  old_root = *tree;
  old_left = old_root->left;
  middle = old_left->right;

  /* rearrange */
}
Using our example, we now have, in effect:
                        g   old_root
                         \
                          \
                           \
 old_left  d                h
          /  
         /   
        b         f  middle
       / \       /
      a   c     e

Now let's hook them together in the new arrangement. What code will do that? [wait] Yes, giving:

void single_right(list *tree)
{ node *old_root,*old_left,*middle;

  /* attach to pieces of interest */
  old_root = *tree;
  old_left = old_root->left;
  middle = old_left->right;

  /* rearrange */
  old_left->right = old_root;
  old_root->left = middle;
  *tree = old_left;
}
It's quite nice that six assignment accomplish the rotation. Let's now augment this code so that the .height fields are also maintained properly. For these three subtrees, which might have new .heights? [wait] Yes, old_root and old_left. So, how do we set these properly? [wait] Yep, we call set_node_height for each one, giving:
void single_right(list *tree)
{ node *old_root,*old_left,*middle;

  /* attach to pieces of interest */
  old_root = *tree;
  old_left = old_root->left;
  middle = old_left->right;

  /* rearrange */
  old_left->right = old_root;
  old_root->left = middle;
  *tree = old_left;

  /* update heights */
  set_node_height(old_root);
  set_height(old_left);
}
Is it important that we set these .heights in this order? [wait] You bet, the height of old_left depends on the height of old_root.

You can do SingleLeft on your own. We're now ready to think about when to do which rotation. In general, if we have an AVL tree, and either insert one new key or delete one old key, no subtree anywhere should be out of balance by more than two levels. More specifically, the heights of the two children should not differ by more than two. So, the main idea is to keep the tree reasonably well balanced as we go.

Restoring AVLness of tree should be possible with relatively few rotations. We have been working with an example in which such balance is restored. Let's look at another tree, and check whether a single rotation will fix it. Suppose we have the following tree that needs its balance restored:

                       h
                      / \
                     /   \
                    /     \
                   d       j
                  / \
                 /   \
                a     f
                     / \
                    e   g
It is too deep to the left, so it is natural to want to try a SingleRight rotation. Let's try it - what do we get? [wait...] Yes,
                       d
                      / \
                     /   \
                    /     \
                   a       h
                          / \
                         /   \
                        f     j
                       / \
                      e   g
Ack! It didn't work. The problem is that the `middle' subtree that gets reattached is too deep. Notice that `f' is same distance below the root both before and after the rotation. Reattaching a too-deep subtree at the same level doesn't help. This brings is to our first conclusion about when to do a rotation. If the tree is too deep to the left-outside, a SingleRight rotation will restore the balance. By symmetry, if the tree is too deep to the right-outside, a SingleLeft will restore balance. If the tree is too deep to the left-inside or the right-inside, no single rotation will restore the balance.

We're part way there. How can we handle a tree that is too deep to the inside? [wait...] Can we turn it into a problem for which a single rotation will work? [wait...] Let's take our example, and apply a SingleLeft rotation to the left subtree. What do we get? [wait] Yes, we get:

                       h
                      / \
                     /   \
                    /     \
                   f       j
                  / \
                 /   \
                d     g
               / \     
              a   e    
This doesn't look so great at the moment, but what kind of unbalanced condition do we now have? [wait] Yes, exactly, it's too deep to the left outside, so how can we now repair it? [wait] Yes, do a SingleRight at the root. What does that produce? [wait] Yep,
                       f
                      / \
                     /   \
                    /     \
                   d       h
                  / \     / \
                 a   e   g   j
What do you think about that? Not bad. Applying two single rotations in this way is how one restores balance to a tree that is too deep to the inside. It is such a useful sequence that it has its own name, in this case DoubleRight. By symmetry, there is a useful double rotation called DoubleLeft.

Let's write the code for DoubleRight.

void double_right(list *tree)
{ single_left(&(*tree->left));
  single_right(tree);
}
That's quite brief. We almost don't need it. What shall we do to maintain the .height fields through a double rotation? [wait] Good, absolutely nothing, and why is that? [wait] If single_left maintains .height and single_right maintains .height, then the single rotations do all that is necessary.

It will be convenient to write a procedure balance_tree that will apply any rotation that may be needed at the root to restore the AVL property at the root. Then it will be only a matter of adding some `balance_tree' calls to our operations in order to finish off the AVL-tree implementation. Let's do balance_tree here, but I'll leave fixing up the data type operations (insert and delete) to call it to you.

First, let's get the heights of the two children.

void balance_tree(list *tree)
{ int h_left,h_right;
  node *subtree;

  /* get heights of children */
  h_left = get_node_height((*tree)->left);
  h_right = get_node_height((*tree)->right);
}
Now we can compare these values to see whether the tree is too deep to one side.
void balance_tree(list *tree)
{ int h_left,h_right;
  node *subtree;

  /* get heights of children */
  h_left = get_node_height((*tree)->left);
  h_right = get_node_height((*tree)->right);

  if (h_left > h_right+1)
    { /* too deep to the left */
    }
  else

  if (h_left+1 < h_right)
    { /* too deep to the right */
    }
}
Notice that if the tree is neither too deep to the left nor too deep to the right, that nothing is changed. Now we can handle each case. We'll just do the left case. You can do the other one by symmetry.

Now that we know we're too deep to the left, we can proceed to figure out whether we're too deep to the outside, or too deep to the inside. This gives:

void balance_tree(list *tree)
{ int h_left,h_right;
  node *subtree;

  /* get heights of children */
  h_left = get_node_height((*tree)->left);
  h_right = get_node_height((*tree)->right);

  if (h_left > h_right+1)
    { /* too deep to the left */
      subtree = (*tree)->left;

      h_left = get_node_height(subtree->left);
      h_right = get_node_height(subtree->right);

      if (h_left >= h_right)
        { /* too deep to the outside */
        }
      else
        { /* too deep to the inside */
        }
    }
  else

  if (h_left+1 < h_right)
    { /* too deep to the right */
    }
}
How do we finish off the `left' case? [wait] Yes, we call the appropriate rotation. So, what do I write? [wait] Yes, as we said above, giving:
void balance_tree(list *tree)
{ int h_left,h_right;
  node *subtree;

  /* get heights of children */
  h_left = get_node_height((*tree)->left);
  h_right = get_node_height((*tree)->right);

  if (h_left > h_right+1)
    { /* too deep to the left */
      subtree = (*tree)->left;

      h_left = get_node_height(subtree->left);
      h_right = get_node_height(subtree->right);

      if (h_left >= h_right)
        { /* too deep to the outside */
          single_right(tree);
        }
      else
        { /* too deep to the inside */
          double_right(tree);
        }
    }
  else

  if (h_left+1 < h_right)
    { /* too deep to the right */
    }
}
By the way, when the two heights are identical, one wants to do a single rotation, hence the `>=' instead of `>' above.

That's basically it for the mechanics of maintaining reasonable balance in a binary search tree (keeping it AVL). As I mentioned, you'll need to figure out where to call balance_tree.

Let's do one example that involves deleting a node. Suppose that we have the AVL tree:

                                 f
                                / \
                               /   \
                              /     \
                             /       \
                            /         \
                           d           l
                          / \         / \
                         /   \       /   \
                        b     e     j     n
                         \         / \   /
                          c       h   k m
                                 / \
                                g   i
Now let's delete `f'. From this we get:
                                 e
                                / \
                               /   \
                              /     \
                             /       \
                            /         \
                           d           l
                          /           / \
                         /           /   \
                        b           j     n
                         \         / \   /
                          c       h   k m
                                 / \
                                g   i
which is not AVL at `d'. Which kind of imbalance do we have, too deep to the inside, or too deep to the outside? [wait] Right, it is too deep to the inside, so we need a DoubleRight rotation. The SingleLeft at `b' gives us:
                                 e
                                / \
                               /   \
                              /     \
                             /       \
                            /         \
                           d           l
                          /           / \
                         /           /   \
                        c           j     n
                       /           / \   /
                      b           h   k m
                                 / \
                                g   i
and the SingleRight at `d' gives us:
                                 e
                                / \
                               /   \
                              /     \
                             /       \
                            /         \
                           c           l
                          / \         / \
                         /   \       /   \
                        b     d     j     n
                                   / \   /
                                  h   k m
                                 / \
                                g   i
Now the tree is not AVL at `e'. What must happen now? [wait] Yes, it's too deep to the right-inside, so we need to do a DoubleLeft. We first do the SingleRight at `l', giving:
                                 e
                                / \
                               /   \
                              /     \
                             /       \
                            /         \
                           c           j
                          / \         / \
                         /   \       /   \
                        b     d     h     l
                                   / \   / \
                                  g   i k   n
                                           /
                                          m
Now the SingleLeft at `e' makes:
                                 j
                                / \
                               /   \
                              /     \
                             /       \
                            /         \
                           e           l
                          / \         / \
                         /   \       /   \
                        c     h     k     n
                       / \   / \         / 
                      b   d g   i       m  
That's it for our discussion of AVL trees. The O(logn) Insert and Delete costs are the best that one achieve for keeping an ordered list, when using a serial processor and when the data structure is memory resident (not on disk).


Last Updated: November 29, 1996
Paul Utgoff: utgoff@cs.umass.edu
© Copyright 1996, All Rights Reserved, Paul Utgoff, University of Massachusetts