In computer science, a redâÂÂblack tree is a self-balancing binary search tree data structure noted for fast storage and retrieval of ordered information. The nodes in a red-black tree hold an extra "color" bit, often drawn as red and black, which help ensure that the tree is always approximately balanced.
When the tree is modified, the new tree is rearranged and "repainted" to restore the coloring properties that constrain how unbalanced the tree can become in the worst case. The properties are designed such that this rearranging and recoloring can be performed efficiently.
The (re-)balancing is not perfect, but guarantees searching in time, where is the number of entries in the tree. The insert and delete operations, along with tree rearrangement and recoloring, also execute in time.
Tracking the color of each node requires only one bit of information per node because there are only two colors (due to memory alignment present in some programming languages, the real memory consumption may differ). The tree does not contain any other data specific to it being a redâÂÂblack tree, so its memory footprint is almost identical to that of a classic (uncolored) binary search tree. In some cases, the added bit of information can be stored at no added memory cost.
In 1972, Rudolf Bayer invented a data structure that was a special order-4 case of a B-tree. These trees maintained all paths from root to leaf with the same number of nodes, creating perfectly balanced trees. However, they were not binary search trees. Bayer called them a "symmetric binary B-tree" in his paper and later they became popular as 2âÂÂ3âÂÂ4 trees or even 2âÂÂ3 trees.
In a 1978 paper, "A Dichromatic Framework for Balanced Trees", Leonidas J. Guibas and Robert Sedgewick derived the redâÂÂblack tree from the symmetric binary B-tree. The color "red" was chosen because it was the best-looking color produced by the color laser printer available to the authors while working at Xerox PARC. Another response from Guibas states that it was because of the red and black pens available to them to draw the trees.
In 1993, Arne Andersson introduced the idea of a right leaning tree to simplify insert and delete operations.
In 1999, Chris Okasaki showed how to make the insert operation purely functional. Its balance function needed to take care of only four unbalanced cases and one default balanced case.
The original algorithm used eight unbalanced cases, but reduced that to six unbalanced cases. Sedgewick showed that the insert operation can be implemented in just 46 lines of Java. In 2008, Sedgewick proposed the left-leaning redâÂÂblack tree, leveraging AnderssonâÂÂs idea that simplified the insert and delete operations. Sedgewick originally allowed nodes whose two children are red, making his trees more like 2âÂÂ3âÂÂ4 trees, but later this restriction was added, making new trees more like 2âÂÂ3 trees. Sedgewick implemented the insert algorithm in just 33 lines, significantly shortening his original 46 lines of code.
The black depth of a node is defined as the number of black nodes from the root to that node (i.e. the number of black ancestors). The black height of a redâÂÂblack tree is the number of black nodes in any path from the root to the leaves, which, by requirement 4, is constant (alternatively, it could be defined as the black depth of any leaf node). The black height of a node is the black height of the subtree rooted by it. In this article, the black height of a null node shall be set to 0, because its subtree is empty as suggested by the example figure, and its tree height is also 0.
In addition to the requirements imposed on a binary search tree the following must be satisfied by a
Some authors, e.g. Cormen & al., claim "the root is black" as fifth requirement; but not Mehlhorn & Sanders or Sedgewick & Wayne. Since the root can always be changed from red to black, this rule has little effect on analysis. This article also omits it, because it slightly disturbs the recursive algorithms and proofs.
As an example, every perfect binary tree that consists only of black nodes is a redâÂÂblack tree.
The read-only operations, such as search or tree traversal, do not affect any of the requirements. In contrast, the modifying operations insert and easily maintain requirements 1 and 2, but with respect to the other requirements some extra effort must be made, to avoid introducing a violation of requirement 3, called a red-violation, or of requirement 4, called a black-violation.
The requirements enforce a critical property of redâÂÂblack trees: the path from the root to the farthest leaf is no more than twice as long as the path from the root to the nearest leaf. The result is that the tree is height-balanced. Since operations such as inserting, deleting, and finding values require worst-case time proportional to the height of the tree, this upper bound on the height allows redâÂÂblack trees to be efficient in the worst case, namely logarithmic in the number of entries, i.e. (a property which is shared by all self-balancing trees, e.g., AVL tree or B-tree, but not the ordinary binary search trees). For a mathematical proof see section Proof of bounds.
RedâÂÂblack trees, like all binary search trees, allow quite efficient sequential access (e.g. in-order traversal, that is: in the order LeftâÂÂRootâÂÂRight) of their elements. But they support also asymptotically optimal direct access via a traversal from root to leaf, resulting in search time.
RedâÂÂblack trees are similar in structure to 2âÂÂ3âÂÂ4 trees, which are B-trees of order 4. In 2âÂÂ3âÂÂ4 trees, each node can contain between 1 and 3 values and have between 2 and 4 children. These 2âÂÂ3âÂÂ4 nodes correspond to black node â red children groups in red-black trees, as shown in figure 1. It is not a 1-to-1 correspondence, because 3-nodes have two equivalent representations: the red child may lie either to the left or right. The left-leaning red-black tree variant makes this relationship exactly 1-to-1, by only allowing the left child representation. Since every 2âÂÂ3âÂÂ4 node has a corresponding black node, invariant 4 of red-black trees is equivalent to saying that the leaves of a 2âÂÂ3âÂÂ4 tree all lie at the same level.
Despite structural similarities, operations on redâÂÂblack trees are more economical than B-trees. B-trees require management of vectors of variable length, whereas red-black trees are simply binary trees.
RedâÂÂblack trees offer worst-case guarantees for insertion time, deletion time, and search time. Not only does this make them valuable in time-sensitive applications such as real-time applications, but it makes them valuable building blocks in other data structures that provide worst-case guarantees. For example, many data structures used in computational geometry are based on redâÂÂblack trees, and the Completely Fair Scheduler and epoll system call of the Linux kernel use redâÂÂblack trees. The AVL tree is another structure supporting search, insertion, and removal. AVL trees can be colored redâÂÂblack, and thus are a subset of red-black trees. The worst-case height of AVL is 0.720 times the worst-case height of red-black trees, so AVL trees are more rigidly balanced. The performance measurements of Ben Pfaff with realistic test cases in 79 runs find AVL to RB ratios between 0.677 and 1.077, median at 0.947, and geometric mean 0.910. The performance of WAVL trees lies between AVL trees and red-black trees.
RedâÂÂblack trees are also particularly valuable in functional programming, where they are one of the most common persistent data structures, used to construct associative arrays and sets that can retain previous versions after mutations. The persistent version of redâÂÂblack trees requires space for each insertion or deletion, in addition to time.
For every 2âÂÂ3âÂÂ4 tree, there are corresponding redâÂÂblack trees with data elements in the same order. The insertion and deletion operations on 2âÂÂ3âÂÂ4 trees are also equivalent to color-flipping and rotations in redâÂÂblack trees. This makes 2âÂÂ3âÂÂ4 trees an important tool for understanding the logic behind redâÂÂblack trees, and this is why many introductory algorithm texts introduce 2âÂÂ3âÂÂ4 trees just before redâÂÂblack trees, even though 2âÂÂ3âÂÂ4 trees are not often used in practice.
In 2008, Sedgewick introduced a simpler version of the redâÂÂblack tree called the left-leaning redâÂÂblack tree by eliminating a previously unspecified degree of freedom in the implementation. The LLRB maintains an additional invariant that all red links must lean left except during inserts and deletes. RedâÂÂblack trees can be made isometric to either 2âÂÂ3 trees, or 2âÂÂ3âÂÂ4 trees, for any sequence of operations. The 2âÂÂ3âÂÂ4 tree isometry was described in 1978 by Sedgewick. With 2âÂÂ3âÂÂ4 trees, the isometry is resolved by a "color flip," corresponding to a split, in which the red color of two children nodes leaves the children and moves to the parent node.
The original description of the tango tree, a type of tree optimised for fast searches, specifically uses redâÂÂblack trees as part of its data structure.
As of Java 8, the HashMap has been modified such that instead of using a LinkedList to store different elements with colliding hashcodes, a redâÂÂblack tree is used. This results in the improvement of time complexity of searching such an element from to where is the number of elements with colliding hashes.
The read-only operations, such as search or tree traversal, on a redâÂÂblack tree require no modification from those used for binary search trees, because every redâÂÂblack tree is a special case of a simple binary search tree. However, the immediate result of an insertion or removal may violate the properties of a redâÂÂblack tree, the restoration of which is called rebalancing so that redâÂÂblack trees become self-balancing. Rebalancing (i.e. color changes) has a worst-case time complexity of and average of , though these are very quick in practice. Additionally, rebalancing takes no more than three tree rotations (two for insertion).
This is an example implementation of insert and remove in C. Below are the data structures and the <code>rotate_subtree</code> helper function used in the insert and remove examples.
The proposal breaks down both insertion and removal (not mentioning some very simple cases) into six constellations of nodes, edges, and colors, which are called cases. The proposal contains, for both insertion and removal, exactly one case that advances one black level closer to the root and loops, the other five cases rebalance the tree of their own. The more complicated cases are pictured in a diagram.
Insertion begins by placing the new (non-NULL) node, say N, at the position in the binary search tree of a NULL node whose in-order predecessorâÂÂs key compares less than the new nodeâÂÂs key, which in turn compares less than the key of its in-order successor. (Frequently, this positioning is the result of a search within the tree immediately preceding the insert operation and consists of a node <code>P</code> together with a direction <code>dir</code> with The newly inserted node is temporarily colored red so that all paths contain the same number of black nodes as before. But if its parent, say P, is also red then this action introduces a red-violation.
The rebalancing loop of the insert operation has the following invariants:
The current nodeâÂÂs parent P is black, so requirement 3 holds. Requirement 4 holds also according to the loop invariant.
If both the parent P and the uncle U are red, then both of them can be repainted black and the grandparent G becomes red for maintaining requirement 4. Since any path through the parent or uncle must pass through the grandparent, the number of black nodes on these paths has not changed. However, the grandparent G may now violate requirement 3, if it has a red parent. After relabeling G to N the loop invariant is fulfilled so that the rebalancing can be iterated on one black level (= 2 tree levels) higher.
Insert case 2 has been executed for times and the total height of the tree has increased by 1, now being  . The current node N is the (red) root of the tree, and all RB-properties are satisfied.
The parent P is red and the root. Because N is also red, requirement 3 is violated. But after switching PâÂÂs color the tree is in RB-shape. The black height of the tree increases by 1.
The parent P is red but the uncle U is black. The ultimate goal is to rotate the parent node P to the grandparent position, but this will not work if N is an "inner" grandchild of G (i.e., if N is the left child of the right child of G or the right child of the left child of G). A at P switches the roles of the current node N and its parent P. The rotation adds paths through N (those in the subtree labeled 2, see diagram) and removes paths through P (those in the subtree labeled 4). But both P and N are red, so requirement 4 is preserved. Requirement 3 is restored in case 6.
The current node N is now certain to be an "outer" grandchild of G (left of left child or right of right child). Now at G, putting P in place of G and making P the parent of N and G. G is black and its former child P is red, since requirement 3 was violated. After switching the colors of P and G the resulting tree satisfies requirement 3. Requirement 4 also remains satisfied, since all paths that went through the black G now go through the black P.
Because the algorithm transforms the input without using an auxiliary data structure and using only a small amount of extra storage space for auxiliary variables it is in-place.
The complex case is when N is not the root, colored black and has no proper child (â only NULL children). In the first iteration, N is replaced by NULL.
The rebalancing loop of the delete operation has the following invariant:
The current node N is the new root. One black node has been removed from every path, so the RB-properties are preserved. The black height of the tree decreases by 1.
P, S, and SâÂÂs children are black. After painting S red all paths passing through S, which are precisely those paths not passing through N, have one less black node. Now all paths in the subtree rooted by P have the same number of black nodes, but one fewer than the paths that do not pass through P, so requirement 4 may still be violated. After relabeling P to N the loop invariant is fulfilled so that the rebalancing can be iterated on one black level (= 1 tree level) higher.
The sibling S is red, so P and the nephews C and D have to be black. A at P turns S into NâÂÂs grandparent. Then after reversing the colors of P and S, the path through N is still short one black node. But N now has a red parent P and after the reassignment a black sibling S, so the transformations in cases 4, 5, or 6 are able to restore the RB-shape.
The sibling S and SâÂÂs children are black, but P is red. Exchanging the colors of S and P does not affect the number of black nodes on paths going through S, but it does add one to the number of black nodes on paths going through N, making up for the deleted black node on those paths.
The sibling S is black, SâÂÂs close child C is red, and SâÂÂs distant child D is black. After a at S the nephew C becomes SâÂÂs parent and NâÂÂs new sibling. The colors of S and C are exchanged. All paths still have the same number of black nodes, but now N has a black sibling whose distant child is red, so the constellation is fit for case D6. Neither N nor its parent P are affected by this transformation, and P may be red or black ( in the diagram).
The sibling S is black, SâÂÂs distant child D is red. After a at P the sibling S becomes the parent of P and SâÂÂs distant child D. The colors of P and S are exchanged, and D is made black. The whole subtree still has the same color at its root S, namely either red or black ( in the diagram), which refers to the same color both before and after the transformation. This way requirement 3 is preserved. The paths in the subtree not passing through N (i.o.w. passing through D and node 3 in the diagram) pass through the same number of black nodes as before, but N now has one additional black ancestor: either P has become black, or it was black and S was added as a black grandparent. Thus, the paths passing through N pass through one additional black node, so that requirement 4 is restored and the total tree is in RB-shape.
Because the algorithm transforms the input without using an auxiliary data structure and using only a small amount of extra storage space for auxiliary variables it is in-place.
For there is a redâÂÂblack tree of height with
nodes ( is the floor function) and there is no redâÂÂblack tree of this tree height with fewer nodesâÂÂtherefore it is minimal.<br/>Its black height is     (with black root) or for odd (then with a red root) also  
For a redâÂÂblack tree of a certain height to have minimal number of nodes, it must have exactly one longest path with maximal number of red nodes, to achieve a maximal tree height with a minimal black height. Besides this path all other nodes have to be black. If a node is taken off this tree it either loses height or some RB property.
The RB tree of height with red root is minimal. This is in agreement with
A minimal RB tree (RB<sub>h</sub> in figure 2) of height has a root whose two child subtrees are of different height. The higher child subtree is also a minimal RB tree, containing also a longest path that defines its height it has nodes and the black height The other subtree is a perfect binary tree of (black) height having black nodesâÂÂand no red node. Then the number of nodes is by induction
The graph of the function is convex and piecewise linear with breakpoints at where The function has been tabulated as A027383(hâÂÂ1) for
The inequality leads to , which for odd leads to
So in both, the even and the odd case, is in the interval
with being the number of nodes.
A redâÂÂblack tree with nodes (keys) has tree height
In addition to the single-element insert, delete and lookup operations, several set operations have been defined on union, intersection and set difference. Then fast bulk operations on insertions or deletions can be implemented based on these set functions. These set operations rely on two helper operations, Split and Join. With the new operations, the implementation of redâÂÂblack trees can be more efficient and highly-parallelizable. In order to achieve its time complexities this implementation requires that the root is allowed to be either red or black, and that every node stores its own black height.
The join algorithm is as follows:
function joinRightRB(T<sub>L</sub>, k, T<sub>R</sub>): if (T<sub>L</sub>.color=black) and (T<sub>L</sub>.blackHeight=T<sub>R</sub>.blackHeight): return Node(T<sub>L</sub>,â¨k,redâ©,T<sub>R</sub>) T'=Node(T<sub>L</sub>.left,â¨T<sub>L</sub>.key,T<sub>L</sub>.colorâ©,joinRightRB(T<sub>L</sub>.right,k,T<sub>R</sub>)) if (T<sub>L</sub>.color=black) and (T'.right.color=T'.right.right.color=red): T'.right.right.color=black; return rotateLeft(T') return T' /* T[recte T'] */
function joinLeftRB(T<sub>L</sub>, k, T<sub>R</sub>): /* symmetric to joinRightRB */
function join(T<sub>L</sub>, k, T<sub>R</sub>): if T<sub>L</sub>.blackHeight>T<sub>R</sub>.blackHeight: T'=joinRightRB(T<sub>L</sub>,k,T<sub>R</sub>) if (T'.color=red) and (T'.right.color=red): T'.color=black return T' if T<sub>R</sub>.blackHeight>T<sub>L</sub>.blackHeight: /* symmetric */ if (T<sub>L</sub>.color=black) and (T<sub>R</sub>.color=black): return Node(T<sub>L</sub>,â¨k,redâ©,T<sub>R</sub>) return Node(T<sub>L</sub>,â¨k,blackâ©,T<sub>R</sub>)
The split algorithm is as follows:
function split(T, k): if (T = NULL) return (NULL, false, NULL) if (k = T.key) return (T.left, true, T.right) if (k < T.key): (L',b,R') = split(T.left, k) return (L',b,join(R',T.key,T.right)) (L',b,R') = split(T.right, k) return (join(T.left,T.key,L'),b,R')
The union of two redâÂÂblack trees and representing sets and , is a redâÂÂblack tree that represents . The following recursive function computes this union:
function union(t<sub>1</sub>, t<sub>2</sub>): if t<sub>1</sub> = NULL return t<sub>2</sub> if t<sub>2</sub> = NULL return t<sub>1</sub> (L<sub>1</sub>,b,R<sub>1</sub>)=split(t<sub>1</sub>,t<sub>2</sub>.key) proc1=start: T<sub>L</sub>=union(L<sub>1</sub>,t<sub>2</sub>.left) proc2=start: T<sub>R</sub>=union(R<sub>1</sub>,t<sub>2</sub>.right) wait all proc1,proc2 return join(T<sub>L</sub>, t<sub>2</sub>.key, T<sub>R</sub>)
Here, split is presumed to return two trees: one holding the keys less its input key, one holding the greater keys. (The algorithm is non-destructive, but an in-place destructive version exists also.)
The algorithm for intersection or difference is similar, but requires the Join2 helper routine that is the same as Join but without the middle key. Based on the new functions for union, intersection or difference, either one key or multiple keys can be inserted to or deleted from the redâÂÂblack tree. Since Split calls Join but does not deal with the balancing criteria of redâÂÂblack trees directly, such an implementation is usually called the "join-based" implementation.
The complexity of each of union, intersection and difference is for two redâÂÂblack trees of sizes and . This complexity is optimal in terms of the number of comparisons. More importantly, since the recursive calls to union, intersection or difference are independent of each other, they can be executed in parallel with a parallel depth . When , the join-based implementation has the same computational directed acyclic graph (DAG) as single-element insertion and deletion if the root of the larger tree is used to split the smaller tree.
Parallel algorithms for constructing redâÂÂblack trees from sorted lists of items can run in constant time or time, depending on the computer model, if the number of processors available is asymptotically proportional to the number of items where . Fast search, insertion, and deletion parallel algorithms are also known.
The join-based algorithms for redâÂÂblack trees are parallel for bulk operations, including union, intersection, construction, filter, map-reduce, and so on.
Basic operations like insertion, removal or update can be parallelised by defining operations that process bulks of multiple elements. It is also possible to process bulks with several basic operations, for example bulks may contain elements to insert and also elements to remove from the tree.
The algorithms for bulk operations aren't just applicable to the redâÂÂblack tree, but can be adapted to other sorted sequence data structures also, like the 2âÂÂ3 tree, 2âÂÂ3âÂÂ4 tree and (a,b)-tree. In the following different algorithms for bulk insert will be explained, but the same algorithms can also be applied to removal and update. Bulk insert is an operation that inserts each element of a sequence into a tree .
This approach can be applied to every sorted sequence data structure that supports efficient join- and split-operations. The general idea is to split and in multiple parts and perform the insertions on these parts in parallel.
Note that in Step 3 the constraints for splitting assure that in Step 5 the trees can be joined again and the resulting sequence is sorted.
The pseudo code shows a simple divide-and-conquer implementation of the join-based algorithm for bulk-insert. Both recursive calls can be executed in parallel. The join operation used here differs from the version explained in this article, instead join2 is used, which misses the second parameter k.
bulkInsert(T, I, k): I.sort() bulklInsertRec(T, I, k)
bulkInsertRec(T, I, k): if k = 1: forall e in I: T.insert(e) else m := âÂÂsize(I) / 2â (T<sub>1</sub>, _, T<sub>2</sub>) := split(T, I[m]) bulkInsertRec(T<sub>1</sub>, I[0 .. m], âÂÂk / 2âÂÂ) || bulkInsertRec(T<sub>2</sub>, I[m + 1 .. size(I) - 1], âÂÂk / 2âÂÂ) T â join2(T<sub>1</sub>, T<sub>2</sub>)
Sorting is not considered in this analysis.
This can be improved by using parallel algorithms for splitting and joining. In this case the execution time is .
Another method of parallelizing bulk operations is to use a pipelining approach. This can be done by breaking the task of processing a basic operation up into a sequence of subtasks. For multiple basic operations the subtasks can be processed in parallel by assigning each subtask to a separate processor.
Sorting is not considered in this analysis. Also, is assumed to be smaller than , otherwise it would be more efficient to construct the resulting tree from scratch.