The Unicode Bidirectional Algorithm (UBA), formally defined in Unicode Standard Annex #9 (UAX #9), is a specification developed by the Unicode Consortium that determines how text containing a mixture of left-to-right and right-to-left scripts is displayed. It is a normative part of the Unicode Standard and is required for conformance wherever characters from right-to-left scripts such as Arabic or Hebrew are rendered.
Most writing systems display text from left to right, but several scriptsâÂÂincluding Arabic, Hebrew, Thaana, and SyriacâÂÂare written from right to left. When text from both directions appears in the same document, the result is known as bidirectional text (or bidi text). Without a clear specification, ambiguities arise in determining the correct display order of characters.
The Unicode Standard prescribes a logical order for storing characters in memory, regardless of their visual direction. The UBA translates this logical order into a correct visual display order.
The UBA defines several categories of special control characters used to influence text direction:
Lightweight, zero-width characters that act as directional anchors without affecting display:
Signal that a piece of text is to be treated as embedded in a given direction:
Force characters to be treated as strongly directional, overriding their implicit types:
Introduced in Unicode 6.3, isolates prevent the enclosed text from affecting the surrounding text's ordering:
The UBA processes text in four main phases:
Text is split into paragraphs at paragraph separator characters (type B). Each paragraph is processed independently.
Each character is assigned a bidirectional character type (e.g., L, R, AL, EN, AN) from the Unicode Character Database. An embedding level list is also initialized.
A series of rules resolves the embedding level of each character:
The maximum embedding depth is 125 levels, a value guaranteed not to change in future versions of the standard.
Rules L1âÂÂL4 reorder characters on each line for display:
Characters are classified into the following categories:
A conforming implementation must:
The UBA permits six higher-level protocol overrides (HL1âÂÂHL6), including:
On web pages, Unicode directional formatting characters can be replaced by HTML5 and CSS3 markup:
The misuse of bidirectional formatting characters poses significant security risks, as they can be used to make malicious code or text appear benign. This is documented in Unicode Technical Report #36 (UTR36). Directional overrides (LRO, RLO) are particularly dangerous and should be avoided where possible.