Skip to content

Equivalence & Normalization

(Work in progress)

Some characters in Unicode can be written in multiple ways, having different binary representations.

The Unicode® Standard Annex #15 (here) refers to two forms of equivalency (emphasis mine):

Canonical equivalence is a fundamental equivalency between characters or sequences of characters which represent the same abstract character, and which when correctly displayed should always have the same visual appearance and behavior.

Source

For example, the following are canonically equivalent:

Escaped A B Escaped
\u0100 Ā A\u0304
\u01de Ǟ Ǟ A\u0308\u0304