Unicode Codepoint Inspector — Confusables & Zero-Width
Skip to main content

Unicode Codepoint Inspector

Inspect every codepoint with hex, Unicode block, and security flags for zero-width and confusable characters.

Codepoints

30

ASCII

25

Non-ASCII

5

Suspicious

1

#CharHexBlock / categoryFlags
1HU+0048Basic Latin·Letter
2eU+0065Basic Latin·Letter
3lU+006CBasic Latin·Letter
4lU+006CBasic Latin·Letter
5oU+006FBasic Latin·Letter
6,U+002CBasic Latin·Symbol
7SPACEU+0020Basic Latin·Whitespace
8U+4E16CJK Unified Ideographs·Letter
9U+754CCJK Unified Ideographs·Letter
10SPACEU+0020Basic Latin·Whitespace
11🌍U+1F30DEmoji·Symbol
12SPACEU+0020Basic Latin·Whitespace
13U+2014General Punctuation·Punctuation
14SPACEU+0020Basic Latin·Whitespace
15рU+0440Cyrillic·Letterconfusable
16aU+0061Basic Latin·Letter
17yU+0079Basic Latin·Letter
18pU+0070Basic Latin·Letter
19aU+0061Basic Latin·Letter
20lU+006CBasic Latin·Letter
21SPACEU+0020Basic Latin·Whitespace
22vU+0076Basic Latin·Letter
23sU+0073Basic Latin·Letter
24SPACEU+0020Basic Latin·Whitespace
25pU+0070Basic Latin·Letter
26aU+0061Basic Latin·Letter
27yU+0079Basic Latin·Letter
28pU+0070Basic Latin·Letter
29aU+0061Basic Latin·Letter
30lU+006CBasic Latin·Letter

About Unicode Codepoint Inspector

Inspect every codepoint in a string with its hex value, Unicode block, category, and security flags. Surfaces zero-width characters, visually confusable Cyrillic/Greek lookalikes, bidi override characters, and byte-order marks — the hidden characters used in phishing, Trojan Source attacks, and document fingerprinting.

Security use cases

  • Phishing detection — spot Cyrillic or Greek lookalikes in domain names or usernames.
  • Trojan Source — find bidi override characters hidden in source code.
  • Document fingerprinting — detect zero-width characters used to uniquely identify a document copy.
  • Data validation — find unexpected control characters in user input or API responses.

Pipeline

Frequently asked

What is a Unicode codepoint?
A codepoint is a number assigned to a character in the Unicode standard. It is written as U+XXXX (e.g. U+0041 for "A", U+1F600 for 😀). A codepoint is not the same as a byte — characters above U+007F require multiple bytes in UTF-8. JavaScript strings are sequences of UTF-16 code units, so emoji and other characters above U+FFFF are represented as surrogate pairs.
What are zero-width characters?
Zero-width characters are codepoints that have no visible glyph but affect text rendering or processing. Examples: U+200B (zero-width space), U+200C (zero-width non-joiner), U+200D (zero-width joiner, used in emoji sequences), U+FEFF (byte-order mark / zero-width no-break space). They are sometimes used to hide text in phishing attacks or to fingerprint documents.
What are confusable characters?
Confusable characters are codepoints from non-Latin scripts that look visually identical or very similar to ASCII characters. For example, Cyrillic "а" (U+0430) looks like Latin "a" (U+0061), and Cyrillic "о" (U+043E) looks like Latin "o" (U+006F). This is the basis of IDN homograph attacks — registering a domain like "pаypal.com" with a Cyrillic "а" that looks identical to the real domain.
What is a bidi override character?
Bidirectional override characters (U+202A–U+202E) control the direction of text rendering. They can be used to make malicious code look like a comment, or to reverse the apparent order of characters in a filename. The "Trojan Source" attack uses bidi overrides to hide malicious code in source files that looks harmless to reviewers.
Why does my string have more codepoints than characters?
Some visible "characters" are composed of multiple codepoints. Emoji with skin tone modifiers, family emoji, and flag emoji are sequences of 2–7 codepoints joined by zero-width joiners. Accented characters can be either a single precomposed codepoint (NFC) or a base character plus a combining diacritic (NFD). This tool counts codepoints, not grapheme clusters.