Unicode Codepoint Inspector
Inspect every codepoint with hex, Unicode block, and security flags for zero-width and confusable characters.
Codepoints
30
ASCII
25
Non-ASCII
5
Suspicious
1
| # | Char | Hex | Block / category | Flags | |
|---|---|---|---|---|---|
| 1 | H | U+0048 | Basic Latin·Letter | ||
| 2 | e | U+0065 | Basic Latin·Letter | ||
| 3 | l | U+006C | Basic Latin·Letter | ||
| 4 | l | U+006C | Basic Latin·Letter | ||
| 5 | o | U+006F | Basic Latin·Letter | ||
| 6 | , | U+002C | Basic Latin·Symbol | ||
| 7 | SPACE | U+0020 | Basic Latin·Whitespace | ||
| 8 | 世 | U+4E16 | CJK Unified Ideographs·Letter | ||
| 9 | 界 | U+754C | CJK Unified Ideographs·Letter | ||
| 10 | SPACE | U+0020 | Basic Latin·Whitespace | ||
| 11 | 🌍 | U+1F30D | Emoji·Symbol | ||
| 12 | SPACE | U+0020 | Basic Latin·Whitespace | ||
| 13 | — | U+2014 | General Punctuation·Punctuation | ||
| 14 | SPACE | U+0020 | Basic Latin·Whitespace | ||
| 15 | р | U+0440 | Cyrillic·Letter | confusable | |
| 16 | a | U+0061 | Basic Latin·Letter | ||
| 17 | y | U+0079 | Basic Latin·Letter | ||
| 18 | p | U+0070 | Basic Latin·Letter | ||
| 19 | a | U+0061 | Basic Latin·Letter | ||
| 20 | l | U+006C | Basic Latin·Letter | ||
| 21 | SPACE | U+0020 | Basic Latin·Whitespace | ||
| 22 | v | U+0076 | Basic Latin·Letter | ||
| 23 | s | U+0073 | Basic Latin·Letter | ||
| 24 | SPACE | U+0020 | Basic Latin·Whitespace | ||
| 25 | p | U+0070 | Basic Latin·Letter | ||
| 26 | a | U+0061 | Basic Latin·Letter | ||
| 27 | y | U+0079 | Basic Latin·Letter | ||
| 28 | p | U+0070 | Basic Latin·Letter | ||
| 29 | a | U+0061 | Basic Latin·Letter | ||
| 30 | l | U+006C | Basic Latin·Letter |
About Unicode Codepoint Inspector
Inspect every codepoint in a string with its hex value, Unicode block, category, and security flags. Surfaces zero-width characters, visually confusable Cyrillic/Greek lookalikes, bidi override characters, and byte-order marks — the hidden characters used in phishing, Trojan Source attacks, and document fingerprinting.
Security use cases
- Phishing detection — spot Cyrillic or Greek lookalikes in domain names or usernames.
- Trojan Source — find bidi override characters hidden in source code.
- Document fingerprinting — detect zero-width characters used to uniquely identify a document copy.
- Data validation — find unexpected control characters in user input or API responses.
Pipeline
- String Escape / Unescape — escape suspicious characters to their \\uXXXX form.
Frequently asked
- What is a Unicode codepoint?
- A codepoint is a number assigned to a character in the Unicode standard. It is written as U+XXXX (e.g. U+0041 for "A", U+1F600 for 😀). A codepoint is not the same as a byte — characters above U+007F require multiple bytes in UTF-8. JavaScript strings are sequences of UTF-16 code units, so emoji and other characters above U+FFFF are represented as surrogate pairs.
- What are zero-width characters?
- Zero-width characters are codepoints that have no visible glyph but affect text rendering or processing. Examples: U+200B (zero-width space), U+200C (zero-width non-joiner), U+200D (zero-width joiner, used in emoji sequences), U+FEFF (byte-order mark / zero-width no-break space). They are sometimes used to hide text in phishing attacks or to fingerprint documents.
- What are confusable characters?
- Confusable characters are codepoints from non-Latin scripts that look visually identical or very similar to ASCII characters. For example, Cyrillic "а" (U+0430) looks like Latin "a" (U+0061), and Cyrillic "о" (U+043E) looks like Latin "o" (U+006F). This is the basis of IDN homograph attacks — registering a domain like "pаypal.com" with a Cyrillic "а" that looks identical to the real domain.
- What is a bidi override character?
- Bidirectional override characters (U+202A–U+202E) control the direction of text rendering. They can be used to make malicious code look like a comment, or to reverse the apparent order of characters in a filename. The "Trojan Source" attack uses bidi overrides to hide malicious code in source files that looks harmless to reviewers.
- Why does my string have more codepoints than characters?
- Some visible "characters" are composed of multiple codepoints. Emoji with skin tone modifiers, family emoji, and flag emoji are sequences of 2–7 codepoints joined by zero-width joiners. Accented characters can be either a single precomposed codepoint (NFC) or a base character plus a combining diacritic (NFD). This tool counts codepoints, not grapheme clusters.