Meta Description: Learn how URL encoding works, which characters need encoding, and best practices for handling URLs in web development. Includes practical examples and common pitfalls.
URLs are fundamental to the web, but they have strict rules about which characters they can contain. URL encoding (also called percent-encoding) is the mechanism that allows arbitrary data to be included in URLs safely.
Understanding URL encoding is essential for building secure, reliable web applications that handle user input correctly.
What Is URL Encoding?
URL encoding converts characters into a format that can be transmitted over the Internet. It replaces unsafe characters with a % followed by two hexadecimal digits representing the character's ASCII value.
Example
| Original | Encoded |
|---|---|
hello world |
hello%20world |
price: $50 |
price%3A%20%2450 |
a/b/c |
a%2Fb%2Fc |
Why URL Encoding Is Necessary
URL Character Restrictions
URLs are designed to use a limited set of ASCII characters:
Reserved Characters (have special meaning):
:(scheme separator)/(path separator)?(query start)&(query parameter separator)=(parameter value separator)#(fragment identifier)@,!,$,',(,),*,+,,,;,=
Unreserved Characters (safe to use):
- Alphanumeric:
A-Z,a-z,0-9 - Special:
-,_,.,~
Unsafe Characters (must be encoded):
- Space,
<,>,{,},|,\,^,[,],` - Non-ASCII characters (Unicode)
- Control characters
The Problem
If you include reserved or unsafe characters in a URL without encoding, they may:
- Break the URL structure
- Be misinterpreted by servers
- Cause security vulnerabilities
- Result in data corruption
How URL Encoding Works
The Encoding Process
- Take the character's byte value
- Convert to hexadecimal
- Prefix with %
Examples
| Character | ASCII Value | Hex | Encoded |
|---|---|---|---|
| Space | 32 | 20 | %20 |
| ! | 33 | 21 | %21 |
| " | 34 | 22 | %22 |
| # | 35 | 23 | %23 |
| $ | 36 | 24 | %24 |
| % | 37 | 25 | %25 |
| & | 38 | 26 | %26 |
| ' | 39 | 27 | %27 |
| ( | 40 | 28 | %28 |
| ) | 41 | 29 | %29 |
| * | 42 | 2A | %2A |
| + | 43 | 2B | %2B |
| , | 44 | 2C | %2C |
| / | 47 | 2F | %2F |
| : | 58 | 3A | %3A |
| ; | 59 | 3B | %3B |
| = | 61 | 3D | %3D |
| ? | 63 | 3F | %3F |
| @ | 64 | 40 | %40 |
| [ | 91 | 5B | %5B |
| ] | 93 | 5D | %5D |
Encoding Unicode Characters
Unicode characters are encoded as UTF-8 bytes, then each byte is percent-encoded:
| Character | UTF-8 Bytes | Encoded |
|---|---|---|
| © | C2 A9 | %C2%A9 |
| ® | C2 AE | %C2%AE |
| 你 | E4 BD A0 | %E4%BD%A0 |
| 😊 | F0 9F 98 8A | %F0%9F%98%8A |
URL Components and Encoding
Different URL components have different encoding requirements:
Scheme and Authority
https://user:password@example.com:8080
- Scheme (
https): No encoding needed - User/Password: May need encoding for special characters
- Host: Uses Punycode for non-ASCII domains
Path
/path/to/resource
/is a separator (not encoded)- Other reserved characters should be encoded
Query String
?key1=value1&key2=value2
?,&,=are separators (not encoded in structure)- Parameter names and values should be encoded
Fragment
#section-id
#is the fragment identifier- Fragment content should be encoded
URL Encoding in Practice
JavaScript
// Encode for URL path
encodeURIComponent('hello world');
// 'hello%20world'
// Encode full URL
encodeURI('https://example.com/path with spaces');
// 'https://example.com/path%20with%20spaces'
// Decode
decodeURIComponent('hello%20world');
// 'hello world'
Key Difference: encodeURI vs encodeURIComponent
| Function | Encodes | Doesn't Encode |
|---|---|---|
encodeURI |
Most unsafe chars | :/?#[]@!$&'()*+,;= |
encodeURIComponent |
All reserved chars | -_.!~*'() |
When to use which:
encodeURI: For complete URLsencodeURIComponent: For individual URL components (query parameters, path segments)
Python
from urllib.parse import quote, quote_plus, unquote
# Standard encoding
quote('hello world') # 'hello%20world'
# Plus encoding (spaces become +)
quote_plus('hello world') # 'hello+world'
# Decode
unquote('hello%20world') # 'hello world'
PHP
// Standard encoding
urlencode('hello world'); // 'hello+world'
// RFC 3986 encoding
rawurlencode('hello world'); // 'hello%20world'
// Decode
urldecode('hello+world'); // 'hello world'
Common Pitfalls
1. Double Encoding
Encoding already-encoded data causes corruption:
// Wrong
const encoded = encodeURIComponent(encodeURIComponent('a b'));
// Result: 'a%2520b' (double-encoded)
// Correct
const encoded = encodeURIComponent('a b');
// Result: 'a%20b'
2. Encoding Complete URLs with encodeURIComponent
// Wrong - breaks URL structure
encodeURIComponent('https://example.com/path?q=test');
// 'https%3A%2F%2Fexample.com%2Fpath%3Fq%3Dtest'
// Correct - use encodeURI for full URLs
encodeURI('https://example.com/path?q=test');
// 'https://example.com/path?q=test'
3. Inconsistent Space Handling
Different systems handle spaces differently:
| Method | Space Becomes |
|---|---|
encodeURIComponent |
%20 |
encodeURI |
%20 |
urlencode (PHP) |
+ |
quote_plus (Python) |
+ |
| Form data | + |
Best practice: Use %20 for URLs; + is acceptable for form data.
4. Not Encoding User Input
// Dangerous - user input directly in URL
const url = `/search?q=${userInput}`;
// Safe - properly encode user input
const url = `/search?q=${encodeURIComponent(userInput)}`;
Security Considerations
URL Injection Attacks
Unencoded user input can enable attacks:
// Vulnerable
const url = `/redirect?target=${userInput}`;
// User input: "https://evil.com?original="
// Result: /redirect?target=https://evil.com?original=
// Mitigation
const url = `/redirect?target=${encodeURIComponent(userInput)}`;
Open Redirect Vulnerabilities
Always validate and encode redirect URLs:
// Dangerous
app.get('/redirect', (req, res) => {
res.redirect(req.query.url);
});
// Safer
app.get('/redirect', (req, res) => {
const url = decodeURIComponent(req.query.url);
if (isValidInternalUrl(url)) {
res.redirect(url);
} else {
res.status(400).send('Invalid redirect');
}
});
URL Encoding Reference Table
Reserved Characters
| Char | Encoded | Purpose |
|---|---|---|
| ! | %21 | Exclamation |
| # | %23 | Fragment identifier |
| $ | %24 | Dollar sign |
| & | %26 | Query separator |
| ' | %27 | Single quote |
| ( | %28 | Opening parenthesis |
| ) | %29 | Closing parenthesis |
| * | %2A | Asterisk |
| + | %2B | Plus sign |
| , | %2C | Comma |
| / | %2F | Path separator |
| : | %3A | Scheme/port separator |
| ; | %3B | Semicolon |
| = | %3D | Parameter value separator |
| ? | %3F | Query start |
| @ | %40 | At sign |
| [ | %5B | Opening bracket |
| ] | %5D | Closing bracket |
Common Encoded Characters
| Char | Encoded | Description |
|---|---|---|
| Space | %20 | Space |
| " | %22 | Double quote |
| % | %25 | Percent sign |
| < | %3C | Less than |
| > | %3E | Greater than |
| \ | %5C | Backslash |
| ^ | %5E | Caret |
| ` | %60 | Backtick |
| { | %7B | Opening brace |
| | | %7C | Vertical bar |
| } | %7D | Closing brace |
Frequently Asked Questions
What is the difference between URL encoding and Base64 encoding?
URL encoding converts special characters to percent-encoded format (%XX) for safe URL transmission. Base64 encoding converts binary data to ASCII characters for safe text transmission. URL encoding is specific to URLs and preserves the original data size plus overhead. Base64 increases size by 33% but can encode any binary data, not just text.
Why do spaces sometimes become + and sometimes %20?
The + encoding for spaces comes from the application/x-www-form-urlencoded media type used for form submissions. In URLs proper, spaces should be encoded as %20. However, many URL encoding functions (like PHP's urlencode) use + by default for backward compatibility. Use rawurlencode in PHP or encodeURIComponent in JavaScript for proper %20 encoding.
Do I need to encode URLs in HTML attributes?
Yes, URLs in HTML attributes (like href and src) should be properly encoded. However, modern browsers are forgiving and will often handle unencoded URLs. For security and correctness, always encode URLs, especially when they contain user input or dynamic values.
How do I handle URL encoding in APIs?
APIs should accept properly encoded URLs and URL parameters. When constructing API URLs, encode each parameter value with encodeURIComponent. When receiving parameters, decode with decodeURIComponent. Always validate decoded values before using them in your application logic.
Conclusion
URL encoding is essential for building robust web applications. Understanding when and how to encode URLs prevents bugs, security vulnerabilities, and data corruption.
Key takeaways:
- Encode reserved and unsafe characters in URLs
- Use
encodeURIComponentfor query parameters - Use
encodeURIfor complete URLs - Never double-encode
- Always encode user input before including in URLs
- Validate decoded input for security
Need to encode or decode URLs? Try our free URL Encoder/Decoder for instant conversion with support for full URLs and individual components.
Further reading: RFC 3986, MDN encodeURI, WHATWG URL Standard
Sources: IETF RFC 3986, Mozilla Developer Network, WHATWG Living Standard