Back

URL Encoding: Complete Guide to Percent-Encoding for Web Developers

Meta Description: Learn how URL encoding works, which characters need encoding, and best practices for handling URLs in web development. Includes practical examples and common pitfalls.


URLs are fundamental to the web, but they have strict rules about which characters they can contain. URL encoding (also called percent-encoding) is the mechanism that allows arbitrary data to be included in URLs safely.

Understanding URL encoding is essential for building secure, reliable web applications that handle user input correctly.

What Is URL Encoding?

URL encoding converts characters into a format that can be transmitted over the Internet. It replaces unsafe characters with a % followed by two hexadecimal digits representing the character's ASCII value.

Example

Original Encoded
hello world hello%20world
price: $50 price%3A%20%2450
a/b/c a%2Fb%2Fc

Why URL Encoding Is Necessary

URL Character Restrictions

URLs are designed to use a limited set of ASCII characters:

Reserved Characters (have special meaning):

  • : (scheme separator)
  • / (path separator)
  • ? (query start)
  • & (query parameter separator)
  • = (parameter value separator)
  • # (fragment identifier)
  • @, !, $, ', (, ), *, +, ,, ;, =

Unreserved Characters (safe to use):

  • Alphanumeric: A-Z, a-z, 0-9
  • Special: -, _, ., ~

Unsafe Characters (must be encoded):

  • Space, <, >, {, }, |, \, ^, [, ], `
  • Non-ASCII characters (Unicode)
  • Control characters

The Problem

If you include reserved or unsafe characters in a URL without encoding, they may:

  • Break the URL structure
  • Be misinterpreted by servers
  • Cause security vulnerabilities
  • Result in data corruption

How URL Encoding Works

The Encoding Process

  1. Take the character's byte value
  2. Convert to hexadecimal
  3. Prefix with %

Examples

Character ASCII Value Hex Encoded
Space 32 20 %20
! 33 21 %21
" 34 22 %22
# 35 23 %23
$ 36 24 %24
% 37 25 %25
& 38 26 %26
' 39 27 %27
( 40 28 %28
) 41 29 %29
* 42 2A %2A
+ 43 2B %2B
, 44 2C %2C
/ 47 2F %2F
: 58 3A %3A
; 59 3B %3B
= 61 3D %3D
? 63 3F %3F
@ 64 40 %40
[ 91 5B %5B
] 93 5D %5D

Encoding Unicode Characters

Unicode characters are encoded as UTF-8 bytes, then each byte is percent-encoded:

Character UTF-8 Bytes Encoded
© C2 A9 %C2%A9
® C2 AE %C2%AE
E4 BD A0 %E4%BD%A0
😊 F0 9F 98 8A %F0%9F%98%8A

URL Components and Encoding

Different URL components have different encoding requirements:

Scheme and Authority

https://user:password@example.com:8080
  • Scheme (https): No encoding needed
  • User/Password: May need encoding for special characters
  • Host: Uses Punycode for non-ASCII domains

Path

/path/to/resource
  • / is a separator (not encoded)
  • Other reserved characters should be encoded

Query String

?key1=value1&key2=value2
  • ?, &, = are separators (not encoded in structure)
  • Parameter names and values should be encoded

Fragment

#section-id
  • # is the fragment identifier
  • Fragment content should be encoded

URL Encoding in Practice

JavaScript

// Encode for URL path
encodeURIComponent('hello world');
// 'hello%20world'

// Encode full URL
encodeURI('https://example.com/path with spaces');
// 'https://example.com/path%20with%20spaces'

// Decode
decodeURIComponent('hello%20world');
// 'hello world'

Key Difference: encodeURI vs encodeURIComponent

Function Encodes Doesn't Encode
encodeURI Most unsafe chars :/?#[]@!$&'()*+,;=
encodeURIComponent All reserved chars -_.!~*'()

When to use which:

  • encodeURI: For complete URLs
  • encodeURIComponent: For individual URL components (query parameters, path segments)

Python

from urllib.parse import quote, quote_plus, unquote

# Standard encoding
quote('hello world')  # 'hello%20world'

# Plus encoding (spaces become +)
quote_plus('hello world')  # 'hello+world'

# Decode
unquote('hello%20world')  # 'hello world'

PHP

// Standard encoding
urlencode('hello world');  // 'hello+world'

// RFC 3986 encoding
rawurlencode('hello world');  // 'hello%20world'

// Decode
urldecode('hello+world');  // 'hello world'

Common Pitfalls

1. Double Encoding

Encoding already-encoded data causes corruption:

// Wrong
const encoded = encodeURIComponent(encodeURIComponent('a b'));
// Result: 'a%2520b' (double-encoded)

// Correct
const encoded = encodeURIComponent('a b');
// Result: 'a%20b'

2. Encoding Complete URLs with encodeURIComponent

// Wrong - breaks URL structure
encodeURIComponent('https://example.com/path?q=test');
// 'https%3A%2F%2Fexample.com%2Fpath%3Fq%3Dtest'

// Correct - use encodeURI for full URLs
encodeURI('https://example.com/path?q=test');
// 'https://example.com/path?q=test'

3. Inconsistent Space Handling

Different systems handle spaces differently:

Method Space Becomes
encodeURIComponent %20
encodeURI %20
urlencode (PHP) +
quote_plus (Python) +
Form data +

Best practice: Use %20 for URLs; + is acceptable for form data.

4. Not Encoding User Input

// Dangerous - user input directly in URL
const url = `/search?q=${userInput}`;

// Safe - properly encode user input
const url = `/search?q=${encodeURIComponent(userInput)}`;

Security Considerations

URL Injection Attacks

Unencoded user input can enable attacks:

// Vulnerable
const url = `/redirect?target=${userInput}`;
// User input: "https://evil.com?original="
// Result: /redirect?target=https://evil.com?original=

// Mitigation
const url = `/redirect?target=${encodeURIComponent(userInput)}`;

Open Redirect Vulnerabilities

Always validate and encode redirect URLs:

// Dangerous
app.get('/redirect', (req, res) => {
  res.redirect(req.query.url);
});

// Safer
app.get('/redirect', (req, res) => {
  const url = decodeURIComponent(req.query.url);
  if (isValidInternalUrl(url)) {
    res.redirect(url);
  } else {
    res.status(400).send('Invalid redirect');
  }
});

URL Encoding Reference Table

Reserved Characters

Char Encoded Purpose
! %21 Exclamation
# %23 Fragment identifier
$ %24 Dollar sign
& %26 Query separator
' %27 Single quote
( %28 Opening parenthesis
) %29 Closing parenthesis
* %2A Asterisk
+ %2B Plus sign
, %2C Comma
/ %2F Path separator
: %3A Scheme/port separator
; %3B Semicolon
= %3D Parameter value separator
? %3F Query start
@ %40 At sign
[ %5B Opening bracket
] %5D Closing bracket

Common Encoded Characters

Char Encoded Description
Space %20 Space
" %22 Double quote
% %25 Percent sign
< %3C Less than
> %3E Greater than
\ %5C Backslash
^ %5E Caret
` %60 Backtick
{ %7B Opening brace
| %7C Vertical bar
} %7D Closing brace

Frequently Asked Questions

What is the difference between URL encoding and Base64 encoding?

URL encoding converts special characters to percent-encoded format (%XX) for safe URL transmission. Base64 encoding converts binary data to ASCII characters for safe text transmission. URL encoding is specific to URLs and preserves the original data size plus overhead. Base64 increases size by 33% but can encode any binary data, not just text.

Why do spaces sometimes become + and sometimes %20?

The + encoding for spaces comes from the application/x-www-form-urlencoded media type used for form submissions. In URLs proper, spaces should be encoded as %20. However, many URL encoding functions (like PHP's urlencode) use + by default for backward compatibility. Use rawurlencode in PHP or encodeURIComponent in JavaScript for proper %20 encoding.

Do I need to encode URLs in HTML attributes?

Yes, URLs in HTML attributes (like href and src) should be properly encoded. However, modern browsers are forgiving and will often handle unencoded URLs. For security and correctness, always encode URLs, especially when they contain user input or dynamic values.

How do I handle URL encoding in APIs?

APIs should accept properly encoded URLs and URL parameters. When constructing API URLs, encode each parameter value with encodeURIComponent. When receiving parameters, decode with decodeURIComponent. Always validate decoded values before using them in your application logic.

Conclusion

URL encoding is essential for building robust web applications. Understanding when and how to encode URLs prevents bugs, security vulnerabilities, and data corruption.

Key takeaways:

  • Encode reserved and unsafe characters in URLs
  • Use encodeURIComponent for query parameters
  • Use encodeURI for complete URLs
  • Never double-encode
  • Always encode user input before including in URLs
  • Validate decoded input for security

Need to encode or decode URLs? Try our free URL Encoder/Decoder for instant conversion with support for full URLs and individual components.


Further reading: RFC 3986, MDN encodeURI, WHATWG URL Standard

Sources: IETF RFC 3986, Mozilla Developer Network, WHATWG Living Standard