AdWords is now Google Ads. Our new name reflects the full range of advertising options we offer across Search, Display, YouTube, and more. Learn more

Ads
4.2K members online now
4.2K members online now
Get started with Google Ads - learn the basics to get set up for success
Guide Me
star_border
Reply

URL encoder in Editor

Visitor ✭ ✭ ✭
# 1
Visitor ✭ ✭ ✭

Hi all,

 

I've noticed that a URL with spaces or special characters is automatically encoded in the editor. The spaces are encoded correctly, but i dont know what happens with special characters. the type of encoding is not working at all, and its different form other encodin/decoding online tools.

 

do you knwo if i can change or fix this behavious somehow?

thanks a lot in advance

1 Expert replyverified_user

Re: URL encoder in Editor

Google Employee
# 2
Google Employee

The exact rules are somewhat complicated:

 

1) The domain name part is never escaped (in other words, international domain names are fully supported)

2) In the rest of the URL, space, left angle bracket and right angle bracket are always escaped (as %20, %3C and %3E, correspondingly). Percent sign is escaped as %25 unless followed by two hexadecimal digits (this way, %hexhex escapes are left intact).

3) If the URL contains only ASCII and Latin-1 characters (Unicode codepoints up to U+00FF - basically, characters used in European languages), then each non-ASCII character is represented by a single escape sequence. For example, Á (capital A with acute, U+00C1) is escaped as %C1.

4) If the URL contains any characters beyond U+00FF (pretty much every non-Latin character, and certain less common Latin ones), then each non-ASCII character is represented in UTF-8, and every byte of this representation is independently escaped. For example, Ω (greek capital letter Omega, U+03A9) is escaped as %CE%A9

 

Rule #3 here is somewhat unusual, and is probably the one causing problems for you. It means that the same character may be encoded differently, depending on what other characters may also be present in the URL. Thus, the aforementioned Á is encoded as %C1 if all other characters are ASCII or Latin-1, but as %C3%81 if there are any characters beyond Latin-1. Those other tools you mention probably use UTF-8 (rule #4) in all cases, and that's why they produce different results.

 

Back at the time rule #3 was introduced, most web servers in Europe were configured to use this Latin-1 based encoding, while servers in the rest of the world were generally using UTF-8. So we invented our "hybrid" encoding strategy in order to try and be helpful for most customers. I'm not sure what the situation is now, though I note that we receive very few complaints about our encoding scheme, so there are reasons to believe that it continues to balance the needs of the two communities well.

 

In any case, if Editor's encoding scheme doesn't suit your purposes, the workaround is to encode your URLs in the desired way first, and then enter already-encoded URLs into Editor. As I mentioned, Editor preserves existing escape sequences, so these pre-encoded URLs should remain unchanged. That's the best suggestion I can offer, I'm afraid.