AdWords
2.1K members online now
2.1K members online now
For questions related to Google Shopping and Merchant Center. Learn to optimize your Shopping ads
Guide Me
star_border
Reply

Fixing / Detecting double-encoded utf-8

Visitor ✭ ✭ ✭
# 1
Visitor ✭ ✭ ✭

We're getting this error on some of our products:
https://monosnap.com/file/a1vNZTYhV6MEehxWoS3bIaIOFWbQxc.png

"Encoding problem (double UTF8 encoding) in attribute: description"

 

How is Google detecting if something is double-encoded? Are there certain characters that are commonly show up when double-encoding happens? How can we fix that problem on our end? (We're using Ruby)

 

This problem isn't mentioned here: https://support.google.com/merchants/answer/160079 and we are sending valid utf8, just happens that some of the valid utf8 happens to be characters that probably aren't right. 

2 Expert replyverified_user

Re: Fixing / Detecting double-encoded utf-8

[ Edited ]
Top Contributor
# 2
Top Contributor

double-encoding usually indicates that the original site/database text had
used different or a mix of various encodings such as latin1/cp1251/etc --
especially special symbols, proprietary characters, diacritics, or similar,
and then later encoded as utf-8 -- that is not allowed.

generally, standard utf-8 should be used for all the original text
and then simply transfered to google without any (re) encoding.

otherwise, simply use all u.s.ascii characters, in
the range of 0x30 - 0x7a with standard (english)
capitalization and standard (english) punctuation.

e.g.

Val d'Orcia Caffee Pano


otherwise, forum-members cannot look into the original-text or how the data
was encoded or transmitted to google -- but google may be contacted directly.

 

as an aside, html, css, javascript, etc, -- are also not allowed;

all data submitted should be plain text without any formatting.

 

Re: Fixing / Detecting double-encoded utf-8

Visitor ✭ ✭ ✭
# 3
Visitor ✭ ✭ ✭
That's what we are doing, but unfortunately the data given to us was
encoded like this.

What's the best way to contact Google?

Re: Fixing / Detecting double-encoded utf-8

Top Contributor
# 4
Top Contributor

Re: Fixing / Detecting double-encoded utf-8

Rising Star
# 5
Rising Star
I don't think Google will be able to help you. This is an issue on your scripting side that generates the data. Even when you think it is correct there are several procedures you need to accomplish to ensure it is pure.

For example entity decode, special character decode, remove hidden white space, remove certain special elements.

With my coding i have at least 9 steps to create a cleaned up text, than a final step to convert it all to back.

I suggest if you are knowledgeable to go to stackoverflow and ask the question as that is a better area to ask how to create a clean xml file.

hope it helps.
Twitter | Linkedin | Community Profile | Shopping Feed Tips From FeedArmy
Did you find any helpful responses or answers to your query? If yes, please click on ‘Accept As Solution’

Re: Fixing / Detecting double-encoded utf-8

Explorer ✭ ✭ ☆
# 6
Explorer ✭ ✭ ☆

@Emmanuel F Is spot on, this is not an issue that Google support can really help you with. Most of the time, double UTF-8 encoding errors are from special characters like & and fractions and accents getting turned into code. You can set up rules within Merchant Center to find and replace those special characters, you just have to map them all (we actually use a custom formula for excel that accesses a database of these errors to clean up our feeds). It's definitely a better idea to fix the extraction process, but if that proves to be too big or you just need to buy time until you can finish that, setting up Merchant Center rules is a relatively easy quick-fix.

Re: Fixing / Detecting double-encoded utf-8

Visitor ✭ ✭ ✭
# 7
Visitor ✭ ✭ ✭
This question is clearly not answered. Let me answer it correctly:

"How is Google detecting if something is double-encoded?"

Google is in a position to help you because they are detecting the errors in the first place. However Google is not specifying how they detect double-encoded characters either because they've chosen not to, or simple neglect. It is up to us to guess. To the naked eye, double-encoding artifacts are usually quite clear. If someone spent a few weeks on this and came up with common word contexts and bad sequences of characters as a result of 2 or 3 encoding translations, along with a way to push the errors google detects into a database, publish it to a website and as open source, and provide a few tools for easy exploration into what certain extended characters look like when mix-encoded using variation translations, the community could detect these issues earlier, and we (including Google) would all prosper.

Re: Fixing / Detecting double-encoded utf-8

Visitor ✭ ✭ ✭
# 8
Visitor ✭ ✭ ✭
Hi Mark C,

I found http://www.i18nqa.com/debug/utf8-debug.html which seems to correspond to what Google does. But yes, it would be nice to have something more standard.

Joe

Re: Fixing / Detecting double-encoded utf-8

Visitor ✭ ✭ ✭
# 9
Visitor ✭ ✭ ✭
I'm finding that the most common mixed encoding issues derive from combinations of utf-8, cp1252, and iso8859-1 so that's a great find. It would also be great to have a tool where you give it what you expect a word should be, and how it was actually received, and the tool will run through many combinations/iterations of dropped/replaced characters as well as mixed encodings to find out what combination could produce that outcome. That way you can at least try to repair them in a somewhat automatic/faster way.

If anyone else knows of other resources or tools to detect and possibly resolve mixed encoding issues after-the-fact, please post them here! Smiley Happy