|
Spiral Scripts Support Forum :: Virtuemart Extensions |
|
|
|
|
|
Subject :GoogleBaseXML bug on xml and txt..
04-11-2010 20:58:52
|
|
|
webmastergreg |
|
Fresher |
|
Joined: 04-11-2010 19:50:45
Posts: 5
Location: |
|
|
|
Hello
Congrats for this excellent and simple component.
We just have an issue withe the feed (and so with the txt file too) as the stuff inside the feeds are not enclosed in cdata tags.
And html entity are not parsed correctly.
I can send you two distinct feeds one ok (with another extension) and yours, for comparison.
Thanks |
IP Logged
|
|
|
|
|
|
|
Subject :Re:GoogleBaseXML bug on xml and txt..
06-11-2010 08:03:06
|
|
|
boggler |
|
Spiral Scripts Support |
|
Joined: 18-08-2009 10:14:13
Posts: 211
Location: |
|
|
|
Hi, I don't believe that you are correct that the information should be contained in CDATA tags - see the Google data feed specification at http://www.google.com/support/merchants/bin/answer.py?answer=188494
Could you let me know of a specific example where you are having problems, it sounds as if there is a problem somewhere with the entity encoding, I would think it can be fixed.
Can you give me the url of your product feed? That might help. |
IP Logged
|
susan subway boggler |
|
|
|
|
|
|
Subject :Re:GoogleBaseXML bug on xml and txt..
06-11-2010 13:41:19
|
|
|
boggler |
|
Spiral Scripts Support |
|
Joined: 18-08-2009 10:14:13
Posts: 211
Location: |
|
|
|
To expand upon my previous answer, I am sure that it would be incorrect to enclose the data in cdata tags. The cdata tag is basically an xml version of the html comment tag <!-- a comment -->
- it means that any text inside the tags will be ignored.
It is true that if you enclose any data that contains html in cdata tags then it will prevent validation errors - but this is not how you want to handle html tags in the data.
The correct way to do this is to use html entity encoding, which is what we do. For example, < will become <
This is the method that Google recommend in their documentation, and is the standard for RSS.
You need to remember what the product feed is for - to generate a product listing on Google product search. If you enclose the data in cdata tags, then Google will see this as an empty listing, because that is what the cdata tag implies.
I have to say that if you have an example of a data feed where the data is enclosed in cdata tags then that is incorrectly formatted, not ours.
If you think that there is a problem though with the product feed generated by our component do let me know the URL, I will be able to check then if there is something going wrong with the entity ecoding. |
IP Logged
|
Last Edited On: 06-11-2010 13:43:09 By boggler for the Reason |
susan subway boggler |
|
|
|
|
|
|
Subject :Re:GoogleBaseXML bug on xml and txt..
06-11-2010 17:06:43
|
|
|
|
|
|
|
|
Subject :Re:GoogleBaseXML bug on xml and txt..
07-11-2010 18:00:19
|
|
|
webmastergreg |
|
Fresher |
|
Joined: 04-11-2010 19:50:45
Posts: 5
Location: |
|
|
|
Hello, I've manage to fix this issue by this way.
the file:
com_googlebasexml/models/googlebasexml.php
I've replace (line 573):
Code:
$items[$i]->description = htmlspecialchars($desc);
By:
Code:
$items[$i]->description = $desc;
However this is not a good fix, because some special caracters are not encoded.
The real issue here is that there's a double encode of the ampersand so the & are wrongly encoded as amp; for each html entities.
But I let you check this more deeper. |
IP Logged
|
Last Edited On: 07-11-2010 19:52:55 By webmastergreg for the Reason |
|
|
|
|
|
|
Subject :Re:GoogleBaseXML bug on xml and txt..
07-11-2010 20:35:41
|
|
|
webmastergreg |
|
Fresher |
|
Joined: 04-11-2010 19:50:45
Posts: 5
Location: |
|
|
|
Ok
I've finally just add this:
Code:
$items[$i]->description = str_replace('&','&',$desc);
(the fisrt one is amp;, because of the forum...)
To replace all the twice encoded ampersand, and now the feed is ok.
I will test that on google merchant.
So htmlspecialchars is perhaps not the perfect way to retreive the description.
Some of preg_replace with array could be ok too, IMUO.
Just tell me what you think |
IP Logged
|
|
|
|
|
|
|
Subject :Re:GoogleBaseXML bug on xml and txt..
08-11-2010 10:32:38
|
|
|
boggler |
|
Spiral Scripts Support |
|
Joined: 18-08-2009 10:14:13
Posts: 211
Location: |
|
|
|
I know it seems wierd, but actually the double encoding of the '&' is correct! Remember that when it is eventually displayed as a product listing it will be displayed as html, there will be one round of decoding, so
&
will become
&
which will display correctly as an ampersand character when viewed as html.
So there really is no reason to worry about this, although I can well understand why you are confused!
However since you have drawn my attention to it, I have noticed that there is a bug in the character encoding in fact, as
$items[$i]->description = htmlspecialchars($desc);
should be
$items[$i]->description = htmlspecialchars($desc, ENT_QUOTES);
to ensure that the single quotation mark is correctly encoded.
I will make a new release which fixes this problem. |
IP Logged
|
susan subway boggler |
|
|
|
|
|
|
Subject :Re:GoogleBaseXML bug on xml and txt..
08-11-2010 14:14:53
|
|
|
boggler |
|
Spiral Scripts Support |
|
Joined: 18-08-2009 10:14:13
Posts: 211
Location: |
|
|
|
There is a new release, version 1.0.3, which deals with the htmlspecialchars ENT_QUOTES problem mentioned above.
You can update by downloading again using your existing download link, then upload and install using the Joomla installer, no need to uninstall first.
Glad you like the component, we've worked hard to make this a useful product. |
IP Logged
|
Last Edited On: 08-11-2010 14:16:24 By boggler for the Reason |
susan subway boggler |
|
|
|
|
|
|
Subject :Re:GoogleBaseXML bug on xml and txt..
08-11-2010 17:12:17
|
|
|
webmastergreg |
|
Fresher |
|
Joined: 04-11-2010 19:50:45
Posts: 5
Location: |
|
|
|
Hi
yes you were right & is ok, just noticed that after submission accepted by GM.
I was obsessed by the firefox feed rendering (so much issue with feeds in the past)
So it's ok I will test the new version, and back to you.
I must say at this time (so with the first version) that I got 21 products from 74, rejected for bad characters.
I tell you more about that after my tests
Thanks a lot |
IP Logged
|
|
|
|
|
|
|
Subject :Re:GoogleBaseXML bug on xml and txt..
09-11-2010 11:19:02
|
|
|
boggler |
|
Spiral Scripts Support |
|
Joined: 18-08-2009 10:14:13
Posts: 211
Location: |
|
|
|
The bad characters may arise if you have pasted the product description from a word processor where these contain special characters - these can result in invalid html.
Usually they will display OK on a web page as modern web browsers are very forgiving of validation problems, but will cause problems with xml, as the rules for validating xml must be strictly followed. If you are having these problems I think you could try running the offending items through an html validator. |
IP Logged
|
susan subway boggler |
|
|
|
|