Ticket #5136 (closed defect: worksforme)

Opened 2 years ago

Last modified 19 months ago

msim doesn't convert to UTF-8

Reported by: Jaywalker Owned by: jeff
Milestone: Component: MySpace
Version: 2.4.0 Keywords:
Cc:

Description

I noticed Tom had a new status message, so in hopes of him saying something other than merely "woo!", I messaged him. Nothing was displayed in the chat, but here's the output of the debug log. I'll see if I can't figure out what's wrong...

(03:41:09) msim: msim_markup_to_html: couldn't parse <root><p><f n="Arial" h="12">I’'m away from my desk, out of the office, and out of the country!  I’'m going to Kuwait for Operation MySpace. Check it out at <a h='http://www.myspace.com/operationmyspace' />, and make sure you tune in on March 10th.</f></p></root> as XML, returning raw: <p><f n="Arial" h="12">I’'m away from my desk, out of the office, and out of the country!  I’'m going to Kuwait for Operation MySpace. Check it out at <a h='http://www.myspace.com/operationmyspace' />, and make sure you tune in on March 10th.</f></p>

Attachments

Freestyle.jpg (356.8 kB) - added by civas555 23 months ago.
i am civas555

Change History

  Changed 2 years ago by jeff

Looks like fairly normal markup to me. Any hints on what is wrong earlier in the debug log?

  Changed 2 years ago by Jaywalker

Not from what I can tell.. But I did notice that with the "Last Seen" plugin enabled it fried my blist.xml. The message was saved in that file with everything except the <root> bit and had some weird characters where the escape for ' is supposed to be. I don't think that's related to the problem but I may be wrong. Here's the full debug info from the time the message is being received from Tom in case I missed something...

(06:09:05) msim: dynamic buffer at 0 (max 30720), reading up to 15359
(06:09:05) msim: msim_input_cb: going to null terminate at n=283
(06:09:05) msim: msim_input_cb: read=283
(06:09:05) msim: msim_parse: got <\bm\1\f\6221\cv\697\msg\<p><f n="Arial" h="12">I’'m away from my desk, out of the office, and out of the country!  I’'m going to Kuwait for Operation MySpace. Check it out at <a h='http://www.myspace.com/operationmyspace' />, and make sure you tune in on March 10th.</1f></1p>>
(06:09:05) msim: msim_preprocess_incoming: tagging with _username=tom
(06:09:05) msim: msim_markup_to_html: couldn't parse <root><p><f n="Arial" h="12">I’'m away from my desk, out of the office, and out of the country!  I’'m going to Kuwait for Operation MySpace. Check it out at <a h='http://www.myspace.com/operationmyspace' />, and make sure you tune in on March 10th.</f></p></root> as XML, returning raw: <p><f n="Arial" h="12">I’'m away from my desk, out of the office, and out of the country!  I’'m going to Kuwait for Operation MySpace. Check it out at <a h='http://www.myspace.com/operationmyspace' />, and make sure you tune in on March 10th.</f></p>

  Changed 2 years ago by jeff

The <root> is just added by msimprpl so that xmlnode_from_str can parse it. For some reason xmlnode_from_str is failing. Maybe it has to do with the mix of double- and single-quotes? The place to investigate would be libpurple/xmlnode.c:xmlnode_from_str.

  Changed 2 years ago by Jaywalker

Yeah.. I'm sure it's in libpurple's XML parsing since both the blist xml and msim xml parsing are broken. I'll see what I can't hack up tonight ;)

  Changed 2 years ago by rekkanoryo

  • owner jeff deleted
  • component changed from MySpace to libpurple
  • summary changed from msim couldn't parse message in msim_markup_to_html to xmlnode parsing broken?

Making these changes at the request of Jaywalker

follow-up: ↓ 7   Changed 2 years ago by nosnilmot

Maybe it has something to do with the unescaped/unencoded 0x92 bytes in it?

in reply to: ↑ 6   Changed 2 years ago by rekkanoryo

Replying to nosnilmot:

Maybe it has something to do with the unescaped/unencoded 0x92 bytes in it?

Looks like it to me.

Ideally, 0x92 should be stripped from the message in the prpl. It's commonly used as a "single smart quote" or apostrophe on Windows, and since a standard apostrophe or quote appears to already be present beside 0x92 it's not needed. Alternatively, a simple escape/encode/whatever should be sufficient (g_markup_escape_text can't be trusted to do this, particularly in older glib versions).

  Changed 2 years ago by nosnilmot

  • owner set to jeff
  • component changed from libpurple to MySpace
  • summary changed from xmlnode parsing broken? to msim doesn't convert to UTF-8

oh, if it's a "valid" character in some other encoding then the problem is the msim prpl is not converting everything to UTF-8 before sending it off to libpurple.

  Changed 2 years ago by jeff

Need to find out the character encoding that MySpaceIM uses. U+0092 is PRIVATE USE TWO in Unicode and 8859-2. In Windows-1250 it is a Right Single Quotation Mark. There is a 1250 to Unicode mapping here: http://www.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WindowsBestFit/bestfit1250.txt

Interesting that there is a standard quote besides the 0x92. I wonder why this is happening. If there always is a ASCII ' character by the 0x92, stripping the 0x92 would be fine, but it probably conveys useful information. I'd rather strip the ASCII ' and use the Unicode equivalent of 0x92, if possible.

Changed 23 months ago by civas555

i am civas555

  Changed 19 months ago by Jaywalker

  • status changed from new to closed
  • resolution set to worksforme

It would seem that this was a bot error... I just tested with the current official client and single quotes don't seem to be causing an issue anymore.

Note: See TracTickets for help on using tickets.