Ticket #5136 (closed defect: worksforme)

Opened 2 years ago

Last modified 2 years ago

msim doesn't convert to UTF-8

Reported by: Jaywalker Owned by: jeff
Milestone: Component: MySpace
Version: 2.4.0 Keywords:
Cc:

Description

I noticed Tom had a new status message, so in hopes of him saying something other than merely "woo!", I messaged him. Nothing was displayed in the chat, but here's the output of the debug log. I'll see if I can't figure out what's wrong...

(03:41:09) msim: msim_markup_to_html: couldn't parse <root><p><f n="Arial" h="12">I’'m away from my desk, out of the office, and out of the country!  I’'m going to Kuwait for Operation MySpace. Check it out at <a h='http://www.myspace.com/operationmyspace' />, and make sure you tune in on March 10th.</f></p></root> as XML, returning raw: <p><f n="Arial" h="12">I’'m away from my desk, out of the office, and out of the country!  I’'m going to Kuwait for Operation MySpace. Check it out at <a h='http://www.myspace.com/operationmyspace' />, and make sure you tune in on March 10th.</f></p>

Attachments

Freestyle.jpg (356.8 kB) - added by civas555 2 years ago.
i am civas555

Change History

  Changed 2 years ago by jeff

Looks like fairly normal markup to me. Any hints on what is wrong earlier in the debug log?

  Changed 2 years ago by Jaywalker

Not from what I can tell.. But I did notice that with the "Last Seen" plugin enabled it fried my blist.xml. The message was saved in that file with everything except the <root> bit and had some weird characters where the escape for ' is supposed to be. I don't think that's related to the problem but I may be wrong. Here's the full debug info from the time the message is being received from Tom in case I missed something...

(06:09:05) msim: dynamic buffer at 0 (max 30720), reading up to 15359
(06:09:05) msim: msim_input_cb: going to null terminate at n=283
(06:09:05) msim: msim_input_cb: read=283
(06:09:05) msim: msim_parse: got <\bm\1\f\6221\cv\697\msg\<p><f n="Arial" h="12">I’'m away from my desk, out of the office, and out of the country!  I’'m going to Kuwait for Operation MySpace. Check it out at <a h='http://www.myspace.com/operationmyspace' />, and make sure you tune in on March 10th.</1f></1p>>
(06:09:05) msim: msim_preprocess_incoming: tagging with _username=tom
(06:09:05) msim: msim_markup_to_html: couldn't parse <root><p><f n="Arial" h="12">I’'m away from my desk, out of the office, and out of the country!  I’'m going to Kuwait for Operation MySpace. Check it out at <a h='http://www.myspace.com/operationmyspace' />, and make sure you tune in on March 10th.</f></p></root> as XML, returning raw: <p><f n="Arial" h="12">I’'m away from my desk, out of the office, and out of the country!  I’'m going to Kuwait for Operation MySpace. Check it out at <a h='http://www.myspace.com/operationmyspace' />, and make sure you tune in on March 10th.</f></p>

  Changed 2 years ago by jeff

The <root> is just added by msimprpl so that xmlnode_from_str can parse it. For some reason xmlnode_from_str is failing. Maybe it has to do with the mix of double- and single-quotes? The place to investigate would be libpurple/xmlnode.c:xmlnode_from_str.

  Changed 2 years ago by Jaywalker

Yeah.. I'm sure it's in libpurple's XML parsing since both the blist xml and msim xml parsing are broken. I'll see what I can't hack up tonight ;)

  Changed 2 years ago by rekkanoryo

  • owner jeff deleted
  • component changed from MySpace to libpurple
  • summary changed from msim couldn't parse message in msim_markup_to_html to xmlnode parsing broken?

Making these changes at the request of Jaywalker

follow-up: ↓ 7   Changed 2 years ago by nosnilmot

Maybe it has something to do with the unescaped/unencoded 0x92 bytes in it?

in reply to: ↑ 6   Changed 2 years ago by rekkanoryo

Replying to nosnilmot:

Maybe it has something to do with the unescaped/unencoded 0x92 bytes in it?

Looks like it to me.

Ideally, 0x92 should be stripped from the message in the prpl. It's commonly used as a "single smart quote" or apostrophe on Windows, and since a standard apostrophe or quote appears to already be present beside 0x92 it's not needed. Alternatively, a simple escape/encode/whatever should be sufficient (g_markup_escape_text can't be trusted to do this, particularly in older glib versions).

  Changed 2 years ago by nosnilmot

  • owner set to jeff
  • component changed from libpurple to MySpace
  • summary changed from xmlnode parsing broken? to msim doesn't convert to UTF-8

oh, if it's a "valid" character in some other encoding then the problem is the msim prpl is not converting everything to UTF-8 before sending it off to libpurple.

  Changed 2 years ago by jeff

Need to find out the character encoding that MySpaceIM uses. U+0092 is PRIVATE USE TWO in Unicode and 8859-2. In Windows-1250 it is a Right Single Quotation Mark. There is a 1250 to Unicode mapping here: http://www.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WindowsBestFit/bestfit1250.txt

Interesting that there is a standard quote besides the 0x92. I wonder why this is happening. If there always is a ASCII ' character by the 0x92, stripping the 0x92 would be fine, but it probably conveys useful information. I'd rather strip the ASCII ' and use the Unicode equivalent of 0x92, if possible.

Changed 2 years ago by civas555

i am civas555

  Changed 2 years ago by Jaywalker

  • status changed from new to closed
  • resolution set to worksforme

It would seem that this was a bot error... I just tested with the current official client and single quotes don't seem to be causing an issue anymore.

Note: See TracTickets for help on using tickets.
All information, including names and email addresses, entered onto this website or sent to mailing lists affiliated with this website will be public. Do not post confidential information, especially passwords!