Opened 8 years ago

Closed 7 years ago

#13413 closed patch (fixed)

IM HTML->XHTML conversion chokes on font families with quotes

Reported by: darkrain42 Owned by: nwalp
Milestone: 3.0.0 Component: libpurple
Version: 2.7.10 Keywords:
Cc: markdoliner


I think this is an issue in purple_markup_html_to_xhtml, but it may be an issue in Pidgin's handling of incoming pastes (this was the revision history for 0.2 quoted from

What the XMPP prpl ends up seeing is:

<html xmlns=''><body xmlns=''><p><span style='color: #000000;'><span style='font-family: Verdana, Arial, Helvetica, Geneva, sans-serif;'><span style='font-size: medium;'>The status of this specification has been changed to Retracted since it has been superseded by </span></span><span style='font-size: medium;'><span style='font-weight: bold;'><span style='font-family: 'Times New Roman';'><a href="">XMPP Core</a></span></span><span style='font-family: 'Times New Roman';'> </span></span></span><span style='font-size: medium;'>[<a href="">1</a>]. (psa)</span></p></body></html>

which contains <span style='font-family: 'Times New Roman';'> .

util.c:1717 probably needs to escape quotes, or otherwise handle this. (if it's not something further up in the chain)

Attachments (3)

13413.patch (1.5 KB) - added by loic 8 years ago.
quotes.patch (12.8 KB) - added by loic 8 years ago.
quotes.2.patch (14.1 KB) - added by loic 8 years ago.

Download all attachments as: .zip

Change History (11)

comment:1 Changed 8 years ago by darkrain42

It seems like the most straightforward way to correct this would be to s{'}{"} while processing the inner string.

But HTML parsing hurts my brain.

comment:2 Changed 8 years ago by MarkDoliner

  • Cc markdoliner added

Changed 8 years ago by loic

comment:3 Changed 8 years ago by loic

The proposed patch attachment:13413.patch is minimal and it comes with a test that shows it does the job. I looked into the sample tests files of tidy for inspiration on how to prevent that kind of problem instead of just this specific problem. But the purple_markup_html_to_xhtml does not seem to be developed with genericity in mind. If you have suggestions on how to improve this patch, I will work on it.

comment:4 Changed 8 years ago by darkrain42

  • Milestone set to Patches Needing Review
  • Type changed from defect to patch

loic: I think this looks good, but need to stare at it a bit more, and also mull it over.

Part of my concern is that this is actually (unfortunately) indiciative of a larger issue in that function (the same issue occurs in at least the 'back', and 'color' code as well, and probably a lot more places where we do string interpolation of "input we were passed" into "string where the input is surrounded by single quotes".

Still, I do think this is on the right path.

Changed 8 years ago by loic


comment:5 Changed 8 years ago by loic

attachment:quotes.patch is another patch that tries to address all quoting issues of this kind, for font, alt and href. It also contains extensive tests for the rest of the function: I found them useful when trying to understand specific parts of the code. I ran into a related bug, reported separately in #13485 . I also suspect some unreachable code in a macro but did not report this because it does not harm: it looks like } else if(*c == '\\') { cannot happen at /pidgin-mtn/libpurple/util.c:1363 because it is within an if(!g_ascii_strncasecmp(c, "<" x " ", strlen("<" x " "))

Changed 8 years ago by loic


comment:6 Changed 8 years ago by loic

The patch has been modified to take into account remarks from darkrain review:

  • isolate the tests documenting suspicous behavior so that they are not confused with tests establishing good behavior
  • do not output escaped characters in plain text for alt attribute in img or href attribute in a

comment:7 Changed 7 years ago by nwalp

  • Owner set to nwalp

comment:8 Changed 7 years ago by nwalp

  • Milestone changed from Patches Needing Review to 3.0.0
  • Resolution set to fixed
  • Status changed from new to closed
Note: See TracTickets for help on using tickets.
All information, including names and email addresses, entered onto this website or sent to mailing lists affiliated with this website will be public. Do not post confidential information, especially passwords!