Opened 10 years ago

Last modified 10 years ago

#8465 new defect

BIDI character appended to received URIs

Reported by: n0nick Owned by: deryni
Milestone: Component: pidgin (gtk)
Version: 2.5.2 Keywords:
Cc:

Description

When receiving a URI, I noticed that a string is always appended to its end, often causing a malformed link. The string is "%E2%80%AC", which I found to be the Unicode character 'POP DIRECTIONAL FORMATTING' (U+202C) http://is.gd/k6zA.

i.e., a user sent me the url:

http://example.com/page.html?arg=1

it displays correctly, but when I'll click on it the browser will load:

http://example.com/page.html?arg=1%E2%80%AC

I'm using Pidgin 2.5.2 under Ubuntu 8.10.

Change History (13)

comment:1 Changed 10 years ago by deryni

  • Status changed from new to pending

What protocol(s) does this happen on? Does this happen with all buddies or just some? What client(s) are the buddies that this happens with using (if it isn't all your buddies)? What plugins do you have loaded in pidgin? Can you get the Help->Debug Window output from when this happens?

comment:2 Changed 10 years ago by n0nick

  • Status changed from pending to new

Verified on received & sent messages on XMPP, MSN, Yahoo. Other protocols I don't have active buddies to test with. It happens for all buddies.

Although: based on the behavior I witnessed, my estimation is that: Google Talk wraps everything with these BIDI chars (at least on my Hebrew conversations), and the URI comes with this character appended. I copied the URI to test on other protocols and indeed reproduced, but then I checked in vim to find that I'm pasting this string with <202c> attached.

So, I guess a solution would be to ignore such characters in links' targets.

Relevant debug rows are only: (19:01:54) msn: S: SB 003: MSG pin%BUDDY_EMAIL%tmail.com racheli 96 (19:01:55) msn: S: SB 003: MSG pin%BUDDY_EMAIL%tmail.com racheli 224

Active plugins: History 2.5.2 Log Reader 2.5.2 Message Timestamp Formats 2.5.2 Nautilus Integration 0.8 Pidgin GTK+ Theme Control 2.5.2 Psychic Mode 2.5.2

comment:3 Changed 10 years ago by n0nick

Sorry for the ugly text, didn't realize WikiFormatting doesn't do newlines.
The debug rows are of course:
(19:01:54) msn: S: SB 003: MSG pin%BUDDY_EMAIL%tmail.com racheli 96
(19:01:55) msn: S: SB 003: MSG pin%BUDDY_EMAIL%tmail.com racheli 224

comment:4 follow-up: Changed 10 years ago by deryni

  • Status changed from new to pending

You should see in the debug window the literal XML traffic that comes in to your Google Talk account, those messages will be the most helpful here.

comment:5 in reply to: ↑ 4 Changed 10 years ago by n0nick

  • Status changed from pending to new

Replying to deryni:

You should see in the debug window the literal XML traffic that comes in to your Google Talk account, those messages will be the most helpful here.

OK! After a few tries I can limit the case to when the sending user is using http://mail.google.com/ with a Hebrew interface.

Debug messages:

(19:39:33) util: Writing file prefs.xml to directory /home/sagiem/.purple
(19:39:33) util: Writing file /home/sagiem/.purple/prefs.xml
(19:39:40) msim: dynamic buffer at 0 (max 15360), reading up to 15359
(19:39:40) msim: msim_input_cb: going to null terminate at n=12
(19:39:40) msim: msim_input_cb: read=12
(19:39:40) msim: msim_parse: got <\ka\0>
(19:39:43) msn: C: NS 000: PNG
(19:39:44) jabber: Sending (ssl): <iq type='get' id='purple5fde4984'><ping xmlns='urn:xmpp:ping'/></iq>
(19:39:44) jabber: Recv (ssl)(558): <message to="sagiem@gmail.com/Home89D2E568" type="chat" id="C0829D831AE9DC86_2" iconset="classic" from="sagiem.pop3@gmail.com/gmail.8CFEBFC9"><body>‫http://www.bestweekever.tv/2009/02/18/either-its-lion-prom-night-or-this-mans-done-lots-his-damn-mind/‬</body><met:google-mail-signature xmlns:met="google:metadata">04eab6ad9194cd41</met:google-mail-signature><cha:active xmlns:cha="http://jabber.org/protocol/chatstates"/><nos:x value="disabled" xmlns:nos="google:nosave"/><arc:record otr="false" xmlns:arc="http://jabber.org/protocol/archive"/></message>
(19:39:46) jabber: Recv (ssl)(74): <iq to="sagiem@gmail.com/Home89D2E568" id="purple5fde4984" type="result"/>

The URI I sent myself was:

http://www.bestweekever.tv/2009/02/18/either-its-lion-prom-night-or-this-mans-done-lots-his-damn-mind/

But clicking to it on the message window opened a browser with:

http://www.bestweekever.tv/2009/02/18/either-its-lion-prom-night-or-this-mans-done-lots-his-damn-mind/%E2%80%AC

comment:6 Changed 10 years ago by n0nick

any word? this is a real annoyance.

could someone maybe direct me to the piece of code i should inspect on my own?

comment:7 Changed 10 years ago by deryni

Sorry for the delay. Do those characters show up in your logs (I'm going to guess that they do)? I think this has to be a Google Talk bug since I doubt they have any reason to actually be sending that character at the end of a URL. (And I can't imagine pidgin is adding them, though I suppose that is possible.)

Does normal text send that character (I'm not sure how you'd tell other than by checking your log, assuming they show up there)?

I don't think we can just ignore those characters in URLs because they are perfectly legal there. A plugin could be written to look for them at the end of URLs and strip them when they come from a Google Talk user (or more easily just an XMPP user) though I guess.

comment:8 Changed 10 years ago by deryni

  • Owner set to deryni

comment:9 Changed 10 years ago by bernmeister

Is this still an issue in 2.5.8?

comment:10 Changed 10 years ago by darkrain42

  • Status changed from new to pending

Could you save a copy of the log and attach it, please. I wonder if the character is being stripped from the debug log when you cut-n-paste it. You can edit the log if you'd like (strip it down to just the relevant lines).

comment:11 Changed 10 years ago by trac-robot

  • Status changed from pending to closed

This ticket was closed automatically by the system. It was previously set to a Pending status and hasn't been updated within 14 days.

comment:12 Changed 10 years ago by n0nick

I'm sorry for not answering, been on vacation... Can't seem to re-open this bug though.

When you say log, do you mean user chat history? when clicking the link again in the user log, it still has the same characters. can't seem to find where the log files are stored to check their source.

excerpt from user log:

(02:07:32 PM) sagiemao@mail.tau.ac.il: ‫שלום‬
(02:08:08 PM) sagiem@gmail.com/Home: היי
(02:08:11 PM) sagiemao@mail.tau.ac.il: ‫http://www.bestweekever.tv/2009-2-18/either-its-lion-prom-night-or-this-mans-done-lots-his-damn-mind/‬

the characters are definitely here after copy-paste. when i press "backspace" once at the end of the url, nothing is removed and the string's direction seems to change.

comment:13 Changed 10 years ago by darkrain42

  • Status changed from closed to new
Note: See TracTickets for help on using tickets.
All information, including names and email addresses, entered onto this website or sent to mailing lists affiliated with this website will be public. Do not post confidential information, especially passwords!