Opened 9 years ago

Closed 8 years ago

Last modified 6 years ago

#12387 closed defect (fixed)

Pidgin crashes if MSN direct connections are enabled.

Reported by: superyo Owned by:
Milestone: 2.7.10 Component: libpurple
Version: 2.7.4 Keywords:
Cc: MarkDoliner, fqueze, pva, salinasv, Max, luizgpa

Description

Pidgin crashes constantly if MSN direct connections are enabled and UPNP is enabled on router. If any of this options are disabled it works flawlessly so I think it has something to do with opening the ports for the DC. UPNP is working for other applications (uTorrent, Skype) so I guess the router (TP-Link TL-WR841N) is working OK.

Maybe it's related to ticket #12072

Tested on Pidgin 2.7.2 on Arch Linux

Attachments (5)

pidgin-backtrace.log (2.7 KB) - added by superyo 9 years ago.
Backtrace of crash
pidgin-2.7.2-msn-dc-upnp-crash.patch (411 bytes) - added by liuyubao 9 years ago.
pidgin-backtrace.txt (1.2 KB) - added by liuyubao 9 years ago.
pidgin-backtrace.2.log (4.9 KB) - added by dodys 8 years ago.
pidgin 2.7.7-1
adium_upnp.wireshark (198.2 KB) - added by Robby 8 years ago.
Wireshark log posted to #a14650

Download all attachments as: .zip

Change History (45)

Changed 9 years ago by superyo

Backtrace of crash

comment:1 Changed 9 years ago by QuLogic

I don't suppose you could run that in valgrind? Though if it's not easy to reproduce, the slowdown might be totally unbearable.

comment:2 Changed 9 years ago by liuyubao

Debian squeeze, pidgin 2.7.2-1 crashes very frequently.

Here is a patch against pidgin-2.7.2-1, it seems to have fixed the problem (I'm not sure because there are many cancel/destroy without setting the pointers to NULL in libpurple/protocols/msn/directconn.c).

I checked official pidgin-2.7.3, it should have this bug too.

I guess the cause is twice calls to purple_network_listen_cancel() by msn_dc_destroy() and msn_dc_incoming_connection_timeout_cb().

Changed 9 years ago by liuyubao

Changed 9 years ago by liuyubao

comment:3 Changed 9 years ago by QuLogic

  • Cc MarkDoliner added; liuyubao removed

That doesn't really fix anything. Of course, once msn_dc_destroy has been called, then all of dc is invalid, so setting one element of it to NULL is just going to crash somewhere else. The real question is why msn_dc_incoming_connection_timeout_cb is called after msn_dc_destroy since it should have been cancelled.

Mark, could this have been fixed by the upnp fix you made just before release?

comment:4 Changed 9 years ago by MarkDoliner

QuLogic: Possibly? But ticket #12483 has a similar backtrace and that guy is using 2.7.3, so maybe not.

comment:6 Changed 9 years ago by QuLogic

Ticket #12483 has been marked as a duplicate of this ticket.

comment:7 Changed 9 years ago by QuLogic

  • Component changed from MSN to libpurple
  • Owner QuLogic deleted

Finally managed to get something out of valgrind:

==14468== Invalid read of size 4
==14468==    at 0x3CAEA6E850: g_int_hash (gutils.c:3247)
==14468==    by 0x3CAEA2E2B5: g_hash_table_remove_internal (ghash.c:309)
==14468==    by 0x4CA7B99: purple_network_remove_port_mapping (network.c:1080)
==14468==    by 0xE8FD62E: msn_dc_destroy (directconn.c:212)
==14468==    by 0xE91635C: msn_slplink_remove_slpcall (slplink.c:224)
==14468==    by 0xE914F4E: msn_slpcall_destroy (slpcall.c:115)
==14468==    by 0xE912EA4: msn_p2p_msg (slp.c:1114)
==14468==    by 0xE8F7D22: msn_cmdproc_process_msg (cmdproc.c:312)
==14468==    by 0xE91904E: msg_cmd_post (switchboard.c:811)
==14468==    by 0xE91150E: msn_servconn_process_data (servconn.c:487)
==14468==    by 0xE911680: read_cb (servconn.c:443)
==14468==    by 0x4696ED: pidgin_io_invoke (gtkeventloop.c:73)
==14468==  Address 0x131e21a0 is 0 bytes inside a block of size 4 free'd
==14468==    at 0x4A04D72: free (vg_replace_malloc.c:325)
==14468==    by 0x3CAEA2D2E4: g_hash_table_remove_node (ghash.c:449)
==14468==    by 0x3CAEA2E34F: g_hash_table_remove_internal (ghash.c:1095)
==14468==    by 0x4CA7B8A: purple_network_remove_port_mapping (network.c:1055)
==14468==    by 0xE8FD62E: msn_dc_destroy (directconn.c:212)
==14468==    by 0xE91635C: msn_slplink_remove_slpcall (slplink.c:224)
==14468==    by 0xE914F4E: msn_slpcall_destroy (slpcall.c:115)
==14468==    by 0xE912EA4: msn_p2p_msg (slp.c:1114)
==14468==    by 0xE8F7D22: msn_cmdproc_process_msg (cmdproc.c:312)
==14468==    by 0xE91904E: msg_cmd_post (switchboard.c:811)
==14468==    by 0xE91150E: msn_servconn_process_data (servconn.c:487)
==14468==    by 0xE911680: read_cb (servconn.c:443)
==14468== 

I don't believe this is MSN specific. It should have existed since 2.6.0 but I think the only other user is XMPP SI file transfers. I guess nobody uses them. :P

comment:8 Changed 9 years ago by qulogic@…

  • Milestone set to 2.7.4
  • Resolution set to fixed
  • Status changed from new to closed

(In c7f2cce48dfe465889b7803de44156e7e89e41e0):
protocol is the *value*, not the *key*. Thus, we really shouldn't be attempting to remove it from the hash table. Especially because we just removed the corresponding key, thus invalidating this pointer.

Fixes #12387.

comment:9 Changed 9 years ago by datallah

Ticket #12606 has been marked as a duplicate of this ticket.

comment:10 Changed 9 years ago by datallah

Ticket #12613 has been marked as a duplicate of this ticket.

comment:11 Changed 9 years ago by qulogic@…

(In bdb5fe6e0e06c8f8275aff942d31b2d4359487bb):
I should probably add a ChangeLog entry for this one since it's a bit annoying.

Refs #12387.

comment:12 Changed 8 years ago by datallah

Ticket #12671 has been marked as a duplicate of this ticket.

comment:13 Changed 8 years ago by QuLogic

Ticket #12743 has been marked as a duplicate of this ticket.

comment:14 follow-up: Changed 8 years ago by dodys

pidgin still crashing on 2.7.4 for me, ticket 12743 that has beend signed as duplicate of this one

comment:15 in reply to: ↑ 14 Changed 8 years ago by datallah

Replying to dodys:

pidgin still crashing on 2.7.4 for me, ticket 12743 that has beend signed as duplicate of this one

dodys, please file a new ticket with a crash report from 2.7.4

comment:16 Changed 8 years ago by datallah

  • Milestone 2.7.4 deleted
  • Resolution fixed deleted
  • Status changed from closed to new
  • Version changed from 2.7.2 to 2.7.4

This is still happening in 2.7.4 (see #12776).

comment:17 follow-up: Changed 8 years ago by datallah

Ticket #12776 has been marked as a duplicate of this ticket.

comment:18 in reply to: ↑ 17 Changed 8 years ago by dodys

Replying to datallah:

Ticket #12776 has been marked as a duplicate of this ticket.

Yeah I have this same issue, only works if invisible.

datallah do you want me to attach the file here or create another ticket?

comment:19 Changed 8 years ago by datallah

dodys, you can attach it here. Please also provide a debug log.

comment:20 Changed 8 years ago by pva

Reported in Gentoo. Here is pidgin log with backtrace at the end: https://bugs.gentoo.org/attachment.cgi?id=252539

comment:21 Changed 8 years ago by darkrain42

Ticket #12832 has been marked as a duplicate of this ticket.

comment:22 Changed 8 years ago by Robby

Reported for Adium 1.4.1 (libpurple 2.7.5): http://pastebin.com/SC2AkjqU.

comment:23 Changed 8 years ago by darkrain42

  • Cc salinasv added

comment:24 Changed 8 years ago by Robby

This one is really biting us after the release of 1.4.1.

xnyhps said this in #adium-devl: "Those crashes are weird, seems to happen when cleaning up SSL connections of a UPnP request... is UPnP even usable over SSL?"

comment:25 Changed 8 years ago by Robby

#a14584, we'll try to retrieve some debug logging.

comment:26 Changed 8 years ago by darkrain42

The Adium ticket got updated with a relevant debug log (the last crash/debug log of the three). Looking at the code, I have at least an idea about what's going on here.

There are various calls to cb in the upnp code that really should be asynchronous (at least one has a comment as such). One possibility (though I didn't see a code path that looked like it would cause this):

  • Port mapping fails, which calls purple_network_set_upnp_port_mapping_cb
  • port_network_set_upnp_port_mapping_cb calls purple_upnp_remove_port_mapping (in the if (!success) branch) and assigns the result to listen_data->mapping_data.
  • purple_upnp_remove_port_mapping *synchronously* calls the specified callback, which is run before listen_data->mapping_data is assigned (so listen_data->mapping data becomes a stale pointer).

Unfortunately, my brain hurts now.

comment:27 Changed 8 years ago by Robby

Okay, I've set up #a14650 since the previous ticket was a bit of a mess.

comment:28 Changed 8 years ago by QuLogic

Ticket #12996 has been marked as a duplicate of this ticket.

Changed 8 years ago by dodys

pidgin 2.7.7-1

comment:29 Changed 8 years ago by dodys

I've attached a new log because I still have problems in pidgin 2.7.7-1 Just to remember, msn crashes all the time, it has been take a while now to crash but still crashing, I'm running it on Arch linux. Any doubt just ask =]

comment:30 Changed 8 years ago by MarkDoliner

Anyone know how to reproduce this bug? Does it happen at sign on? Or after sending/receiving a file? Or after closing an IM window? Or is it totally random? Is it 100% reproducible, or does it happen only sometimes?

I think it would be helpful to see the values of some variables from the backtrace. If anyone sees this crash again and is able to load the core file in gdb...

  1. Type "frame N" where N is the number of the frame for the purple_network_listen_cancel function. In dodys's most recent backtrace this is frame 1.
  2. print *listen_data
  3. print *(listen_data->mapping_data)

Thanks!

comment:31 Changed 8 years ago by Robby

The people that have reported this crash on the Adium forums said this:

Well, since I updated Adium to the newest 1.4.1 version it'll automatically close after 10 seconds. It does connect all my accounts but 10 or 15 seconds after that it crashes

It has been crashing for me too since I updated to 10.6.5 and adium 1.4.1 usually within a few mins.

Last edited 6 years ago by Robby (previous) (diff)

comment:32 Changed 8 years ago by Ext3h

@MarkDoliner It seems completly random, even when you are idle. But i noticed in the logs, that the crash only occures if upnp has failed before. (UPnP also seems to fail at random)

One of the last lines in the debug log before every crash are:

(01:14:28) msn: got_ok: listening socket created
(01:14:28) msn: msn_slplink_process_msg: slpmsg complete
(01:14:28) msn: msn_slplink_process_msg: send ACK

And right after that multiple lines like:

msn: switchboard send msg..
msn: C: SB 005: MSG 6 D 142
msn: S: SB 005: ACK 6

If upnp was successfull pidgin will run fine, but when upnp failed, pidgin will crash within a second after creating the socket. If upnp failed, but no new socket was created, then pidgin runs just fine.

Last edited 6 years ago by Robby (previous) (diff)

comment:33 Changed 8 years ago by Robby

QuLogic said a wireshark of the nat/upnp device reply would be useful. Please add the output to this ticket if you manage to obtain it.

comment:34 Changed 8 years ago by Max

Something similar on Kubuntu 10.10. Crash upon (cancelled?) file transfer using MSN messenger.

Starting program: /usr/bin/pidgin 
[Thread debugging using libthread_db enabled]
[New Thread 0x7fffe42e1700 (LWP 6748)]

Program received signal SIGSEGV, Segmentation fault.
0x00007ffff4ba8b84 in purple_util_fetch_url_cancel () from /usr/lib/libpurple.so.0
(gdb) bt
#0  0x00007ffff4ba8b84 in purple_util_fetch_url_cancel () from /usr/lib/libpurple.so.0
#1  0x00007ffff4ba5412 in purple_upnp_cancel_port_mapping () from /usr/lib/libpurple.so.0
#2  0x00007ffff4b817b2 in purple_network_listen_cancel () from /usr/lib/libpurple.so.0
#3  0x00007fffe876ef57 in ?? () from /usr/lib/purple-2/libmsn.so
#4  0x00007ffff4e6cb1b in ?? () from /lib/libglib-2.0.so.0
#5  0x00007ffff4e6c342 in g_main_context_dispatch () from /lib/libglib-2.0.so.0
#6  0x00007ffff4e702a8 in ?? () from /lib/libglib-2.0.so.0
#7  0x00007ffff4e707b5 in g_main_loop_run () from /lib/libglib-2.0.so.0
#8  0x00007ffff62493e7 in gtk_main () from /usr/lib/libgtk-x11-2.0.so.0
#9  0x0000000000482204 in main ()
(gdb) bt full
#0  0x00007ffff4ba8b84 in purple_util_fetch_url_cancel () from /usr/lib/libpurple.so.0
No symbol table info available.
#1  0x00007ffff4ba5412 in purple_upnp_cancel_port_mapping () from /usr/lib/libpurple.so.0
No symbol table info available.
#2  0x00007ffff4b817b2 in purple_network_listen_cancel () from /usr/lib/libpurple.so.0
No symbol table info available.
#3  0x00007fffe876ef57 in ?? () from /usr/lib/purple-2/libmsn.so
No symbol table info available.
#4  0x00007ffff4e6cb1b in ?? () from /lib/libglib-2.0.so.0
No symbol table info available.
#5  0x00007ffff4e6c342 in g_main_context_dispatch () from /lib/libglib-2.0.so.0
No symbol table info available.
#6  0x00007ffff4e702a8 in ?? () from /lib/libglib-2.0.so.0
No symbol table info available.
#7  0x00007ffff4e707b5 in g_main_loop_run () from /lib/libglib-2.0.so.0
No symbol table info available.
#8  0x00007ffff62493e7 in gtk_main () from /usr/lib/libgtk-x11-2.0.so.0
No symbol table info available.
#9  0x0000000000482204 in main ()
No symbol table info available.



max@lynx:~$ apt-cache show pidgin
Package: pidgin
Priority: optional
Section: net
Installed-Size: 1788
Maintainer: Ubuntu Developers <ubuntu-devel-discuss@lists.ubuntu.com>
Original-Maintainer: Ari Pollak <ari@debian.org>
Architecture: amd64
Version: 1:2.7.3-1ubuntu3

Changed 8 years ago by Robby

Wireshark log posted to #a14650

comment:35 Changed 8 years ago by Robby

I've attached a Wireshark log posted to #a14650.

comment:36 Changed 8 years ago by Robby

The log was accompanied by this information:

"Attached a uPnP trace above from what appears to be the same crash. If there's something else that should be filtered in, please let me know and I'll try to oblige - although this bug doesn't much like to occur on demand.

With regard to the environment in which this was captured:

  • The uPnP device is a wireless router of random Chinese brand (TP-Link TL-WR740N). It cost pennies and has about the level of performance and features you'd expect given that.
  • The said router is furthermore actually located in China, behind a (or more than one?) network-scale NAT.
  • A port mapping doesn't actually seem to occur - the trace appears to show that uPnP is enumerated, then a NAT-PMP mapping is attempted, but the above router doesn't know anything about NAT-PMP. Its admin screen doesn't show anything mapped for Adium."

comment:37 Changed 8 years ago by darkrain42@…

(In 8febed9408d870efdef757d67f9a3631e1d6d494):
upnp: Asynch-ronize the callbacks from UPnP to calling code. Refs #12387

I have no idea if this will resolve the crashes, but with the help of the packet capture, I /think/ these are correct.

Short summary: it's possible for the callback to fire (and ar be freed) before the top-level function (purple_upnp_cancel_port_mapping) returns, even though cancel_port_mapping returns the now-invalid ar (which may lead to a subsequent use-after-free).

At least one call path through the code that I think leads to this (backed up by one of the debug logs I looked at):

purple_upnp_cancel_port_mapping(...)
	do_port_mapping_cb (has_control_mapping == TRUE, ar->add == FALSE)
		purple_upnp_generate_action_message_and_send(..., done_port_mapping_cb, ar)
			/* We fail to parse the URL (see some debug logs) */
			done_port_mapping_cb
				ar->cb(FALSE, cbdata)
				return;
			return;
		return;
	return ar;

...and something which calls:

do_port_mapping_cb(has_control_mapping == TRUE, ar->add == TRUE)
	ar->cb(FALSE, cbdata)
	g_free(ar)
	return;

comment:38 Changed 8 years ago by rekkanoryo

Ticket #13138 has been marked as a duplicate of this ticket.

comment:39 Changed 8 years ago by rekkanoryo

Ticket #13186 has been marked as a duplicate of this ticket.

comment:40 Changed 8 years ago by darkrain42

  • Milestone set to 2.7.10
  • Resolution set to fixed
  • Status changed from new to closed

I'm going to mark this as fixed by my previous commit. I'm informed that MSN FT may otherwise be broken, but I think this specific crash is fixed.

We'll re-open it if I turn out to be incorrect.

Note: See TracTickets for help on using tickets.
All information, including names and email addresses, entered onto this website or sent to mailing lists affiliated with this website will be public. Do not post confidential information, especially passwords!