Opened 12 years ago

Closed 12 years ago

Last modified 12 years ago

#1102 closed defect (worksforme)

Pidgin breaks blist.xml

Reported by: czarny Owned by: nwalp
Milestone: 2.1.0 Component: libpurple
Version: 2.0 Keywords: blist
Cc:

Description

Hello

I know this is not exactly considered a bug, but I think it seriously destroys the user experience.

The problem is with blist.xml - when pidgin (gaim did this too) gets restarted hard, like a system hard reboot (via button on the case), pidgin often destroys blist.xml, putting garbage into it. Sometimes, quite often, it destroys it's backup as well, leaving the user with no buddy list.

This isn't a nightmare usually, when You've got all the contacts exported to the server, but all the IRC channels disappear and all the combined contacts (like You've got one buddy, but You can reach him through let's say Jabber and Gadu-Gadu), so every user from every source gets it's own buddy in the list again and that IS pretty frustrating.

Change History (22)

comment:1 Changed 12 years ago by elb

Pidgin should never put garbage into blist.xml; it sounds like maybe what is happening is that your operating system has not synced the contents of blist.xml by the time you push the hard reset button, and so the file is corrupt.

When Pidgin saves its buddy list, it saves the blist to an entirely new file (blist.xml.save), and then only moves that file to blist.xml if the entire write was successful and the file was closed successfully. There should be no way for Pidgin to produce a corrupt blist.xml due to any sort of race, unless the operating system is misbehaving.

(Quite possibly, if you are using Windows, it makes no coherent guarantees about such operations.)

comment:2 Changed 12 years ago by czarny

I use linux: [czarny@kacper build]$ uname -a Linux kacper 2.6.20.4_laptop-0.2 #1 PREEMPT Sat May 12 20:08:36 CEST 2007 i686 Intel(R)_Pentium(R)_M_processor_1.70GHz PLD Linux

Well - as I said, it isn't a normal bug, it's just a thing I've spotted working with gaim/pidgin. The blist.xml.save after the reset is corrupted as well.

It's just an annoying thing, because I do some development, which sometimes leads to freezes and I have to reboot, which will break my buddy list ;/

comment:3 Changed 12 years ago by elb

What filesystem are you using? Are you working on the block layer?

This is certainly not something which Pidgin can fix -- we're doing everything we can to ensure consistency. The problem here is in the kernel.

At the risk of a more out-of-date blist if you do change something (an alias, add a buddy, change a contact, etc.), but to close the window of possible synchronization errors, you can change the timeout in purple_blist_schedule_save (blist.c:368, in my source) to something larger than 5 seconds. By reducing the frequency of possible blist saves, this should reduce the window in which corruption can occur.

You can also consider changing the mount options for your /home filesystem, if this is happening often. E.g., -o sync will cause filesystem writes to be synchronous on ext2/3, degrading write performance but increasing the probability of coherent files on restart.

comment:4 Changed 12 years ago by czarny

Hm - I see...so the buddy list is synced time-based, not event-based? Like every 3 secs it gets synced?? Not when I add a buddy??

Well - I use xfs and I'm quite happy bout it.

I'll try to tweak the libpurple stuff.

comment:5 Changed 12 years ago by elb

Not exactly -- the blist gets synced when it has changed, but there have been no changes in the past 5 seconds. This prevents several changes in rapid succession from each triggering a save.

I would consider being less happy about XFS, myself, if I found out it was corrupting my files. :-P

comment:6 Changed 12 years ago by lschiere

  • Component changed from pidgin (gtk) to libpurple
  • Milestone set to 2.0.3
  • Owner set to nwalp

comment:7 Changed 12 years ago by czarny

Well - it is the fastest fs out there.

Nevertheless the same thing happened on reiserfs, reiser4, ext2 and ext3 so....

comment:8 Changed 12 years ago by elb

I really don't see how that is possible. Look at purple_util_write_data_to_file in util.c. We write the entire contents of blist.xml to blist.xml.save, close the file, check to see that the entire file wrote, then *stat* the file to make sure it's the appropriate size, and only if all of that succeeds do we rename() it to blist.xml.

I challenge your assertion that you have actually seen this behavior on multiple filesystems, unless you have some more fundamental error.

comment:9 Changed 12 years ago by czarny

I use PLD Linux with xfs. My g/f uses Ubuntu with ext3 and it happens there as well sometimes. On another machine I've used reiserfs, now reiser4 and it happened there too.

Then again -- You don't believe me, so there is no point in further discussing it -- or is it?

comment:10 Changed 12 years ago by nwalp

Given the lengths to which we've gone to avoid this, it is hard for us to believe. Do either (both?) of these machines have any plugins loaded? Where did your g/f get Pidgin from? Are you running the same binaries, or did she get Ubuntu packages compiled by someone else?

Clearly, *something* is corrupting these files, and to the best of our knowledge, it's not the main code path in pidgin. I certainly don't want to brush this report by the wayside, but it does (at this point) seem to be isolated to you and other machines you use.

comment:11 Changed 12 years ago by czarny

I've made pidgin for PLD Linux. No patches - pure building.

From where my g/f has it I've got no idea - Ubuntu repos I'd guess.

Well - as it hasn't happened for some time now and I'm not willing to tickle with it on my own I can't provide any more details. But if it happens I'm gonna investigate it and see what could have happened.

comment:12 Changed 12 years ago by elb

Please do. If you can find a way to repeat it, we would love to know.

comment:13 Changed 12 years ago by nwalp

So, if you haven't seen this in a while, and we can't reproduce it, can we close this bug? You can of course re-open it if you do manage to reproduce it.

comment:14 Changed 12 years ago by lschiere

  • Resolution set to worksforme
  • Status changed from new to closed

comment:15 Changed 12 years ago by seanegan

  • Milestone changed from 2.0.3 to 2.1.0

Milestone 2.0.3 deleted

comment:16 follow-up: Changed 12 years ago by Valdar

I am using windows xp and have had this same issue happen twice. After a hard reset my blist.xml and my blist.xml~ file is empty. The same for accounts.xml and accounts.xml~.

comment:17 in reply to: ↑ 16 Changed 12 years ago by elb

Replying to Valdar:

I am using windows xp and have had this same issue happen twice. After a hard reset my blist.xml and my blist.xml~ file is empty. The same for accounts.xml and accounts.xml~.

This is quite obviously a Windows bug; there is no way for this to happen on a working POSIX filesystem. (See libpurple/util.c, function purple_util_write_data_to_file if you would like to verify this for yourself.) I don't know why you are seeing it, but my best suggestion is not to hard reboot the box. If Windows is too unstable to prevent that, I suggest installing an operating system.

comment:18 follow-up: Changed 12 years ago by czarny

I'm disgusted with the way bugs are treated here. 'It's not a pidgin error, it's your filesystems error', 'It's not pidgins error, it's obviously your OS-s error'.

A formal bug reply: "Change FS" or "Change OS"?

So when I submit a bug concerning memory allocation I'll get a reply, that the pidgins code is superb, but libc is wrongly implemented?

If you won't reply to Windows bugs, as "Windows is too unstable to prevent that" and you "suggest installing an operating system" why the fuck do you support that OS? Either don't support it at all (including removing windows binaries from the main site), or live with what the systems got and start working round the instabilities of the OS you support.

The same goes with FS change. Either write on the main site, that you support only a specific configuration of a linux box (and I'm sure I'll get something like 'Install a decent distro: Debian/Ubuntu?' in that note), or start supporting all FS-es and be flexible enough to provide the users with a satisfying software.

The "We're doing everything right, it's the world that is wrong" attitude is unacceptable!

comment:19 follow-up: Changed 12 years ago by Valdar

My point of view is that it's acceptable for them because I'm not paying them for the software. "It's a windows problem" means "stop using gaim, install Miranda-IM". They lose one user/advocate. Miranda is installed and I'm happily hard rebooting 200 times a day with no data loss now. :-)

However, if I paid money for gaim, then I would be pretty pissed.

This highlights a problem with open source in general and contributes to why it will never overtake commercial software, so I think you have a valid point in that argument.

comment:20 in reply to: ↑ 18 Changed 12 years ago by elb

Replying to czarny:

I'm disgusted with the way bugs are treated here. 'It's not a pidgin error, it's your filesystems error', 'It's not pidgins error, it's obviously your OS-s error'.

Please explain to me what you do not understand about this, and I will try to clarify it for you. I pointed you to the section of code which is concerned, and it is (to the best of our ability to discern) correct. We even go so far as to stat the file after closing it, to make sure it is of the correct size. I have NO IDEA what we can do to make this better. At some point, short of writing your own disk drivers and doing direct hardware access, you HAVE to trust the operating system to handle certain things correctly; in this case, it is obvious that Windows is not (for whatever reason) doing so.

If you can show me a solution that fixes this problem, I will gladly apply it. However, it is clear that the operating system is violating agreements (POSIX filesystem semantics, specifically) here, and there is very little we can do in the face of that.

If you do not understand this problem, please do not be rude and insulting; explain what you do not understand, and we will attempt to clarify.

A formal bug reply: "Change FS" or "Change OS"?

So when I submit a bug concerning memory allocation I'll get a reply, that the pidgins code is superb, but libc is wrongly implemented?

If, for example, malloc() returns the same memory area twice, then YES. And that is the sort of bug which is happening here. There is some measure of correctness which an application MUST be able to trust in the underlying operating system.

If you won't reply to Windows bugs, as "Windows is too unstable to prevent that" and you "suggest installing an operating system" why the fuck do you support that OS? Either don't support it at all (including removing windows binaries from the main site), or live with what the systems got and start working round the instabilities of the OS you support.

I would love to drop Windows support entirely, but it is not up to me.

The same goes with FS change. Either write on the main site, that you support only a specific configuration of a linux box (and I'm sure I'll get something like 'Install a decent distro: Debian/Ubuntu?' in that note), or start supporting all FS-es and be flexible enough to provide the users with a satisfying software.

If there are similarly buggy filesystems on Linux, we would certainly give the same reply. (I am not aware of any, but they may exist.)

The "We're doing everything right, it's the world that is wrong" attitude is unacceptable!

Do you, or do you not, understand the code that I directed you to? If so, please tell me how it is wrong, and we will fix it. If not, please do not assert that we are at fault -- because as best we can tell, we are not.

Ethan

comment:21 in reply to: ↑ 19 Changed 12 years ago by elb

Replying to Valdar:

My point of view is that it's acceptable for them because I'm not paying them for the software. "It's a windows problem" means "stop using gaim, install Miranda-IM". They lose one user/advocate. Miranda is installed and I'm happily hard rebooting 200 times a day with no data loss now. :-)

Note that Miranda almost certainly does NOT depend on the Windows POSIX layer; it is entirely possible (probable?) that the native Win32 filesystem calls do not suffer from this bug. There isn't a lot we can do about that; however, if someone were to donate a Win32-native and correct purple_util_write_data_to_file that built in our build environment (or one with moderate changes), and it fixed this problem, we would certainly apply it.

However, if I paid money for gaim, then I would be pretty pissed.

This highlights a problem with open source in general and contributes to why it will never overtake commercial software, so I think you have a valid point in that argument.

If you mean that there are many users who demand results, fling insults, and generally behave childishly without giving anything back, we are in complete agreement. (This is not directed toward your comment, which, if somewhat confused here at the end, is polite and reasonable.)

comment:22 Changed 12 years ago by elb

czarny:

After your vitriolic ranting, you may find this interesting:

https://launchpad.net/ubuntu/+bug/37435

Note: See TracTickets for help on using tickets.
All information, including names and email addresses, entered onto this website or sent to mailing lists affiliated with this website will be public. Do not post confidential information, especially passwords!