Opened 11 years ago

Last modified 11 years ago

#6140 new patch

XML logger

Reported by: samuel Owned by: rlaager
Milestone: Patches Needing Improvement Component: libpurple
Version: 2.4.2 Keywords: xml logger history
Cc:

Description

I have begun to write xml logging mechanism for pidgin, to replace old logging methods and provide better log functionality.

Reasons to replace current logging mechanisms:

When we are logging conversation, we should log its content, not appearance. We should be able to determine from the log the sender of the message, the reciever, time when message was sent, and many other useful things. Ability to retrieve this information from log allows us to provide better functionality to user (for example advanced searching). It is base-stone for all further log improvements.

Current state is that we have two loggers, text, and html logger. Both of them provide only the log of the message appearance, it is not possible to determine mentioned useful informations from the log. We just see how message appeared in the IM window during the conversation. For example, when we use html logger to log history, we are not able to change the colors of sender, reciever names, etc. in the log. This think is really ugly, because one can change the colors in the chat window, but not in the log. Then, it is really confusing, when you are for example used to have a green color for sender, and in the log it appears red.

Next question is, what is the reason of having two different loggers? I do not see any, it is only one more useless option in pidgin preferences. All logs should be done the same way, there is no reason to use sometimes logger A, then logger B, logger C, etc. It is messy. If we want to configure the way logs are displayed, ok, do it. But don't change the way of logging them. When the user changes his/her mind and want displayed logs in another way, it would be possible with XML logs, but it is not possible with html and text loggers.

Proposition of xml logger

I have writen some proposal of xml log structure, you can find it as an attachment format.xml.

I think the whole log content should be stored in the PurpleLog? structure, and readed or written to a file, when needed. I have added few members to PurpleLog? structure to allow such a storing of information.

Current version of xml logger is able only to write logs, not to read them. Also, it is not able to log all informations proposed in the format.xml file. To make xml logger really work, it is neccessary to change interface of some functions, and appropriatly rewrite affected modules. Changing this is too big step to do it without your agreement, so please, tell what is your opinion about this all:-)

Attachments (1)

xml-log.tar (30.0 KB) - added by samuel 11 years ago.

Download all attachments as: .zip

Change History (8)

comment:1 follow-up: Changed 11 years ago by datallah

  • Type changed from enhancement to patch

I haven't looked at the code yet, but there've been discussions several times about why xml isn't a suitable format for logging (among other reasons because you can't reasonably keep the file well-formed all the time).

This also likely can be implemented as plugin.

comment:2 follow-up: Changed 11 years ago by deryni

What changes to pidgin do you feel are necessary for an xml logger to be written? What about the current logging system doesn't allow you to just create an xml logger plugin? You are aware that the current logging system allows for plugins to add log types, correct?

Why do you want to store the account and buddy alias in the log? I can't imagine most people will want to see old aliases when looking at old logs, will they? Assuming we want the alias stored in the log, why would we want to store the alias again with each message? Why would we need to store both to and from for each message? (For IMs only one side is needed and for chats only one side will generally exist.) Why would the message itself not be the cdata of the <message/> tag?

Also two small details. Firstly, your diffs were taken backwards (they show your changes as removals not additions) and secondly the word is "standard" not "standart".

comment:3 in reply to: ↑ 1 Changed 11 years ago by samuel

Replying to datallah:

there've been discussions several times about why xml isn't a suitable format for logging (among other reasons because you can't reasonably keep the file well-formed all the time).

Yes, you are right, this is a problem. But the current html logger has the same problem, the html-log is not well-formed during the conversation. Currently I am dealing with this problem by complet resaving of the file after each message. Yes, it looks little ugly, but I think this is not real problem, because the logs are always small files (I do not have log greater than 4kB). Perhaps there is a better way to log conversation than by xml-logger and if you have an idea, please, propose it, but I do not think that the current pidgin loggers are suitable.

This also likely can be implemented as plugin.

I do not think so. I want logging mechanism which log all important informations about the conversation and I want it to replace current loggers which do not do so. I do not sea reason to have many different loggers, one good logger is enough and I want such a logger. I do not say that it must be XML-logger, there could be better format for logging. I just want logger to log all important information about the conversation and no current logger do this.

Changed 11 years ago by samuel

comment:4 in reply to: ↑ 2 Changed 11 years ago by samuel

Replying to deryni:

What changes to pidgin do you feel are necessary for an xml logger to be written?

Current logger write function gets parameters like sender alias, but not important parameters like sender/receiver id, and so..

Logger read function just reads log and return a text. But logger should not return just text, it should return metainformation like sender, reciever, time, flags, etc. as described in the post. And this is not possible without rewrite of finch and pidgin.

I think that the log information should be stored in the PurpleLog? structure and write/read function should just store/read the content of this structure to/from a file. And the logviewer shoul work directly with this structure, not with the plain or formated text.

What about the current logging system doesn't allow you to just create an xml logger plugin?

I think there shoul be one good logger logging all important information. I do not want many different loggers. I want one which will replace old loggers and this should not be done by plugin.

You are aware that the current logging system allows for plugins to add log types, correct?

Yes, I know about this.

Why do you want to store the account and buddy alias in the log? I can't imagine most people will want to see old aliases when looking at old logs, will they?

Perhaps you are right, there is no big reason to store aliases in the log. But I think there is no big reason to do not do that. Logviewer do not have to show these old aliases, but perhaps sometime in future somebody will change a mind and find the way how to use this information. Not all the information stored in the log must be used. If there is a chance that some information could be used in future, store it. It does not matter that it will not be used, nobody was hurt because there is somewhere stored unimportant information. Nobody read plain logs, they use logviewer what can filtrate what is relevant and what not.

Assuming we want the alias stored in the log, why would we want to store the alias again with each message? Why would we need to store both to and from for each message? (For IMs only one side is needed and for chats only one side will generally exist.)

In chats there are different senders of message, so it makes sense to store who has sent the message. And, similarly, I can send a message to different persons, so the receiver should be stored.

Why would the message itself not be the cdata of the <message/> tag?

It can be a cdata of the <message/> tag, if you think it is better idea.

Also two small details. Firstly, your diffs were taken backwards (they show your changes as removals not additions) and secondly the word is "standard" not "standart".

I am sorry for that. I just corrected it, now it should be ok.

Thank you for your interest.

comment:5 follow-up: Changed 11 years ago by deryni

Replying to samuel:

Yes, you are right, this is a problem. But the current html logger has the same problem, the html-log is not well-formed during the conversation.

Yes, the html logger has this defect as well but html doesn't have the same well-formedness requirements that XML does. That is html processing entities are allowed to be forgiving to well-formedness errors (or at least have a history of doing so and are unlikely to change that). XML processing entities are (more-or-less) explicitely not allowed to be similarly forgiving, thus causing significantly greater annoyance if logs are malformed.

Currently I am dealing with this problem by complet resaving of the file after each message. Yes, it looks little ugly, but I think this is not real problem, because the logs are always small files (I do not have log greater than 4kB).

That is a problem because it is an enormous amount of unnecessary churning of the disk. At least as far as I'm concerned. This will definitely cause a number of complaints from people (especially those who have home directories mounted over things like NFS).

I do not sea reason to have many different loggers, one good logger is enough and I want such a logger.

One good logger might be fine, but it also likely to be much more than many people care about if it needs to cater to what some small set of people want (which is exactly what plugin addable loggers are good for). Personally, I use, and have used for years, the text logger and am more than satisfied with it.

I just want logger to log all important information about the conversation and no current logger do this.

What information specifically are the current loggers not logging that you want included? Can that information not be added to the logs as-is without requiring a new logging type (or even with a new logging type) but without invasive changes to the logging system itself? If not, why not?

comment:6 in reply to: ↑ 5 Changed 11 years ago by samuel

Replying to deryni:

Yes, the html logger has this defect as well but html doesn't have the same well-formedness requirements that XML does. That is html processing entities are allowed to be forgiving to well-formedness errors (or at least have a history of doing so and are unlikely to change that). XML processing entities are (more-or-less) explicitely not allowed to be similarly forgiving, thus causing significantly greater annoyance if logs are malformed.

I think this is just question of implementation, logger can be written it will ignore malformation of the xml file. But I agree that there exists more suitable file format. I just choosed XML because of easiest implementation.

That is a problem because it is an enormous amount of unnecessary churning of the disk. At least as far as I'm concerned. This will definitely cause a number of complaints from people (especially those who have home directories mounted over things like NFS).

Ok, I agree.

What information specifically are the current loggers not logging that you want included? Can that information not be added to the logs as-is without requiring a new logging type (or even with a new logging type) but without invasive changes to the logging system itself? If not, why not?

Current loggers just do somethink like a screenshot of conversation window. They log appearance, not content and that is what I think should be changed. It should be possible exactly determine what is message, who is its sender, when it was sent, what is the content. When I see somethink like "<3:34> Michael: I like chocolate cake" I will assume that this was message from Michael sent at 3:34, and probably I will be right. (Some problem occurs if the body of the message was <3:34> Michael: etc...) But it is not unambiguous and it is not determinable by machine. And this turns off all possible improvements of current logging, like for example advance searching, or changing the color of sender in the log window and many other (just look to some other IM history if you want to know how this metainformation can be used).

This is also answer why it is not possible to do this without invasive changes. It is impossible because the philosophy is completely different. I want to log content, but now it is only appearance what is logged.

One more example to explain. Assume you are writing document in some sort of text procesor. Would you save it as .png by making snapshot of the screen? No, because you would lost all metainformation. And this is the same with the logger.

comment:7 Changed 11 years ago by rlaager

  • Milestone set to Patches Needing Improvement
  • Owner set to rlaager

Keeping the file in-sync can be done as follows: Write the file as normal on the first message. Then, on subsequent messages, seek backwards over the closing tag and write out the next message and closing tag.

Also, "write" can be defined here as fwrite() + fflush(), just as in the existing code. This is probably "good enough". If you want more robustness, use a SAX-style parser rather than a DOM-style, so you get partial data back if the end of the file is missing. If you really want hardcore interoperability, when you encounter that error, rewrite the file with the data you were able to retrieve and the closing tag.

I'm not opposed to this idea in principle, but the big issue is that XML buys us absolutely nothing. You're just going to convert the conversation markup into XML and then convert it back in again. You'll write a whole bunch of code on both ends, for what? If you're going to define a common logging format, then it might have some merit. I would recommend you look at this page (as well as Adiums implementation, in case that page is out of date): http://trac.adiumx.com/wiki/XMLLogFormat

If you implemented something compatible with Adium's log format, that would be useful for at least some people (and close a feature request I saw the other day).

Also, in case this wasn't already mentioned... Use diff -ur (or manually concatenate your diffs) rather than making a .tar file of separate diff files.

For your dates, use the ISO date format. Of course, if you go the Adium-compatible route, you'd have to do that anyway.

If I were you, I'd start this project by generating a few sample Adium logs and writing the read-only parser first. That way, if you get that done but stop then, you've still got something useful to people.

Note: See TracTickets for help on using tickets.
All information, including names and email addresses, entered onto this website or sent to mailing lists affiliated with this website will be public. Do not post confidential information, especially passwords!