Hooray for standards.
For all the lip service Mozilla gives about being standards compliant, they sure don’t apply the same set of rules to their bookmarks format. Now, I’m sure that it dates back to the original Netscape code and they’ve found a good reason to keep it the same way all these years, but for someone who’s trying to read the poorly formed HTML4 document, who doesn’t have access to a giant html parser, it’s a real pain in the ass.
Mini-XML chokes on it. That’s a big problem. But lets say for the sake of argument that I rewrote the code to use libxml2. I’d find, as I just did, that even though it can parse it without crashing, the nodes end up in a nonsensical hierarchy that depends mostly on what order they were added in. What’s that, you want proof? See this formatted copy of a fake bookmarks.html file. I’ve added indentation to show you the hierarchy that libxml2 creates.
Even better, Firefox cannot handle libxml2 making the links in the file standards compliant. If for example, I have a URL with an ampersand in it, libxml2 will naturally convert that to & which Firefox will leave be instead of interpreting it as &. This breaks a lot of links as you can imagine. As Daniel Veillard doesn’t want to encourage creating broken code, my only solution there seems to be a whole lot of post processing.
I’m sorry if I misunderstood something or I’ve angered any of the Mozilla developers who I will wholly admit are much smarter than me, but this puts my program in a state of flux that it really cannot be in right now. Arr.
July 13th, 2006 at 11:48 am
I’m pretty sure the Mozilla bookmarks format can be parsed line-by-line using text, instead of using a full-blown HTML/SGML parser… it might be a little easier.
July 13th, 2006 at 12:15 pm
Yeah, I considered doing it that way, though I had the concern of what happens if the file gets mangled and you have multiple tags on single lines. Still, you’re right in that it’s much easier than dealing with a parser and since Firefox is the only other application that uses the file, it should be safe from that happening. I think line-by-line is the road I’m going to end up taking.
Thanks.