ECMAScript disharmony

posted on 2008-08-14

In a previous post about ES4, I was a bit complaining about how things are getting political. In the past months, things have gone even worse. The ECMA comity got really split into two, one part of the people working on ES3.1 - which is a minor javascript language update - and the other one working on ES4 aka “Javascript 2″.

I was not very surprised to read today on the ES4 mailing list a mail from Brendan Eich. Entitled ECMAScript Harmony which state that the comity decided that they will :

1. Focus work on ES3.1 with full collaboration of all parties, and
target two interoperable implementations by early next year.

2. Collaborate on the next step beyond ES3.1, which will include
syntactic extensions but which will be more modest than ES4 in both
semantic and syntactic innovation.

Sounds like ES4 as we know it right now is purely and simply dropped !

What does it mean ? It means first that we are still very far to see a major upgrade in the way Javascript will run on the browsers (for reasons why some people definitely don’t want that to happen, read my previous post on ES4).

It also means that the whole “AS3 is standard” thing that Adobe has been “selling” to developers is no longer true. But anyway, that was quite hard to keep this promise from the beginning, since AS3 was released without the ES4 draft being completed, it was definitely not compatible with any standard anyway (if you want to know more about ES4/AS3, Hank Williams has a complete analysis about this, thanks to Juan for the link).

It also means that the approach of haXe is successful : instead of waiting for a particular technology to eventually be deployed, you can start right now using an highlevel programming language to develop for the client-side (Flash and Javascript) and for the server side (PHP and Neko). And when a new technology will come up, haXe will just be able to adapt to it as well : that’s the haXe Judo, as Blackdog is calling it ;)

Some people might not be happy about how things are going, but as for me I think it’s good that ES4 does not become the next JS. While I agree with several aspects of the language design, I think that it’s not a good idea to have such a complex language being the main tool for the web developers.

If we look at the success of the Web, I can see two main reasons why people created a lot of content for it :

1 - it is free : no royalties, no IDE to buy, you can simply write HTML (+ Perl at first, and later PHP) and start building your own website

2 - it is easy : learning HTML is easy, doing some JS to open alerts as well, PHP is very easy to get started with, etc…

I think it’s something very important to keep in mind if we want the web to continue to evolve in a sane manner. We are web professionals and we have our own problems. As professionals, we need more powerful tools, of course, because we tend to always push the limits of our tools at their best ! But don’t forget that most of the web developers don’t care about that. They are just very happy with the way things are right now, because it’s easy.

I won’t say it’s easy to develop, because there’s all these nasty bugs in either JS or PHP that keep making people crazy, but it’s easy to get started, and you can actually build self-confidence, little-by-little, in your capacity at making websites. It’s called “learning how to program”, and that’s a very looong path, that nobody knows exactly if and how it ends.

Anyway, if we want to improve our tools, we can’t think only about our own problems. If someone want to get started programming a website, he doesn’t want to learn about OOP, classes, structural subtyping, or closures. He doesn’t even want to HEAR about it. And he’s right : things should be easy at the beginning.

So, 1 million Euros question : can you build a language that can satisfy at the same time : the beginners and the advanced developers, the proponents of static typing and the proponents of dynamic typing, the people that prefer functional programming and the one that prefer imperative style ? Can you really do that ? I don’t think anybody can. There is no such an universal language ! Each language have its own needs, and even if I am very proud of haXe for instance, I would maybe not recommend it to someone that just started programming, because simply it’s not been built for this.

So what would be best for the web, if there can’t be a single language that satisfy everybody ? The answer is : a common runtime ! Given enough openness to various languages, a common runtime should be able to run many different kind of languages on top of it : statically and dynamically typed ones, functional and imperative ones… even, why not - horror - Visual Basic !

We definitely can’t use an existing runtime : JVM has been written to run Java and .Net has been written to run C#. NekoVM has been built with this multi-language support in mind, but would need to be further abstracted and improved to reach this goal.

And even if it was built, who would distribute such a runtime ? Microsoft ? I’m afraid not. Adobe ? They will definitely concentrate on the Flash Player. Mozilla ? I don’t even know if they would be interested since they seem to focus a lot of Javascript…

We’re back again to politics (or “real world”, as you wish). The Web is open - in theory - but the web client is controlled by a few players, without any way to change that…

Quite depressing, but let’s rejoice, it’s already nice that we have haXe to play with ! :)

Working with File Formats

posted on 2008-08-03

Working with file formats is something very funny (when it works) or frustrating (when it doesn’t). Last format I worked on was PDF. I had several PDF files and I wanted to extract some textual data from them.

First step was to look at it with a text editor : do I need to understand the way the file is stored, or there a way to directly extract the data I need without even understanding the whole thing ? No luck with this, PDF is a binary file format that usually looks like garbage when opened with a text editor.

Second step is then to look for the file format documentation. A bit of google and wikipedia and you can find the following link :

You’ll be able to find here a PDF File Format Reference… in PDF of course. After skipping the introduction blabla, you can directly reach the interesting section which describe the Syntax and the File Structure. Great ! It’s not very often that a file format is officially documented actually, so it’s always nice to find some complete documentation, although additional C source code would be helpful as well.

Anyway, after a bit of coding, I was able to parse my first PDF. It consists in list of “objects” which are referenced by an ID, and depending on their type might content either text, graphics or font data. I was of course interested in the text data, but it was compressed with a so-called “filter” which is basically ZLib compression.

But after trying several zlib parameters to decompress the data, I failed to unzip the text sections. Looking back at the file format, searching for answers, I found that PDF support which is called “encryption”.

The encryption used is the RC4 algorithm, with a key built from different informations present in the PDF, plus an user password that is by default empty. The encryption also contains bit flags that tells which operations people can do with this PDF, like saving, printing, editing… This is one of the most stupid security I have ever seen !

In fact since people are able to open the PDF without entering a password, it means that the PDF can be decrypted without password (aka with the empty password). So it means that it should be possible to very easily remove “encryption” from such a PDF in an automated manner, including modifying the “user rights” on it.

Back in time when this “security” was added, it was surely “security through obscurity” : since nobody knew how to obtain the RC4 key from a given PDF, nobody could read such PDF. But the way the RC4 key is computed is also documented as part of the PDF reference. And this actually is very funny (looks page 125 of the PDF Reference).

Well, since I was not very lucky this time, it turned out the PDF I was trying to read were “encrypted”. So I had to implement this whole nonsense security algorithm… It actually took me almost half a day, because the some value that I had in my PDF reader was wrongly parser (I forgot to handle the minus sign in front of the number) and thus was giving me an invalid RC4 key…

After spending hours trying to make things work, I downloaded some PDF Python library which supported PDF decryption, then run it and added some traces to display the key it was computing. Since it was different from mine, I was able to track my bug by comparing the difference between the computation of the key at the different steps.

And finally, after a day of hardcore coding, it worked ! The text section of the PDF contains some Postscript-like data, but I didn’t need to parse this one, so I instead used some specific regular expressions to extract things that I needed.

Looking at the different formats I have been working on the past years in haXe, and with the recent addition in haXe 2.0 of crossplatform and haxe.Int32, which are the mostly used classes when working with file formats, I decided to group all of these formats together into one single library : hxFormat.

It currently supports FLV and AMF (taken from haxeVideo), ZIP TAR and GZ (taken from package) and PDF. I’m planning to work on PE and DMG support as well at some time, since it would be nice to be able to create DMG in a crossplatform way (see my previous post about OSX and its comments). I’m also accepting other people contributions, so I hope the library will grow with more file formats support !

Anyway it’s always nice to see some library that parse a binary file and makes some sense of the garbage that is stored as bytes. It looks like some kind of magic to me. And working with file formats is also a very good way to learn (more) about programming. A file structure, when it’s well designed, give a lot of information about the architecture of the program that read/write it. It’s rare to see a good file format with bad software, and in general good software have good file formats as well.

Few reasons I hate OSX

posted on 2008-08-02

I hate OSX. Really.

Don’t take it personally if you’re a Mac fan, I don’t have anything against Mac (as hardware - expect French keyboard layout maybe) but OSX is another story.

I’m an open source software developer and supporter of “available everywhere” software. I think that everybody have the right to choose the tool that suit him the most, and that - especially for OS - you should use the one you’re comfortable with. That’s why I want to provide proper releases of my softwares (haXe, Neko, …) for most popular OS : Windows, Linux, and - sadly - OSX.

Since I don’t work on a Mac on a daily basis, I’m trying to automate things as much as possible in order not to have to switch computer for each release I want to build (remember : release early, release often !) . And that’s where the nightmare starts…

For Windows/Linux, things are pretty much easy. After building the small xCross Neko Library, I am able to open a small message window that display the log messages. It was quite easy to do on Windows, since there is a lot of documentation available. On Linux, I used GTK2 and although I didn’t know at all about it, I could make it work. But on OSX, it took me almost one full day to have a small window with auto-scrolling text to work ! Thanks to the very poor Apple OSX C API documentation… And trust me on this one, I’m usually quite able to understand how things works without documentation, but this was really hard.

Ok, back to the subject. Once the crossplatform xCross neko library written, it was then possible to write my software directly in haXe. Since I wanted to automate things at best, I wanted to be able to directly build binaries on Windows for Windows, OSX and Linux, without having to switch OS ! This is pretty easy to do with xCross : since it’s a Neko runtime with statically linked standard libraries, all you have to do is to append the neko bytecode that haXe compiler outputs to the original binary, plus a small header to tell that the VM should start the bytecode. That’s exactly what the nekotools boot command is doing, so I reused this behavior but this time with the xCross runtime.

Now, with one single command I could compile my haXe code, then produce 3 binaries for Windows, Linux and OSX. Everything crossplatform, perfect ! But the last step was the end of my success…

Once the application is built, you have to distribute it (so people can easily download + install it). On Windows, you can either directly distribute the exe, or zip it first. On Linux, gziping the binary is quite common and users know how to do anyway. On OSX, there are several ways to package applications, but first you have to put your binary inside a .app directory (with some additional XML) so it’s recognized as an “application” and can receive user input : running a binary from commandline will NOT enable you to get ANY kind of interactivity : the window will not even gain focus ! And once this is done, the most “easy” way to package your .app directory is to use a DMG.

For not Mac users that might not know that, DMG is a proprietary image format similar to ISO. There are two DMG2ISO software available : one in Perl - which doesn’t seem to work with latest DMG format - and one is the hdiutil software which is only available on Mac. But I didn’t care about DMG2ISO : all I wanted to do was the ability to create a DMG either on Windows or Linux. The answer is : you can’t ! The only tool I found to create DMG is on Mac. And worse : it doesn’t seem to be able to be run from commandline ! So for every release, you’ll have to select the directory, select your options blablabla, click click click then finally you get your DMG…

That’s not automated as I wanted it to be, but back then when I built the haXe Installer, I choose to stop there. My current setup was then to use the MacBook (on which I have dual Boot OSX/Ubuntu).

A few days ago, something catched my attention : VirtualBox is a virtual machine that can run Ubuntu and Windows very nicely on your PC. And even better, it’s GPL software, and it’s only 20MB installer on Windows ! (as a side note, I always tend to prefer small software since it often means it’s not bloated)

With VirtualBox, I was able to install a Virtual Ubuntu on my Windows machine (note : I need Windows for daily Flash work), and compile haXe+Neko on it : perfect for testing / building releases ! Of course my first thoughts were to be able to run OSX as well ! That would be soooo nice, and I don’t care buying an additional OSX license for it since it would save so much trouble switching computers. Actually, I would even put my Virtual Ubuntu and Virtual OSX (and why not one additional WinXP) on one of these USB 16GB keys to carry everywhere with me my fully configured OS ! That would be cool !

Sadly, VirtualBox does not support OSX so far. Well, there are some issues with it, like SSE2/SSE3 emulation I guess, but more importantly : the OSX EULA doesn’t allow you to virtualize it ! It MUST run on Mac hardware ! There’s something wrong with this EULA, and not only from a moral point of view… I’m not a lawyer, but I’m pretty sure that this kind of thing is illegal in Europe, since it force you to buy two distinct products as a whole (OSX and Mac hardware). Anyway.

Giving up VirtualBox right now, my next step is to try installing + running OSX on free VMWare Server, since it seems some people (Google for osx86) managed to do it. I really hope that VirtualBox (which is owned by Sun BTW) will be successful in supporting OSX in the future, since it’s a really good piece of software.

As a conclusion, it looks to me that OSX is an even more closed platform than Windows is ! Or maybe that’s not it. Maybe simply OpenSource programmer are making their software available on Windows because it’s the mainstream OS, while Mac is filled with companies selling you text editors for almost $50 ! And that will be the last thing I say I hate about OSX… at least for now.