Xml and Types

posted on 2005-06-10

Today web applications are using a lot of dynamicly typed features such as Xml, Databases, and Http GET/POST parameters. While HTTP parameters can be most of the time secured and type-checked at compile time using closures for example, Xml and Databases are still most of the type dynamicly typed.

Of course you can still use Xml and Databases in strictly typed programs but you’ll use them in an usafe manner (through hashtables, DOM, or SQL strings) and you’ll have to check a lot of types at runtime in order to secure your program logic for unexpected input. Some technologies have been created in order to automate the process : DTD for example are ensuring the correctness of the structure of an Xml document, and Schema is going even further by ensuring also the types of some attributes. However even after being checked against a DTD or a Schema, and Xml remains an Xml and an SQL resultset remains an SQL resultset. That means that the programmer will still have to access it in an unsafe manner that will only be based on previously checked structure.

Let’s take an example : load an Xml file <person name=”Nicolas”/> and prove the structure of that Xml after parsing it. In all Xml apis you can still try to access a “nome” attribute of this Xml while obviously you made a syntax error and that error should/could be tracked at compile-time if your language was able to understand the Schema/DTD you’ve just been using.

There is two kind of groups with different reactions to this problem :

  • the dynamic group that deals with dynamicly typed input by using dynamicly typed languages such as PHP, Python, Ruby : avoiding the difficulty of giving a type to dynamicly typed things, they just access it the more easily possible, without using any heavyweighted syntax. It works if you know what you’re doing, but it can very easy to break, especially if your Xml/Database structure is changing in your application lifetime since you’ll have then to review all the code (or run it and check all cases) to ensure that no access is made to some no-longer-available attribute or result field.
  • the offline generation group, especialy Java people : in order to ensure type safety at compile time, they’re using offline code generators that are translating Xml/Database structure into source code with all the needed wrappers. Once converted to a Class, your database table is abstracted and you can manipulate it as a Java object, with sometimes some heavyweighted syntax (calling get/set methods to access to a field, since this can lead to side effects, …).

The generators assume that it’s possible to correctly represent the Xml or Database structure in target language Type System, and vice-versa, which is not always the case ! Let’s take the (Mother,Father,Child) example. Mother and Father can both have a Child, or more likely a Child will always have exactly one Mother and one Father. But Mothers and Fathers can have several childs. How do you represent that relational structure in Xml ?

The answer is easy, look at Databases. When you deal with such cases or more generally with mutually recursive structures you have to assign ids to each node and reference nodes each other between ids. DTD and Schema are not allowing you to check the correctness of such structures, and then there exists some Types or Database Tables that can be represented in Xml but that can’t be checked using DTD/Schema. That’s a problem…

The idea is the following : if the goal is to be able to represent your Xml or Database into your object type system, why not to describe the structure in your language ? And why then not automate checking ? All you need is an additional “prove” keyword that check at runtime a structure against a type and returns an instance of that type if correct :

MyClass a = prove(xml);

Will prove that the xml is structuraly equivalent to your type MyClass and will return a MyClass instance if correct, or raise an exception if an error occurs. Of course in that case MyClass is a “pure” data structure with no methods. The same technique can be used for Database result sets and can even be optimized to be performed only the first time (assuming that you don’t modify the database structure while you’re running the application).

There is still a lot of work to do :
- define a standard for relations (”ids”) between objects
- fully define how an Xml map to your language data structures
- be able to convert back one of your language data structure to an Xml
- handle transparently the mutations of fields that might trigger side effects in the case of a database.
- and even more….

Flash Open Source

posted on 2005-06-05

Recently John Dowdell showed up on OSFlash mailing list and asked for comments on GPLFlash announcement and open sourcing Flash in general.

I answered with the following mail :

Let me state my point of view about current state of OSS and Flash.

Few years ago, Macromedia Flash had two strengths : its Flash Player that was well distributed and installed on a high percentage of computers and its Flash IDE that was popular with Designers. Both where very important and made the success we know.

Now with the whole RIA thing, the developers are starting evaluating platforms not based on the IDE but on programming language features and available tools. Let’s see what Macromedia Flash have :

- a loosy typed language (AS2) which cannot compete with Java / C#
- a very slow virtual machine implementation
- a buggy and slow compiler
- an IDE that nobody want to use for coding big applications
- very few dev tools : only a debugger, no profiling, no runtime error checking, ….
- no external (commandline) tools that enable users to reuse their favorite IDE
- a closed source platform

Flash can difficultly compete as a developer platform, so the only big strength left is the Flash Player ubiquity : I’m not sure that’s enough to convince more developers to join. MTASC for instance was born from theses frustrations : at Motion-Twin we no longer use AS2 but we have our inhouse programming language for writing games, our own compiler and a good set of tools (swf linker, xml dtd checking, obfuscator…) that made use 10 times more productive than before. If we had platform choice we would have switched for long time !

Macromedia is too slow to adapt to its new public : developers have different needs than designers. In particular, they have already their favorite IDE and want good and flexible technology. Stability and reusability are two important points and full integration is no longer relevant.

What about GPLFlash ? There’s been several projects like that in the past. Writing a SWF Player is quite an easy thing to do with current OSS libraries available - and I know what I’m talking about. Macromedia should realize that and hurry up to cooperate with such project. It’s more dangerous to let an OSSPlayer out of control than to endorse the project, give away some MM Player sources and technology tips to help the building of a better player that would in-fine be the replacement of official player. Look at Real and
their Helix thing.

But I’m not sure that the marketing/business people in Macromedia that are driving the company will understand all theses quite new concepts such as OSS joint project and beleive in it to give it a try. That lack of openness and the long time needed to realize it might eventualy kill the Flash Platform idea in the near future.

NekoML

posted on 2005-06-01

Now that I have a Neko compiler and VM, the next step is to boostrap the language. For that it is needed a typed language on top of Neko. I called that NekoML right now, it’s a like OCaml with a more Neko-like syntax (more easy to learn for people coming from C/Java) and that adds some interesting features such as polymorphic number operations (+ work for int, float and string).

The language is working but it still needs exceptions, the module system with separate compilation, and the check for pattern matching completeness. Once everything is ok, I will add a generator from NekoML to Neko so it can be used as an higher-level staticly-typed programming language. After that the goal is to rewrite the Neko and the NekoML compilers in NekoML to have self-compilation and get rid of OCaml.