Nicolas Cannasse WebLog » Blog Archive

Xml and Types

posted on 2005-06-10

Today web applications are using a lot of dynamicly typed features such as Xml, Databases, and Http GET/POST parameters. While HTTP parameters can be most of the time secured and type-checked at compile time using closures for example, Xml and Databases are still most of the type dynamicly typed.

Of course you can still use Xml and Databases in strictly typed programs but you’ll use them in an usafe manner (through hashtables, DOM, or SQL strings) and you’ll have to check a lot of types at runtime in order to secure your program logic for unexpected input. Some technologies have been created in order to automate the process : DTD for example are ensuring the correctness of the structure of an Xml document, and Schema is going even further by ensuring also the types of some attributes. However even after being checked against a DTD or a Schema, and Xml remains an Xml and an SQL resultset remains an SQL resultset. That means that the programmer will still have to access it in an unsafe manner that will only be based on previously checked structure.

Let’s take an example : load an Xml file <person name=”Nicolas”/> and prove the structure of that Xml after parsing it. In all Xml apis you can still try to access a “nome” attribute of this Xml while obviously you made a syntax error and that error should/could be tracked at compile-time if your language was able to understand the Schema/DTD you’ve just been using.

There is two kind of groups with different reactions to this problem :

the dynamic group that deals with dynamicly typed input by using dynamicly typed languages such as PHP, Python, Ruby : avoiding the difficulty of giving a type to dynamicly typed things, they just access it the more easily possible, without using any heavyweighted syntax. It works if you know what you’re doing, but it can very easy to break, especially if your Xml/Database structure is changing in your application lifetime since you’ll have then to review all the code (or run it and check all cases) to ensure that no access is made to some no-longer-available attribute or result field.
the offline generation group, especialy Java people : in order to ensure type safety at compile time, they’re using offline code generators that are translating Xml/Database structure into source code with all the needed wrappers. Once converted to a Class, your database table is abstracted and you can manipulate it as a Java object, with sometimes some heavyweighted syntax (calling get/set methods to access to a field, since this can lead to side effects, …).

The generators assume that it’s possible to correctly represent the Xml or Database structure in target language Type System, and vice-versa, which is not always the case ! Let’s take the (Mother,Father,Child) example. Mother and Father can both have a Child, or more likely a Child will always have exactly one Mother and one Father. But Mothers and Fathers can have several childs. How do you represent that relational structure in Xml ?

The answer is easy, look at Databases. When you deal with such cases or more generally with mutually recursive structures you have to assign ids to each node and reference nodes each other between ids. DTD and Schema are not allowing you to check the correctness of such structures, and then there exists some Types or Database Tables that can be represented in Xml but that can’t be checked using DTD/Schema. That’s a problem…

The idea is the following : if the goal is to be able to represent your Xml or Database into your object type system, why not to describe the structure in your language ? And why then not automate checking ? All you need is an additional “prove” keyword that check at runtime a structure against a type and returns an instance of that type if correct :

MyClass a = prove(xml);

Will prove that the xml is structuraly equivalent to your type MyClass and will return a MyClass instance if correct, or raise an exception if an error occurs. Of course in that case MyClass is a “pure” data structure with no methods. The same technique can be used for Database result sets and can even be optimized to be performed only the first time (assuming that you don’t modify the database structure while you’re running the application).

There is still a lot of work to do :
- define a standard for relations (”ids”) between objects
- fully define how an Xml map to your language data structures
- be able to convert back one of your language data structure to an Xml
- handle transparently the mutations of fields that might trigger side effects in the case of a database.
- and even more….

You must be logged in to post a comment.

The path through language design

Xml and Types

Leave a Reply