What’s next for Neko

posted on 2005-08-20

For people interested in the futur of Neko, here are the things I’m planning to work on in the upcoming weeks :

NekoML is ML-style language that generates to Neko. It is near OCaml but with what I think is a better syntax. Currently the NekoML compiler is written in OCaml but once NekoML is ready I’ll rewrite first the Neko and then the NekoML compilers in NekoML to complete the bootstrapping of Neko.

Two important libraries are needed to implement the compilers : a Lexer and a Parser. Designing and writing theses two as well as compiling efficiently pattern matching from NekoML to Neko are the most difficult parts.

Once done things will start to get really interesting : being able to use the compiler as a neko library opens the ability of interactive programming (console or “toplevel”). A JIT compiler in NekoML is also expected.

NekoVM 1.0 Release

posted on 2005-08-17

Today the Neko website is Online, please visit it there : http://nekovm.org.

GCC register Allocation

posted on 2005-08-11

Today I went very low level, through x86 assembly code generated by GCC. I wanted to check if the compiler was allocating correctly registers for the NekoVM important variables. In the main loop there was three variable specified as “register” : the accumulator (VM register), the VM stack pointer and the VM code pointer. Theses three are really important since basicly you’re manipulating them all the time.

Checking at the generated code with gcc -O3 -S interp.c I found that none was actually allocated as a register. Looking at OCaml VM sources, I noticed that the interpreter was binding theses variables to specific register using GCC asm extension such as :

    int *sp asm("%edi");

By doing that with NekoVM I first got some weird unreadable GCC error saying that it couldn’t allocate enough registers. I went through comment-and-test some part of the code to check what was not working. In several cases in the VM (adding one string and an integer, adding two strings and resizing the stack) I was using some local variables that were preventing for some reason the registers from being allocated.

I fixed it by moving theses part of the code in other functions, which is not so bad because the call overhead is very low compared to the cost of the operations performed. I then somehow managed to get my registers allocated properly. But when running… SegFault.

I had to check more documentation about the manual register allocation in order to understand what was going wrong. Actually according to calling conventions only some registers (on x86 %ebx, %ebp, %esi and %edi) are preserved between calls. One of my allocated registers was %ecx and was then changed inside a call, and was crashing the VM when back. I moved then VM stack pointer to %edi and VM code pointer to %esi.

For the accumulator, I could have put into %ebx, but that would have make GCC unable to use a preserved register and might hurt some performances. Since the “acc” is most of the time assigned when returning from a call, I only added at a few opcodes some save & restore statements, and allocated the register to %eax which is the processor accumulator.

For only a few lines changed into the interpreter code, and some GCC-specific statements added here and there (macros when __GNUC__ defined), I got a 2 times speedup on my favorite fibonacci(35) sample. That’s pretty good news ! Tomorrow I will check about how the Microsoft Compiler is allocating the registers and if I can deal with it the way I did with GCC. I also need to collect some statistics about which opcodes are used the most often on Dinoparc website to see if I can’t specialize some of them a little bit to get additionnal speedup. Right now the VM only have 54 opcodes, so there is more slots available.

MTASC OSCON Talk final slides

posted on 2005-08-06

Here are the final slides for MTASC Talk that I gave at OSCON yesterday.
It includes several informations that were not available in previous slides.

mtasc_oscon.pdf

Thank you to all the people that came to the talk and hello to all the people I’ve met at OSCON !

Neko pystone

posted on 2005-08-03

Yesterday I went to the IronPython Tutorial at OSCON. It was pretty interesting, it reminded me the work I did on OCamlOLE (OLE/COM bindings for OCaml) which involved generating code for accessing OLE components from a TypeLibrary.

From a performances point-of-view, IronPython claims a 1.8x speed factor improvement over Python 2.4, using the pystone benchmark. This morning I wrote then the Neko version of the pystone benchmark and run it on my laptop. I got a 3x speed factor improvement over Python 2.4, so that confirms previous benchmarks using fibonacci numbers. And there is no JIT yet…

(please note that integers on Python are Objects and are automaticaly converted to Bignums when operations are causing an overflow, so a lot more checks and operations are done compared to Neko integers).