This page is targeted at people interested in knowing how the transformer works in general, rather than the specifics of the transformations supported by this transformer.
The transformations are executed by checking and applying rules - created in a fixed order by a rule builder - against a tokenized version of the input statement. As the transformation needs to retrieve code from earlier in the statement, the rule builder allows rules to create scope-aware pointers that look back. I named these pointers anchors.
The first step in the language transformation is to chop the input statement into tokens that the transformation engine can work with. The tokenizer recognizes four kinds of tokens:
Each opening "(" and closing ")" bracket is a separate token.
While each language will need its own tokenizer, any language split in tokens in this way can be handled by the transformer.
My first object oriented approach to building rules was a dismal failure. The object rules could only be executed one after the other; it was very memory intensive; but most importantly the objects caused side effects that proved hard to debug.
In the end I chose a more functional approach, using two "little languages". One language to check if a rule should apply, another language apply the change. Little language parsing is slow, so the Rule Builder generates PHP code from the little languages that is loaded at runtime and executed quickly. To give an impression of what happens: about a 100 lines of little language statements generate about 1300 lines of code containing 78 function definitions for the transformer.
The check and change functions are called when required by the transformer, resulting in a step by step transformation of the original code. These checks and transformations work on two entities: code tokens and anchors. Code tokens consist of either tokens in the original code or new code defined in the change functions. Anchors are scope-related variables that point to code - either earlier in the transformed code or as a separate sequence of code tokens. Anchors are removed when a scope is exited, thus keeping memory usage at a minimum.
Rules become context and scope aware by creating anchors on certain keywords and checking for these anchors in the rule. This means dynamic and scoped usage of the rules without dynamic updates of the rules set.
Alas not everything could be nicely implemented in the little languages, so the code allows for direct call of PHP functions when necessary. One example is the code for checking whether a token ends with "*" or contains a dot. Alas the core Else Logic transformations are very complex and language dependent and are therefore also coded in PHP.
The language definition and the transformations are defined by the SQL Transformer Builder. This is about 200 lines of code that form the SQL specific part of the process. The first 100 lines define both SQL keywords and the general rules for transformation. The other 100 lines define the direct call functions that I could not implement using the little languages or would become too complex using those little languages.
The Anchored Transformer is a generic transformer. It is instantiated for a particular language by the code generated by the Rule Builder and stores the transformation functions and the language definitions. It also contains functions that make the low level transformations easy to code by the Rule Builder and the direct call functions.
I would like to thank Eelco Visser and his coworker Lennart Kats for introducing me to the science behind program language transformations and for their excellent work on the Stratego/XT program transformation project. Though I did not use Stratego/XT their papers on programming language transformation helped me every time I got stuck while writing the transformation engine.