R Language | CodeForNothing

Tags

One of the less well known features of GeneXproTools is the ability for the users to add support for new programming languages. Out of the box, GeneXproTools, can export the models it creates to fifteen or sixteen programming languages depending on the type of problem. The “secret” to this flexibility is the fact that, internally, GeneXproTools uses Karva notation which is a very compact language that is easily converted into a tree. This tree can then be cross-compiled into almost any programming language. As in a traditional compiler, this component of GeneXproTools is divided into a backend and frontend. The frontend processes the Karva notation as I described above whereas the backend, assisted by a resource or grammar, recreates the model in the programming language of choice. Adding a new programming language is as simple as creating a new grammar.

GeneXproTools’ Grammar Concepts

A grammar is simply an XML file that tells GeneXproTools how each function is represented in the language of that grammar. For example, and choosing the javascript grammar as an example, the function power is described as:

Let’s ignore the first attribute for the moment. The second one, terminals, defines the arity of the function (the number of inputs), the third and last parameters are fixed values used internally in GeneXproTools that cannot be changed. Finally, the inner text of the node is the description of the power function as it will appear in the code. Note that the inputs must always have the format x0, x1… xn-1 where n is the arity of the function. As of this writing each grammar defines 279 functions so it is a good idea to start the new grammar by copying one that resembles the new language to save time.

Some functions are special and require that the initial parameter, uniontype, be defined. These are the functions used to link the various genes in the model. At the moment there are four such functions per grammar and here is the javascript addition function as an example:

It is quite simple. The uniontype contents match the beginning of each line of code in a model. A typical model in javascript would be:

function gepModel(d)
{
var vTemp = 0.0;

vTemp = (d[0]*((Math.sqrt(d[3])+d[2])+(Math.sqrt(d[1])/d[2])));
vTemp += ((d[2]*d[3])*Math.sqrt((Math.pow(d[3],3)*(d[1]/d[2]))));
vTemp += (d[2]*(((d[3]*d[3])*(d[1]-d[3]))+((d[3]/d[1])/d[1])));

return vTemp;
}

The uniontype represents the code in bold and set to red . When we translate the same model to MATLAB the result will be:

function result = gepModel(d)

varTemp = 0.0;

varTemp = (d(1)*((sqrt(d(4))+d(3))+(sqrt(d(2))/d(3))));
varTemp = varTemp + ((d(3)*d(4))*sqrt(((d(4)^3)*(d(2)/d(3)))));
varTemp = varTemp + (d(3)*(((d(4)*d(4))*(d(2)-d(4)))+((d(4)/d(2))/d(2))));

result = varTemp;

In this case the corresponding uniontype is defined as such:

So the tempvarname is the variable that accumulates the models’ results, the symbol is the operator used to link the genes and the member represents the body of the gene. Again, if you choose a grammar that resembles the new language as a starting point then you can pretty much leave these attributes unchanged.

Another important aspect of the grammars are the “helper functions”. These are functions that require a special implementation for that language. For example, Visual Basic does not have a native Mod function so we have to define it as a callable or helper function. In this case the function is defined as:

<function uniontype=”” terminals=”2″ symbol=”Mod” idx=”4″>gepMod(x0,x1)</function>

And the function gepMod is defined in the helpers section of the grammar as such:

<helper replaces=”Mod”>Function gepMod(ByVal x As Double, ByVal y As Double) As Double{CRLF}{TAB}gepMod = ((x / y) – Fix(x / y)) * y{CRLF}End Function{CRLF}</helper>

As you can see there are special characters to help with formatting body of the function and the x character is also reserved and should be replaced with {CHARX}.

There are several other aspects to the grammars that I am not going to cover in this blog entry but if you get stuck contact me either here at this blog or through Gepsoft’s support.

Grammar Functions

The first thing to do is choosing an existing grammar as a starting point and R is similar to both javascript and Visual Basic (at least from a grammar building point of view). I ended up selecting the latter because of R’ power operator which matches Visual Basic’s. The GeneXproTools’ grammars live in the folder C:\Program Files (x86)\GeneXproTools 43\grammars\ and there are two types of grammars: the Math grammars and the Boolean grammars. The Boolean grammars are used to generate code for Logic Synthesis models whereas the grammars named Math are used for all other model types. In this post I am not covering the Boolean grammars (although they are basically the same) so I started by duplicating the file vb.Math.00.default.grm.xml and renaming it to r.Math.00.default.grm.xml. The second step is to open the file in a text editor such as notepad (I suggest using Notepad2 because it colorizes the contents nicely and validates the xml while you write) and change the grammar first node to:

If you now start GeneXproTools, open a run and go to the Model Panel you will find that the R Language entry was added to the Languages list:

R Language in GeneXproTools

The next step is quite labour intensive and entails translating all the functions from Visual Basic to R. First we start by translating the uniontypes. The only difference here is the equals signal so R’s uniontypes will have this format:

uniontype=”{tempvarname} <- {tempvarname} {symbol} {member}”

Note that the less than symbol symbol must be encoded otherwise the grammar would not be a valid XML file. With this done we jump into the list of functions. Many functions can be left untouched but others must be translated to the R equivalents. Most of the times it is a matter of small differences, for example, “Log” translates to “log”, but others are more complex such as the 3Rt function which requires a helper function in Visual Basic but that can be expressed using the ternary operator in R. This process is error prone and it is a good idea to, every now and then, open the grammar in a browser that validates the XML such as Internet Explorer. Also some judicious search and replace can greatly reduce the burden of hand editing each function.

Another major part of the work involves translating the helper functions from Visual Basic to the R Language. These functions are under the helpers node and mostly are one or two liners. The only odd bit are the layout rules. Whenever you need to insert a Tab you must add {TAB} and to add a new line you have to use (CRLF). In most cases you do not need to worry about the layout of the helpers unless you want to prettify the code. Interestingly, the R grammar is probably the the grammar with the least number of helpers of all.

Finalizing the Grammar

After all the functions and helpers have been translated we are left with a few loose ends to fix. Firstly let’s look at the “headers” node. The header corresponds to the model’s function declaration, for example:

<header type=”default” replace=”no”>Function gepModel(ByRef d() As Double) As Double{CRLF}</header>

Which must be changed to:

<header type=”default” replace=”no”>gepModel <- function(d){CRLF}{</header>

There are two entries in the headers: the first one is the generic case and the second one is specific to Classification runs that require the declaration of a variable called ROUNDING_THRESHOLD. Again, it is a simple case of changing the bits that are different in R while maintaining the same semantics of the code.

The next node that needs a bit of tweaking is the randomconstants node. This node encodes the declaration of the random constants used in the model and they are declared as constants (Const) in Visual Basic but in R they are simple variables.

<randomconst type=”default” replace=”no”>{TAB}Const {labelname} As Double = {labelindex}{CRLF}</randomconst>

Changes to:

<randomconst type=”default” replace=”no”>{TAB}{labelname} <- {labelindex}{CRLF}</randomconst>

The node constants is very similar to the previous one and only requires a small adjustment. Here are both versions:

<randomconst type=”default” replace=”no”>{TAB}Const {labelname} As Double = {labelindex}{CRLF}</randomconst>

Changes to:

<randomconst type=”default” replace=”no”>{TAB}{labelname} <- {labelindex}{CRLF}</randomconst>

Similar changes must be applied to the node footers.

Finalizing the Grammar

The last nodes needed to complete the grammar are the parenstype where a value of 1 means use square brackets and zero means normal brackets , the commentmark which must be set to # for the R Language and the startindex which is the lower bound of a list or an array (1 in this case).

These adjustments bring the grammar very close to complete and are are enough for it to work correctly. The next step is testing the grammar which is a rather more involved process that entails creating models with all the functions and testing them with different sets of data to ensure that the results of the grammar generated code are as close as possible to the native processing of GeneXproTools.

Finally, you can download the grammar described in this post from here to copy into the grammars folder under your installation of GeneXproTools. i hope you found this post useful and if you have any questions just post them in the comments below.

M	T	W	T	F	S	S
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	31

CodeForNothing

Tag Archives: R Language

Building a grammar for the R Language

GeneXproTools’ Grammar Concepts

Grammar Functions

Finalizing the Grammar

Finalizing the Grammar