Nasal Meta-Programming

From FlightGear wiki
Jump to navigation Jump to search


Recently I created a module full of different hacks, since I was a hobby-ish programmer who had nothing else left to invent in my spare time. In interest as a tutorial, I would like to explain the workings of it, so that maybe you can become a Nasal hacker too!

I called my module "gen.nas" and though it started with "generators", it really covers anything I thought to create and the name is now more of a convenience (I don't like typing) and meant to imply a fair amount of ambiguity. I based it off of two different methods to load modules: the driver.nas import function and FlightGear's security-free io.load_nasal function. For those who don't know, Andy Ross (the creator of Nasal) made a repository on GitHub (see [1] that has some helpful Nasal libraries, one of which (driver.nas) provides an import() function which duplicates the global namespace to prevent modules loaded by import() write access to it, though they can still use extension functions like find(). I knew that I wanted to have access to the global namespace, which would preclude use of import(), but I also liked the idea of an EXPORTS vector to control (or at least pretend to control) what could be used outside of my module, as well as allowing for some good example functions to make use of it. So in the end, it needs to be loaded using a mixture of the two ;).

The whole file can be viewed here (updated 05/2013), but I will copy each section over to here when I explain it. At the top there are some comments which I covered already:

# gen.nas -- namespace "gen"

# Generators and mostly utilities using namespace hacking &c
# Quickly grew overboard ;)

# Note: the fundamental assertion that _the_globals is *the* globals
# could potentially cause problems depending on the loading method
# (driver.nas's import would not work, but FlightGear's io.load_nasal
# would work; which is funny, given that I am using EXPORTS :D).

Just a disclaimer here: I was writing this module for fun, and much of it is untested, but I have caught a few errors or improvements that I will point out as we are touring it.

If you look at the top of the file, I define a minimal EXPORTS vector. What happened to my other functions??

var EXPORTS = ["_the_globals", "_global_func",
               "public", "namespace", "global",
               "bind_to_caller", "bind_to_namespace",
               "bind_to_namespaces"];

Well I decided to make a fun hack so I did not have to manually enter every function I wanted to make public. So I made a public() function!

# For each symbol created by the function <fn> or
# for each symbol in <fn> (if it is either a hash
# or vector), add the name of the symbol to the
# caller's EXPORT vector. Returns a vector of the
# added symbols and adds the symbols to the caller's
# local namespace if possible (i.e. when <fn> is not
# a vector).
#
# The anonymous function argument is so that you can
# use exactly the same syntax, versus having to
# convert it to or write in hash-style syntax (after
# all, Nasal just splits off another codegen to handle
# func{}s...)
var public = func(fn) {
   var c = caller(1)[0];
   var names = []; var hash = {};
   if (typeof(fn) == 'func') {
      call(fn, nil, nil, hash);
      var names = keys(hash);
   } elsif (typeof(fn) == 'hash') {
      var names = keys(hash = fn);
   } elsif (typeof(fn) == 'vector') {
      var names = fn;
   } else die("invalid/unrecognized argument to public()");
   foreach(var sym; keys(hash)) {
      c[sym] = hash[sym];
   }
   if (!contains(c, "EXPORTS"))
      c["EXPORTS"] = [];
   return foreach (var sym; names) {
      append(c["EXPORTS"], sym);
   };
   # In case the behavior changes (they are equivalent):
   #foreach (var sym; names) {
   #   append(c["EXPORTS"], sym);
   #};
   #return names;
};

So I try and split whatever variable we receive up into names and hash, where hash holds both variable_name and value whereas names only holds the names in a vector. It's fairly condensed code, but it should be understandable to the reader knowing Nasal. Note the funny return of a foreach loop, though! It turns out that this loop (and forindex, which is equivalent) leaves the vector on the stack, aka it "returns" that value. If this wacky behavior changes (I think a comment in the code mentioned taking the vector off of the stack), then uncomment the alternative code. As an excercise for the reader: Given a manual return of the names vector (i.e. no return foreach(){} hack), what is a really easy optimization to make instead of a foreach/append() loop?

Next I have two more functions that really are not that well thought through, but should work for simple use cases:

# Basically the same. FIXME: should we use bind() instead?
var global = func(fn) {
   var c = _the_globals;
   var names = []; var hash = {};
   if (typeof(fn) == 'func') {
      call(fn, nil, nil, hash);
      var names = keys(hash);
   } elsif (typeof(fn) == 'hash') {
      var names = keys(hash = fn);
   } else die("invalid/unrecognized argument to global()");
   foreach(var sym; keys(hash)) {
      c[sym] = hash[sym];
   }
   return names;
};

# Runs the function in the namespace, like public().
# Essentially says that the function "describes"
# that namespace (after it runs, of course).
# I could write down a dozen analogies for this...
# Usage:
#   gen.namespace("foo", func {
#       ... #your code here, just write normally 
#   });
# Which roughly translates into C++ as:
#   namespace foo
#   {
#       ... //your code here
#   }
var namespace = func(namespc, fn) {
   if (typeof(namespc) == 'scalar')
      var namespc = _the_globals[namespc];
   bind(fn, _the_globals);
   call(fn, nil, nil, namespc);
};

Note that these rely on the assumption that we can access and modify the global namespace. Let me backtrack and cover a different part of the file, where we try and capture the global namespace and put it under a variable called _the_globals.

var _level = 0;
while (closure(caller(0)[1], _level)) != nil) _level += 1;
var _the_globals = closure(caller(0)[1], _level-=1);
var _global_func = bind(func{}, _the_globals);
bind = (func{
	var _bind = bind;
	func(fn, namespace, enclosure=nil) {
		if (fn != _global_func)
			return _bind(fn, namespace, enclosure);
		#protect it from getting rebound by returning an equivalent but duplicate function:
		return _bind(_bind(func{}, _the_globals), namespace, enclosure);
	}
})();

Very short in length, it checks all the namespaces above this namespace (the caller(0)[1] returns the function that is currently running, aka the one that is creating this namespace, and using closure() on that returns the namespace above it (level=0) or above that (level=1), etc.) until it returns nil, at which case it goes back down one level and caches the assumed "global" namespace. Then we declare an empty function that is bound to the global namespace. This turns out to be very useful later on, with advanced namespace assignment.

One bothersome thing about namespaces in Nasal is that they are fundamentally tied to functions, not the hashes that make up the namespace's variables. It turns out that it is easy to give a function both an outer namespace and a namespace to run in (using bind and call respectively), but it is really hard to give it an outer-outer namespace, due to the chain of functions that needs to be created and the fact that you don't know where you would find the correct function. I got really frustrated one day with this, but the next week I hit upon the solution: make the function! What the cautious programmer has to do is manually create a chain of functions that link to the next one and represent the correct namespace. The bind function takes two arguments: a namespace and an enclosure. It must be emphasized that there are three namespace-related parts of a function. The first namespace is stored with the function and is the outer namespace or the first level of recursion for looking up variables. This is the second argument to bind(). The next namespace is not stored with the function and is only set when the function is called. This is the namespace that the function runs in and is the fourth argument to call, or a new hash otherwise. This is where variables set using the var keyword go. The last attribute, which is also stored with the function, is a function from which to retrieve both another namespace and another function in the chain (or nil if there is none). It turns out that we can manually create a new function to put into this chain, which is what I do with these functions:

# Lexically bind the function to the caller
var bind_to_caller = func(fn, level=1) {
   if (level < 1) return;
   bind(fn, caller(level)[0], caller(level)[1]);
};
# Bind the function to the namespace and then globals
var bind_to_namespace = func(fn, namespace) {
   if (typeof(namespace) == 'scalar')
      var namespace = _the_globals[namespace];
   bind(fn, namespace, _global_func));
};
# Bind the function to each namespace in turn (the
# first is the top-level one, after globals). Each
# item can be a scalar (name of the sub-namespace)
# or a hash (the namespace itself). If create is
# true, then any names that are not present in a
# namespace are created as a new hash; else this
# returns nil.
var bind_to_namespaces = func(fn, namespaces, create=1) {
	if (typeof(namespace) == 'scalar')
		var namespaces = split(".", namespaces);
	var namespace = _the_globals;
	var save = pop(namespaces);
	var _fn = _global_func;
	foreach (var i; namespaces) {
		if (typeof(i) == 'scalar') {
			if (!contains(namespace, i))
				if (create)
					namespace[i] = {};
				else return;
			var i = namespace[i];
		}
		var _fn = bind(func{}, var namespace = i, _fn);
	}
	if (typeof(save) == 'scalar') {
		if (!contains(namespace, save))
			if (create)
				namespace[save] = {};
			else return;
		var save = namespace[save];
	}
	bind(fn, save, _fn);
};

The first one is really easy, we actually do what happens when the Nasal VM sees a func{} expression, though in this case we a recieving a naFunc instead of an naCode. The next one is not complicated either, we bind it to a namespace and then globals (or the assumed globals). The third one is the most tricky. The namespaces argument is either a list of names or hashes or a single string to be split at each dot character. We then save a namespace from the end of the list and process the rest.

We define a temporary variable called _fn that starts out as the _global_func, since we of course want our function to ultimately recurse into the global namespace. Then we reassign it to be a new function (new functions are created by each func{} expression) that is bound to the previous function. This builds the chain of namespaces that we want. One very important thing to note is that using bind(_fn, namespace, _fn) would be wrong, wrong, wrong! It would not create a new function but instead bind _fn to itself, which creates an infinite recursion onto itself which would throw Nasal into an infinite loop as soon as the function tried to use a non-local variable. Big mistake! (that I actually made – oops) Notice how we have to start at the top level and work our way down; this is because each function chains "upward" in its namespaces, so that the "upward" must exist at the time of binding. Also note how we have to save one namespace, this is because the last step requires actually binding the function that we were ultimately trying to bind.

Onto some other utilities:

var _defined = func(sym) {
   # We must first check the frame->locals hash/namespace
   # (since closure(fn, 0) returns the namespace/closure
   # above it, i.e. PTR(frame->func).func->namespace vs
   # PTR(PTR(frame->func).func->next).func->namespace).
   if(contains(caller(1)[0], sym)) return 1;
   var fn = caller(1)[1]; var l = 0;
   while((var frame = closure(fn, l)) != nil) {
      if(contains(frame, sym)) return 1;
      l += 1;
   }
   return 0;
};
var _ldefined = func(sym) {
   return contains(caller(1)[0], sym);
};
var _fix_rest = func(sym) {
    var val = caller(1)[0][sym];
    if (typeof(val) == 'vector' and
        size(val) == 1 and
        typeof(val[0]) == 'vector')
        caller(1)[0][val] = val[0];
};

Here I have "defined", "locally defined", and "fixup the rest vector" functions. I just made them private because they will probably be defined by a library like globals.nas does for FlightGear and/or are one-liners than can be embedded. Note, however, that the defined function in globals.nas is not correct! It purely checks caller entries instead of closure entries, the latter of which are what actually represent the inheritance of namespaces. This is the correct version, note the use of caller(1)[1] to get the function that is currently running and from which we can access the chain of namespaces via closure(). Also note the check of caller(1)[0] to check if it was defined in the local namespace (like by using the var keyword). The last function takes the name of the rest vector (usually arg...) and checks if it consists of a single vector. If so, it replaces the whole vector with just the first element. This allows for functions to call other functions and specify the rest argument as a vector, instead of having to use call() to get the correct arguments.

Now I will move on to higher-level hacking and lesser utilities, all of which are declared using the public() helper so that they automatically get added to the namespace and the EXPORT vector. Note that I won't cover all of them here! Let's start with four of the fun ones:

   # Create a new hash from the symbools of the caller
   # if they are not listed in the ignore vector.
   var new_hash = func(ignore...) {
      var c = caller(1)[0];
      var m = {};
      foreach (SYM; var sym; keys(c)) {
         foreach (var s; ignore) {
            if (sym == s) continue SYM;
         }
         m[sym] == c[sym];
      }
      return m;
   };

   # Create a new object instance (similar to above,
   # but uses the 'me' symbol for the parents vector
   # and ignores the arg and me symbols)
   var new_obj = func(ignore...) {
      var c = caller(1)[0];
      var m = { parents: [c.me] };
      foreach (SYM; var sym; keys(c)) {
         if (sym == "me" or sym == "arg") continue SYM;
         foreach (var s; ignore) {
            if (sym == s) continue SYM;
         }
         m[sym] == c[sym];
      }
      return m;
   };

   #ifdef globals.props.Node:
   if (contains(_the_globals, "props") and contains(_the_globals.props, "Node")) {
   # Same as new_hash but returns a props.Node object using setValues()
   var new_prop = func(ignore...) {
      var c = caller(1)[0];
      var m = {};
      foreach (SYM; var sym; keys(c)) {
         foreach (var s; ignore) {
            if (sym == s) continue SYM;
         }
         m[sym] == c[sym];
      }
      return props.Node.new(m);
   };
   } #endif

   # The opposite of new_hash, this takes a hash and expands the key/values
   # contained in it into the caller (overwriting any possible duplicates)
   var expand_hash = func(hash, ingore...) {
      var c = caller(1)[0];
      foreach (SYM; var sym; keys(hash)) {
         foreach (var s; ignore) {
            if (sym == s) continue SYM;
         }
         c[sym] == hash[sym];
      }
      return c;
   };

These basically allow working with members of an object as local variables. This is particularly useful when you have a gazillion arguments to a constructor function (indeed, they probably all have defaults values!) and they are all named according to their name as the member, so there would be a lot of lines of the form m.foo = foo;. These helper functions make constructors like that into one-liners. If you have a temporary variable that should not be copied as a member, just include it as an argument:

var Warper = {
    # Create a class to warp an input, giving
    # it an initial position of <pos>.
    new : func(pos, power, offset) {
        var tmp = pos+offset;
        var curr = math.pow(tmp, power); #current position
        var m = gen.new_obj("tmp", "pos");
    }
};

Please note that new_obj bases its parents vector off of the "me" variable of the caller! For most use cases, this works perfectly well (e.g. calling Warper.new() would base it off of Warper), but it also allows instances of objects based of instances (e.g. Warper.new().new() would have a parents vector of the first Warper.new()) and will not work using brackets (e.g. Warper["new"]() would result in a error, "undefined symbol "new" on line 188 of gen.nas").

This is a function that automatically associates a list of keys with a list of values:

   # Associate respective keys with values stored in the second vector
   # and return the resulting hash. It is recursive, so something like
   # this works as syntactic sugar (the first index specifies the name):
   #   var clamp_template = ["property", ["range", "min", "max"]];
   #   var aileron = ["/controls/flight/aileron", [-1, 1]];
   #   vec2hash(clamp_template, aileron) == {
   #      "property": "/controls/flight/aileron",
   #      range: { "min": -1, "max": 1 } }
   var vec2hash = func(_keys, list) {
      var result = {};
      forindex (var i; _keys) {
         if (typeof(_keys[i]) == 'vector') {
            result[_keys[i][0]] = list2hash(_keys[i][1:], list[i]);
         } elsif (typeof(_keys[i]) == 'scalar') {
            result[_keys[i]] = list[i];
         }
      };
      return result;
   };

It was inspired by the way C extension functions in Nasal are initialized: a simple list is specified (like a vector in Nasal) though each receives a name to be accessed by depending on its index in the list. This extension also allows hashes within hashes, if the list of keys has a vector in itself in which case the first index specifies where to put the sub-hash inside the outer hash, while the other items specify the keys inside of the sub-hash.

Here's another one:

   # Make an extension in the namespace, inside any objects
   # or sub-namespaces specified in objs, with the name
   # of fname, and where fn is written like it was in the file
   # (i.e. no prefixing of the namespace before every variable).
   # It only defines it if the namespace exists and a variable
   # with the name does not exist or is nil.
   var provide_extension = func(namespc, fname, fn, objs...) {
      if (typeof(namespc) == 'scalar')
         var _n = _the_globals[namespc];
      foreach (var name; objs) {
         if (_n == nil) return;
         _n = _n[name];
      }
      if (_n[fname] != nil) return; #only define it if it does not exist
      if (typeof(fn) == 'scalar') fn = compile(fn);
      _n[fname] = bind(fn, _the_globals[namespc], _global_func);
   };

Like the comment says, it makes a function inside a namespace and any objects only if the namespace exists but the variable doesn't. Notice that I only bind the function to the namespace and globals, why is that?? To take an example, lets consider props.nas which defines a Node class:

# $FG_ROOT/Nasal/props.nas
var Node = {
    getNode        : func wrap(_getNode       (me._g, arg)),
    #...
};
#...
Node.getValues = func {
    #...
};
#...

Consider where Node.getNode and Node.getValues are bound to. Are they bound to the same namespace? Or different ones? Is one bound "inside" the class? Well it turns out that the Nasal VM binds them both to the same place, which is simply inside the props namespace. This is why I don't have to use bind_to_namespaces but simply have to follow the objects and bind it to one namespace. (Note that I originally came up with this function for the purpose of adding a custom extension to props.nas from outside of it!) Also note that this means that bind_to_namespaces is only intended for namespaces within namespaces, not objects/classes within namespaces. This is where a line must be drawn between the two, even though they are all internally hashes, they do have a different use cases and I draw a distinction with my jargon.

Other features available in the module but not tutorialized yet:

  • mutable functions
  • "macro" functions
  • consolidation of namespaces (actually named accumulate right now): make a one-time expense to reduce timing of hash lookups. Probably not recommended, though.
  • two harebrained schemes for overloading that are rather inflexible and actually only satisfy two use-cases per function I made.
  • duplicate, (recursive) equals, and econtains (extended contains) utilities
  • a couple classes at the end: Hash, Func, and Class.