AST Macros
Version | 1 |
Created | 2013-11-10 |
Status | Draft |
Last modified | 2013-11-10 |
Author | Jacob Carlborg |
Abstract
The basic concept of AST (Abstract Syntax Tree) macros, or just syntax macros, is fairly simple. A macro is just like any other function or method except that it will only run at compile time. When a macro is called, instead of evaluating its argument and then calling the function, an AST is created for each argument passed to the function. The macro will then return a new AST which is injected and type checked at the call site. This means that the call to the macro will be replaced with the AST returned by the macro.
Example
macro myAssert (Context context, Ast!(bool) val, Ast!(string) str = null)
{
auto message = str ? "Assertion failure: " ~ str.eval : val.toString();
auto msgExpr = literal(constant(message));
return <[
if (!$val)
throw new AssertError($msgExpr);
]>;
}
void main ()
{
myAssert(1 + 2 == 4);
}
Compiling and running the above program would result in the following assert error:
core.exception.AssertError@main(13): Assertion failure: 1 + 2 == 4
The interesting part here is that the assert message contains the actual expression that failed.
Rationale
AST macros can be used extend the language with new semantics without
changing the actual language. Instead of adding new features to the
language AST macros can be a general solution to implement language
changes in library code. Many existing language features could have been
implemented with AST macros, like scope
, foreach
and similar
language constructs.
Formal Definition
Declaring a Macro
A macro is always declared with the macro
keyword followed by its name
and a parameter list. The first parameter of macro is always of the type
Context
, therefore the parameter list cannot be empty. The rest of the
parameters are always of the type Ast
. A macro always need to return
either void
or a value of the type Ast
.
macro foo (Context context, Ast!(string) str)
{
return str;
}
Calling a Macro
A macro is called just like any other function. The first parameter,
which is of the type Context
, is passed implicitly by the compiler.
The rest of the arguments are passed like in a regular function call.
Although you won’t pass arguments of the type Ast
, regular values are
passed instead and the compiler creates an AST of the arguments and pass
them as Ast
arguments to the macro.
The Context Parameter
The first parameter of a macro declaration is always of the type
Context
. This parameter is mostly passed implicitly by the compiler.
It’s also possible to pass the context parameter manually. This is
useful when having helper functions for a macro and need to retain the
context given to the original macro.
The context parameter contains information about the surrounding context where the macro was called. This can be information like the surrounding method and class from which the macro was called.
Bonus
This context parameter also contain information about the complete compilation environment, like:
- The arguments used when the compiler was invoked
- Functions for emitting messages of various verbosity level, like error,
warning and info
- Functions for querying various types of settings/options, like which versions
are defined, is “debug” or “release” defined and so on
- In general providing as much as possible of what the compiler knows about the
compile run
- The context should have an associative array with references to all scoped variables at initiation point.
This has the benefit of enabling a macro to check variables that are “passed” to it as well as modify it. The this keyword value should be available to the macro if it is from either a class or struct.
Semantics
Since macros can only be called at compile time the compiler will strip out all macros before the code generating phase.
Quasi-Quoting
Quasi-quoting is basically a form of syntax sugar for creating syntax tree. It could be considered as AST literals. In all examples in this text the following syntax is used for quasi-quoting:
<[ writeln("asd"); ]>
Splicing
Splicing is a syntax used for dynamically inserting a piece of an AST in
quasi-quotes. In all examples in this text a dollar sign, $
, is used
for splicing.
<[ writeln($expr); ]>
The syntax for quasi-quoting and splicing is just an abstraction of what the syntax would actually look like. Regardless of what syntax is used there’s always the option to manually create syntax trees using an API.
The AST Macro
The ast
macro is an option to implement quasi-quoting in a library
macro. The ast
macro takes an arbitrary expression and transform it to
an AST. It also supports splicing. The ast
macro has a couple of
overloads and their declarations look as follows:
macro ast (T) (Context context, Ast!(T) expr)
{
// ...
}
macro ast (Context context, Ast!(void delegate ()) block)
{
// ...
}
The first overload takes an arbitrary expression and converts it into an AST. The second overload takes a delegate, this is to be able to convert a whole block of code to an AST.
Bonus
Calling a Macro
Macros are extend to be callable from anywhere it’s possible to use a mixin.
Statement Macros
A statement macro is a macro that takes a Statement
as its last
parameter. The difference compared to regular macros is the calling
syntax. Statement macros are called with the same syntax used for
statements, like the example below:
macro foo (Context context, Statement block)
{
return block;
}
macro bar (Context context, Ast!(int) arg, Statement block)
{
return block;
}
void main ()
{
foo
{
writeln("foo");
writeln("foo again");
}
foo
writeln("foo2");
bar(3) {
writeln("bar");
writeln("bar again");
}
bar(3)
writeln("bar2");
}
Just like many of the built-in statements the braces are optional when there’s only a single expression in the statement. Since the statement is always the last parameter in the macro declaration and it’s always passed outside the regular argument list it’s legal to have parameters with default arguments or a variadic parameter list before the statement parameter.
macro foo (Context context, Ast!(string)[] arg ..., Statement block)
{
return block;
}
macro bar (Context context, Ast!(string) fmt = null, Statement block)
{
return block;
}
Declaration macros
A declaration macro is a macro that acts like a user defined attribute. It can be applied to any declaration. When a declaration macro is used, the macro is called and the AST of the declaration is passed as the last parameter to the macro. The declaration is replaced with whatever syntax tree the macro returns.
A declaration macro always take a Declaration
as its last parameter.
The same rules about default arguments and variadic parameter that apply
to statement macros apply to declaration macros as well.
macro attr (Context context, Declaration decl)
{
auto attrName = decl.name;
auto type = decl.type;
return <[
private $decl.type _$decl.name;
$decl.type $decl.name ()
{
return _$decl.name;
}
$decl.type $decl.name ($decl.type value)
{
return _$decl.name = value;
}
]>;
}
class Foo
{
@attr int bar;
}
Use cases
Examples of usage of AST macros that would be useful for extending the language.
Linq
Linq is a .net library that encorperates searching and manipulation of data. A c# example is:
using System;
using System.Linq;
class Program
{
static void Main()
{
int[] array = { 1, 2, 3, 6, 7, 8 };
var elements = from element in array
where element > 5
select element;
foreach (var element in elements)
{
}
}
}
This could be implemented by an end user as:
import linq;
import std.stdio;
void main() {
int[] array = [1, 2, 3, 6, 7, 8];
int[] data;
query {
from element in array
where element > 2
add element to data
}
}
That code would be converted to:
import linq;
void main() {
int[] array = [1, 2, 3, 6, 7, 8];
int[] data;
foreach (element; array) {
if (element > 5) data ~= element;
}
}
C#’s ability of specifying the variable to be set to is not required at least for this example. However it should be able to be specified e.g.
query {
int data
from element in array
where element > 5
select element
}
This would be closer to c#’s.
That code would be converted to:
import linq;
void main() {
int[] array = [1, 2, 3, 6, 7, 8];
int[] data;
foreach (element; array) {
if (element > 5) data ~= element;
}
}
For improvements of this it would be suggested that the ability to be able to get the current variables declared within scope. This will enable the ability to check for if variables defined e.g. the array. If it is not it will be possible give a good compiler error. It would enable the ability to instead of specifying the type of an array value it could determine if based upon the array given.
Reflection
class Person {
macro where (Context context, Statement statement) {
// ...
}
}
auto foo = "John";
auto result = Person.where(e => e.name == foo);
// is replaced by
auto foo = "John";
auto result = Person.query("select * from person where person.name = " ~
sqlQuote(foo) ~ ";");
Calculation
Given a simple macro example that will add two numbers together and then return it, the values requested must be available. Using scoped variables passed by reference on the context this is possible.
func(1, 2); // example args
void func(int i, int i2) {
foo {
output
i, i2
}
}
macro foo (Context context, Ast!(string) str)
{
string outputVariable = // get return through str
string name1 = // get i through str
string name2 = // get i2 through str
return outputVariable = "auto " ~ outputVariable ~ text(context.scopeVariables!int(name1) + context.scopeVariables!int(name2)) ~ ";";
}
When unrolled it will become:
void func(int i, int i2) {
auto output = 3;
}
This essentially emulates pure functions however as stated in Linq example that it would enable checking of variables and types as required.
C++ Namespaces (issue 7961)
Bugzilla issue
7961 talks about
adding support for C++ namespaces. This should be possible to solve with
library code, especially since we already have pragma(mangle)
:
What we have today, declaration of a C++ function, without namespace:
extern (C++) void x ();
Namespaces in C++ is all about mangling of symbols. Since we already
have pragma(mangle)
one could think that it would be possible to solve
with library code. Unfortunately this causes some problems:
string namespace (string namespace) { // mangle the namespace ... }
pragma(mangle, namespace("foo::bar") extern (C++) void x ();
In the about example the namespace is properly mangled but we’re missing the mangling of “x”. That’s not something we want to do manually. Next try:
string namespace (string namespace, alias func) () { // mangle the namespace ... }
pragma(mangle, namespace!("foo::bar", x) extern (C++) void x ();
The above doesn’t work either because of forward references of “x”. Next try:
string namespace (string namespace, T, string name) () { // mangle the namespace ... }
pragma(mangle, namespace!("foo::bar", void function (), "x") extern (C++) void x ();
The above would mostly likely work. But now we’re duplicating the signature and the name of “x”. This is error prone and we will lead hard to find bugs or irritating linker errors. Not something we want to do.
Instead we can solve it with AST macros:
string mangle_cpp (string namespace, T, string name) () { // mangle the declaration ... }
macro namespace (Context context, Ast!(string) namespace, Declaration declaration)
{
auto name = declaration.name;
auto type = declaration.type;
auto mangledName = mangle_cpp(namespace.eval(), type.eval(), name.eval());
auto mangeldNameAst = literal(constant(mangledName));
return <|
pragma(mangle, $mangledName) $declaration;
]>;
}
Usage:
@namespace("foo::bar") extern (C++) void x ();
This can also be used to look more like a real namespace in C++:
@namespace("foo::bar") extern (C++)
{
void x ();
void y ();
}
Attribute inference
Currently attributes are inferred automatically for template functions. This shows an example of automatically infer attributes for a non-template function based the attributes of another symbol 1.
macro inferAttributes (Context context, Ast!(Symbol) symbol, Declaration decl)
{
foreach (attr ; symbol.attributes)
decl.attributes ~= attr;
return decl;
}
Usage:
class Foo (T)
{
@inferAttributes(T.foo) void thisIsSoPolymorphic () { }
}