Random Generation of AST
When we create a population of individuals, we do not want to create every individual by hand. We would rather generate them by some random process. This chapter explains how this can be achieved.
In short generating an Abstract Syntax Tree comes down to picking an alternative at each level of the grammar. This will become clear in examples.
Nadezhda
The Nadezhda grammar is implemented assert
1 pub enum Program {
2 Forward(Box<Program>),
3 Backward(Box<Program>),
4 Stop,
5 }
We would like to generate a program with the following line of code:
1 let program = rand::Random();
For this to work we need to implement rand::Rand for Program. The basic
strategy is to generate a number from 0 up to 3 and depending on the number
create one of the three alternatives.
1 impl rand::Rand for Program {
2 fn rand<R: rand::Rng>(rng: &mut R) -> Self {
3 let random_number = rng.gen_range(0, 3);
4 match random_number {
5 0 => {
6 let nested_program: Program = Program::rand(rng);
7 Program::Forward(Box::new(nested_program))
8 },
9 1 => {
10 let nested_program: Program = Program::rand(rng);
11 Program::Backward(Box::new(nested_program))
12 },
13 _ => Program::Stop,
14 }
15 }
16 }
Note that for the Forward and Backward alternative we need to also create a
nested program that will be wrapped.
Generating more complex grammars
In generating random programs one quickly encounters a problem. The problem manifest itself in occasional stack overflows. In order to examine the problem we are taking a closer look at the heart of the problem.
Take a look at the following grammar
1 Tree -> leaf
2 | node Tree Tree
It expresses a tree structure. If one would write corresponding abstract syntax
trees and generators for this grammar, the same problem would occur. If there
are equal chances to generate a leaf or a node out of a Tree, we are very
likely to end up with a very deep tree.
The solution is to bias the generation in favor of leafs. This will short
circuit the infinite descent and makes stack overflows unlikely.
Mathematically inclined readers can follow along to investigate the issue
further. Lets assume that there is a chance \(p\) to generate a node and a
chance \(q = 1 - p\) to generate a leaf. We would like to express the
expected number of nodes and leafs \(E\) in our generated trees. By
linearity of expected values we have
Because for each node there is 1 extra node plus for the left and right
branches two times the expected number of nodes and leafs, and for each leaf
there is one extra leaf. Using the relation \(q = 1 - p\) and solving for
\(E\) we get
If there are is an equal change to generate nodes and leafs this expression
diverges. I.e. the expected number of nodes and leafs will be infinite.
Exercises
- What is the expected length of a randomly generated Nadezhda program.
- When writing generators it gets tedious to constantly write the same boilerplate code. Alleviate this nuisance by writing a macro.