340 lines
13 KiB
Text
340 lines
13 KiB
Text
|
|
In the compiler `continuation' means a continuation that is a lambda node.
|
|
Non-lambda continuation arguments, such as the argument to a RETURN, are
|
|
not referred to as continuations (the argument isn't a continuation, it
|
|
is a variable that is bound to a continuation).
|
|
|
|
|
|
Every node has the following fields:
|
|
|
|
variant ; one of LITERAL, REFERENCE, LAMBDA, or CALL
|
|
parent ; parent node
|
|
index ; index of this node in parent, if parent is a call node
|
|
simplified? ; true if it has already been simplified; if this is #F
|
|
; then all of this node's ancestors must also be unsimplified
|
|
flag ; useful flag, all users must leave this is #F
|
|
|
|
|
|
Literal nodes:
|
|
|
|
value ; the value
|
|
type ; the type of the value (important for statically typed languages,
|
|
; not so useful for Scheme)
|
|
|
|
Reference nodes:
|
|
|
|
variable ; the referenced variable; the binder of the variable must be
|
|
; an ancestor of the reference node
|
|
|
|
Call nodes:
|
|
primop ; the primitive being called
|
|
args ; vector of argument nodes
|
|
exits ; the number of arguments that are continuations; the continuation
|
|
; arguments come before the non-continuation ones
|
|
source ; source info; used for error messages
|
|
|
|
Primops are either trivial or nontrivial. Trivial primops only return a value
|
|
and have no side effects. Calls to trivial primops never have continuation
|
|
arguments and are always arguments to other calls. Calls to nontrivial primops
|
|
may or may not have continuations and are always the body of a lambda node.
|
|
|
|
Lambda nodes:
|
|
|
|
type ; one of PROC, CONT, or JUMP (and maybe THROW at some point)
|
|
name ; symbol (for debugging)
|
|
id ; unique integer (for debugging)
|
|
body ; the call-node that is the body of the lambda
|
|
variables ; a list of variable records, with #Fs for ignored positions
|
|
source ; source info; used for error messages
|
|
protocol ; calling protocol from the source language
|
|
block ; for use during code generation
|
|
env ; for use when adding explicit environments
|
|
|
|
PROC's are general procedures. The first variable of a PROC will be bound
|
|
to the PROC's continuation.
|
|
|
|
CONT's are continuation arguments to calls.
|
|
|
|
JUMP's are continuations bound by LET or LETREC, whose calling points are
|
|
known, and which are created and called within a single PROC.
|
|
|
|
Variables:
|
|
|
|
name ; source code name for variable (used for debugging only)
|
|
id ; unique numeric identifier (used for debugging only)
|
|
type ; type of variable's value
|
|
binder ; LAMBDA node which binds this variable (or #F if none)
|
|
refs ; list of reference nodes n for which (REFERENCE-VARIABLE n)
|
|
; = this variable
|
|
flag ; useful slot, used by shapes, COPY-NODE, NODE->VECTOR, etc.
|
|
; all users must leave this is #F
|
|
flags ; list of various annotations, e.g. IGNORABLE
|
|
generate ; for whatever code generation wants
|
|
|
|
----------------------------------------------------------------
|
|
The node tree has a very regular lexical structure:
|
|
|
|
The body of every lambda node is a non-trivial call.
|
|
The parent of every non-trivial call is a lambda node.
|
|
Every CONT lambda is a continuation of a non-trivial call.
|
|
Every JUMP lambda is an argument to either the LET or the LETREC
|
|
primops (described below).
|
|
The lambda node that binds a variable is an ancestor of every reference
|
|
to that variable.
|
|
|
|
If you start from any leaf node and follow the parent pointers up through the
|
|
node tree, you first go through some number, possible zero, of trivial calls
|
|
until a non-trivial call is reached. From that point on non-trivial calls
|
|
alternate with CONT nodes until a PROC or JUMP lambda is reached. Going up
|
|
from a PROC lambda is the same as going up from a leaf, while JUMP lambdas
|
|
are always arguments to LET or LETREC, both of which are non-trivial.
|
|
|
|
A basic block appears as a sequence of non-trivial calls with a single
|
|
continuation apiece. The block begins with a PROC or JUMP lambda, or
|
|
with a CONT lambda that is an argument to a call with two or more
|
|
continuations, and ends with a call that has either no continuations,
|
|
or two or more.
|
|
|
|
Basic blocks are grouped into trees. The root of every tree is either
|
|
a PROC or JUMP lambda, the branch points are calls with two or more
|
|
continuations, and the leaves are jumps or returns. Within a tree
|
|
the control flow follows the lexical structure of the program from
|
|
parent to child (if we ignore calls to other PROCs).
|
|
|
|
Every JUMP lambda is called from within only one PROC lambda, so a PROC
|
|
can be considered to consist of a set of trees, the leaves of which either
|
|
return from that PROC or jump to the top of another tree in the set.
|
|
|
|
----------------------------------------------------------------
|
|
|
|
Primops:
|
|
|
|
id ; unique symbol identifying this primop
|
|
trivial? ; #t if this primop has does not accept a continuation
|
|
side-effects ; one of #F, READ, WRITE, ALLOCATE, or IO
|
|
simplify-call-proc ; simplify method
|
|
primop-cost-proc ; cost of executing this operation
|
|
; (in some undisclosed metric)
|
|
return-type-proc ; the type of the value returned (for trivial primops only)
|
|
proc-data ; more data for the procedure primops
|
|
cond-data ; more data for conditional primops
|
|
code-data ; code generation data
|
|
|
|
`procedure' primops are those that call one of their values.
|
|
`conditional' primops are those that have more than one continuation.
|
|
|
|
Below is a list of the standard primops. All but the last two are non-trivial.
|
|
|
|
For the following the five primops the lambda node being called, jumped to,
|
|
or whatever has been identified by the compiler, and the number of variables
|
|
that the lambda node has matches the number of arguments.
|
|
|
|
(CALL <cont> <proc> . <args>)
|
|
(TAIL-CALL <cont-var> <proc> . <args>)
|
|
(RETURN <cont-var> . <args>)
|
|
(JUMP <jump-var> . <args>)
|
|
; (THROW <throw-var> . <args>) not yet implemented
|
|
|
|
These are the same as the above except that the procedure has not been
|
|
identified by the compiler. There is no UNKNOWN-JUMP because all calls
|
|
to JUMP lambdas must be known.
|
|
|
|
(UNKNOWN-CALL <cont> <proc> . <args>)
|
|
(UNKNOWN-TAIL-CALL <cont> <proc> . <args>)
|
|
(UNKNOWN-RETURN <cont-var> . <args>)
|
|
|
|
PROC lambdas are called with either CALL or TAIL-CALL if all of their call
|
|
sites have been identified, or with UNKNOWN-CALL or UNKNOWN-TAIL-CALL if not.
|
|
JUMP lambdas are called using JUMP.
|
|
|
|
|
|
LET binds random values, such as lambda nodes or the results of trivial
|
|
calls, to variables. This primop only exists because of the requirement
|
|
that every call have a primop; all it does is apply <cont> to <args>
|
|
(it is called LET instead of APPLY because LET forms in the source code
|
|
become calls to this primop).
|
|
|
|
(LET <cont> . <args>)
|
|
|
|
|
|
Recursive binding:
|
|
|
|
(LETREC1 <cont>)
|
|
(LETREC2 <cont> <id-var> <lambda1> <lambda2> ...)
|
|
|
|
These are always used together, with the body of the continuation to LETREC1
|
|
being a call to LETREC2. The two calls together look like:
|
|
|
|
(LETREC1 (lambda (<id-var> <var1> ... <varN>)
|
|
(LETREC2 <cont> <id-var> <lambda1> ... <lambdaN>)))
|
|
|
|
which the CPS pretty-printer prints as:
|
|
|
|
(let* (...
|
|
((id-var var1 ... varN) (letrec1))
|
|
(() (letrec2 id-var lambda1 ... lambdaN))
|
|
...)
|
|
...)
|
|
|
|
The end result is to bind <varI> to <lambdaI>. The point to the excercise
|
|
is that lambdas occur within the scope of the variables.
|
|
|
|
|
|
Undefined effect. This takes a continuation variable as an argument only
|
|
so that the continuation variable is always reached.
|
|
|
|
(UNDEFINED-EFFECT <cont-var> ...)
|
|
|
|
|
|
Accessing and mutating the store.
|
|
Cells are used to implement SET! on lexically bound variables. GLOBAL-SET!
|
|
and GLOBAL-REF are used for module variables that may be set.
|
|
|
|
(CELL-SET! <cont> <cell> <value>)
|
|
(GLOBAL-SET! <cont> <global-var> <value>)
|
|
|
|
(CELL-REF <cell>) ; trivial
|
|
(GLOBAL-REF <global-var>) ; trivial
|
|
|
|
----------------------------------------------------------------
|
|
Printing out the node tree.
|
|
|
|
The following procedure:
|
|
|
|
(define (fact n)
|
|
(let loop ((n n) (r 1))
|
|
(if (< n 2)
|
|
r
|
|
(loop (- n 1) (* n r)))))
|
|
|
|
when converted into nodes is:
|
|
|
|
(LAMBDAp (c_6 n_1)
|
|
(letrec1 (LAMBDAc (x_13 loop_2)
|
|
(letrec2 (LAMBDAc ()
|
|
(unknown-tail-call c_6 loop_2 n_1 '1))
|
|
x_13
|
|
(LAMBDAp (c_8 n_3 r_4)
|
|
(test
|
|
(LAMBDAc ()
|
|
(unknown-return c_8 r_4))
|
|
(LAMBDAc ()
|
|
(unknown-tail-call c_8 loop_2 (- n_3 '1) (* n_3 r_4)))
|
|
(< n_3 '2)))))))
|
|
|
|
where LAMBDAp is a PROC lambda and LAMBDAc is a CONT lambda. Lexically bound
|
|
variables are printed as <name>_<id> and constants as '<value>. This is not
|
|
very readable, and larger procedures are much worse. The first step in making
|
|
it more comprehensible is to print each lambda node separately with a marker
|
|
to indicate where it appears in the tree.
|
|
|
|
(LAMBDAp fact_7 (c_6 n_1)
|
|
(letrec1 1 ^c_14))
|
|
|
|
(LAMBDAc c_14 (x_13 loop_2)
|
|
(letrec2 1 ^c_12 x_13 ^loop_9))
|
|
|
|
(LAMBDAc c_12 ()
|
|
(unknown-tail-call 0 c_6 loop_2 n_1 '1))
|
|
|
|
(LAMBDAp loop9 (c_8 n_3 r_4)
|
|
(test 2 ^g_10 ^g_11 (< n_3 '2)))
|
|
|
|
(LAMBDAc g_10 ()
|
|
(unknown-return 0 c_8 r_4))
|
|
|
|
(LAMBDAc g_11 ()
|
|
(unknown-tail-call 0 c_8 loop_2 (- n_3 '1) (* n_3 r_4)))
|
|
|
|
The labels used are the names and id's of the lambda nodes, with a ^ in front
|
|
to distinguish them from variables. The code for each lambda is indented
|
|
slightly more than the lambda in which it actually occurs. To make the
|
|
distinction between continuation and non-continuation lambdas clearer the
|
|
number of continuation arguments to a call is printed just after the primop
|
|
(for example the first two arguments to TEST are continuations).
|
|
|
|
The first three calls form a basic block because the first two calls have
|
|
exactly one continuation apiece. To make this more easily seen these
|
|
calls can be printed using a more condensed notation:
|
|
|
|
(LAMBDAp fact_7 (c_6 n_1)
|
|
(LET* (((x_13 loop_2) (letrec1))
|
|
(() (letrec2 x_13 ^loop_9)))
|
|
(unknown-tail-call 0 c_6 loop_2 n_1 '1)))
|
|
|
|
The continuations are not printed as arguments but instead their variables
|
|
are printed to the left of the call in a parody of Scheme's LET*. The results
|
|
of the LETREC1 are bound to the variables X_13 and LOOP_2 as would happen with
|
|
the real LET* (if it allowed calls to return multiple values).
|
|
|
|
Finally, here is the way the code for FACT is actually printed:
|
|
|
|
7 (P fact_7 (c_6 n_1)
|
|
14 (LET* (((x_13 loop_2)
|
|
(letrec1))
|
|
12 (() (letrec2 x_13 ^loop_9)))
|
|
(unknown-tail-call 0 c_6 loop_2 n_1 '1)))
|
|
|
|
9 (P loop_9 (c_8 n_3 r_4)
|
|
(test 2 ^g_10 ^g_11 (< n_3 '2)))
|
|
|
|
10 (C g_10 ()
|
|
(unknown-return 0 c_8 r_4))
|
|
|
|
11 (C g_11 ()
|
|
(unknown-tail-call 0 c_8 loop_2 (- n_3 '1) (* n_3 r_4)))
|
|
|
|
The ID number of every lambda node is printed out at the beginning of the
|
|
line on which the code for the lambda appears. This is redundant for the
|
|
lambdas that are not printed as part of a LET*. The word `LAMBDA' is not
|
|
printed. The (letrec1) call appears on a new line because the printer
|
|
indents the calls in LET* a fixed amount.
|
|
|
|
The reason for printing the ID numbers is so that the actual nodes can be
|
|
obtained. Once a lambda has been printed (either by the pretty printer or
|
|
by the regular printer), (NODE-UNHASH <id>) will return it:
|
|
|
|
scheme-compiler> (node-unhash 9)
|
|
'#{Node lambda loop 9}
|
|
scheme-compiler> ,inspect ##
|
|
'#{Node lambda loop 9}
|
|
|
|
[0: variant] 'lambda
|
|
[1: parent] '#{Node call letrec2}
|
|
[2: index] 2
|
|
[3: simplified?] #t
|
|
[4: flag] #f
|
|
[5: stuff-0] '#{Node call test}
|
|
[6: stuff-1] '(#{Variable n 3} #{Variable r 4})
|
|
[7: stuff-2] '(#{Name #} (n r) (if # r #))
|
|
[8: stuff-3] '#{Lambda-data}
|
|
|
|
----------------------------------------------------------------
|
|
Simplification.
|
|
|
|
The factorial procedure above is how it looks when originally translated
|
|
into a node tree. The next step in compilation is to simplify the tree,
|
|
doing constant folding, identifying call points, and so on. The simplified
|
|
version of FACT is:
|
|
|
|
7 (P fact_7 (c_6 n_1)
|
|
14 (LET* (((x_13 loop_2)
|
|
(letrec1))
|
|
12 (() (letrec2 x_13 ^loop_9)))
|
|
(jump 0 loop_2 n_1 '1)))
|
|
|
|
9 (J loop_9 (n_3 r_4)
|
|
(test 2 ^g_10 ^g_11 (< n_3 '2)))
|
|
|
|
10 (C g_10 ()
|
|
(unknown-return 0 c_6 r_4))
|
|
|
|
11 (C g_11 ()
|
|
(jump 0 loop_2 (+ '-1 n_3) (* n_3 r_4)))
|
|
|
|
The only change is that the loop has been turned into a JUMP lambda.
|
|
|
|
----------------------------------------------------------------
|
|
Still to describe:
|
|
protocol determination
|
|
simplifier moving stuff down, duplicating, later passes move values back up
|