This section gives an overview of Scheme’s semantics. A detailed informal semantics is the subject of chapters 3 through 6. For reference purposes, section 7.2 provides a formal semantics of Scheme.
Scheme is a statically scoped programming language. Each use of a variable is associated with a lexically apparent binding of that variable.
Scheme is a dynamically typed language. Types are associated with values (also called objects) rather than with variables. Statically typed languages, by contrast, associate types with variables and expressions as well as with values.
All objects created in the course of a Scheme computation, including procedures and continuations, have unlimited extent. No Scheme object is ever destroyed. The reason that implementations of Scheme do not (usually!) run out of storage is that they are permitted to reclaim the storage occupied by an object if they can prove that the object cannot possibly matter to any future computation. Implementations of Scheme are required to be properly tail-recursive. This allows the execution of an iterative computation in constant space, even if the iterative computation is described by a syntactically recursive procedure. Thus with a properly tail-recursive implementation, iteration can be expressed using the ordinary procedure-call mechanics, so that special iteration constructs are useful only as syntactic sugar. See section 3.5.
Scheme procedures are objects in their own right. Procedures can be created dynamically, stored in data structures, returned as results of procedures, and so on. One distinguishing feature of Scheme is that continuations, which in most other languages only operate behind the scenes, also have “first-class” status. Continuations are useful for implementing a wide variety of advanced control constructs, including non-local exits, backtracking, and coroutines. See section 6.10.
Arguments to Scheme procedures are always passed by value, which means that the actual argument expressions are evaluated before the procedure gains control, regardless of whether the procedure needs the result of the evaluation. Scheme’s model of arithmetic is designed to remain as independent as possible of the particular ways in which numbers are represented within a computer. In Scheme, every integer is a rational number, every rational is a real, and every real is a complex number. Thus the distinction between integer and real arithmetic, so important to many programming languages, does not appear in Scheme. In its place is a distinction between exact arithmetic, which corresponds to the mathematical ideal, and inexact arithmetic on approximations. Exact arithmetic is not limited to integers.
Scheme, like most dialects of Lisp, employs a fully parenthesized prefix notation for programs and other data; the grammar of Scheme generates a sublanguage of the language used for data. An important consequence of this simple, uniform representation is that Scheme programs and data can easily be treated uniformly by other Scheme programs. For example, the eval procedure evaluates a Scheme program expressed as data.
The read procedure performs syntactic as well as lexical decomposition of the data it reads. The read procedure parses its input as data (section 7.1.2), not as program.
The formal syntax of Scheme is described in section 7.1.
Every identifier defined in this report appears in one or more of several libraries. Identifiers defined in the base library are not marked specially in the body of the report. This library includes the core syntax of Scheme and generally useful procedures that manipulate data. For example, the variable abs is bound to a procedure of one argument that computes the absolute value of a number, and the variable + is bound to a procedure that computes sums. The full list all the standard libraries and the identifiers they export is given in Appendix A.
All implementations of Scheme:
Must provide the base library and all the identifiers exported from it.
May provide or omit the other libraries given in this report, but each library must either be provided in its entirety, exporting no additional identifiers, or else omitted altogether.
May provide other libraries not described in this report.
May also extend the function of any identifier in this report, provided the extensions are not in conflict with the language reported here.
Must support portable code by providing a mode of operation in which the lexical syntax does not conflict with the lexical syntax described in this report.
When speaking of an error situation, this report uses the phrase “an error is signaled” to indicate that implementations must detect and report the error. An error is signaled by raising a non-continuable exception, as if by the procedure raise as described in section 6.11. The object raised is implementation-dependent and need not be distinct from objects previously used for the same purpose. In addition to errors signaled in situations described in this report, programmers can signal their own errors and handle signaled errors.
The phrase “an error that satisfies predicate is signaled” means that an error is signaled as above. Furthermore, if the object that is signaled is passed to the specified predicate (such as file-error? or read-error?), the predicate returns #t.
If such wording does not appear in the discussion of an error, then implementations are not required to detect or report the error, though they are encouraged to do so. Such a situation is sometimes, but not always, referred to with the phrase “an error.” In such a situation, an implementation may or may not signal an error; if it does signal an error, the object that is signaled may or may not satisfy the predicates error-object?, file-error?, or read-error?. Alternatively, implementations may provide non-portable extensions.
For example, it is an error for a procedure to be passed an argument of a type that the procedure is not explicitly specified to handle, even though such domain errors are seldom mentioned in this report. Implementations may signal an error, extend a procedure’s domain of definition to include such arguments, or fail catastrophically.
This report uses the phrase “may report a violation of an implementation restriction” to indicate circumstances under which an implementation is permitted to report that it is unable to continue execution of a correct program because of some restriction imposed by the implementation. Implementation restrictions are discouraged, but implementations are encouraged to report violations of implementation restrictions.
For example, an implementation may report a violation of an implementation restriction if it does not have enough storage to run a program, or if an arithmetic operation would produce an exact number that is too large for the implementation to represent.
If the value of an expression is said to be “unspecified,” then the expression must evaluate to some object without signaling an error, but the value depends on the implementation; this report explicitly does not say what value is returned.
Finally, the words and phrases “must,” “must not,” “shall,” “shall not,” “should,” “should not,” “may,” “required,” “recommended,” and “optional,” although not capitalized in this report, are to be interpreted as described in RFC 2119 [3]. They are used only with reference to implementer or implementation behavior, not with reference to programmer or program behavior.
Chapters 4 and 6 are organized into entries. Each entry describes one language feature or a group of related features, where a feature is either a syntactic construct or a procedure. An entry begins with one or more header lines of the form
category:
template
for identifiers in the base library, or
name library
category:
template
where
name is the short name of a library as defined in Appendix A.
If
category is “syntax,” the entry describes an expression type, and the template gives the syntax of the expression type. Components of expressions are designated by syntactic variables, which are written using angle brackets, for example <expression> and <variable>. Syntactic variables are intended to denote segments of program text; for example, <expression> stands for any string of characters which is a syntactically valid expression. The notation
<thing1> …
indicates zero or more occurrences of a <thing>, and
<thing1> <thing2> …
indicates one or more occurrences of a <thing>.
If
category is “auxiliary syntax,” then the entry describes a syntax binding that occurs only as part of specific surrounding expressions. Any use as an independent syntactic construct or variable is an error.
If
category is “procedure,” then the entry describes a procedure, and the header line gives a template for a call to the procedure. Argument names in the template are
italicized. Thus the header line
vector
k)
indicates that the procedure bound to the vector-ref variable takes two arguments, a vector
vector and an exact non-negative integer
k (see below). The header lines
k)
k
fill)
indicate that the make-vector procedure must be defined to take either one or two arguments.
It is an error for a procedure to be presented with an argument that it is not specified to handle. For succinctness, we follow the convention that if an argument name is also the name of a type listed in section 3.2, then it is an error if that argument is not of the named type. For example, the header line for vector-ref given above dictates that the first argument to vector-ref is a vector. The following naming conventions also imply type restrictions:
|
The names
start and
end are used as indexes into strings, vectors, and bytevectors. Their use implies the following:
It is an error if
start is greater than
end.
It is an error if
end is greater than the length of the string, vector, or bytevector.
If
start is omitted, it is assumed to be zero.
If
end is omitted, it assumed to be the length of the string, vector, or bytevector.
The index
start is always inclusive and the index
end is always exclusive. As an example, consider a string. If
start and
end are the same, an empty substring is referred to, and if
start is zero and
end is the length of
string, then the entire string is referred to.
The symbol “⟹” used in program examples is read “evaluates to.” For example,
(* 5 8) ⟹ 40 means that the expression (* 5 8) evaluates to the object 40. Or, more precisely: the expression given by the sequence of characters “(* 5 8)” evaluates, in an environment containing the base library, to an object that can be represented externally by the sequence of characters “ 40.” See section 3.3 for a discussion of external representations of objects.
By convention, ? is the final character of the names of procedures that always return a boolean value. Such procedures are called predicates. Predicates are generally understood to be side-effect free, except that they may raise an exception when passed the wrong type of argument.
Similarly, ! is the final character of the names of procedures that store values into previously allocated locations (see section 3.4). Such procedures are called mutation procedures. The value returned by a mutation procedure is unspecified.
By convention, “->” appears within the names of procedures that take an object of one type and return an analogous object of another type. For example, list->vector takes a list and returns a vector whose elements are the same as those of the list.
A command is a procedure that does not return useful values to its continuation.
A thunk is a procedure that does not accept arguments.