Chapter 2

Lexical conventions

This section gives an informal account of some of the lexical conventions used in writing Scheme programs. For a formal syntax of Scheme, see section 7.1.

2.1 Identifiers

An identifieris any sequence of letters, digits, and “extended identifier characters” provided that it does not have a prefix which is a valid number. However, the . token (a single period) used in the list syntax is not an identifier.

All implementations of Scheme must support the following extended identifier characters:

! $ % & * + - . / : < = > ? @ ^ _ ~ Alternatively, an identifier can be represented by a sequence of zero or more characters enclosed within vertical lines (|), analogous to string literals. Any character, including whitespace characters, but excluding the backslash and vertical line characters, can appear verbatim in such an identifier. In addition, characters can be specified using either an <inline hex escape> or the same escapes available in strings.

For example, the identifier |H\x65;llo| is the same identifier as Hello, and in an implementation that supports the appropriate Unicode character the identifier |\x3BB;| is the same as the identifier λ. What is more, |\t\t| and |\x9;\x9;| are the same. Note that || is a valid identifier that is different from any other identifier.

Here are some examples of identifiers:

...                      +
+soup+                   <=?
->string                 a34kTMNs
lambda                   list->vector
q                        V17a
|two words|              |two\x20;words|
the-word-recursion-has-many-meanings See section 7.1.1 for the formal syntax of identifiers.

Identifiers have two uses within Scheme programs:

Any identifier can be used as a variableor as a syntactic keyword(see sections 3.1 and 4.3).
When an identifier appears as a literal or within a literal (see section 4.1.2), it is being used to denote a symbol (see section 6.5).

In contrast with earlier revisions of the report [20], the syntax distinguishes between upper and lower case in identifiers and in characters specified using their names. However, it does not distinguish between upper and lower case in numbers, nor in <inline hex escapes> used in the syntax of identifiers, characters, or strings. None of the identifiers defined in this report contain upper-case characters, even when they appear to do so as a result of the English-language convention of capitalizing the first word of a sentence.

The following directives give explicit control over case folding.

#!fold-case
#!no-fold-case

These directives can appear anywhere comments are permitted (see section 2.2) but must be followed by a delimiter. They are treated as comments, except that they affect the reading of subsequent data from the same port. The #!fold-case directive causes subsequent identifiers and character names to be case-folded as if by string-foldcase (see section 6.7). It has no effect on character literals. The #!no-fold-case directive causes a return to the default, non-folding behavior.

2.2 Whitespace and comments

Whitespace characters include the space, tab, and newline characters. (Implementations may provide additional whitespace characters such as page break.) Whitespace is used for improved readability and as necessary to separate tokens from each other, a token being an indivisible lexical unit such as an identifier or number, but is otherwise insignificant. Whitespace can occur between any two tokens, but not within a token. Whitespace occurring inside a string or inside a symbol delimited by vertical lines is significant.

The lexical syntax includes several comment forms. Comments are treated exactly like whitespace.

A semicolon (;) indicates the start of a line comment.The comment continues to the end of the line on which the semicolon appears. Another way to indicate a comment is to prefix a <datum> (cf. section 7.1.2) with #;and optional <whitespace>. The comment consists of the comment prefix #;, the space, and the <datum> together. This notation is useful for “commenting out” sections of code.

Block comments are indicated with properly nested #|and |# pairs.

#|
   The FACT procedure computes the factorial
   of a non-negative integer.
|#
(define fact
  (lambda (n)
    (if (= n 0)
        #;(= n 1)
        1        ;Base case: return 1
        (* n (fact (- n 1))))))

2.3 Other notations

For a description of the notations used for numbers, see section 6.2.


. + -: These are used in numbers, and can also occur anywhere in an identifier. A delimited plus or minus sign by itself is also an identifier. A delimited period (not occurring within a number or identifier) is used in the notation for pairs (section 6.4), and to indicate a rest-parameter in a formal parameter list (section 4.1.4). Note that a sequence of two or more periods is an identifier.
( ): Parentheses are used for grouping and to notate lists (section 6.4).
': The apostrophe (single quote) character is used to indicate literal data (section 4.1.2).
`: The grave accent (backquote) character is used to indicate partly constant data (section 4.2.8).
, ,@: The character comma and the sequence comma at-sign are used in conjunction with quasiquotation (section 4.2.8).
": The quotation mark character is used to delimit strings (section 6.7).
\: Backslash is used in the syntax for character constants (section 6.6) and as an escape character within string constants (section 6.7) and identifiers (section 7.1.1).
[ ] { } |: Left and right square and curly brackets (braces) are reserved for possible future extensions to the language.
#: The number sign is used for a variety of purposes depending on the character that immediately follows it:
#t #f: These are the boolean constants (section 6.3), along with the alternatives #true and #false.
#\: This introduces a character constant (section 6.6).
#(: This introduces a vector constant (section 6.8). Vector constants are terminated by ) .
#u8(: This introduces a bytevector constant (section 6.9). Bytevector constants are terminated by ) .
#e #i #b #o #d #x: These are used in the notation for numbers (section 6.2.5).
#<n>= #<n>#: These are used for labeling and referencing other literal data (section 2.4).

2.4 Datum labels

lexical syntax: #<n>=<datum>

lexical syntax: #<n>#

The lexical syntax #<n>=<datum> reads the same as <datum>, but also results in <datum> being labelled by <n>. It is an error if <n> is not a sequence of digits.

The lexical syntax #<n># serves as a reference to some object labelled by #<n>=; the result is the same object as the #<n>= (see section 6.1). Together, these syntaxes permit the notation of structures with shared or circular substructure.

(let ((x (list 'a 'b 'c)))
(set-cdr! (cddr x) x)
x) ⟹ #0=(a b c . #0#)
The scope of a datum label is the portion of the outermost datum in which it appears that is to the right of the label. Consequently, a reference #<n># can occur only after a label #<n>=; it is an error to attempt a forward reference. In addition, it is an error if the reference appears as the labelled object itself (as in #<n>= #<n>#), because the object labelled by #<n>= is not well defined in this case.

It is an error for a <program> or <library> to include circular references except in literals. In particular, it is an error for quasiquote (section 4.2.8) to contain them.

#1=(begin (display #\x) #1#)
⟹ error

Chapter 2 Lexical conventions

2.1 Identifiers

2.2 Whitespace and comments

2.3 Other notations

2.4 Datum labels

Chapter 2

Lexical conventions