[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Scheme-reports] Scheme r7rs syntax described by ABNF



Hello again.

I have an updated draft for the formal syntax of Scheme r7rs (based on r7rs-draft-8.pdf) written in ABNF.  This draft includes all sections except for quasiquotations.  The datum section has undergone some quickcheck-style unit testing.  I'd appreciate any review and feedback.

I also have one *minor* suggestion.  The definition for <library name part> seems inconsistent with the style used for <label>.

library-name-part     = identifier / 1*digit10

The above definition would follow same style used for <label>.

regards,

Joe N.

;;;
;;; Formal syntax for Scheme r7rs described by the following ABNF
;;; [RFC5234].  Although [RFC5234] refers to octets, the syntax
;;; described in this document are sequences of character numbers
;;; (code points) taken from Unicode.  The terminals in the ABNF
;;; productions are in terms of characters rather than bytes.
;;;
;;; A minimal number of delimiters (i.e. "DELIMITER") have been
;;; inserted to ensure the rules herein are parseable AS-IS by the
;;; "read" procedure.


;; r7rs Helper tokens

tab            = %x09                          ; \t
newline        = %x0A                          ; \n
return         = %x0D                          ; \r
space          = %x20                          ; \s
double-quote   = %x22                          ; "
number-sign    = %x23                          ; #
backslash      = %x5C                          ; \
vertical-line  = %x7C                          ; |

alarm-name     = %x61.6C.61.72.6D              ; alarm
backspace-name = %x62.61.63.6B.73.70.61.63.65  ; backspace
delete-name    = %x64.65.6C.65.74.65           ; delete
escape-name    = %x65.73.63.61.70.65           ; escape
newline-name   = %x6E.65.77.6C.69.6E.65        ; newline
null-name      = %x6E.75.6C.6C                 ; null
return-name    = %x72.65.74.75.72.6E           ; return
space-name     = %x73.70.61.63.65              ; space
tab-name       = %x74.61.62                    ; tab

unichar-low    = %x0000-D7FF
unichar-high   = %xE000-10FFFF
unichar        = unichar-low / unichar-high

non-backslash-or-double-quote                  = %x00-21 / %x23-5B / %x5D-D7FF / unichar-high
non-line-ending                                = %x00-09 / %x0B-0C / %x0E-D7FF / unichar-high
non-vertical-line                              = %x00-7B / %x7D-D7FF / unichar-high
non-vertical-line-or-backslash                 = %x00-5B / %x5D-7B / %x7D-D7FF / unichar-high
non-vertical-line-or-backslash-or-double-quote = %x00-21 / %x23-5B / %x5D-7B / %x7D-D7FF / unichar-high
non-vertical-line-or-number-sign               = %x00-22 / %x24-7B / %x7D-D7FF / unichar-high


;; r7rs Lexical structure

dot                   = "." DOT-DELIMITER

token                 = identifier / boolean / number
                      / character / string
                      / "(" / ")" / "#(" / "#u8(" / "'" / "`" / "," / ",@" / dot

DELIMITER             = whitespace intertoken-space
DOT-DELIMITER         = whitespace dot-intertoken-space
; delimiter           = whitespace / vertical-line
;                     / "(" / ")" / double-quote / ";"
; CAUTION: approximation for <delimiter> and <intertoken-space> with
; a special case for "."

intraline-whitespace  = space / tab

whitespace            = intraline-whitespace / line-ending

line-ending           = newline / return newline / return

comment               = ";" *non-line-ending line-ending
                      / nested-comment
                      / "#;" datum

dot-comment           = ";" *non-line-ending line-ending
                      / nested-comment

nested-comment        = "#|" comment-text *comment-cont "|#"

comment-text          = *non-vertical-line-or-number-sign
; CAUTION: approximation for <character sequence not containing #| or |#>

comment-cont          = nested-comment comment-text

directive             = ("#!fold-case" / "#!no-fold-case")

atmosphere            = whitespace / comment / directive

dot-atmosphere        = whitespace / dot-comment / directive

intertoken-space      = *atmosphere

dot-intertoken-space  = *dot-atmosphere

identifier            = initial *subsequent DELIMITER
                      / vertical-line *symbol-element vertical-line
                      / peculiar-identifier DELIMITER

initial               = letter / special-initial / inline-hex-escape

letter                = %x61-7A / %x41-5A   ; a-z / A-Z

special-initial       = "!" / "$" / "%" / "&" / "*" / "/" / ":" / "<" / "="
                      / ">" / "?" / "^" / "_" / "~"

subsequent            = initial / digit / special-subsequent

digit                 = digit10   ; 0-9

hex-digit             = digit16   ; 0-9 / a-f / A-F

explicit-sign         = "+" / "-"

special-subsequent    = explicit-sign / "." / "@"

inline-hex-escape     = "\x" hex-scalar-value ";"

digit16-lt-d          = digit10 / %x61-63 / %x41-43   ; 0-9 / a-c / A-c

hex-scalar-value      = *"0" ( (digit16-lt-d 3hex-digit)   ; %x000000-00CFFF
                             / (  "D" digit8 2hex-digit)   ; %x00D000-00D7FF
                             / ( ("E" / "F") 3hex-digit)   ; %x00E000-00FFFF
                             /         ("10" 4hex-digit) ) ; %x100000-10FFFF

peculiar-identifier   = explicit-sign
                      / explicit-sign sign-subsequent *subsequent
                      / explicit-sign "." dot-subsequent *subsequent
                      / "." dot-subsequent *subsequent
; CAUTION: Note that "+i", "-i" and infnan are exceptions to the
; peculiar-identifier rule; they are parsed as numbers, not
; identifiers.

dot-subsequent        = sign-subsequent / "."

sign-subsequent       = initial / explicit-sign / "@"

symbol-element        = non-vertical-line-or-backslash
                      / symbolstring-element / double-quote / "\|"

boolean               = ("#t" / "#f" / "#true" / "#false") DELIMITER

character             = ("#\" (character-any / character-name / "x" hex-scalar-value)) DELIMITER

character-any         = unichar

character-name        = alarm-name / backspace-name / delete-name
                      / escape-name / newline-name / null-name
                      / return-name / space-name / tab-name

string                = double-quote *string-element double-quote

string-element        = non-backslash-or-double-quote
                      / "\a" / "\b" / "\t" / "\n" / "\r" / ("\" double-quote) / "\\"
                      / "\" *intraline-whitespace line-ending *intraline-whitespace
                      / inline-hex-escape

symbolstring-element  = non-vertical-line-or-backslash-or-double-quote
                      / "\a" / "\b" / "\t" / "\n" / "\r" / ("\" double-quote) / "\\"
                      / "\" *intraline-whitespace line-ending *intraline-whitespace
                      / inline-hex-escape

bytevector            = "#u8(" *byte ")"

byte                  = (%x30-39                            ; 0-9
                        / %x31-39 %x30-39                   ; 10-99
                        / %x31 %x30-39 %x30-39              ; 100-199
                        / %x32 %x30-35 %x30-35) DELIMITER   ; 200-255

number                = (num2 / num8 / num10 / num16) DELIMITER

num2                  = prefix2 complex2
num8                  = prefix8 complex8
num10                 = prefix10 complex10
num16                 = prefix16 complex16

complex2              = real2 / real2 "@" real2
                      / real2 "+" ureal2 "i" / real2 "-" ureal2 "i"
                      / real2 "+i" / real2 "-i" / real2 infnan "i"
                      / "+" ureal2 "i" / "-" ureal2 "i"
                      / infnan "i" / "+i" / "-i"
complex8              = real8 / real8 "@" real8
                      / real8 "+" ureal8 "i" / real8 "-" ureal8 "i"
                      / real8 "+i" / real8 "-i" / real8 infnan "i"
                      / "+" ureal8 "i" / "-" ureal8 "i"
                      / infnan "i" / "+i" / "-i"
complex10             = real10 / real10 "@" real10
                      / real10 "+" ureal10 "i" / real10 "-" ureal10 "i"
                      / real10 "+i" / real10 "-i" / real10 infnan "i"
                      / "+" ureal10 "i" / "-" ureal10 "i"
                      / infnan "i" / "+i" / "-i"
complex16             = real16 / real16 "@" real16
                      / real16 "+" ureal16 "i" / real16 "-" ureal16 "i"
                      / real16 "+i" / real16 "-i" / real16 infnan "i"
                      / "+" ureal16 "i" / "-" ureal16 "i"
                      / infnan "i" / "+i" / "-i"

real2                 = sign ureal2
                      / infnan
real8                 = sign ureal8
                      / infnan
real10                = sign ureal10
                      / infnan
real16                = sign ureal16
                      / infnan

ureal2                = uinteger2
                      / uinteger2 "/" uinteger2
ureal8                = uinteger8
                      / uinteger8 "/" uinteger8
ureal10               = uinteger10
                      / uinteger10 "/" uinteger10
                      / decimal10
ureal16               = uinteger16
                      / uinteger16 "/" uinteger16

decimal10             = uinteger10 suffix
                      / "." 1*digit10 suffix
                      / 1*digit10 "." *digit10 suffix

uinteger2             = 1*digit2
uinteger8             = 1*digit8
uinteger10            = 1*digit10
uinteger16            = 1*digit16

prefix2               = radix2 exactness
                      / exactness radix2
prefix8               = radix8 exactness
                      / exactness radix8
prefix10              = radix10 exactness
                      / exactness radix10
prefix16              = radix16 exactness
                      / exactness radix16

infnan                = "+inf.0" / "-inf.0" / "+nan.0" / "-nan.0"

suffix                = [exponent-marker sign 1*digit10]

exponent-marker       = "e" / "s" / "f" / "d" / "l"

sign                  = ["+" / "-"]

exactness             = ["#i" / "#e"]

radix2                = "#b"
radix8                = "#o"
radix10               = ["#d"]
radix16               = "#x"

digit2                = %x30-31   ; 0-1
digit8                = %x30-37   ; 0-7
digit10               = %x30-39   ; 0-9
digit16               = digit10 / %x61-66 / %x41-46   ; 0-9 / a-f / A-F


;; r7rs External representations

datum                 = simple-datum / compound-datum
                      / label "=" datum / label "#"

simple-datum          = boolean / number
                      / character / string
                      / symbol / bytevector

symbol                = identifier

compound-datum        = list / vector / abbreviation

list                  = "(" *datum ")"
                      / "(" 1*datum dot datum ")"

abbreviation          = abbrev-prefix datum

abbrev-prefix         = "'" / "`" / "," / ",@"

vector                = "#(" *datum ")"

label                 = "#" 1*digit10


;; r7rs Expressions

expression            = identifier
                      / literal
                      / procedure-call
                      / lambda-expression
                      / conditional
                      / assignment
                      / derived-expression
                      / macro-use
                      / macro-block
                      / includer

literal               = quotation / self-evaluating

self-evaluating       = boolean / number / vector
                      / character / string / bytevector

quotation             = "'" datum
                      / "(" "quote" DELIMITER datum ")"

procedure-call        = "(" operator *operand ")"

operator              = expression

operand               = expression

lambda-expression     = "(" "lambda" DELIMITER formals body ")"

formals               = "(" *identifier ")"
                      / identifier
                      / "(" 1*identifier dot identifier ")"

body                  = *definition sequence

sequence              = *command expression

command               = expression

conditional           = "(" "if" DELIMITER test consequent alternate ")"

test                  = expression

consequent            = expression

alternate             = [expression]

assignment            = "(" "set!" DELIMITER identifier expression ")"

derived-expression    = "(" "cond" DELIMITER 1*cond-clause ")"
                      / "(" "cond" DELIMITER *cond-clause "(" "else" DELIMITER sequence ")" ")"
                      / "(" "case" DELIMITER expression 1*case-clause ")"
                      / "(" "case" DELIMITER expression *case-clause "(" "else" DELIMITER sequence ")" ")"
                      / "(" "case" DELIMITER expression *case-clause "(" "else" DELIMITER "=>" DELIMITER recipient ")" ")"
                      / "(" "and" DELIMITER *test ")"
                      / "(" "or" DELIMITER *test ")"
                      / "(" "when" DELIMITER test sequence ")"
                      / "(" "unless" DELIMITER test sequence ")"
                      / "(" "let" DELIMITER "(" *binding-spec ")" body ")"
                      / "(" "let" DELIMITER identifier "(" *binding-spec ")" body ")"
                      / "(" "let*" DELIMITER "(" *binding-spec ")" body ")"
                      / "(" "letrec" DELIMITER "(" *binding-spec ")" body ")"
                      / "(" "letrec*" DELIMITER "(" *binding-spec ")" body ")"
                      / "(" "let-values" DELIMITER "(" *mv-binding-spec ")" body ")"
                      / "(" "let*-values" DELIMITER "(" *mv-binding-spec ")" body ")"
                      / "(" "begin" DELIMITER sequence ")"
                      / "(" "do" DELIMITER "(" *iteration-spec ")" "(" test do-result ")"
                            *command ")"
                      / "(" "delay" DELIMITER expression ")"
                      / "(" "delay-force" DELIMITER expression ")"
                      / "(" "parameterize" DELIMITER "(" *("(" expression expression ")") ")" ")"
                      / "(" "guard" DELIMITER "(" identifier *cond-clause ")" body ")"
                      / quasiquotation
                      / "(" "case-lambda" DELIMITER *case-lambda-clause ")"

cond-clause           = "(" test sequence ")"
                      / "(" test ")"
                      / "(" test "=>" DELIMITER recipient ")"

recipient             = expression

case-clause           = "(" "(" *datum ")" sequence ")"
                      / "(" "(" *datum ")" "=>" DELIMITER recipient ")"

binding-spec          = "(" identifier expression ")"

mv-binding-spec       = "(" formals expression ")"

iteration-spec        = "(" identifier init step ")"
                      / "(" identifier init ")"

case-lambda-clause    = "(" formals body ")"

init                  = expression

step                  = expression

do-result             = expression

macro-use             = "(" keyword *datum ")"

keyword               = identifier

macro-block           = "(" "let-syntax" DELIMITER "(" *syntax-spec ")" body ")"
                      / "(" "letrec-syntax" DELIMITER "(" *syntax-spec ")" body ")"

syntax-spec           = "(" keyword transformer-spec ")"

includer              = "(" "include" DELIMITER 1*string ")"
                      / "(" "include-ci" DELIMITER 1*string ")"


;; r7rs Quasiquotations (TBD)

quasiquotation        = "`" "|TBD|"
                      / "(" "quasiquote" "|TBD|" ")"


;; r7rs Transformers

transformer-spec      = "(" "syntax-rules" DELIMITER "(" *identifier ")" *syntax-rule ")"
                      / "(" "syntax-rules" DELIMITER identifier "(" *identifier ")"
                            *syntax-rule ")"

syntax-rule           = "(" pattern template ")"

pattern               = pattern-identifier
                      / underscore
                      / "(" *pattern ")"
                      / "(" 1*pattern dot pattern ")"
                      / "(" *pattern pattern ellipsis *pattern ")"
                      / "(" *pattern pattern ellipsis *pattern
                            dot pattern ")"
                      / "#(" *pattern ")"
                      / "#(" *pattern pattern ellipsis *pattern ")"
                      / pattern-datum

pattern-datum         = string
                      / character
                      / boolean
                      / number

template              = pattern-identifier
                      / "(" *template-element ")"
                      / "(" 1*template-element dot template ")"
                      / "#(" *template-element ")"
                      / template-datum

template-element      = template
                      / template ellipsis

template-datum        = pattern-datum

pattern-identifier    = initial *subsequent DELIMITER
                      / vertical-line *symbol-element vertical-line
                      / pattern-peculiar-identifier DELIMITER

ellipsis              = "..." DELIMITER

underscore            = "_" DELIMITER

pattern-peculiar-identifier = explicit-sign
                      / explicit-sign sign-subsequent *subsequent
                      / explicit-sign "." dot-subsequent *subsequent
                      / "." dot-subsequent *pattern-subsequent
; CAUTION: Note that "+i", "-i" and infnan are exceptions to the
; peculiar-pattern rule; they are parsed as numbers, not
; identifiers.

pattern-subsequent    = initial / digit / pattern-special-subsequent

pattern-special-subsequent = explicit-sign / "@"


;; r7rs Programs and definitions

program               = 1*import-declaration 1*command-or-definition

command-or-definition = command
                      / definition
                      / "(" "begin" DELIMITER 1*command-or-definition ")"

definition            = "(" "define" DELIMITER identifier expression ")"
                      / "(" "define" DELIMITER "(" identifier def-formals ")" body ")"
                      / syntax-definition
                      / "(" "define-values" DELIMITER def-formals body ")"
                      / "(" "define-record-type" DELIMITER identifier
                            constructor identifier *field-spec ")"
                      / "(" "begin" DELIMITER *definition ")"

def-formals           = *identifier
                      / *identifier dot identifier

constructor           = "(" identifier *field-name ")"

field-spec            = "(" field-name accessor ")"
                      / "(" field-name accessor mutator ")"

field-name            = identifier

accessor              = identifier

mutator               = identifier

syntax-definition     = "(" "define-syntax" DELIMITER keyword transformer-spec ")"


;; r7rs Libraries

library               = "(" "define-library" DELIMITER library-name
                            *library-declaration ")"

library-name          = "(" 1*library-name-part ")"

library-name-part     = identifier
                      / 1*digit10 DELIMITER
; CAUTION: need to confirm correction to r7rs spec for above <uinteger 10>

library-declaration   = "(" "export" DELIMITER *export-spec ")"
                      / import-declaration
                      / "(" "begin" DELIMITER *library-declaration ")"
                      / includer
                      / "(" "cond-expand" DELIMITER 1*cond-expand-clause ")"
                      / "(" "cond-expand" DELIMITER 1*cond-expand-clause
                            "(" "else" DELIMITER *library-declaration ")" ")"

import-declaration    = "(" "import" DELIMITER 1*import-set ")"

export-spec           = identifier
                      / "(" "rename" DELIMITER identifier identifier ")"

import-set            = library-name
                      / "(" "only" DELIMITER import-set 1*identifier ")"
                      / "(" "except" DELIMITER import-set 1*identifier ")"
                      / "(" "prefix" DELIMITER import-set identifier ")"
                      / "(" "rename" DELIMITER import-set
                            "(" identifier 1*identifier ")" ")"

cond-expand-clause    = "(" feature-requirement *library-declaration ")"

feature-requirement   = identifier
                      / library-name
                      / "(" "and" DELIMITER *feature-requirement ")"
                      / "(" "or" DELIMITER *feature-requirement ")"
                      / "(" "not" DELIMITER feature-requirement ")"


On Dec 29, 2012, at 24:47 , Joseph Wayne Norton <norton@x> wrote:

> 
> Hello.
> 
> In the process of reviewing the r7rs draft, I decided to draft the formal syntax of Scheme r7rs written in ABNF.  This draft only covers tokens (including datum).  
> 
> This kind of specification would be helpful to me and possibly to others.  I'd appreciate any review and feedback.
> 
> I intend to draft the other sections (i.e. expressions, quasiquotations, transformers, programs and definitions, and libraries) as well.
> 
> thanks,
> 
> Joe N.
> 
> <scheme_r7rs_tokens.abnf>

_______________________________________________
Scheme-reports mailing list
Scheme-reports@x
http://lists.scheme-reports.org/cgi-bin/mailman/listinfo/scheme-reports