NETB131 Programming Project                                   

Programming language SNOBOL

An overview


Yoan Grass Batista, F42225

1. History and Special Features

SNOBOL, which is the acronym for String-Oriented Symbolic Language was designed in the early 1960s by three people at Bell Laboratories: (D.J. Farber, R.E. Griswold, and F.P. Polonsky (Farber et al., 1964).

SNOBOL is a special purposed language developed to provide a powerful means of doing character string manipulation. Accordingly SNOBOL has a collection of powerful operations for doing string pattern matching. The most common early application of SNOBOL was to write text editors. Because of the dynamic nature of SNOBOL and its interpreter implementation, it is now considered too slow for such applications. In fact SNOBOL is now close to being completely unused.

 

SNOBOL4 was implemented using string macros. This realized a virtual machine so that it could be implemented on a variety of different machines.

 

It is really a combination of two kinds of languages: a conventional language, with several data types and a simple but powerful control structure, and a pattern language, with a structure all its own. The conventional language is not block structured, and may appear old-fashioned. The pattern language, however, remains unsurpassed, and is unique to SNOBOL4.

Significant Language Features

*      String Manipulation Operations - has several of these operations which allow a string to be tested for contents and make replacements in the string.

*      Pattern Matching- involves examining substrings, for the occurrences of specified substrings. Substrings are also known as patterns.

*      Dynamically typed - SNOBOL4 has no type declarations and no restrictions on the data type of the value of any variable.

*      Interpretive language - The compiler translates the program into a notation that the interpreter can easily execute.

 

2. "Hello World" Program

Description

This program demonstrates the text output function of the SNOBOL4 programming language by displaying the message "Hello world!".

Source Code

OUTPUT = 'Hello World!'

END

Sample Run

Hello world!

Program Notes

This program was complied and tested using Vanilla SNOBOL.

 

3. Fundamental Data Types

The data types used by SNOBOL Language are as follows:

 

  Data Type                                   Formal Identification

 

string

STRING

integer

INTEGER

real number

REAL

pattern structure

PATTERN

array

ARRAY

table

TABLE

created Name

NAME

unevaluated expression

EXPRESSION

object code

CODE

programmer-defined

Data type name

external

EXTERNAL

 

Integers, reals, strings, patterns, arrays, and tables are types of data objects that are built into the SNOBOL4 language. Facilities are provided in the language to permit a programmer to define additional types of data. This facilitates representation of structural relationships inherent in data.

Modular Units:

Many SNOBOL4 procedures are invoked by functions built into the system, called primitive functions. Operations that occur frequently are implemented as primitive functions for efficiency. In addition, facilities are available for a programmer to define his own source language function. A programmer defined function in SNOBOL4 must include:
i) a call to the primitive function DEFINE for each programmer-defined function.
ii) a procedure, written in SNOBOL$, for each function.
Many functions are conveniently defined recursively. For example, factorials may be defined as
fact(0) = 1
fact(n) = n*fact(n-1) for n>0

Simple Data Type

SNOBOL4 has several different basic types, but has a mechanism to define hundreds more as aggregates of others. Initially, we'll discuss the two most basic: integers and strings.

Integers

An integer is a simple whole number, without a fractional part. In SNOBOL4, its value can range from -32767 to +32767. It appears without quotation marks, and commas should not be used to group digits. Here are some acceptable integers:

    14    -234    0    0012    +12832    -9395    +0

These are incorrect in SNOBOL4:

    13.4             fractional part is not allowed
    49723            larger than 32767
    -                number must contain at least one digit
    3,076            comma is not allowed

Use the CODE.SNO program to test different integer values. Try both legal and illegal values. Here are some sample test lines:

    Enter SNOBOL4 statements:

    ?       OUTPUT = 42

    42

    ?       OUTPUT = -825

    -825

    ?       OUTPUT = 73768

    Compilation error: Erroneous integer, re-enter:

Reals

Vanilla SNOBOL4 does not include real numbers. They are available in SNOBOL4+, Catspaw's highly enhanced implementation of the SNOBOL4 programming language.

Strings

A string is an ordered sequence of characters. The order of the characters is important: the strings AB and BA are different. Characters are not restricted to printing characters; all of the 256 combinations possible in an 8-bit byte are allowed.

Normally, the maximum length of a string is 5,000 characters, although you can tell SNOBOL4 to accept longer strings. A string of length zero (no characters) is called the null string. At first, you may find the idea of an empty string disturbing: it's a string, but it has no characters. Its role in SNOBOL4 is similar to the role of zero in the natural number system.

Strings may appear literally in your program, or may be created during execution. To place a literal string in your program, enclose it in apostrophes (')1 or double quotation marks ("). Either may be used, but the beginning and ending marks must be the same. The string itself may contain one type of mark if the other is used to enclose the string. The null string is represented by two successive marks, with no intervening characters. Here are some samples to try with CODE.SNO:

    ?       OUTPUT = 'STRING LITERAL'

    STRING LITERAL

    ?       OUTPUT = "So is this"

    So is this

    ?       OUTPUT = ''

 

    ?       OUTPUT = 'WHO COINED THE WORD "BYTE"?'

    WHO COINED THE WORD "BYTE"?

    ?       OUTPUT = "WON'T"

    WON'T

Variables

A variable is a place to store an item of data. The number of variables you may have is unlimited, provided you give each one a unique name. Think of a variable as a box, marked on the outside with a permanent name, able to hold any data value or type. Many programming languages require that you formally declare what kind of entity the box will contain -- integer, real, string, etc. -- but SNOBOL4 is more flexible. A variable's contents may change repeatedly during program execution. The size of the box contracts or expands as necessary. One moment it might contain an integer, then a 2,000 character string, then the null string; in fact, any SNOBOL4 data type.

There are only a few rules about composing a variable's name when it appears in your program:

  1. The name must begin with an upper- or lower-case letter.
  2. If it is more than one character long, the remaining characters may be any combination of letters, numbers, or the characters period (.) and underscore (_).
  3. The name may not be longer than the maximum line length (120 characters).

Here are some correct SNOBOL4 names:

    WAGER     P23     VerbClause     SUM.OF.SQUARES     Buffer

Normally, SNOBOL4 performs "case-folding" on names. Lower-case alphabetic characters are changed to upper-case when they appear in names -- Buffer and BUFFER are equivalent. Naturally, casefolding of data does not occur within a string literal. Casefolding can be disabled by the command line option /C.

In some languages, the initial value of a new variable is undefined. SNOBOL4 guarantees that a new variable's initial value is the null string. However, except in very small programs, you should always initialize variables. This prevents unexpected results when a program is modified or a program segment is reexecuted.

You store something in a variable by making it the object of an assignment operation. You can retrieve its contents simply by using it wherever its value is needed. Using a variable's value is nondestructive; the value in the box remains unchanged. Try creating some variables using CODE.SNO:

    ?       ABC = 'EGG'

    ?       OUTPUT = ABC

    EGG

    ?       D = 'SHELL'

    ?       OUTPUT = abc d             (Same as ABC D)

    EGGSHELL

    ?       OUTPUT = NONESUCH          (New variable is null)

     

    ?       OUTPUT = ABC NULL D

    EGGSHELL

    ?       N1 = 43

    ?       D = 17

    ?       OUTPUT = N1 + D

    60

    ?       output = ABC D

    EGG17

 

OUTPUT is a variable with special properties; when a value is stored in its box, it is also displayed on your screen. There is a corresponding variable named INPUT, which reads data from your keyboard. Its box has no permanent contents. Whenever SNOBOL4 is asked to fetch its value, a complete line is read from the keyboard and used instead. If INPUT were used twice in one statement, two separate lines of input would be read. Try these examples:

 

    ?       OUTPUT = INPUT

    TYPE ANYTHING YOU DESIRE

    TYPE ANYTHING YOU DESIRE

    ?       TWO.LINES = INPUT '-AND-' INPUT

    FIRST LINE

    SECOND LINE

    ?       OUTPUT = TWO.LINES

    FIRST LINE-AND-SECOND LINE

SNOBOL4 variables are global in scope -- any variable may be referenced anywhere in the program.

 

4. Basic Control flow and functions

Success and Failure

Success and failure are as important in SNOBOL4 as they are in life. Success and failure are unmistakable signals; something either worked, or it didn't. Significant program conciseness is achieved by recognizing that data values and signals are fundamentally different entities.

The elements of a statement provide values and signals as computation proceeds. SNOBOL4 accumulates both, and stops executing a particular statement when it finds it cannot succeed. Program flow can be altered based upon this success or failure.

The success signal will have a value result associated with it. In situations in which the signal itself is the desired object, the result value may only be the null string. The failure signal has no associated value. (In some instances, it may be helpful to view failure as meaning "failure to produce a result.")

Previously, we introduced the variable INPUT, which reads a line from the keyboard. In general, INPUT can be made to read from any disk file. The line read may be any character string, including the null string if it is an empty line. If any string might appear, then there is no special value we can test for to detect End-of-File. Success and failure provide an elegant alternative to testing for special values.

When we retrieve a value from INPUT, we normally get a string and a success signal. But when End-of-File is encountered, we get a failure signal instead, and no value.

Since control-Z (or function key 6) allows you to enter an Endof-File from the keyboard, we can easily demonstrate this type of failure. As you've noticed, the CODE.SNO program reports the success or failure of each statement. So far, all examples have succeeded. Now try this one:

    ?       OUTPUT = INPUT

    ^Z

    Failure

Success and failure are control signals, and appear only during the execution of a statement. They cannot be stored in a variable, which holds values only.

There is much more which can be done with success and failure, but to understand their use, you'll need to know how SNOBOL4 statements are constructed.

A Snobol4 Statement

In general, a SNOBOL4 statement looks like this:

    Label   Statement body                                 :GOTO

The label is optional, and is omitted by placing a blank or tab in the first character position. The GOTO is also optional, and can be eliminated simply by omitting it and the colon. In fact, even the statement body is optional. You can have a program line consisting of just a label or a GOTO field.

The Label Field

SNOBOL4 normally executes the statements of a program in sequence. The ability to transfer control from one statement to another, perhaps conditionally, makes SNOBOL4 much more usable.

Labels provide names for statements. If present, they must begin in the first character position of a statement, and must start with a letter or number. Additional characters may be anything but blank or tab. Like variable names, lower-case letters are equivalent to upper-case when case-folding (the default).

The GOTO Field

Transfer of control is made possible by the GOTO. It interrupts the normal sequential execution of statements by telling SNOBOL4 which statement to execute after the present one. The GOTO field appears at the end of the statement, preceded by a colon (:), and has one of these forms:

    :(label)

    :S(label)

    :F(label)

    :S(label1) F(label2)

White space is required before the colon. "Label" is the name given the target statement, and must be enclosed in parentheses. If the first form is used, execution resumes at the referenced statement, unconditionally. In the second and third forms, transfer occurs only if the statement has succeeded or failed, respectively. Otherwise, execution proceeds to the next statement in line. If the fourth form is used, transfer is made to label1 if the statement succeeded or to label2 if it failed. A statement with a label and a GOTO would look like this:

    COPY    OUTPUT = INPUT           :F(DONE)

Now let's write a short program which copies keyboard input to the screen, and reports the total number of lines. If you are an accurate typist, you can type it into SNOBOL4 directly. Otherwise, you should use your text editor to create a file containing the program text. First stop the CODE.SNO program by typing END:

    ?END

 

    B>SNOBOL4 CON

 

    Vanilla SNOBOL4      Version 2.14.

    (c) Copyright 1984,1988 Catspaw, Inc. All Rights Reserved.

    Enter program, terminate with "END"

    ?       N = 0

    ?COPY   OUTPUT = INPUT           :F(DONE)

    ?       N = N + 1                :(COPY)

    ?DONE   OUTPUT = 'THERE WERE ' N ' LINES'

    ?END

 

    No errors

 

    TYPE IN A TEST LINE
    TYPE IN A TEST LINE

 

    AND ANOTHER
    AND ANOTHER

 

    ^Z
    THERE WERE 2 LINES

 

    B>

We start the line count in variable N at 0. The next statement has a label, COPY, a statement body, and a GOTO field. It is an assignment statement, and begins execution by reading a line of input. If INPUT successfully obtains a line, the result is stored in OUTPUT. The GOTO field is only testing for failure, so SNOBOL4 proceeds to the next statement, where N is incremented, and the unconditional GOTO transfers back to statement COPY.

When an End-of-File is read, variable INPUT signals failure. Execution of this statement terminates immediately, without performing the assignment, and transfers to the statement labeled DONE. The number of lines is displayed, and control flows into the END statement, stopping the program.

5 Built-In Functions

A function is analogous to an operator; it operates on data to produce a result. The data objects are called the arguments of the function. The result returned -- the function of the arguments -- may have two components: the success or failure signal; and for success, a value. The value may be any data type.

A function is used by writing its name and a list of arguments enclosed by parentheses:

    FUNCTION_NAME(ARG1, ARG2, ..., ARGn)

It may appear in your program anywhere a constant is allowed -- in expressions, patterns, even as the argument of another function. If the function has more than one argument, they should be separated by commas. If trailing arguments are omitted, SNOBOL4 will supply the null string instead. Some functions, such as one that returns the current date, have no arguments at all.

SNOBOL4 provides a large number of predefined functions, and allows you to define your own. The large repertoire of built-in functions makes SNOBOL4 programming easier. Most functions are concerned with pattern matching, input/output, and advanced features of the language. Here we'll introduce a few simple conditional, numeric, and string functions to give you an idea of the variety. Try them interactively with CODE.SNO.

Conditional Functions

These functions fail or succeed depending upon their arguments. They are sometimes called predicate functions because the success of an expression using them is predicated upon their success. If they succeed, they return the null string as their value.

    Function               Succeeds if:

 

    IDENT(S,T)       S and T are identical.  S and T may be constants
                     or variables with any data type.  To
                     be identical, the arguments must have the
                     same data type and value.  Since omitted arguments
                     default to the null string, IDENT(S)
                     succeeds if S is the null string.

 

    DIFFER(S,T)      S and T are different.  DIFFER is the opposite
                     of IDENT.  DIFFER(S) succeeds if S is
                     not the null string.

 

    EQ(X,Y)          Integers X and Y are equal.  X and Y must be
                     integers, or strings which can be converted
                     to integers.

 

    NE(X,Y)          Integers X and Y are not equal.

 

    GE(X,Y)          Integer X is greater than or equal to Y.

 

    GT(X,Y)          Integer X is greater than Y.

 

    LE(X,Y)          Integer X is less than or equal to Y.

 

    LT(X,Y)          Integer X is less than Y.

 

    INTEGER(X)       X is an integer, or a string which can be
                     converted to an integer.

 

    LGT(S,T)         String S is lexically greater than string T
                     using a character-by-character comparison.

Leading blanks may be used in front of a argument for readability. Here are some exercises for CODE.SNO:

 

    ?       N = 3

    ?       EQ(N, 3)

    Success

    ?       IDENT(N, 3)

    Success

    ?       EQ(3, "3")

    Success

    ?IDENT(3, "3")                   (integer and string)

    Failure

    ?       EQ(N, 4)

    Failure

    ?       NE(N, 4)

    Success

    ?       INTEGER(N)

    Success

    ?       INTEGER('47')

    Success

    ?       DIFFER('ABC', 'abc')

    Success

    ?       IDENT('a' 'b' 'c', 'abc')

    Success

    ?       LGT('ABC', 'ABD')

    Failure

When any of these functions succeed, they return a null string. Since other statement elements are not altered when concatenated with the null string, this provides an easy way to interpose tests and construct loops. Suppose we execute the statement:

    N = LT(N,10) N + 1       :S(LOOP)

Function LT fails if N is 10 or greater. If the statement fails, the assignment is not performed, and execution continues with the next statement. However, if LT succeeds, its null string value is concatenated with the expression N + 1, and the result is assigned to N. This has the effect of increasing N by 1 and transferring to statement LOOP until N reaches 10.

If we concatenated several conditional functions together, and they all succeeded, the result would still be the null string. If any function failed, the entire concatenation would fail. This gives us a simple way to produce a successful result if a number of conditions are all true. For example, the expression:

    INTEGER(N) GE(N,5) LE(N,100)

Succeeds if N is an integer between 5 and 100.

Other Functions

These functions always succeed; all but REMDR and SIZE return a string result.

    DATE()              Return current date and time as a string.

 

    DUPL(S,N)           Duplicate string S, N times.

 

    REMDR(X,Y)          Produce the remainder (modulus) of X / Y.

 

    REPLACE(S1,S2,S3)   Return string S1 after performing the
                        character replacements specified by strings
                        S2 and S3.  S2 specifies which characters to
                        replace, and S3 specifies what to replace
                        them with.

 

    SIZE(S)             Return the number of characters in string S.

 

    TRIM(S)             Return string S with trailing blanks removed.

 

Exercises for CODE.SNO:

    ?       OUTPUT = 'THE DATE AND TIME ARE: ' DATE()
    THE DATE AND TIME ARE: 10-19-87 11:49:33.90
    ?       OUTPUT = DUPL('ABC', 20)
    ABCABCABCABCABCABCABCABCABCABCABCABCABCABCABCABCABCABCABCABC
    ?       OUTPUT = SIZE('ZIPPY')
    5
    ?       OUTPUT = SIZE('')
    0
    ?       OUTPUT = TRIM('TRAILING BLANKS  ') 'GONE'
    TRAILING BLANKSGONE
    ?       OUTPUT = REPLACE('spoon','po','PO')
    sPOOn

 

6. Arrays

Array Concepts

Arrays in SNOBOL4 are similar to arrays in other programming languages. They allow a single variable name to specify more than one data element; integer subscripts distinguish the individual members of an array. Each array element may contain any data type, independent of the types in other array elements.

A one-dimensional array is a "vector;" it is simply a list of I items. A two-dimensional array is a "grid" composed of several adjacent vectors -- an I by J array has I rows and J columns. A three-dimensional array, I by J by K in size, is a rectangular solid consisting of K adjacent grids. There's no limit to the number of dimensions allowed, but such arrays become increasingly difficult to visualize.

In keeping with SNOBOL4's pliability, an array is defined during program execution, rather than at compilation time. Its size and shape is specified by a string. The definition of an array may be changed at any time, or the array may be deleted and its memory reused when it is no longer needed.

Array Creation

Arrays are created by the SNOBOL4 function ARRAY. A program calls this function with a "prototype string" which specifies the number of dimensions and their sizes. The function returns an "array pointer," which is stored in a variable; the array elements are referenced by applying subscripts to this variable. Here are two statements for use with CODE.SNO. They create oneand two-dimensional arrays named LIST and BOX respectively:

    ?       LIST = ARRAY('25')

    ?       BOX = ARRAY('12,3')

LIST points to a vector of 25 elements. BOX points to a grid, 12 rows high and 3 columns wide, containing 36 elements. The ARRAY function initializes all array elements to the null string.

Array Referencing

Array subscripts are integer valued, and are specified by angular or square brackets (<> or []). Subscript values range from 1 to the size of each dimension. If you attempt to use a subscript outside this range, the array reference will fail, and the failure may be detected in the GOTO portion of the statement. Try some array references with CODE.SNO:

    ?       LIST<3> = 'MAPLE'

    ?       BOX[10,2] = 3

    ?       LIST[33] = 4

    Failure

    ?       OUTPUT = LIST[3] LIST[4] BOX<10,2>

    MAPLE3

Angular and square brackets are interchangeable. The reference to LIST[33] failed because the largest subscript allowed for that array is 25. LIST[4] produced its initialized value, the null string, and had no effect on the concatenation. The array pointer in LIST can be assigned to another variable:

    ?       B = LIST

    ?       OUTPUT = B[3]

    MAPLE

    ?       B<3> = 'WILLOW'

    ?       OUTPUT = LIST<3>

    WILLOW

Assigning the pointer in LIST to B made both variables point to the same array. Since there's but one actual array, array references made using LIST or B are equivalent. The COPY function creates a duplicate copy of an entire array.

Array elements may be used anywhere a variable name is allowed -- expressions, patterns, function arguments, etc. The fact that an array reference fails if a subscript is out-ofbounds can be used in a simple and natural way when scanning an array. Rather than having to know an array's size, we simply loop until an array reference fails. A program segment to display the members of an array SCORE might look like this:

   I = 0

   I = I + 1

   OUTPUT = SCORE[I]                    :S(PRINT)

   . . .

Array Initialization

Arrays may be created with an initial value other than the null string. ARRAY accepts a second argument which specifies this initial value. We can create a three-dimensional array with all elements initialized to the string 'PA-18' as follows:

    ?       A = ARRAY('2,3,4','PA-18')

    ?       OUTPUT = A[1,2,3]

    PA-18

 

 

7. Compilers

*      Snobol4 Beta2

*      Snobol4 Vanilla

 

8. Projects and Software in SNOBOL

Areas of Application

SNOBOL4 is use primarily as a research tool rather than for commercial applications.

*      Analysis of Literature

*      Analysis of Music

*      Computer experts using it for database programs.

10. References

*      SNOBOL.- http://en.wikipedia.org/wiki/SNOBOL

*      SNOBOL Tutorial.- http://burks.bton.ac.uk/burks/language/snobol/catspaw/tutorial/contents.htm

*      The SNOBOL Programming Language.- http://www.engin.umd.umich.edu/CIS/course.des/cis400/snobol/snobol.html

*      SNOBOL History.- http://www.snobol4.org/history.html

*      Introduction to SNOBOL.- http://www.whoishostingthis.com/resources/snobol/

Printed References

*      Griswold, Ralph E., J. F. Poage, and I. P. Polonsky. The SNOBOL 4 Programming Language. Englewood Cliffs, NJ: Prentice Hall, 1968.

*      Hockey, Susan M. SNOBOL Programming for the Humanities. New York: Clarendon Press; Oxford: Oxford University Press, 1985.