NETB131 Programming Project Programming language SNOBOL An overview |
Yoan Grass Batista, F42225
1.
History and Special Features
SNOBOL is a special purposed language developed to provide a powerful means of doing character string manipulation. Accordingly SNOBOL has a collection of powerful operations for doing string pattern matching. The most common early application of SNOBOL was to write text editors. Because of the dynamic nature of SNOBOL and its interpreter implementation, it is now considered too slow for such applications. In fact SNOBOL is now close to being completely unused.
SNOBOL4 was implemented using string macros. This realized a virtual machine so that it could be implemented on a variety of different machines.
It is really a combination of two kinds of languages: a conventional language, with several data types and a simple but powerful control structure, and a pattern language, with a structure all its own. The conventional language is not block structured, and may appear old-fashioned. The pattern language, however, remains unsurpassed, and is unique to SNOBOL4.
String Manipulation Operations - has several of these operations which allow a string to be tested for contents and make replacements in the string.
Pattern Matching- involves examining substrings, for the occurrences of specified substrings. Substrings are also known as patterns.
Dynamically typed - SNOBOL4 has no type declarations and no restrictions on the data type of the value of any variable.
Interpretive language - The compiler translates the program into a notation that the interpreter can easily execute.
2. "Hello World" Program
This program demonstrates the text output function of the SNOBOL4 programming language by displaying the message "Hello world!".
OUTPUT = 'Hello World!'
END
|
Hello world!
|
This program was complied and tested using Vanilla SNOBOL.
3.
Fundamental Data Types
The data types used by SNOBOL Language are as follows:
Data Type Formal Identification
string |
STRING |
integer |
INTEGER |
real number |
REAL |
pattern structure |
PATTERN |
array |
ARRAY |
table |
TABLE |
created Name |
NAME |
unevaluated expression |
EXPRESSION |
object code |
CODE |
programmer-defined |
Data type name |
external |
EXTERNAL |
Integers, reals, strings, patterns, arrays, and tables are types of data objects that are built into the SNOBOL4 language. Facilities are provided in the language to permit a programmer to define additional types of data. This facilitates representation of structural relationships inherent in data.
Modular Units:
Many SNOBOL4
procedures are invoked by functions built into
the system, called primitive functions. Operations that occur
frequently are
implemented as primitive functions for efficiency. In addition,
facilities are
available for a programmer to define his own source language
function. A
programmer defined function in SNOBOL4 must include:
i) a call to the primitive function DEFINE for each
programmer-defined
function.
ii) a procedure, written in SNOBOL$,
for each
function.
Many functions are conveniently defined recursively. For
example, factorials
may be defined as
fact(0) = 1
fact(n) = n*fact(n-1) for n>0
SNOBOL4 has several different basic types, but has a mechanism to define hundreds more as aggregates of others. Initially, we'll discuss the two most basic: integers and strings.
An integer is a simple whole number, without a fractional part. In SNOBOL4, its value can range from -32767 to +32767. It appears without quotation marks, and commas should not be used to group digits. Here are some acceptable integers:
14 -234 0 0012 +12832 -9395 +0
These are incorrect in SNOBOL4:
13.4 fractional part is not allowed
49723 larger than 32767
- number must contain at least one digit
3,076 comma is not allowed
Use the CODE.SNO program to test different integer values. Try both legal and illegal values. Here are some sample test lines:
Enter SNOBOL4 statements:
? OUTPUT = 42
42
? OUTPUT = -825
-825
? OUTPUT = 73768
Compilation error: Erroneous integer, re-enter:
Vanilla SNOBOL4 does not include real numbers. They are available in SNOBOL4+, Catspaw's highly enhanced implementation of the SNOBOL4 programming language.
A string is an ordered sequence of characters. The order of the characters is important: the strings AB and BA are different. Characters are not restricted to printing characters; all of the 256 combinations possible in an 8-bit byte are allowed.
Normally, the maximum length of a string is 5,000 characters, although you can tell SNOBOL4 to accept longer strings. A string of length zero (no characters) is called the null string. At first, you may find the idea of an empty string disturbing: it's a string, but it has no characters. Its role in SNOBOL4 is similar to the role of zero in the natural number system.
Strings may appear literally in your program, or may be created during execution. To place a literal string in your program, enclose it in apostrophes (')1 or double quotation marks ("). Either may be used, but the beginning and ending marks must be the same. The string itself may contain one type of mark if the other is used to enclose the string. The null string is represented by two successive marks, with no intervening characters. Here are some samples to try with CODE.SNO:
? OUTPUT = 'STRING LITERAL'
STRING LITERAL
? OUTPUT = "So is this"
So is this
? OUTPUT = ''
? OUTPUT = 'WHO COINED THE WORD "BYTE"?'
WHO COINED THE WORD "BYTE"?
? OUTPUT = "WON'T"
WON'T
A variable is a place to store an item of data. The number of variables you may have is unlimited, provided you give each one a unique name. Think of a variable as a box, marked on the outside with a permanent name, able to hold any data value or type. Many programming languages require that you formally declare what kind of entity the box will contain -- integer, real, string, etc. -- but SNOBOL4 is more flexible. A variable's contents may change repeatedly during program execution. The size of the box contracts or expands as necessary. One moment it might contain an integer, then a 2,000 character string, then the null string; in fact, any SNOBOL4 data type.
There are only a few rules about composing a variable's name when it appears in your program:
Here are some correct SNOBOL4 names:
WAGER P23 VerbClause SUM.OF.SQUARES Buffer
Normally, SNOBOL4 performs "case-folding" on names. Lower-case alphabetic characters are changed to upper-case when they appear in names -- Buffer and BUFFER are equivalent. Naturally, casefolding of data does not occur within a string literal. Casefolding can be disabled by the command line option /C.
In some languages, the initial value of a new variable is undefined. SNOBOL4 guarantees that a new variable's initial value is the null string. However, except in very small programs, you should always initialize variables. This prevents unexpected results when a program is modified or a program segment is reexecuted.
You store something in a variable by making it the object of an assignment operation. You can retrieve its contents simply by using it wherever its value is needed. Using a variable's value is nondestructive; the value in the box remains unchanged. Try creating some variables using CODE.SNO:
? ABC = 'EGG'
? OUTPUT = ABC
EGG
? D = 'SHELL'
? OUTPUT = abc d (Same as ABC D)
EGGSHELL
? OUTPUT = NONESUCH (New variable is null)
? OUTPUT = ABC NULL D
EGGSHELL
? N1 = 43
? D = 17
? OUTPUT = N1 + D
60
? output = ABC D
EGG17
OUTPUT is a variable with special properties; when a value is stored in its box, it is also displayed on your screen. There is a corresponding variable named INPUT, which reads data from your keyboard. Its box has no permanent contents. Whenever SNOBOL4 is asked to fetch its value, a complete line is read from the keyboard and used instead. If INPUT were used twice in one statement, two separate lines of input would be read. Try these examples:
? OUTPUT = INPUT
TYPE ANYTHING YOU DESIRE
TYPE ANYTHING YOU DESIRE
? TWO.LINES = INPUT '-AND-' INPUT
FIRST LINE
SECOND LINE
? OUTPUT = TWO.LINES
FIRST LINE-AND-SECOND LINE
SNOBOL4 variables are global in scope -- any variable may be referenced anywhere in the program.
Success and failure are as important in SNOBOL4 as they are in life. Success and failure are unmistakable signals; something either worked, or it didn't. Significant program conciseness is achieved by recognizing that data values and signals are fundamentally different entities.
The elements of a statement provide values and signals as computation proceeds. SNOBOL4 accumulates both, and stops executing a particular statement when it finds it cannot succeed. Program flow can be altered based upon this success or failure.
The success signal will have a value result associated with it. In situations in which the signal itself is the desired object, the result value may only be the null string. The failure signal has no associated value. (In some instances, it may be helpful to view failure as meaning "failure to produce a result.")
Previously, we introduced the variable INPUT, which reads a line from the keyboard. In general, INPUT can be made to read from any disk file. The line read may be any character string, including the null string if it is an empty line. If any string might appear, then there is no special value we can test for to detect End-of-File. Success and failure provide an elegant alternative to testing for special values.
When we retrieve a value from INPUT, we normally get a string and a success signal. But when End-of-File is encountered, we get a failure signal instead, and no value.
Since control-Z (or function key 6) allows you to enter an Endof-File from the keyboard, we can easily demonstrate this type of failure. As you've noticed, the CODE.SNO program reports the success or failure of each statement. So far, all examples have succeeded. Now try this one:
? OUTPUT = INPUT
^Z
Failure
Success and failure are control signals, and appear only during the execution of a statement. They cannot be stored in a variable, which holds values only.
There is much more which can be done with success and failure, but to understand their use, you'll need to know how SNOBOL4 statements are constructed.
In general, a SNOBOL4 statement looks like this:
Label Statement body :GOTO
The label is optional, and is omitted by placing a blank or tab in the first character position. The GOTO is also optional, and can be eliminated simply by omitting it and the colon. In fact, even the statement body is optional. You can have a program line consisting of just a label or a GOTO field.
SNOBOL4 normally executes the statements of a program in sequence. The ability to transfer control from one statement to another, perhaps conditionally, makes SNOBOL4 much more usable.
Labels provide names for statements. If present, they must begin in the first character position of a statement, and must start with a letter or number. Additional characters may be anything but blank or tab. Like variable names, lower-case letters are equivalent to upper-case when case-folding (the default).
Transfer of control is made possible by the GOTO. It interrupts the normal sequential execution of statements by telling SNOBOL4 which statement to execute after the present one. The GOTO field appears at the end of the statement, preceded by a colon (:), and has one of these forms:
:(label)
:S(label)
:F(label)
:S(label1) F(label2)
White space is required before the colon. "Label" is the name given the target statement, and must be enclosed in parentheses. If the first form is used, execution resumes at the referenced statement, unconditionally. In the second and third forms, transfer occurs only if the statement has succeeded or failed, respectively. Otherwise, execution proceeds to the next statement in line. If the fourth form is used, transfer is made to label1 if the statement succeeded or to label2 if it failed. A statement with a label and a GOTO would look like this:
COPY OUTPUT = INPUT :F(DONE)
Now let's write a short program which copies keyboard input to the screen, and reports the total number of lines. If you are an accurate typist, you can type it into SNOBOL4 directly. Otherwise, you should use your text editor to create a file containing the program text. First stop the CODE.SNO program by typing END:
?END
B>SNOBOL4 CON
Vanilla SNOBOL4 Version 2.14.
(c) Copyright 1984,1988 Catspaw, Inc. All Rights Reserved.
Enter program, terminate with "END"
? N = 0
?COPY OUTPUT = INPUT :F(DONE)
? N = N + 1 :(COPY)
?DONE OUTPUT = 'THERE WERE ' N ' LINES'
?END
No errors
TYPE IN A TEST LINE
TYPE IN A TEST LINE
AND ANOTHER
AND ANOTHER
^Z
THERE WERE 2 LINES
B>
We start the line count in variable N at 0. The next statement has a label, COPY, a statement body, and a GOTO field. It is an assignment statement, and begins execution by reading a line of input. If INPUT successfully obtains a line, the result is stored in OUTPUT. The GOTO field is only testing for failure, so SNOBOL4 proceeds to the next statement, where N is incremented, and the unconditional GOTO transfers back to statement COPY.
When an End-of-File is read, variable INPUT signals failure. Execution of this statement terminates immediately, without performing the assignment, and transfers to the statement labeled DONE. The number of lines is displayed, and control flows into the END statement, stopping the program.
A function is analogous to an operator; it operates on data to produce a result. The data objects are called the arguments of the function. The result returned -- the function of the arguments -- may have two components: the success or failure signal; and for success, a value. The value may be any data type.
A function is used by writing its name and a list of arguments enclosed by parentheses:
FUNCTION_NAME(ARG1, ARG2, ..., ARGn)
It may appear in your program anywhere a constant is allowed -- in expressions, patterns, even as the argument of another function. If the function has more than one argument, they should be separated by commas. If trailing arguments are omitted, SNOBOL4 will supply the null string instead. Some functions, such as one that returns the current date, have no arguments at all.
SNOBOL4 provides a large number of predefined functions, and allows you to define your own. The large repertoire of built-in functions makes SNOBOL4 programming easier. Most functions are concerned with pattern matching, input/output, and advanced features of the language. Here we'll introduce a few simple conditional, numeric, and string functions to give you an idea of the variety. Try them interactively with CODE.SNO.
These functions fail or succeed depending upon their arguments. They are sometimes called predicate functions because the success of an expression using them is predicated upon their success. If they succeed, they return the null string as their value.
Function Succeeds if:
IDENT(S,T) S and T are identical. S and T may be constants
or variables with any data type. To
be identical, the arguments must have the
same data type and value. Since omitted arguments
default to the null string, IDENT(S)
succeeds if S is the null string.
DIFFER(S,T) S and T are different. DIFFER is the opposite
of IDENT. DIFFER(S) succeeds if S is
not the null string.
EQ(X,Y) Integers X and Y are equal. X and Y must be
integers, or strings which can be converted
to integers.
NE(X,Y) Integers X and Y are not equal.
GE(X,Y) Integer X is greater than or equal to Y.
GT(X,Y) Integer X is greater than Y.
LE(X,Y) Integer X is less than or equal to Y.
LT(X,Y) Integer X is less than Y.
INTEGER(X) X is an integer, or a string which can be
converted to an integer.
LGT(S,T) String S is lexically greater than string T
using a character-by-character comparison.
Leading blanks may be used in front of a argument for readability. Here are some exercises for CODE.SNO:
? N = 3
? EQ(N, 3)
Success
? IDENT(N, 3)
Success
? EQ(3, "3")
Success
?IDENT(3, "3") (integer and string)
Failure
? EQ(N, 4)
Failure
? NE(N, 4)
Success
? INTEGER(N)
Success
? INTEGER('47')
Success
? DIFFER('ABC', 'abc')
Success
? IDENT('a' 'b' 'c', 'abc')
Success
? LGT('ABC', 'ABD')
Failure
When any of these functions succeed, they return a null string. Since other statement elements are not altered when concatenated with the null string, this provides an easy way to interpose tests and construct loops. Suppose we execute the statement:
N = LT(N,10) N + 1 :S(LOOP )
Function LT fails
if N is 10 or
greater. If the statement fails, the assignment is not
performed, and execution
continues with the next statement. However, if LT succeeds, its
null string
value is concatenated with the expression N + 1, and the result
is assigned to
N. This has the effect of increasing N by 1 and transferring to
statement
If we concatenated several conditional functions together, and they all succeeded, the result would still be the null string. If any function failed, the entire concatenation would fail. This gives us a simple way to produce a successful result if a number of conditions are all true. For example, the expression:
INTEGER(N) GE(N,5) LE(N,100)
Succeeds if N is an integer between 5 and 100.
These functions always succeed; all but REMDR and SIZE return a string result.
DATE() Return current date and time as a string.
DUPL(S,N) Duplicate string S, N times.
REMDR(X,Y) Produce the remainder (modulus) of X / Y.
REPLACE(S1,S2,S3) Return string S1 after performing the
character replacements specified by strings
S2 and S3. S2 specifies which characters to
replace, and S3 specifies what to replace
them with.
SIZE(S) Return the number of characters in string S.
TRIM(S) Return string S with trailing blanks removed.
Exercises for CODE.SNO:
? OUTPUT = 'THE DATE AND TIME ARE: ' DATE()
THE DATE AND TIME ARE:10-19-87 11:49:33.90
? OUTPUT = DUPL('ABC', 20)
ABCABCABCABCABCABCABCABCABCABCABCABCABCABCABCABCABCABCABCABC
? OUTPUT = SIZE('ZIPPY')
5
? OUTPUT = SIZE('')
0
? OUTPUT = TRIM('TRAILING BLANKS ') 'GONE'
TRAILING BLANKSGONE
? OUTPUT = REPLACE('spoon','po','PO ')
sPOOn
6. Arrays
Arrays in SNOBOL4 are similar to arrays in other programming languages. They allow a single variable name to specify more than one data element; integer subscripts distinguish the individual members of an array. Each array element may contain any data type, independent of the types in other array elements.
A one-dimensional array is a "vector;" it is simply a list of I items. A two-dimensional array is a "grid" composed of several adjacent vectors -- an I by J array has I rows and J columns. A three-dimensional array, I by J by K in size, is a rectangular solid consisting of K adjacent grids. There's no limit to the number of dimensions allowed, but such arrays become increasingly difficult to visualize.
In keeping with SNOBOL4's pliability, an array is defined during program execution, rather than at compilation time. Its size and shape is specified by a string. The definition of an array may be changed at any time, or the array may be deleted and its memory reused when it is no longer needed.
Arrays are created by the SNOBOL4 function ARRAY. A program calls this function with a "prototype string" which specifies the number of dimensions and their sizes. The function returns an "array pointer," which is stored in a variable; the array elements are referenced by applying subscripts to this variable. Here are two statements for use with CODE.SNO. They create oneand two-dimensional arrays named LIST and BOX respectively:
? LIST = ARRAY('25')
? BOX = ARRAY('12,3')
LIST points to a vector of 25 elements. BOX points to a grid, 12 rows high and 3 columns wide, containing 36 elements. The ARRAY function initializes all array elements to the null string.
Array subscripts are integer valued, and are specified by angular or square brackets (<> or []). Subscript values range from 1 to the size of each dimension. If you attempt to use a subscript outside this range, the array reference will fail, and the failure may be detected in the GOTO portion of the statement. Try some array references with CODE.SNO:
? LIST<3> = 'MAPLE'
? BOX[10,2] = 3
? LIST[33] = 4
Failure
? OUTPUT = LIST[3] LIST[4] BOX<10,2>
MAPLE3
Angular and square brackets are interchangeable. The reference to LIST[33] failed because the largest subscript allowed for that array is 25. LIST[4] produced its initialized value, the null string, and had no effect on the concatenation. The array pointer in LIST can be assigned to another variable:
? B = LIST
? OUTPUT = B[3]
MAPLE
? B<3> = 'WILLOW'
? OUTPUT = LIST<3>
WILLOW
Assigning the pointer in LIST to B made both variables point to the same array. Since there's but one actual array, array references made using LIST or B are equivalent. The COPY function creates a duplicate copy of an entire array.
Array elements may be used anywhere a variable name is allowed -- expressions, patterns, function arguments, etc. The fact that an array reference fails if a subscript is out-ofbounds can be used in a simple and natural way when scanning an array. Rather than having to know an array's size, we simply loop until an array reference fails. A program segment to display the members of an array SCORE might look like this:
I = 0
I = I + 1
OUTPUT = SCORE[I] :S(PRINT)
. . .
Arrays may be created with an initial value other than the null string. ARRAY accepts a second argument which specifies this initial value. We can create a three-dimensional array with all elements initialized to the string 'PA-18' as follows:
? A = ARRAY('2,3,4','PA-18')
? OUTPUT = A[1,2,3]
PA-18
7. Compilers
8. Projects and Software in SNOBOL
SNOBOL4 is use primarily as a research tool rather than for commercial applications.
Analysis of Literature
Analysis of Music
Computer experts using it for database programs.
10. References
SNOBOL.- http://en.wikipedia.org/wiki/SNOBOL
SNOBOL Tutorial.- http://burks.bton.ac.uk/burks/language/snobol/catspaw/tutorial/contents.htm
The SNOBOL Programming Language.- http://www.engin.umd.umich.edu/CIS/course.des/cis400/snobol/snobol.html
SNOBOL History.- http://www.snobol4.org/history.html
Introduction to SNOBOL.- http://www.whoishostingthis.com/resources/snobol/
Griswold, Ralph E., J. F. Poage, and
Hockey, Susan M. SNOBOL Programming for the
Humanities.