Friday, March 5, 2010

Grammatical Extensions to the Structured Query Language SQL



Jack Dikian

ABSTRACT

The SQL+sh is an interactive front-end to Unify’s Structured Query Language (SQL). It’s main purpose is to add Csh/tenex like functionality to a vanilla query interpreter in the way of SQL.

A query history stack, ability to recall and edit previous queries as well as an interactive RECORD and FIELD name recognition and completion mechanism are a sample of the sort of enhancements SQL+sh supports. This paper presents a brief background to SQL before discussing some of the features we added to this package. Working in an environment where a significant portion of a programmer’s time is spent writing and maintaining applications software around the Unify Relational Database; any facility that simplifies database interactions must be an advantage. This database is quickly approaching the 2 G-byte mark with over 300 Mb of supporting software. Like other large database users, the overhead of database related maintenance is a significant consideration. Improvements in database related utilities greatly increases productivity as well as reliability.

One of the most powerful facilities available to the maintenance programmer in our environment is Unify’s Structured Query Language SQL. This utility is often used to interrogate as well as patch the underlying database. Adhoc SQL queries are often generated to confirm the correctness of application modules as well as serving the more simple day to day user information requirements.


A Quick Look At SQL

SQL is an english keyword orientated query language of great flexibility. It is a language that is easy enough for non-programmers to learn, yet has enough power for data processing professionals. This product was originally defined by Chamberlin and others at the IBM Research Laboratory in San Jose, California, under the brand name System R. A family of IBM products based on the System R technology was developed. These products are now generally available and are known as DB2, SQL/DS and QMF [1].

A number of other vendors have also produced systems that support SQL. SQL’s data manipulation statements typically operate on entire sets of records. For example, the select and update clauses can retrieve and modify a set of values and tables. SQL, like all relational data manipulation languages is a set-level language. For this reason, SQL is often described as a non-procedural language. The user specifies "what" data they want and not so much "how" to get it.

It is up to SQL to decide on how best to execute any particular query. It needs for example to consider which tables are being referenced in any request; the size of the tables; what indexes exits; how selective those indexes are and of course, the form of the where clause. SQL queries consist of clauses, each of which is preceded by a keyword. Examples of keywords include; select, update, delete and insert. In fact, the previous four keywords all belong to that part of SQL which is commonly referred to as the DML or Data Manipulation Language. Other optional keywords are used to control, format and operate on the various queries. Some simple examples of queries are given below:-

> select Name, Phone
> from PERSONS
> where Age > 30/

The above example illustrates the selecting or retrieving of the specified fields Name and Phone from a specified table PERSON where some specified condition is true. It is important to note that the result of the query is another table.

> select PERSON.*, COMPANY.*
> from PERSON, COMPANY
> where PERSON.PName = COMPANY.CName/

This example demonstrates the retrieving of data from two tables namely PERSON and COMPANY. We are interested in all instances of the field PName in PERSON matching the field CName in the table COMPANY. This is commonly referred to as "Joining" two or more tables. The availability of the join operation is, almost more than anything else that distinguishes relational from non-relational systems.


The SQL+sh

Our main database currently supports over a 100 tables and close to a 1000 fields. Using SQL to interrogate and manipulate data in this environment almost always requires the programmer to first browse through the Database schema listing. This is not only due to the large number of different tables and fields but is also due to UNIFY’s record and field naming conventions. The maximum length of a record name is eight characters. It is therefore impossible to create two records with the names "PROGRAMMER" and "PROGRAMME". A compromise may lead to the names "PROGMR" and "PROGME" etc. It is easy to see why the schema listing may be required in such cases. Creating tables in Unify requires the user to nominate both a short and a long field name. Short field names must begin with a letter and can be up to eight characters long. The long field names begin with a letter and can be up to sixteen characters long. It is the long name that SQL requires for carrying out queries.

The schema is used to determine or look up this long name. The schema is also used to determine relationships between tables and their corresponding fields. Editing large queries are handled by - SQL writing the last query in/tmp. The edit facility invokes a standard editor such as vi with the last query loaded in the editor buffer. The user modifies and saves the changes before using the restart clause to re-execute the query. Although this facility is useful, it is however often tedious. This is especially true when a simple typo needs to be repaired. Because only the last query is effectively saved, access to previous queries are lost unless the user explicitly saves the editor buffer to a nominated file. Interestingly, we required in SQL a similar transformation in functionality as that provided by say csh and tcsh over the bourne shell. Where tcsh provides file name recognition and completion, we required record and field name recognition and completion.

Where csh provides a history and edit facility for commands, we required, a history and edit mechanism for queries. In implementing some of the ideas found in csh and tcsh, we were able to address both the above mentioned short-commings as well as provide a much more effective user interface. Not having access to SQL source, the only other alternative in implementing the above changes was to write our own parser sitting on top of SQL. This would simply read the input stream, decide if it needs to act upon, and manipulate the history stack, carry through edit commands, expand alias’ etc and then write to SQL via a pipe. The output of SQL is not and should not be altered.

SQL+sh reads a schema description file on startup. This file is typically generated by the systems administrator by running a specially written shell script. The description file describes the database tables, there respective fields and other information such as field type and length. The shell script uses SQL to dump the relevant table, field types and names. On startup, SQL+sh looks at the environment variable DBPATH and displays the the name and address of the working database. After this point, SQL+sh enters a for-ever loop waiting for queries, internal commands and or the end clause. A new prompt including the event number is displayed. An environment variable defines the maximum history size. An internal command has been added called "Mod On/Off" which enables and disables the availability of non-passive SQL clauses. For example, after entering the command "Mod Off", such clauses as delete, update, insert are disabled or ignored. This is useful in cases where support staff use SQL to answer quick telephone queries and should not update the database inadvertently. Unlike Unix commands which are newline terminated, SQL queries often span over many lines. In fact, users of SQL are encouraged to use good formatting procedures when making SQL queries. This is in part due to the fact that quite complex SQL scripts can be written and saved for regular use. These scripts are also used to feed data to Unify’s report generator RPT. The "/" character is used to indicate the end of a query. For this reason, SQL+sh supports a modified history substitution command in the way of "!event+". This signals SQL+sh to re-execute the query beginning with the event number "event" and continue to re-execute events forward in the stack until a "/" character is encountered. All other normal history substitution commands such as "!!", "!- number", "!number" as well as "!pattern" etc have been implemented. Where a query spans many lines, SQL+sh collects together the individual clauses to echo a single event in its history stack.

Editing previous queries are handled two ways. The standard SQL procedure is to invoke the system editor with the last query loaded into the editor buffer. The edit clause facilitates this procedure. This method is still available and is usually used for editing large query texts. This method allows only the last query to be modified and executed. SQL+sh introduces the csh like "!event s/patternl/patternl" and ^patternl^pattern2^ mechanisms. These are extremely convenient for repairing typos and or for substituting record or field names while leaving the general structure of the query untouched.

One of the most useful additions to SQL was the introduction of record and field name recognition and completion. The idea here was to provide a convenient way to avoid having to look up the record and field names before generating queries. Automatically displaying field types and length was considered useful. Other considerations included providing a means by which key strokes could be reduced and accurately associating relevant field names to their correct parent tables. This mechanism is used in conjunction with the database schema description file. It is no longer necessary to type a complete record or field name. Only a unique abbreviation is necessary. Typing the ESCAPE key after the abbreviation will complete the record or field name, echoing the full name. Unlike tcsh, where there is really only one type of file name completion, SQL+sh needs to consider context and determine whether a record, or field name is being sought. This is achieved by adding some of the SQL syntax rules into SQL+sh.

For example the following grammar extracts the syntax for the insert and select clauses:-

insert into RECORD [(FIELD .... )]:
from filenamel
l select/ select ["unique"] I * I RECORD.* I RECORD.FIELD I FIELD ....I * I RECORD.* I RECORD.FIELD I FIELD ....
from I RECORD [label] I .... where ["not"] I FIELD I RECORD.FIELD I constant ETC.


SQL+sh tries to carry out a search of either the appropriate record or field based on the position the ESCAPE key was pressed in the input stream. It is obvious from the above two syntax examples that it is not often possible to determine whether a RECORD or a FIELD needs expanding. In the select clause for example, it is possible to say "select record.field from ..." or " select field from...". Hitting the ESCAPE key just after the select token leaves SQL+sh with a choice of searching for appropriate records or fields. In fact, in this particular example, the system will first search through the record list and then the field list. In general, as each word is read, SQL+sh updates a flag indicating whether it is in a "RECORD" or "FIELD" state. This flag is initially set to a "NULL" state thus causing an alert when the ESCAPE key is pressed. A "BOTH" state causes SQL+sh to search records and then fields. This state is established by tracking entered words against various syntax rules defined in SQL+sh. We have also provided a means of commenting query text. Text found enclosed within the "{" and "}" braces are ignored. This facility was implemented in order to allow a clean method of displaying field types and length in-line. On Hitting ESCAPE in a "FIELD" state, the system will not only display a candidate field name but also place the relevant field type and length already commented.

Besides providing a recognition and completion mechanism, SQL+sh also provides a facility where fields belonging to a particular record can be scanned. For example, after having typed in the sub-clause

select * from
PERSON where " it is possible to Hit Ctrl-f to echo the first field belonging to the PERSON record.


Hitting Ctrl-f again will replace the first displayed field name with the next field. When the list of fields are exhausted, the process is repeated. This allows the user to carry out a query on a record even when they had no idea of the field names associated with the given record. The field type and length is once again displayed in comments. Some examples follow:-

select * from PE"
select * from PERSON

results in the cursor sits at the next column position waiting for the rest of the query.

select * from PERSON where
results in
select * from PERSON where PName {STRING 12}
Hitting
again results in
select * from PERSON where Paddress {STRING 45}


The user can now enter the rest of the query

select * from PERSON where Paddress {STRING 45} = ’Bag End*’
Hitting here results again
PAge {NUMERIC 3 }
PAge {NUMERIC 3} <= 111/

Now we can enter the rest of query Often there is the need to carry out repetitive queries involving tests against large text constants such as "0 081 12346789050" and "Speak Friend And Enter". An ability to implement a concept of macros was also considered a useful enhancement. The same query is often re-executed many times over in the event of a Database maintenance session. One or more parameters in the query may however vary. An ability to expand VMS like "Logical Variables" was added to SQL+sh. The same variable setting and expansion mechanism is used to set and unset simple and complex variables. There is no inherent differences between variable substitution and macro processing. The difference is operational. SQL+sh maintains a set of variables each of which has as a value a list of zero or more words. Each word in this list could be a simple constant or another variable. This value may be displayed and changed by using the internal commands show and clear. After the input line is parsed, and before each query is executed, variable substitution is performed. Variables are keyed by ’$’ character. The expansion can be prevented by preceding the ’$’ with a ’V except within ’"s. A Macro with a single argument can be seen as a variable containing another variable in its assignment string. The second variable has to be resolved before the macro can be executed. Newline characters found in the assignment list are ignored. Looping is prevented by checking that the same variable does not appear in the assignment list of that variable.

Examples of variables follow:-

[1] $new_name = "Bilbo Baggins"
[2] $my_update = " update PERSON
[3] s
et PName = Snew_name
[4] where PName = ’ *’/"
[5] Smy_update


We have been using this utility on a trial basis for the last few weeks. In general, the added convenience of query recall and edit far exceeds the cost of overhead. The ability to echo the field length and type results in much less references made to the schema listing. Record and field name completion means less typos in general.

References

[1] C.J. Date, "An Introduction To Database Systems", Addison-Wesley 1986.e, Australia; 13th- 15th September 1988.

No comments:

Post a Comment