INTRODUCTION

Design of the course:


In this course, you will learn to program through example and expand upon the examples given to produce your own COBOL code. We will start out by examining a simple COBOL program and add the more subtle elements of the language when needed to write our sample programs.

Structure of a COBOL Program:


The structure of the COBOL program makes COBOL a relatively easy language to learn and work with, a fact which in some way explains its popularity and longevity. The COBOL program is set up to allow for very flexible data handling and very organized processing and therefore fits in well with the concepts of STRUCTURED programming.

The COBOL program is made up of four DIVISIONS:
Within a DIVISION, there can be further breakdowns which are called SECTIONS.
Each of these four divisions has specific words or clauses that are classified as reserved words because they have a specific function. The reserved words have a specific meaning when the program is being compiled and it is important that they be used correctly. Frequently, in conjunction with reserved words, there are entries that the programmer must make which are specific to the program. In addition, the programmer must be careful to observe the margin or column structure of the program. We will look at this in more detail, but for now note that COBOL has two margins: margin A and margin B. Margin A starts in column 8 and goes through column 11 while margin B starts in column 12. Many COBOL entries are required to start in either margin A or margin B, and the programmer must be aware of the rules.

FIRST PROGRAM


The program that is being illustrated here reads a record from a disk and prints the information out on a printed report.
IDENTIFICATION DIVISION: On its simplest level, this DIVISION identifies the name of the program and its author. Other identifying information can be added for documentation purposes.

 IDENTIFICATION DIVISION.
 PROGRAM-ID.  SAMPLE1.
 AUTHOR. GROCER.
The words IDENTIFICATION DIVISION, define the current division. On the next line, the word PROGRAM-ID specifies that the word entered here by the programmer will give the program a name. The rules of COBOL specify that the name the programmer enters must be made up of letters and numbers and be 8 characters or less. Finally the word AUTHOR calls for the programmer to identify themselves as the author of the program. In looking at this program, note the use of the periods after the reserved words and after the programmer's entries. It should be noted that these three lines start in margin A which for practical purposes means they start in column 8 (technically, they can start anywhere in margin A which runs from 8 through 11).

ENVIRONMENT DIVISION:This DIVISION defines and identifies the environment in which the program will be run. At the beginning level, one of its primary functions is to provide the link between the physical file that is being read and/or written and the logical file name that is used internally within the program.

ENVIRONMENT DIVISION.
INPUT-OUTPUT SECTION.
FILE-CONTROL.
    SELECT CUSTOMER-FILE
	ASSIGN TO "C:\MFCOBOL\SOURCEPG\C12FIRST.DAT".
    SELECT CUSTOMER-REPORT
        ASSIGN TO PRINTER.

This is the ENVIRONMENT DIVISION which contains an INPUT-OUTPUT SECTION which defines the files that are being read or written. Beneath the section statement you see the reserved clause FILE-CONTROL which indicates that the next lines will specifically define the files being used in the program. Each file is defined with a SELECT statement followed by the programmer's logical file name. The logical file name is the name the programmer will use in the program when referring to the file. It must follow the standard naming conventions: the name must be from 1 to 30 characters long, contain only letters, numbers and hyphens and contain no embedded spaces. The ASSIGN clause ties the logical file name to the physical file name and location of the file. In the case of the disk file, the physical file name includes the path name on my disk, the name of the file C12FIRST and the file extension .DAT. In the case of the print file, the output is to go directly to the printer so it is assigned to PRINTER. Notice that the SELECT clauses start in margin B and the second line of each is indented by custom to make it clear that it is part of the statement. The other clauses in the ENVIRONMENT DIVISION illustrated all start in margin A.

DATA DIVISION: This division defines the data that is being used by the program, including data that is being read or written, and data that is being used in work areas.

The structure of the COBOL DATA DIVISION gives the programmer tremendous flexibility and power in the way data is defined and used. It is one of the most powerful features of the COBOL program and a lot of time should be spent understanding the wide range of possibilities it offers.

In our simple beginning program, we will see two SECTIONS within the DATA DIVISION: the FILE SECTION and the WORKING-STORAGE SECTION.

The FILE SECTION is used to define the files that will be read and written. All data that is being read from files to be processed or written to files (including the print file) must pass through the FILE SECTION.

DATA DIVISION.
FILE SECTION.
FD  CUSTOMER-FILE
    DATA RECORD IS CUSTOMER-RECORD.
01  CUSTOMER-RECORD.
    05	CUSTOMER-ID	PIC X(4).
    05	CUSTOMER-NAME	PIC X(20).
    05	CUSTOMER-STREET	PIC X(20).
    05	CUSTOMER-CITY	PIC X(15).
    05	CUSTOMER-STATE	PIC X(2).
    05	CUSTOMER-ZIP	PIC X(5).
    05	FILLER		PIC X(10).
The code above describes the layout of the input file that is being read by this program. First, we see the clause DATA DIVISION to tell us what division we are in followed by the clause FILE SECTION to tell us we are in the section of the data division that describes input and output files. (Note that both of these start in margin A.) The following line starts the specific definition - it too starts in margin A. The letters FD stand for File Description and are followed by two spaces to move us into margin B and then the name of the file. This is the logical file name that was used in the SELECT statement above. The next line (which starts in margin B) gives the name of the individual record in the file. The clause DATA RECORD IS is a reserved word clause for this function. The clause is followed by the name the programmer is assigning to each record on the file. (Note: this clause is not required but is here because it provides good documentation). The record name must again follow the naming conventions for a COBOL data name: On the next line, starting in margin A, there is 01 followed by the record name defined in the DATA RECORD IS clause. When COBOL programmers lay out the file, they use an outline setup where 01 is the whole, in this case the whole record, and the numbers beneath break the whole record down into its parts which are frequently fields. The common convention, when breaking down the whole 01 level, is to use 05 levels for the parts or fields. The reason for this is to leave room so that at some later time the programmer can decide to group several 05s together into a group and use a number between 01 and 05 to designate this group (more about this later). Note that on this file, all records have the same layout (in more advanced programs, this will not always be true).

The first field on all the records in this file is the customer identification number which is a character or alphanumeric field that is four characters long. A character or alphanumeric field can contain anything: letters, numbers, special characters. The programmer starts the line by placing an 05 to designate the field in margin B. Then the programmer defines a name for the field using the COBOL data name conventions. In this case, I named the field CUSTOMER-ID (Note the use of the hyphen where I might have wanted to put a space. Since spaces are not allowed in a COBOL dataname, the programmer frequently uses the hyphen instead of the space). Next, the programmer needs to define the attributes of the field which in this case are that it is a 4 character, alphanumeric field. This is done using the picture clause. PIC or PICTURE are reserved words used to designate the attributes of the field. The clause X(4) that follows the reserved word PIC designates this field as an alphanumeric field (the X stands for alphanumeric) and the 4 in parenthesis tells the length. Note that the picture clause fields line up under each other - this is convention and not a rule. Usually the programmer moves over to a column such as 32 and starts to code the picture clauses. After the picture clause has been coded, the entry is terminated with a period to indicate that the definition of this field is now complete.

Length can be shown using either the type followed by the length in parenthesis or the type character repeated once for each character in the field: Continuing on with the field layout. The second field on all the records is customer name which is a 20 character alphanumeric field. It is shown by:

		05  CUSTOMER-NAME              PIC X(20).      
Following the customer name, there is a 20 character customer street field, a 15 character customer city field, a 2 character customer state field and a five character customer zip code field. Finally in the layout of the file, the programmer was told that the last 10 characters of the field contained data that will not be used in this program. The programmer acknowledges the existence of this data by setting up a field with no name and a PIC of X(10). The field can either be set up by simply omitting a name or the reserved word FILLER can be used to indicate a field that will not be used in this program.

The field needed to be listed in the layout because the entire length of the record must be described when doing the layout. The record is 76 characters long and each of these characters must be accounted for in the record layout. You should note that COBOL record layouts are positional and accumulative. The first field was the customer identification number with a PIC X(4) which meant the data was stored in positions 1, 2, 3, and 4. The second field was the customer name with a PIC X(20) which means that the customer name started in position 5 and went for the next 20 characters ending in position 24. The third field was the customer street which started in position 25 and went for 20 characters ending in position 44. Let's say for example that the customer street started in position 30 and that characters 25, 26, 27, 28 and 29 contained data that was not being used in this program. In that case the layout of the file would have looked like this:
		05  CUSTOMER-ID			PIC X(4).
		05  CUSTOMER-NAME		PIC X(20).
		05  FILLER			PIC X(5).
		05  CUSTOMER-STREET		PIC X(20).
Continuing the FILE SECTION with the second file layout:
	FD  CUSTOMER-REPORT
	    DATA RECORD IS PRINTZ.
	01  PRINTZ.
            05	FILLER			PIC X.
            05	CUSTOMER-ID-PR		PIC X(4).
            05	FILLER		        PIC X(2).
            05	CUSTOMER-NAME-PR	PIC X(20).
            05	FILLER			PIC X(2).
            05	CUSTOMER-STREET-PR	PIC X(20).
            05	FILLER			PIC X(2).
            05	CUSTOMER-CITY-PR	PIC X(15).
            05	FILLER			PIC X(2).
            05	CUSTOMER-STATE-PR	PIC X(2).
            05	FILLER			PIC X(2).
            05	CUSTOMER-ZIP-PR		PIC X(5).
            05	FILLER			PIC X(3).
The second FD starts the File Description of the second file described in the SELECT statements in the ENVIRONMENT DIVISION. Notice that the name used after the FD is the logical file name used in the SELECT statement. Again the File Description is followed by the DATA RECORD IS clause which gives the name PRINTZ to each line being printed on the printer. The line is then defined starting with 01 PRINTZ. Each field on the line has an 05 in front of it. Notice that every other field has the name FILLER, this is because between each field on the printline it looks nice to leave a couple of blank characters. The fillers serve this purpose (Note: that the word FILLER is not required on these lines).

It is a rule in COBOL that all datanames must be unique. This is done so there will be no confusion. When the programmer refers to a name, the program will understand exactly what field the programmer is referring to. To establish uniqueness, the names on the printline must be different from the names on the input record. Since this program is simply going to take the data that is read and print it out, the programmer modified the input names by adding the -PR to the dataname when it was used on the printline. This is a convention that I frequently use. If a field is going to be moved to the printer and printed, I use the original name and append the -PR, making the dataname unique.

When setting up the printline, I knew that the printer I was working with supported 80 characters across, so my printline is 80 characters. The purpose of the program was to see the name and address information on the input record so I set up a field for each piece of data on the output print line. Between each piece of data I used the FILLER to leave some blank space. In addition, I used a FILLER at the top of the record and a FILLER at the bottom of the record. I do this because different versions of COBOL carry the carriage control character as the first character on the line or the last character on the line. By leaving them both blank, I assure myself that the carriage control character will be correctly handled.

The WORKING-STORAGE SECTION contains only one piece of information in this program, a field to indicate that the end of the file has been reached.
	WORKING-STORAGE SECTION.
	01  INDICATORS.
	    05	END-OF-FILE	PIC XXX		VALUE "NO ".
In the WORKING-STORAGE SECTION, there may be more than one indicator so the 01 designation for INDICATORS is where they will all be listed. Since this program only has one indicator, there is only one entry and that entry has been given the dataname: END-OF-FILE. It has been set up as a three character field in memory and the VALUE clause has been used to give it an initial value of NO followed by a space. The VALUE clause initializes the three characters called END-OF-FILE to whatever is programmed in the VALUE clause. Note that the word NO is enclosed in quotes. When giving a character or alphanumeric field an initial VALUE, it has to be enclosed in quotes.

PROCEDURE DIVISION: The PROCEDURE DIVISION is where the actual processing is done. The PROCEDURE DIVISION is broken up into PARAGRAPHS each of which contains instructions that will be executed when the program is run.

The PROCEDURE DIVISION for a structured program is setup with a main paragraph that controls all processing. This main paragraphs performs other paragraphs where the work is actually done. There are a wide variety of styles that are used to number the paragraphs in a meaningful way. We will start out using one style and explore others as the course continues.

There are three major processing components in most programs: Each of these has a specific function within the program. The processing of these procedures is controlled in the main paragraph of the program. Please note that in the PROCEDURE DIVISION, paragraph names start in margin A and commands or instructions start in margin B.
	PROCEDURE DIVISION.
	MAIN-PROGRAM.
	    PERFORM A-100-INITIALIZATION.
	    PERFORM B-100-PROCESS-FILE.
	    PERFORM C-100-WRAPUP.
	    STOP RUN.
Using this style, the main paragraph which comes directly after the words PROCEDURE DIVISION contains just a paragraph name. This is the control paragraph that directs the processing. This paragraph controls the execution of three other paragraphs: A-100-INITIALIZATION, B-100-PROCESS-FILE, and C-100-WRAPUP. The style that we are using designates the paragraphs by using a letter to designate the type of processing being done in the paragraph (any paragraph used for initialization will start with an A, any paragraph used for processing will start with a B and any paragraph used for termination will start with a C). This letter is followed by a hyphen and then a number designating the paragraphs place in the hierarchy. The first paragraph in each type of processing will be given the number 100. Finally, there is another hyphen and then a name the programmer makes up that explains the intent of the paragraph.

Each of the paragraphs in the programming segment above must be executed and to do this the programmer uses the PERFORM verb. The PERFORM verb has many variations, but for now we are looking at the most straightforward version of the command which says to go out and do a particular paragraph and then return and move on to the next instruction. This format for the PERFORM verb is:
		PERFORM paragraph-name.
When COBOL encounters this command, the processing moves to the paragraph that is being performed and the instructions in that paragraph are executed. COBOL knows that the paragraph is over when it encounters another paragraph name or the end of the program. At this point, COBOL returns from the perform and executes the next sequential instruction.

In the programming segment illustrated, the paragraph named A-100-INITIALIZATION will be executed first. When the execution of that paragraph is complete, control will return to the MAIN-PROGRAM paragraph and drop through to the next instruction which says to perform the paragraph named B-100-PROCESS-FILE. That paragraph will now be executed. When the execution is done (note that processing frequently contains a lot to be done and takes a significant amount of time), control will return to the MAIN-PARAGRAPH and drop down and execute the paragraph C-100-WRAP-UP. After this paragraph has been executed, control again returns to the MAIN-PROGRAM paragraph where the instruction STOP RUN is encountered. The STOP RUN statement terminates processing. The format as we are using it is simply:
			STOP RUN.
The initialization paragraph is where the files are opened so that processing can begin.
	A-100-INITIALIZATION
	    OPEN INPUT CUSTOMER-FILE
	         OUTPUT CUSTOMER-REPORT.
In our simple example, the only command in this paragraph is the OPEN command. The file that will be read is opened as an INPUT file and the file that will be written is opened as an OUTPUT file. The file names that are used are the names that were originally defined in the SELECT statement in the ENVIRONMENT DIVISION and the FILE SECTION of the DATA DIVISION. The simple format for the OPEN verb, as used in this program, is:
			OPEN  (INPUT/OUTPUT) file-name.
Note that all files being used in the program can be opened with one OPEN statement.

The processing portion of the program is really the heart of the processing. Usually it is a loop that is done repeatedly until all of the data has been processed.
	B-100-PROCESS-FILE.
	    READ CUSTOMER-FILE
	         AT END
		    MOVE "YES" TO END-OF-FILE.
	    PERFORM B-200-PROCESS-RECORD
		UNTIL END-OF-FILE = "YES".
The paragraph above is the first paragraph in the processing portion of the program. In this program, the first function of this paragraph is to read the first record in the file (called the INITIALIZING READ). The READ statement moves a record from the disk file into memory where it can be processed. The format of the READ statement is:
	READ file-name
	     AT END	
		processing to be done if there is no more data
The READ statement reads the file-name which was defined in the SELECT statement (the logical file name) and the FD portion of the FILE SECTION and then opened as an input file with the OPEN statement. The AT END clause tells what processing is to be done when there is no more data in the file. In the program being illustrated, when there is no more data the word YES is moved to the memory location defined in the WORKING-STORAGE SECTION as the END-OF-FILE indicator. Remember, this field was given an initial value of NO and now the program is changing the value to YES. The use of YES and NO are convenient, but the important thing is that the value of the indicator has been changed.

Following the READ statement is a PERFORM statement which sets up a loop that will be done over and over until there is no more data (shown by the value of END-OF-FILE being YES). The format of this version of the PERFORM statement is:
		PERFORM paragraph-name
		    UNTIL a condition has been met
The PERFORM statement in the sample program will perform a paragraph named B-200-PROCESS-RECORD over and over again until the condition END-OF-FILE indicator equals "YES" is met. The processing works this way:
  1. before the paragraph is executed the condition is checked
  2. if the condition is not true, than the paragraph is executed
  3. control then returns to the statement and the condition is checked again
  4. if the condition is not true, than the paragraph is executed
  5. if at any time the condition is true, the paragraph will not be executed and control will drop through to the next statement - in this case there is no next statement which means the paragraph is complete and control will return to the MAIN-PROGRAM paragraph where it will drop through and execute C-100-WRAP-UP.
The paragraph that is being executed by the PERFORM...UNTIL, is the B-200-PROCESS-RECORD paragraph shown below. The one critical thing that must be included in this paragraph is the code to change the answer to the condition UNTIL END-OF-FILE = "YES". If there is no way to change this condition so that the UNTIL condition will finally be met, the paragraph will be executed indefinitely and the program will be in a never ending loop.
	B-200-PROCESS-RECORD.
	    MOVE SPACES TO PRINTZ.
	    MOVE CUSTOMER-ID TO CUSTOMER-ID-PR.
	    MOVE CUSTOMER-NAME TO CUSTOMER-NAME-PR.
	    MOVE CUSTOMER-STREET TO CUSTOMER-STREET-PR.
	    MOVE CUSTOMER-CITY TO CUSTOMER-CITY-PR.
	    MOVE CUSTOMER-STATE TO CUSTOMER-STATE-PR.
	    MOVE CUSTOMER-ZIP TO CUSTOMER-ZIP-PR.
	    WRITE PRINTZ
		AFTER ADVANCING 1 LINES.
	    READ CUSTOMER-FILE
		AT END
		   MOVE "YES" TO END-OF-FILE.
The code in this paragraph is set up to process one record and read another. The first code that we see, is the processing code. This program, simply moves the information from the record that was read to the print line and then writes the line. Looking at the specifics of the code, the first statement says:
			MOVE SPACES TO PRINTZ.
The word SPACES is a reserved word that means fill the area with blanks. By moving SPACES TO PRINTZ, the programmer has cleaned out the whole print record that was defined in the FILE SECTION as 01 PRINTZ. After the area has been cleaned out, the programmer moves the fields on the input record to fields on the print line using the MOVE statement. The format of the MOVE statement is:
			MOVE field name TO field name.
In this program we are moving fields that occur on the input record to fields that occur on the output record. After all of the fields have been filled, it is time to WRITE the line. The format of the WRITE statement to write a line on the printer is:
     WRITE record-name
	 AFTER ADVANCING (the number of lines to move before writing) LINES.
The line that is to be written is the line that is defined in the 01 level of the FD for the print file. All data must pass through the FILE SECTION as it is read or written, so when a line is being written it must be the line that was defined as the record to be written in the FILE SECTION. Please note that in COBOL you READ files (defined in the FD and you WRITE records (defined in the 01 level of the FD).

Specifically, the sample program wants to move to a new line before writing and it wants the report to be single spaced. Therefore the specific WRITE statement will say:
	WRITE PRINTZ
	    AFTER ADVANCING 1 LINES.
Again, it should be noted that the MOVE statements put the data on the line so that when it is written, the data will be printed.

After the record is written, it is time to read the next record. This is down with a READ statement that is identical to the READ statement that read the initial record in the B-100-PROCESS-FILE paragraph.
  
	READ CUSTOMER-FILE
	    AT END
	       MOVE "YES" TO END-OF-FILE.
Please pay special attention to the MOVE statement that happens when there is no more data. Remember that in the WORKING-STORAGE SECTION we set up the indicator END-OF-FILE with an initial VALUE of "NO ". Then in the PERFORM statement that controls the repetitive looping in the B-100-PROCESS-FILE, we said to PERFORM B-200-PROCESS-RECORD where the actual processing of the record is done until END-OF-FILE = "YES". Now we are providing the means to change that initial value to the "YES" that will terminate processing. If the READ statement encounters the end of the file, a MOVE statement will be executed that moves "YES" to the indicator, END-OF-FILE. When this happens the repetitive processing of B-200-PROCESS-RECORD will terminate.

The logic of the initializing READ statement and other reads being done at the bottom of the loop is very effective. First, before PERFORMing the loop, there is an INITIALIZING READ that will read the first record on the file. If there is no first record, the END-OF-FILE indicator will be immediately set to "YES" and the PERFORM of the loop will never be executed. Assuming there was a record, after the INITIALIZING READ the repetitive processing of the B-200-PROCESS-RECORD paragraph will begin. In this paragraph the data from the record that is currently in memory will be moved to fields on the line to be printed and then the line will be written. After the line has been written, the READ will be executed (Note that all records except the first record on the data file are read with this READ). If there is another record, the END-OF-FILE indicator will remain with a value of "NO". When the READ has been executed the B-200-PROCESS-RECORD paragraph is done and control will return to the PERFORM where the UNTIL clause will check the END-OF-FILE indicator to determine if the B-200-PROCESS-RECORD paragraph should be performed again. Since the value of the END-OF-FILE indicator is still "NO " the processing will continue. The data from the record that was just read will be moved to the print line, the line will be written and another record will be read. By positioning the READ as the last statement in the paragraph, everything moves smoothly. After the READ is executed, the control returns to the PERFORM which checks the results of the READ as it affects the END-OF-FILE indicator to determine if the paragraph should be executed again. The end result of this logic is clean processing that works!

After the END-OF-FILE indicator gets changed to "YES" indicating that in fact there is no more data the B-200-PROCESS-RECORD paragraph will not be executed again and in fact the PERFORM is done. Since the PERFORM was the last statement in the B-100-PROCESS-FILE paragraph, that paragraph to is complete and control returns to the MAIN-PROGRAM paragraph. At this point control drops through to the next PERFORM instruction which says PERFORM C-100-WRAP-UP.
	C-100-WRAP-UP.
	    CLOSE CUSTOMER-FILE
		  CUSTOMER-REPORT.
In this paragraph, the two files that were opened in the A-100-INITIALIZATION are closed. It should be noted that while you have to specify INPUT or OUTPUT when you are OPENing files, you do not have to specify this when you CLOSE the files. The file names that are CLOSEd here are the ones that appeared in the SELECT, were defined in the FD, were OPENed , and in the case of the input file was READ.

When the C-100-WRAP-UP paragraph has been executed, control again returns to the MAIN-PROGRAM paragraph and drops through to the next instruction:
			STOP RUN	
This statement terminates the execution of the program.