The MVS to Unix migration HOWTO: The problems of migration

4. The problems of migration

This section describes just a few of the problems that one can expect to encounter during a MVS to Unix migration project.

A difficulty with a migration job is making oneself understood to people who have never or rarely used one of the systems involved. For this reason, I have included this small "jargon" section that tries to briefly explain the similarities and differences of MVS and Unix system concepts.

Scripting languages and user interfaces in MVS

JCL and JES

JCL (Job Control Language) is a primitive job scripting language that runs on MVS systems. JCL is often extended with a command set called JES (Job Entry Subsystem). JES statements supply information to increase the efficiency of reading, scheduling, and printing jobs.

I do not know of an equivalent of JCL under Unix, so you will just have to imagine a scripting language consisting of lists of files to be used, programs to be executed, perhaps some in-line input for the programs, and perhaps a few THEN and GOTO statements to do error handling (in later versions). There is also information about whose account the job will run on and how long it may run. Apparently most advanced IBM mainframe scripting is done in a language called REXX and JCL is used only when the job at hand is very simple.

TSO and CLISTs

TSO (Time Sharing Option) is the command line interpreter used on MVS systems. Unlike Unix, one cannot simply make a script by sticking a bunch of TSO commands in a file and executing the file. Instead, one uses the CLIST language to write such scripts. Most people around here rarely use TSO or CLISTs, choosing to use ISPF instead.

ISPF

ISPF (Interactive Structured Programming Facility) is a system of menus and forms that allows one to do allocations, edit files, run programs, etc. Experienced users develop short-cuts for getting to frequently-used menus quickly and the system works well if the network between the workstation and mainframe is rapid. Unfortunately, like most window-based interfaces, this system does not encourage users to learn the scripting approach to common jobs that must be repeated on many files at the same time.

Scripting languages and user interfaces in Unix

A Unix "shell" is the equivalent of a combination of JCL, TSO, and REXX on MVS. Unix shells can be used as a command-line user interface to the operating system or as a language for scripting tasks, simply by creating a text file with the commands to be executed and then running the file (like an MS-DOS .BAT file). If one wants to schedule the job to run at a specific time, one uses the at(1), batch(1), or crontab(1) commands.

Many different shells have been written over the last 30 years, and all of them have somewhat different command syntax. For this reason, there is no "pure" Unix scripting language, but a collection of many dialects, some of them quite strange.

`sh(1)` and `bash(1)`

For the sake of brevity, I will say that the original Bourne shell sh(1) is the canonical Unix shell language, and that the Bourne Again shell bash(1) is the best shell to use if one wishes to comfortably write scripts with the same basic syntax.

Scripts written for sh(1) will run under bash(1) without any modifications, but since bash(1) has many additional features, a script written with bash(1) might not work on sh(1) without some modification. Since bash(1) is freely available for any Unix, this should not be a problem for most people.

Perl

If one needs to write a long or complicated script with much string manipulation and pattern matching (a language translator, for example), perl(1) is probably the best choice of language. Perl is a highly flexible "duct-tape" language that has borrowed from the traditions of at least 20 different languages. At first sight the language mostly resembles C/C++ with elements of sh(1) and awk(1). It is possible to write functional programs in Perl in almost any programming style one likes, from undisciplined BASIC spaghetti style, to cryptic APL idiom style, to crystalline Java OOP style. The choice is up to the programmer, but it is usually governed by the length of time available and the size of the task.

Some Unix culture

The choice of shells and scripting languages has been a cause of "holy wars" among certain long-time Unix users, so the above comments should be taken as being very subjective. I prefer bash(1) for short, one page scripts and Perl for long scripts and programs.

MVS "datasets" and Unix "files"

For someone coming from the Unix (or DOS) world, it is a little difficult to grasp the MVS notions of "datasets", "sequential datasets", "partitioned datasets", and "members of partitioned datasets". (The first question one asks is "why use such a complicated and redundant system?" Perhaps the MVS people find the Unix notion of files organised into a hierarchy of directories confusing...)

MVS "datasets" are groups of related data or relate files. The notion of "data" and "files" being a on similar level of organisation is foreign to Unix. In Unix, one finds data in files and files in directories.

There are two types of MVS datasets, sequential and partitioned.

MVS "sequential datasets" are like Unix files with a prefix that is equivalent to a directory structure. The prefix can contain a number of period-separated "levels", corresponding to Unix subdirectory names. An example: a MVS sequential dataset called LOSTPC.TRANF.ZIP would be equivalent to a Unix file named /LOSTPC/TRANSF/ZIP .

MVS "partitioned datasets" (or PDS) are like Unix directories with a bunch of files in them. The MVS "partitioned dataset members" correspond to Unix files. An example: a MVS PDS called LO.ST.CRDX.ILOSTAT would be equivalent to a Unix directory called /LO/ST/CRDX/ILOSTAT . A MVS PDS member called LO.ST.CRDX.ILOSTAT(FORMATS) would be equivalent to a Unix file called /LO/ST/CRDX/ILOSTAT/FORMATS .

In MVS, one must always refer to a dataset (or member) by its complete name with all of its prefixes. For this reason, it is unusual to have a dataset with more than four levels in it. In Unix, one may refer to a directory or file by its complete name, or by a name relative to the current working directory, or a combination of other relative addressing methods. For this reason, one uses as many subdirectories as one wants in order to place files in a logical hierarchical order.

This flexibility of Unix allows one to economise typing at the expense of needing to know what directory one is referring to. (Just type pwd at the command line to find out.) An example: In MVS, one can only refer to a sequential dataset as LOSTPC.TRANSF.ZIP. In Unix, one could refer to a file as /LOSTPC/TRANSF/ZIP (an absolute name), or ZIP (a relative name) if one's current working directory (CWD) was /LOSTPC/TRANSF, or TRANSF/ZIP if one's CWD was /LOSTPC or ../TRANSF/ZIP if one's CWD was /LOSTPC/TEST

The significance of upper and lower case

In MVS the case of letters makes no difference in file names and command names. In Unix it makes a difference. For this reason, Unix users usually prefer filenames with mostly lower-case letters so that there is not a need to continuously toggle the CAPS-LOCK key on and off. This difference in case handling often causes users some problems when they are beginning to learn Unix.

Letter case also causes problems when migrating scripts that have file names (dataset names) written inside of them. One must either decide to keep the old upper case MVS dataset names (preferably substituting "/" for ".") or one must change the file names to lower case everywhere that they appear in the old scripts. This task of search-and-change-case-in-many-files is best done on Unix systems with a script written in perl(1), sed(1), or awk(1).

4.2 EBCDIC and ASCII

An additional level of complication comes from the fact that MVS uses IBM's own character set EBCDIC rather than the standard ASCII or ISO 8859-1 character sets to encode text files. In order to read text files from the MVS system on the Unix system, they must be converted with a character conversion utility such as tr(1) or recode(1) or the character conversion option found in archiving software such as PKZIP.

Unfortunately, MVS systems allow operators the possibility of customising the EBCDIC character set. This was done many years ago at the ICC to accommodate the many accented characters found in the languages of the member nations of the UN, before the ISO-8859-1 (latin1) standard existed. As a result, standard converters such as recode(1) are unable to properly convert text files unless one builds a custom character-to-character map.

ICC has two utilities that aid in the conversion from their custom character set to latin1, PKZIP for text files and the TRANTAB procedure for SAS datasets. More will be said about these methods later.

4.3 Custom file formats

Some applications like SAS use version and platform-dependent file formats to store datasets. In SAS, one makes use of the conversion procedure CPORT to convert a SAS dataset into a transportable format. The dataset is then transported, either physically on a tape or electronically via an FTP session, and then CIMPORTed on the target platform.

If the character set of the original host has been customised like at ICC, one needs to specify a modified translation table when using CPORT.

Next Previous Contents