This section describes just a few of the problems that one can expect to encounter during a MVS to Unix migration project.
A difficulty with a migration job is making oneself understood to people who have never or rarely used one of the systems involved. For this reason, I have included this small "jargon" section that tries to briefly explain the similarities and differences of MVS and Unix system concepts.
JCL (Job Control Language) is a primitive job scripting language that runs on MVS systems. JCL is often extended with a command set called JES (Job Entry Subsystem). JES statements supply information to increase the efficiency of reading, scheduling, and printing jobs.
I do not know of an equivalent of JCL under Unix, so you will just
have to imagine a scripting language consisting of lists of files to
be used, programs to be executed, perhaps some in-line input for the
programs, and perhaps a few THEN
and GOTO
statements to
do error handling (in later versions). There is also information
about whose account the job will run on and how long it may run.
Apparently most advanced IBM mainframe scripting is done in a language
called REXX and JCL is used only when the job at hand is very simple.
TSO (Time Sharing Option) is the command line interpreter used on MVS systems. Unlike Unix, one cannot simply make a script by sticking a bunch of TSO commands in a file and executing the file. Instead, one uses the CLIST language to write such scripts. Most people around here rarely use TSO or CLISTs, choosing to use ISPF instead.
ISPF (Interactive Structured Programming Facility) is a system of menus and forms that allows one to do allocations, edit files, run programs, etc. Experienced users develop short-cuts for getting to frequently-used menus quickly and the system works well if the network between the workstation and mainframe is rapid. Unfortunately, like most window-based interfaces, this system does not encourage users to learn the scripting approach to common jobs that must be repeated on many files at the same time.
A Unix "shell" is the equivalent of a combination of JCL, TSO, and
REXX on MVS. Unix shells can be used as a command-line user interface
to the operating system or as a language for scripting tasks, simply
by creating a text file with the commands to be executed and then
running the file (like an MS-DOS .BAT
file). If one wants to
schedule the job to run at a specific time, one uses the
at(1)
, batch(1)
, or crontab(1)
commands.
Many different shells have been written over the last 30 years, and all of them have somewhat different command syntax. For this reason, there is no "pure" Unix scripting language, but a collection of many dialects, some of them quite strange.
sh(1)
and bash(1)
For the sake of brevity, I will say that the original Bourne shell
sh(1)
is the canonical Unix shell language, and that the
Bourne Again shell bash(1)
is the best shell to use if one
wishes to comfortably write scripts with the same basic syntax.
Scripts written for sh(1)
will run under bash(1)
without any modifications, but since bash(1)
has many
additional features, a script written with bash(1)
might not
work on sh(1)
without some modification. Since
bash(1)
is freely available for any Unix, this should not be
a problem for most people.
If one needs to write a long or complicated script with much string
manipulation and pattern matching (a language translator, for
example), perl(1)
is probably the best choice of language.
Perl is a highly flexible "duct-tape" language that has borrowed from
the traditions of at least 20 different languages. At first sight the
language mostly resembles C/C++ with elements of sh(1)
and
awk(1)
. It is possible to write functional programs in Perl
in almost any programming style one likes, from undisciplined BASIC
spaghetti style, to cryptic APL idiom style, to crystalline Java OOP
style. The choice is up to the programmer, but it is usually governed
by the length of time available and the size of the task.
The choice of shells and scripting languages has been a cause of "holy
wars" among certain long-time Unix users, so the above comments should
be taken as being very subjective. I prefer bash(1)
for
short, one page scripts and Perl for long scripts and programs.
For someone coming from the Unix (or DOS) world, it is a little difficult to grasp the MVS notions of "datasets", "sequential datasets", "partitioned datasets", and "members of partitioned datasets". (The first question one asks is "why use such a complicated and redundant system?" Perhaps the MVS people find the Unix notion of files organised into a hierarchy of directories confusing...)
MVS "datasets" are groups of related data or relate files. The notion of "data" and "files" being a on similar level of organisation is foreign to Unix. In Unix, one finds data in files and files in directories.
There are two types of MVS datasets, sequential and partitioned.
MVS "sequential datasets" are like Unix files with a prefix that is
equivalent to a directory structure. The prefix can contain a number
of period-separated "levels", corresponding to Unix subdirectory
names. An example: a MVS sequential dataset called LOSTPC.TRANF.ZIP
would be equivalent to a Unix file named /LOSTPC/TRANSF/ZIP
.
MVS "partitioned datasets" (or PDS) are like Unix directories with a
bunch of files in them. The MVS "partitioned dataset members"
correspond to Unix files. An example: a MVS PDS called
LO.ST.CRDX.ILOSTAT
would be equivalent to a Unix directory
called /LO/ST/CRDX/ILOSTAT
. A MVS PDS member called
LO.ST.CRDX.ILOSTAT(FORMATS)
would be equivalent to a Unix
file called /LO/ST/CRDX/ILOSTAT/FORMATS
.
In MVS, one must always refer to a dataset (or member) by its complete name with all of its prefixes. For this reason, it is unusual to have a dataset with more than four levels in it. In Unix, one may refer to a directory or file by its complete name, or by a name relative to the current working directory, or a combination of other relative addressing methods. For this reason, one uses as many subdirectories as one wants in order to place files in a logical hierarchical order.
This flexibility of Unix allows one to economise typing at the expense
of needing to know what directory one is referring to. (Just type
pwd
at the command line to find out.) An example: In MVS, one can
only refer to a sequential dataset as LOSTPC.TRANSF.ZIP.
In
Unix, one could refer to a file as /LOSTPC/TRANSF/ZIP
(an
absolute name), or ZIP
(a relative name) if one's current
working directory (CWD) was /LOSTPC/TRANSF
, or
TRANSF/ZIP
if one's CWD was /LOSTPC
or
../TRANSF/ZIP
if one's CWD was /LOSTPC/TEST
In MVS the case of letters makes no difference in file names and command names. In Unix it makes a difference. For this reason, Unix users usually prefer filenames with mostly lower-case letters so that there is not a need to continuously toggle the CAPS-LOCK key on and off. This difference in case handling often causes users some problems when they are beginning to learn Unix.
Letter case also causes problems when migrating scripts that have file
names (dataset names) written inside of them. One must either decide
to keep the old upper case MVS dataset names (preferably substituting
"/
" for ".
") or one must change the file names to lower case
everywhere that they appear in the old scripts. This task of
search-and-change-case-in-many-files is best done on Unix systems with
a script written in perl(1)
, sed(1)
, or awk(1)
.
An additional level of complication comes from the fact that MVS uses
IBM's own character set EBCDIC rather than the standard ASCII or ISO
8859-1 character sets to encode text files. In order to read text
files from the MVS system on the Unix system, they must be converted
with a character conversion utility such as tr(1)
or
recode(1)
or the character conversion option found in
archiving software such as PKZIP
.
Unfortunately, MVS systems allow operators the possibility of
customising the EBCDIC character set. This was done many years ago at
the ICC to accommodate the many accented characters found in the
languages of the member nations of the UN, before the ISO-8859-1 (latin1)
standard existed. As a result, standard converters such as
recode(1)
are unable to properly convert text files unless
one builds a custom character-to-character map.
ICC has two utilities that aid in the conversion from their custom character set to latin1, PKZIP for text files and the TRANTAB procedure for SAS datasets. More will be said about these methods later.
Some applications like SAS use version and platform-dependent file
formats to store datasets. In SAS, one makes use of the conversion
procedure CPORT
to convert a SAS dataset into a transportable
format. The dataset is then transported, either physically on a tape
or electronically via an FTP session, and then CIMPORT
ed on
the target platform.
If the character set of the original host has been customised like at
ICC, one needs to specify a modified translation table when using
CPORT
.