S 4.221 Parallel Sysplex under z/OS

Initiation responsibility: Head of IT, IT Security Officer

Implementation responsibility: Specialists Responsible, Administrator

A Parallel Sysplex cluster is a cluster consisting of several z/OS systems that appear to the outside as a single system. The z/OS systems can run in this case on one or more logical partitions (LPARs). All systems in the cluster are connected by a Coupling Facility for synchronisation purposes. When using more than one LPAR, a Timer Facility must be used to synchronise the system time (Clock). Additional information on this subject can be found in S 3.39 Introduction to the zSeries platform. Parallel Sysplex clusters are used when there are high requirements placed on the availability and scalability.

All z/OS systems in a Parallel Sysplex cluster are loaded from the same set of hard disks. The individual z/OS operating systems are distinguished using individual system definitions.

The following recommendations should be considered when using Parallel Sysplex clusters:

Use of the Coupling Facility

The Coupling Facility (CF) connects the LPARs. It also provides shared memory that is divided into various objects, which are referred to as Coupling Facility structures. Access to the CF is obtained via XES (Cross-System Extended Services). Three different types of memory can be defined in the CF:

Cache Structures

This structure provides high-performance memory that can be shared by several users. When data is read from the hard disk, a copy of the data is written to the user's own local memory buffer. Furthermore, an additional copy can be placed in the Cache Structure of the Coupling Facility as an option.

List Structures

This structure allows several users to exchange information among each other. The information is made available in lists (message passing) or in queues (queues of work).

Lock Structures

This structure can be used to control the use of resources in the Shared or Exclusive mode across all LPARs.

Operation

If you are considering operating a Parallel Sysplex cluster for availability reasons, for example, then the Coupling Facility should be used with data sharing if possible. This applies at least to JES2/3 (Job Entry Subsystem), RACF (Resource Access Control Facility), VTAM (Virtual Telecommunication Access Method), the System Logger, CICS, IMS, and DB2. It should be examined if it is necessary to design the Coupling Facility redundantly to meet the availability requirements of the overall system.

Coupling Facilities are defined and initialised using the HMC (Host Management Console). Recommendations for the use of this console can be found in S 4.207 Use and protection of system-related z/OS terminals.

Couple datasets

Couple datasets are used by the XCF (Cross-System Coupling Facility) to monitor information on the LPARs, groups or members. All LPARs of the Parallel Sysplex cluster must be able to access these datasets. The use of alternate couple datasets is recommended. In z/OS, the couple datasets must be protected using RACF. Only those employees (and their substitutes) who need to edit the files to perform their work should have write access to them (see S 4.211 Use of the z/OS security system RACF).

The IXCL1DSU utility is available to format the couple datasets. This program should be protected by RACF (PROGRAM class). The administrative utility XCMIAPU allows you to define the Coupling Facility Resource Management (CFRM) policy. It should be protected by a corresponding Facility profile in the RACF so that only authorised personnel can access it. Additional recommendations for protecting critical programs can be found in S 4.215 Protection of z/OS utilities that are critical to security.

Sysplex commands

The z/OS operating system provides the SETXCF system command for administration and monitoring purposes. It supports the following activities, among others:

definition of the couple datasets
switching between the primary couple dataset and backup couple dataset
activation of a new CFRM policy
initiation of the PATHIN or PATHOUT connection
changing the structure size
rebuild the structure after structure errors

To protect this command (and all other commands supporting the Parallel Sysplex cluster), corresponding RACF profiles must be defined (see S 4.210 Secure operation of the z/OS operating system).

XCF control

RMF (Resource Measurement Facility) generates an XCF Activity Report. Consideration should be given to using this report to monitor the message traffic between the z/OS operating systems in order to detect communication bottlenecks and deadlock situations early enough and take preventive measures.

Consistent RACF database

A RACF database with uniform RACF definitions should be used for all LPARs in the entire Parallel Sysplex cluster.

Standards

To improve clarity and maintainability, standards should be introduced in the following areas:

The parameter members of the PARMLIBs should be standardised. All names must be unique in the Parallel Sysplex cluster. This includes: dataset names, subsystem names, procedure names, and VTAM application IDs (see S 2.285 Determining standards for z/OS system definitions).
All system settings for the local definitions in PARMLIB and PROCLIB should be uniform. It is recommended that the individual definition members have an identical structure.
The System Managed Storage (SMS) structure must be uniform throughout the entire Parallel Sysplex cluster.
The system software used on all LPARs should be as uniform as possible (it may be necessary in this case to change the software licenses).

Dimensioning

It must be ensured that the caches of the hard disk control units, the work disks, the Coupling Facility structures, and the SPOOL disks are correctly dimensioned. The size of the areas is derived first and foremost from the type and the requirements of the applications running on the Parallel Sysplex cluster. In many cases, the documentation from the software manufacturer will contain information in this regard.

Serialisation

Global Resource Serialization (GRS cluster) must be configured to be able to serialise the system actions. The GRS mode must be defined in the IEASYSnn member of the PARMLIB (RING or STAR mode). If possible, the more modern STAR mode should be selected, since this topology usually offers faster processing due to the resource name lists (RNLs) stored in the couple datasets. The STAR mode also offers more advantages in terms of availability.

Warning: The STAR mode is only possible in conjunction with the Coupling Facility.

High availability through redundancy

Where the availability requirements are high or very high, then it should be examined if the use of the following redundancy mechanisms is appropriate:

RACF with a primary and a backup database
second Coupling Facility
alternate couple datasets
second timer (coupled using the FC 4048 high availability feature code and with a separate electrical circuit)
backup system environment so that a system reboot can be performed immediately in the event of an error
CTC-GRS ring (channel-to-channel adapter ESCON / General Resource Serialization)
backup Multiple Console Support (MCS) master console
data backup of important control files, and if possible, using the Concurrent Copy option (ADRDSSU utility)

Additional information can be found in S 6.93 Contingency planning for z/OS systems.

Hard disk access

The following recommendations relating to hard disk access should be considered:

In the Parallel Sysplex cluster, no hard disks should be available outside of the cluster. Hard disks not belonging to the cluster should only be set to online for recovery purposes.
Access to hard disks in the Parallel Sysplex cluster by other systems not belonging to the cluster should not be possible under production conditions.
Consideration should be given to using the Enhanced Catalog Sharing option if the performance requirements are high.
Test/development systems and production systems should not be operated in parallel in the same Parallel Sysplex cluster, if possible.
The operating system should be loaded from a single set of system disks for all z/OS systems in the Parallel Sysplex cluster.

Symbolic variables

Symbolic variables should be used whenever possible in the PARMLIB definitions. This helps to avoid errors in the system administration and makes system cloning easier.

System Logger

The System Logger should be used with Staging Dataset. (In the event of an error, other systems in the cluster will access these datasets).

Reducing the number of console messages

To reduce and keep manageable the number of console messages, it is recommended to enable message filtering (see S 4.210 Secure operation of the z/OS operating system). This is particularly important, because all messages from all z/OS operating systems in a Parallel Sysplex cluster are displayed on a single MVS console.

Review questions:

When using a Parallel Sysplex cluster in z/OS: Is it examined if it is necessary to design the Coupling Facility redundantly to meet the availability requirements of the overall system?
Is the access to the couple datasets protected in z/OS using RACF?
Is the administrative z/OS utility XCMIAPU protected using a corresponding Facility profile in RACF to ensure that only authorised personnel has access to it?
Is a RACF database with uniform RACF definitions used for all LPARs of the entire Parallel Sysplex cluster in z/OS?
Is the System Managed Storage structure uniform in the entire Parallel Sysplex cluster in z/OS?
When using a Parallel Sysplex cluster in z/OS: Is a GRS cluster installed for the serialisation of system actions?
Are hard disks in the Parallel Sysplex cluster in z/OS only made available within the cluster?
Are systems outside the cluster prevented from accessing hard disks of the Parallel Sysplex cluster in z/OS?