T 4.50 z/OS operating system overload

Even if a z/OS operating system is managed using the workload manager such that an overload should not actually occur, there exists a series of threats that could lead to an overload. An overload must not necessarily lead to a complete system halt. It is also possible for various system resources to simply be no longer available, even though the system itself is still responding. The following situations are typical, but are not the only threats of this type.

Spool full situation

The spool file in a Job Entry Subsystem (JESx) is only intended to be used for a certain volume of output data. It may occur that, for example, unlimited data are written to the spool file in the JESx due to a program loop. This action can lead to a spool full situation, new batch jobs can no longer be started. Only the online processes actually running will remain, in some circumstances, active, provided no output files are written to the spool file. As many JES commands require a useable spool file for execution, this situation may mean that extensive (and time-consuming) recovery measures are necessary to rectify the problem.

Complete system halt

Unix processes in the USS subsystem (Unix System Services) are mapped to address spaces in z/OS. If sufficient memory is no longer available, these address spaces must be swapped to page disks using the Auxiliary Storage Manager (ASM). If these are also insufficient, it is no longer possible to add any address space.

If the number of Unix processes in the USS is not limited and there is insufficient space on the page disks, security problems can result from the starting too many Unix processes. The cause, for instance, can be a recursive function that incessantly starts new Unix processes. As a consequence the system may practically come to a halt.

z/OS (with 64-bit addressing) is considerably less affected by this problem compared to its predecessor, OS/390 (with 31-bit addressing), due to the increased addressing available. As a result of the increased addressing available, more memory can be provided to the z/OS system. This factor has the consequence that the page disks are required much later.

In general, commands or program routines that continuously start new processes can rapidly overload the system. In the end this situation can make an Initial Program Load (IPL ) necessary.

System overload due to an excessive number of JESx initiators

The administrator controls the batch processing and its priorities based on the number of initiators started. If too few initiators are started, queues can be produced during batch processing. If too many initiators are started, resources may be overloaded.

If too many batch jobs are started, there is a risk that the page datasets will be insufficient. This situation would require manual intervention by the operators in the system configuration.

If the job entry subsystem has been defined with a very large number of initiators that, however, are not activated immediately, it may occur that on the entry of the JES2 commando $SI (instead of e. g. $SI1-10), all possible initiators are started. As a result more batch jobs than planned may run in certain circumstances. Although as a rule this situation will not lead to a system halt, response times may become significantly longer.

Delayed tape processing

If more tape units are requested simultaneously than there are stations present, the backup of the data to tape will be delayed. The backup jobs enter the wait status and wait for free tape stations.

Examples