Recently, we had an odd issue in our client's production environment. Many years ago, may be even before I knew there is something called mainframe, someone had built two set of jobs. The first job, lets call it LOGGER reads log messages from a MQ queue and off-loads the messages in a mainframe dataset for archive them. Now this job was special as it will never stop unless it abends or it reads a message from the queue which instruct the job to stop. Here comes the second job, lets call it STOPJOB, it will put a special message in the same MQ queue telling the first job to stop. The first job used to start along with the OS and used to stop only when Operator scheduled the second job.
Everything was running fine until one day when an Operator, by mistake, scheduled the STOPJOB twice during a scheduled quarterly maintenance. It caused two stop message in the MQ queue. The first message was read by LOGGER and it stopped normally. After maintenance along with the OS when the LOGGER was started it read the second stop message and stopped itself. Nobody noticed that until MQ job abended due to space issue which caused quite big issue. After lot of investigation the root-cause was found and it was decided to put a fail-safe mechanism as the same situation was very much possible to happen again.
After lot of discussion it was decided that the best simple solution will be to somehow stop the STOPJOB from being executed when LOGGER is not running or in other terms STOPJOB should only put the stop message in the MQ queue when it knows for sure the LOGGER is running. So our problem will be solved if somehow STOPJOB could know that LOGGER is running and then only put the stop message in the MQ queue.
The challenge was, how to know if a job is currently executing and to know it with-in the job?
As always Google search was our best consultant here. After searching through IBM Information Center and other libraries, found that there is a REXX interface to SDSF. This is really cool and exciting, anything that you can manually perform inside spool can also be done via a REXX program. For example, searching for jobs currently running, set the job prefix or owner and anything like that.
So we wrote a wee script and used that in STOPJOB just before executing the step that put the stop message in the MQ queue.
JOBSTAT - REXX Script
/* REXX */
/*
Author: Chandan Chowdhury
Purpose
-------
Given a job name this script will go into executing system's spool and check if the job is currently running or not.
Return Code
-----------
RC=00, when job is running.
RC=01, when job is not running.
RC>04, when error
Interface: SDSF for REXX
*/
arg jobid
/* attach to host command environment */
IsfRC = IsfCalls("ON")
If IsfRC <> 0 Then Do
say "IsfCalls: RC = " IsfRC
exit IsfRC
End
IsfPrefix = jobid /* Only one jobid */
IsfOwner = "*" /* Any Owner */
/* check the INPUT spool queue */
address sdsf "IsfExec i"
If RC <> 0 Then Do
"SDSF: RC = " RC
exit RC
End
JRC = 00
/*
JNAME stem has the list of jobs mathcing the prefix and owner.
JNAME.0 have the count of jobs found.
*/
If JNAME.0 > 0 Then
do
JRC = 00 /* Job is found running, so set RC=00 */
end
else
do
JRC = 01 /* Job is NOT found running, so set RC=01 */
end
end
/* Detach from host command environment */
IsfRC = IsfCalls("OFF")
If IsfRC <> 0 Then Do
say "IsfCalls RC = " IsfRC
exit IsfRC
end
/* exit with a Return Code */
exit JRC
The below job snippet uses the above script
//JOBSTAT EXEC PGM=IRXJCL,PARM='JOBSTAT LOGGER'
//SYSEXEC DD DISP=SHR,DSN=PROD.REXX.LIB
//DDDUMP DD SYSOUT=*
//SYSTSPRT DD SYSOUT=*
// IF ( RC = 0 ) THEN
//STOPJOB EXEC=STOPJOB,PARM='LOGGER'
// ENDIF
Monday, August 26, 2013
Monday, April 08, 2013
DB2 Package Issue - Cross Collection package use
In this post I am trying to explain a wee complex scenario that we had while we were trying to fix an production application issue. This is not at all a good read because I am writing this only for the sake of documenting the issue and the solution we choose so that it can remind me later.
This is no issue for the main program in AB01PROD because we had added CD01PROD to the PKLIST of the main program and that enables the main program in AB01PROD to use a package in the CD01PROD collection directly or via another sub program.
The issue was this sub program NAB02 was also being called from a COBOL-DB2 Stroed Procedure NABSP01. Now any COBOL DB2 Stored Procedure runs as a sub program under the DB2 plan DSNTSERV
Restricted
- There are defined DB2 Prefixes, lets call them AB01, CD01 and a DB2 package with prefix AB01 can access the tables under same prefix. So to access a CD01PROD.TABLE1 the package must have a prefix of CD01PROD. Another package say AB01PROD.NAB01 cannot access the table.
- A package can only be added to one single collection. So the package NCD01 cannot coexists in both CD01PROD and AB01PROD.
Allowed
- The DB2 Plans are allowed to access packages in multiple collections mentioned in its PKLIST.
The Issue
We had an existing sub program called NCD01 which was accessing a table CD01PROD.TABLE1 and will confirm if a particular policy is of a special type or not. We had another sub program NAB02 which need to do the same check that NCD01 was doing additional to its own logic. So instead of re-adding the logic from from NCD01 into NAB02 we called NCD01 from NAB02 which to reuse the existing logic. So the situation is a program which is in collection AB01PROD need access to a package in CD01PROD collection. Due to restriction no-2 we cannot have the package of NCD01 in AB01PROD collection.This is no issue for the main program in AB01PROD because we had added CD01PROD to the PKLIST of the main program and that enables the main program in AB01PROD to use a package in the CD01PROD collection directly or via another sub program.
The issue was this sub program NAB02 was also being called from a COBOL-DB2 Stroed Procedure NABSP01. Now any COBOL DB2 Stored Procedure runs as a sub program under the DB2 plan DSNTSERV
Solution
We had to dynamically change the package path using SET CURRENT PACKAGE PATH command of DB2. So in NAB01 before calling NCD01 we set the package path to CD01PROD. Then call NCD01 and reset the package back to what it was.
Subscribe to:
Posts (Atom)