Sterling OMS Health Monitor

By | 09/30/2017

Sterling OMS Health Monitor

Sterling OMS Health Monitor

Sterling OMS Health Monitor

In this post we are going to see how Sterling OMS uses health monitor process.

Database Table name YFS_HEARTBEAT
Command To start health Monitor startHealthMonitor.cmd

For application health monitor and reporting we are having various tools in market. Here is few mainly used tools for health monitor.

  • Anturis
  • Dynatrace
  • AppDynamics
  • TraceView
  • Boundary

Now the next question comes to our mind; Why we need to have health monitor in OMS when lot of tools available in market ?

Sterling OMS health Monitor process, mainly implemented for communication between the OMS servers (Application/Agent/Integration). When cache is cleared by user using Clear Cache button (System — Launch System Console — Clear Cache), System needs to find the all the running/active servers and inform them about clear cache.

Before getting into details lets first understand the columns and use of YFS_HEARTBEAT table.

When any server (Application / Agent / Integration) server started an entry been made into YFS_HEARTBEAT with status code “00” (Running).

Column Name Data Type Description
HEARTBEAT_KEY Char (24) The primary key for the YFS_HEARTBEAT table.  
LAST_HEARTBEAT DateTime The timestamp of the last heartbeat. 
SERVICE_NAME Varchar2 (100) The service, agent or component that collects and stores the statistics. 
SERVER_NAME Varchar2 (100) A unique name to identify a server 
SERVER_TYPE Varchar2 (40) The type of the server. For example, the server type can be AGENT, INTEGRATION or APPSERVER
SERVER_ID Varchar2 (100) The identifier associated with the server. 
STATUS Varchar2 (40) The status associated with this server. The valid values are:
• 00: RUNNING
• 01: STOPPED
• 02: TERMINATE 
THREADS_CONFIGURED Number (5,0) The number of threads configured for the server. 
ACTIVE_THREADS Number (5,0) The number of active threads in the server. 
HOST_NAME Varchar2 (100) The host name on which the server is running. 
SERVER_START_TIME DateTime The time stamp of the agent server start time. 
PERCENT_CACHE_USED Number (15,2) The percentage of cache used. 
RMI_OBJECT BLOB Rmi object for the agent servers 

In this table we have few important columns

  • server start time : When exactly server started
  • last heart beat : Time when server communicated the status back to health monitor
  • Host name : Where the server started exactly
  • Server type : AGENT, INTEGRATION or APPSERVER
  • RMI_OJBECT :

RMI ?

Remote method Invocation. yes you read this correctly. Sterling uses RMI calls to communicate between servers.

The RMI (Remote Method Invocation) is an API that provides a mechanism to create distributed application in java. The RMI allows an object to invoke methods on an object running in another JVM.

Click here to read more about Java RMI

How Entry made into YFS_HEARTBEAT table ?

yfs.properties configuration related to Health Monitor?

Properties Name Description
rmi.portrange In a deployment with servers in two different network zones,
The firewall between them must be configured to allow Remote Method Invocation (RMI) Communication between them.
yantra.hm.purge.interval Health monitor purge interval in days. System default value used for purging heartbeat, Snapshot, and page cache records.
If this value is not specified, the default value is 30 days.
yantra.statistics.persist.interval Property to determine statistics logging time interval.
Valid values for minutes (M/m) = 1, 2, 3, 4, 5, 6, 10, 12, 15, 20, or 30
Valid values for hours (H/h) = 1, 2, 3, 4, 6, 8, or 12
Default = 10m
yfs.heartbeat.refresh.interval Valid values for minutes (M/m) = 1, 2, 3, 4, 5, 6, 10, 12, 15, 20, or 30
Valid values for minutes (H/h) = 1, 2, 3, 4, 6, 8, or 12
Default = 10m

So here is the important point

By default, that refresh interval is set to yantra.statistics.persist.interval  / 2 

YFS_HEARTBEAT table record entry and update

  • As soon as server started record insert into yfs_heartbeat table with status code 00 (active)
  • Based on refresh interval parameter (every 5 minutes) last_heartbeat column been updated

Clear Cache Process

Clear cache

How does startHealthMonitor.cmd works ?

# Operation Query Remarks
1 Delete record from YFS_HEARTBEAT
Table where status not active (00) and
MODIFYTS date 30 days older
Delete from YFS_HEARTBEAT
Where STATUS != ’00’ AND
MODIFYTS < {ts ‘2017-08-31 01:24:51’}
Here 30 considered from
yantra.hm.purge.interval
2 Delete record from YFS_SNAPSHOT
Where MODIFYTS date 30 days older
Delete from YFS_SNAPSHOT
Where MODIFYTS < {ts ‘2017-08-31 01:24:52’}
Here 30 considered from
yantra.hm.purge.interval
3 Delete record from PLT_PAGED_DATA table
Where last_accessed date 30 days older
DELETE /*YANTRA*/ FROM PLT_PAGED_DATA
WHERE LAST_ACCESSED < ?
Here 30 considered from
yantra.hm.purge.interval
4 Select record from heartbeat table with status
As 00 but last heart beat record not having
Update for past N minutes (10)
SELECT /*YANTRA*/ YFS_HEARTBEAT.*
FROM YFS_HEARTBEAT YFS_HEARTBEAT
WHERE STATUS = ’00’
AND LAST_HEARTBEAT < 2017-09-30T01:23:03
Considered
yantra.statistics.persist.interval(10 min)Current time : 2017-09-30 01:33:03

LAST_HEARTBEAT < Current time – 10 minutes

5 from previous query result get each heart beat
Key and do select for update
SELECT /*YANTRA*/ YFS_HEARTBEAT.*
FROM YFS_HEARTBEAT YFS_HEARTBEAT
WHERE (YFS_HEARTBEAT.HEARTBEAT_KEY
= ‘2017083118085025272’)
FOR UPDATE NOWAIT
6 Update status as stopped (02) for the
Selected heart beat key
update /*YANTRA*/ YFS_HEARTBEAT
set STATUS = ’02’,MODIFYUSERID = ‘HM’,
MODIFYPROGID = ‘HM’,
MODIFYTS = {ts ‘2017-09-30 01:33:22’},
LOCKID=170 WHERE LOCKID = ?
AND HEARTBEAT_KEY= ?

Above steps helps to maintain the active records in YFS_HEARTBEAT table.

Questions

1.What will happen if we stop (Control + C in windows command prompt) the agent/application server  ?

Answer : Status record will be updated with 01 (Stopped); next time when health monitor picks this record gets deleted

2.What will happen if we kill the server (agent/application) server ?

Answer : Record will say in status 00 (active); Health monitor agent finds last heart beat record not updated for some time; so the change will be changed to 01 and later gets removed

3.Will be able to trigger email when server terminated unexpectedly ?

Answer : See below configuration

Health Monitor Configuration

3.Can we use other monitoring tools and stop using OMS Health Monitor ?

No; If records not cleaned in YFS_HEARTBEAT table; Too many stale entry cause slowness in process. OMS Health monitor should be enabled and used for effective internal communications. We can use other monitor tools for CPU usage, desk space and server up and running.

4.How to change thresholds for Application server, api, agent server ?

  • Application Server: yantra.hm.appserver.threshold (yfs.properties)
  • API: yantra.hm.api.threshold (yfs.properties)
  • Agent/Integration Server: yantra.hm.agent.threshold (yfs.properties)

Additionally you can modify them from System management console as well.

Please share your feedback on this post. If you have any query please comment below or email as directly at support@activekite.com.

Happy Learning !!!!

Please register with us to get more OMS learning updates.

Click here to read OMS Interview Questions

11 thoughts on “Sterling OMS Health Monitor

  1. Mohamed Shaikna

    Hello admin,
    Nice and useful Post – Thanks.
    But have a small doubt.

    “How to change thresholds for Application server, api, agent server ?”
    What thresholds do you mean here?? Please help..

    Reply
    1. admin Post author

      Threshold can be changed from System — Launch System Management click on the server image under application hosts section or agent/integration server group. Threshold is nothing but average response time or number of tasks can be processed in given time.

      For example admin server can have threshold of 0.20 sec
      Agent Servers can have 10,000 tasks as threshold
      Adjust Inventory API can have threshold of 8.0 sec

      Hope this helps

      Reply
  2. ravi

    how do we know about which cache table which application are using ?

    Reply
    1. admin Post author

      Not directly. We got to know the experience. Which API uses which tables we can get to know from the API document. But that does not have all the cached table information. Cache table information can be found via logs.

      Reply
      1. Satheeshkumar Thangaraj

        There is an alternative way to this I suppose.
        dbClassCache.properties file has all the tables enabled for caching.
        Any custom tables (who results need to be cached) can be enabled by including the dbClassCache related properties for the custom tables in customer_overrides.properties.

        Something like this..

        dbclassCache..enabled=true
        CUSTOM_TABLE_NAME.class=com.yantra.shared.dbclasses.DBCacheHome

        Reply
  3. Praveen

    Here, We need to create HMAlert user for the health monitor agent, but my question is to know whether the user should be an Active on Inactive? with LDAP or without LDAP integration to make it work?

    Thanks & Regards

    Reply

Leave a Reply

Your email address will not be published. Required fields are marked *