APM and Interop Services Stale simulatneously


Problem description:

All APM and Interop Services go stale at the same time. Stale issue happens when the scheduled actions job is running is specific days. Example: on weekends and month-end scheduled jobs. Scheduled actions were distributed to various week days and tried, but the issue reoccurred repeatedly. Application session blocking was identified as the main reasons for the stale but the logs and memory dumps did not contain any entries of errors, but it was obvious that the issue was happening on the database end. User was encouraged to take this up with Microsoft support and below is the summary of investigation and action taken. 

Summary of issue

APM Application Session Blocking

Analysis/Action plan

2022-06-25 05:11:08.367 Head Blocker - 83.3 {(17050229ms) LCK_M_U} - -- This stored procedure is called from the read - delete al              from oq.app_lock al              where not exists ( select * from oq.connection_reg c where c.session_id = al.session_id )

2022-06-25 05:11:08.367 |-- 53.0 {(7213316ms) LCK_M_IX -> PAGE: 5:1:14013174} - create procedure oq.GetAppLock     @p_class_id_in - INSERT INTO oq.app_lock( class_id, object_id, session_id, client_id, user_id, lock_type, lock_time )                    SELE

2022-06-25 05:11:08.367 |-- 69.0 {(17050019ms) LCK_M_IX -> PAGE: 5:1:14055585} - create procedure oq.GetAppLock     @p_class_id_in - INSERT INTO oq.app_lock( class_id, object_id, session_id, client_id, user_id, lock_type, lock_time )              SELECT @p

2022-06-25 05:11:08.367 |-- 81.0 {(17050203ms) LCK_M_IX -> PAGE: 5:1:14058024} - create procedure oq.GetAppLock     @p_class_id_in - INSERT INTO oq.app_lock( class_id, object_id, session_id, client_id, user_id, lock_type, lock_time )              SELECT @p

2022-06-25 05:11:08.367 |-- 83.5 {(17050217ms) CXPACKET} - -- This stored procedure is called from the read - delete al              from oq.app_lock al              where not exists ( select * from oq.connection_reg c where c.session_id = al.session_id )

2022-06-25 05:11:08.367 | |-- 53.0 {(7213316ms) LCK_M_IX -> PAGE: 5:1:14013174} - create procedure oq.GetAppLock     @p_class_id_in - INSERT INTO oq.app_lock( class_id, object_id, session_id, client_id, user_id, lock_type, lock_time )                    SE

2022-06-25 05:11:08.367 | |-- 69.0 {(17050019ms) LCK_M_IX -> PAGE: 5:1:14055585} - create procedure oq.GetAppLock     @p_class_id_in - INSERT INTO oq.app_lock( class_id, object_id, session_id, client_id, user_id, lock_type, lock_time )              SELECT

2022-06-25 05:11:08.367 | |-- 81.0 {(17050203ms) LCK_M_IX -> PAGE: 5:1:14058024} - create procedure oq.GetAppLock     @p_class_id_in - INSERT INTO oq.app_lock( class_id, object_id, session_id, client_id, user_id, lock_type, lock_time )              SELECT

2022-06-25 05:11:08.367 | |-- 83.2 {(61ms) CXPACKET} - -- This stored procedure is called from the read - delete al              from oq.app_lock al              where not exists ( select * from oq.connection_reg c where c.session_id = al.session_id )

2022-06-25 05:11:08.367 | | |-- 53.0 {(7213316ms) LCK_M_IX -> PAGE: 5:1:14013174} - create procedure oq.GetAppLock     @p_class_id_in - INSERT INTO oq.app_lock( class_id, object_id, session_id, client_id, user_id, lock_type, lock_time )                   

2022-06-25 05:11:08.367 | | |-- 69.0 {(17050019ms) LCK_M_IX -> PAGE: 5:1:14055585} - create procedure oq.GetAppLock     @p_class_id_in - INSERT INTO oq.app_lock( class_id, object_id, session_id, client_id, user_id, lock_type, lock_time )              SELEC

 

2022-06-25 05:11:08.367 Head Blocker - 83.3 {(17050229ms) LCK_M_U} - -- This stored procedure is called from the read - delete al              from oq.app_lock al              where not exists ( select * from oq.connection_reg c where c.session_id = al.session_id )

2022-06-25 05:11:08.367 |-- 53.0 {(7213316ms) LCK_M_IX -> PAGE: 5:1:14013174} - create procedure oq.GetAppLock     @p_class_id_in - INSERT INTO oq.app_lock( class_id, object_id, session_id, client_id, user_id, lock_type, lock_time )                    SELE

2022-06-25 05:11:08.367 |-- 69.0 {(17050019ms) LCK_M_IX -> PAGE: 5:1:14055585} - create procedure oq.GetAppLock     @p_class_id_in - INSERT INTO oq.app_lock( class_id, object_id, session_id, client_id, user_id, lock_type, lock_time )              SELECT @p

2022-06-25 05:11:08.367 |-- 81.0 {(17050203ms) LCK_M_IX -> PAGE: 5:1:14058024} - create procedure oq.GetAppLock     @p_class_id_in - INSERT INTO oq.app_lock( class_id, object_id, session_id, client_id, user_id, lock_type, lock_time )              SELECT @p

2022-06-25 05:11:08.367 |-- 83.5 {(17050217ms) CXPACKET} - -- This stored procedure is called from the read - delete al              from oq.app_lock al              where not exists ( select * from oq.connection_reg c where c.session_id = al.session_id )

2022-06-25 05:11:08.367 | |-- 53.0 {(7213316ms) LCK_M_IX -> PAGE: 5:1:14013174} - create procedure oq.GetAppLock     @p_class_id_in - INSERT INTO oq.app_lock( class_id, object_id, session_id, client_id, user_id, lock_type, lock_time )                    SE

2022-06-25 05:11:08.367 | |-- 69.0 {(17050019ms) LCK_M_IX -> PAGE: 5:1:14055585} - create procedure oq.GetAppLock     @p_class_id_in - INSERT INTO oq.app_lock( class_id, object_id, session_id, client_id, user_id, lock_type, lock_time )              SELECT

2022-06-25 05:11:08.367 | |-- 81.0 {(17050203ms) LCK_M_IX -> PAGE: 5:1:14058024} - create procedure oq.GetAppLock     @p_class_id_in - INSERT INTO oq.app_lock( class_id, object_id, session_id, client_id, user_id, lock_type, lock_time )              SELECT

2022-06-25 05:11:08.367 | |-- 83.2 {(61ms) CXPACKET} - -- This stored procedure is called from the read - delete al              from oq.app_lock al              where not exists ( select * from oq.connection_reg c where c.session_id = al.session_id )

 

The wait type for main blocker is CXPACKET, because the value of this Max degree of parallelism is 0.

Checked statistics for these tables “app_lock and connection_reg”, and found the statistics for this table app_lock is sampled update statistics.

 

Action plan

Try below action plan on your server,

Added index on this table "oq.app_lock" for column session_id
 

Monitored the application for two weeks, the issue has not occurred again.