DetailPage-MSS-KB

Microsoft small business knowledge base

Article ID: 2681388 - Last Review: July 9, 2012 - Revision: 2.0

SYMPTOMS

In Operations Manager, one of the performance concerns surrounds Operations Manager Database and Data Warehouse insertion times. The following is a description to help identify and troubleshoot problems concerning Database and Data Warehouse data insertion.  

Examine the Operations Manager Event log for the presence of Event ID 2115 events. These events typically indicate that performance issues exist on the Management Server or the Microsoft SQL Server that is hosting the OperationsManager or OperationsManager Data Warehouse databases. Database and Data Warehouse write action workflows run on the Management Servers and these workflows first retain the data received from the Agents and Gateway Servers in an internal buffer. They then gather this data from the internal buffer and insert it into the Database and Data Warehouse. When the first data insertion has completed, the workflows will then create another batch.  

The size of each batch of data depends on how much data is available in the buffer when the batch is created, however there is a maximum limit on the size of the data batch of up to 5000 data items.  If the data item incoming rate increases, or the data item insertion throughput to the Operation Manager and Data Warehouse databases throughput is reduced, the buffer will then accumulate more data and the batch size will grow larger.  There are several write action workflows that run on a Management Server.  These workflows handle data insertion to the Operations Manager and Data Warehouse databases for different data types.  For example:

  • Microsoft.SystemCenter.DataWarehouse.CollectEntityHealthStateChange
  • Microsoft.SystemCenter.DataWarehouse.CollectPerformanceData
  • Microsoft.SystemCenter.DataWarehouse.CollectEventData
  • Microsoft.SystemCenter.CollectAlerts
  • Microsoft.SystemCenter.CollectEntityState
  • Microsoft.SystemCenter.CollectPublishedEntityState
  • Microsoft.SystemCenter.CollectDiscoveryData
  • Microsoft.SystemCenter.CollectSignatureData
  • Microsoft.SystemCenter.CollectEventData

When a Database or Data Warehouse write action workflow on a Management Server experiences slow data batch insertion, for example times in excess of 60 seconds, it will begin logging Event ID 2115 to the Operations Manager Event log. This event is logged every one minute until the data batch is inserted into the Database or Data Warehouse, or the data is dropped by the write action workflow module. As a result, Event ID 2115 will be logged due to the latency inserting data into the Database or Data Warehouse. Below is an example Event logged due to data dropped by the write action workflow module:  

Event Type: Error
Event Source: HealthService
Event Category: None
Event ID: 4506
Computer: <RMS NAME>
Description:
Data was dropped due to too much outstanding data in rule "Microsoft.SystemCenter.OperationalDataReporting.SubmitOperationalDataFailed.Alert" running for instance <RMS NAME> with id:"{F56EB161-4ABE-5BC7-610F-4365524F294E}" in management group <MANAGEMENT GROUP NAME>.

 
Event ID 2115 contains 2 significant pieces of information.  First, the name of the Workflow that is experiencing the problem and second, the elapsed time since the workflow began inserting the last batch of data. 

For example:

Log Name: Operations Manager
Source:        HealthService
Event ID:      2115
Level:         Warning
Computer:      <RMS NAME>
Description:
A Bind Data Source in Management Group <MANGEMENT GROUP NAME> has posted items to the workflow, but has not received a response in 300 seconds.  This indicates a performance or functional problem with the workflow.
 Workflow Id : Microsoft.SystemCenter.CollectPublishedEntityState
 Instance    : <RMS NAME>
Instance Id : {88676CDF-E284-7838-AC70-E898DA1720CB}


This particular Event ID 2115 message indicates that the workflow Microsoft.SystemCenter.CollectPublishedEntityState, which writes Entity State data to the Operations Manager database, is trying to insert a batch of Entity State data and it started 300 seconds ago.  In this example the insertion of the Entity State data has not yet finished.  Normally inserting a batch of data should complete within 60 seconds.  If the Workflow Id contains Data Warehouse then the problem concerns the Operations Manager Data Warehouse.  Otherwise, the problem would concern inserting data into the Operations Manager database.

 

CAUSE

As the description of Event ID 2115 states, this may indicate a database performance problem or too much data incoming from the agents. Event ID 2115 simply indicates there is a backlog inserting data into the Database; Operations Manager or Operations Manager Data Warehouse. These Events can originate from a number of possible causes. For example, a large amount of Discovery data, a Database connectivity issue or full database condition, potential disk or network constraints.

In Operations Manager, Discovery data insertion is a relatively expensive process. We define a burst of data as a short period of time where a significant amount of data is received by the Management Server. These bursts of data can cause Event ID 2115 since the data insertion should occur infrequently. If Event ID 2115 consistently appears for Discovery data collection, this can indicate either a Database or Data Warehouse insertion problem or Discovery rules in a Management Pack collecting too much discovery data.

Operations Manager configuration updates caused by Instance Space changes or Management Pack imports have a direct effect on CPU utilization on the Database Server and this can impact Database insertion times. Following a Management Pack import or a large instance space change, it is expected to see Event ID 2115 messages. For more information on this topic please see the following:

2603913 - How to detect and troubleshoot frequent configuration changes in Operations Manager (http://support.microsoft.com/kb/2603913) (http://support.microsoft.com/kb/2603913)

If the Operations Manager or Operations Manager Data Warehouse databases are out of space or offline, it is expected that the Management Server will continue to log Event ID 2115 messages to the Operations Manager Event log and the pending time will grow higher.

If the write action workflows cannot connect to the Operations Manager or Operations Manager Data Warehouse databases, or they are using invalid credentials to establish their connection, the data insertion will be blocked and Event ID 2115 messages will be logged accordingly until this situation is resolved.

In Operations Manager, expensive User Interface queries can impact resource utilization on the Database which can lead to latency in Database insertion times. When a user is performing an expensive User Interface operation it is possible to see Event ID 2115 messages logged.


Event ID 2115 messages can also indicate a performance problem if the Operations Manager Database and Data Warehouse databases are not properly configured. Performance problems on the database servers can lead to Event ID 2115 messages. Some possible causes include the following:

  • The SQL Log or TempDB database is too small or out of space.
  • The Network link from the Operations Manager and Data Warehouse database to the Management Server is bandwidth constrained or the latency is large. In this scenario we recommend to Management Server to be on the same LAN as the Operations Manager and Data Warehouse server.
  • The data disk hosting the Database, logs or TempDB used by the Operations Manager and Data Warehouse databases is slow or experiencing a function problem. In this scenario we recommend leveraging RAID 10 and we also recommend enabling battery backed Write Cache on the Array Controller.
  • The Operations Manager Database or Data Warehouse server does not have sufficient memory or CPU resources.
  • The SQL Server instance hosting the Operations Manager Database or Data Warehouse is offline.

It is recommend that the Management Server reside on the same LAN as the Operations Manager and Data Warehouse database server.

Event ID 2115 messages can also occur if the disk subsystem hosting the Database, logs or TempDB used by the Operations Manager and Data Warehouse databases is slow or experiencing a function problem. In this scenario we recommend leveraging RAID 10 and we also recommend enabling battery backed Write Cache on the Array Controller. 

RESOLUTION

The first step in troubleshooting Event ID 2115 is to identify what Data Items are being returned within the Event. For example, the Workflow ID indicates what type of Data Items (Discovery, Alerts, Event, Perf) as well as what Database in question. From the Workflow ID we can then determine the corresponding database. If the Workflow ID contains the term DataWarehouse, then the focus should be towards the Operations Manager Data Warehouse. In other instances the focus would be towards the Operations Manager Database.  

The seconds indicator in the Event ID 2115 message returns how long the workflow in question has been waiting to insert the data items into the Database.   

Scenario 1

In the example Event ID 2115 below, the problem concerns the workflow Microsoft.SystemCenter.CollectSignatureData.

Event Type: Warning
Event Source: HealthService
Event Category: None
Event ID: 2115
Computer: <RMS NAME>
Description:
A Bind Data Source in Management Group <MANGEMENT GROUP NAME> has posted items to the workflow, but has not received a response in 300 seconds. This indicates a performance or functional problem with the workflow.
Workflow Id : Microsoft.SystemCenter.CollectSignatureData
Instance : <RMS NAME>
Instance Id : {F56EB161-4ABE-5BC7-610F-4365524F294E}


Resolution 1

We can identify the Performance Signature Data Collection Rules in this example by executing the following SQL Query. This query should be executed in SQL Management Studio against the Operations Manager database.

-- Return all Performance Signature Collection Rules
Use OperationsManager
select
managementpack.mpname,
rules.rulename
from performancesignature with (nolock)
inner join rules with (nolock)
on rules.ruleid = performancesignature.learningruleid
inner join managementpack with(nolock)
on rules.managementpackid = managementpack.managementpackid
group by managementpack.mpname, rules.rulename
order by managementpack.mpname, rules.rulename

This query will return all Performance Signature Collection Rules and their respective Management Pack name. A column is returned for Management Pack name and Rule name.

The following Performance Monitor Counters on a Management Server will provide information concerning Database and Data Warehouse write action insertion batch size and time. If the batch size is growing larger, for example the default batch size is 5000 items, this indicates either the Management Server is slow inserting the data to the Database or Data Warehouse, or is receiving a burst of Data Items from the Agents or Gateway Servers.

· OpsMgr DB Write Action Modules(*)\Avg. Batch Size
· OpsMgr DB Write Action Modules(*)\Avg. Processing Time
· OpsMgr DW Writer Module(*)\Avg. Batch Processing Time, ms
· OpsMgr DW Writer Module(*)\Avg. Batch Size

From the Database and Data Warehouse write action account Average Processing Time counter, we can understand how long it takes on average to write a batch of data to the Database and Data Warehouse. Depending upon the amount of time it takes to write a batch of data to the Database, this may present an opportunity for tuning.


Scenario 2

The SQL Server instance hosting the Operations Manager Database or Data Warehouse is offline. Event ID 2115 as well as Event ID 29200 appear within the Operations Manager Event log. For example:

Log Name: Operations Manager
Source:        HealthService
Date:          
Event ID:      2115
Level:         Warning
Description:

A Bind Data Source in Management Group MSFT has posted items to the workflow, but has not received a response in 60 seconds.  This indicates a performance or functional problem with the workflow.

Workflow Id : Microsoft.SystemCenter.CollectEventData
Instance    : name.contoso.local
Instance Id : {88676CDF-E284-7838-AC70-E898DA1720CB}

Log Name:      Operations Manager
Source:        OpsMgr Config Service
Event ID:      29200
Level:         Error
Description:

OpsMgr Config Service has lost connectivity to the OpsMgr database, therefore it can not get any updates from the database. This may be a temporary issue that may be recovered from automatically. If the problem persists, it usually indicates a problem with the database. Reason:

A network-related or instance-specific error occurred while establishing a connection to SQL Server. The server was not found or was not accessible. Verify that the instance name is correct and that SQL Server is configured to allow remote connections. (provider: Named Pipes Provider, error: 40 - Could not open a connection to SQL Server)


Resolution 2

To resolve the issue in this scenario, follow these steps:
  1. Connect to the Server hosting the Operations Manager database.
  2. Open the Services applet.
  3. Verify that the SQL Server (MSSQLSERVER) service is started and running.
  4. If the SQL Server (MSSQLSERVER) service is not started and running, start the service.

Once Database connectivity is restored workflows should resume successfully storing data within the respective database.  Event ID 31554 validates that the information has been written successfully. 

For example:

Log Name: Operations Manager
Source:        Health Service Modules
Event ID:      31554
Task Category: Data Warehouse
Level:         Information

Description:
Workflow succeeded storing data in the Data Warehouse

One or more workflows were affected by this.

Workflow name: Microsoft.SystemCenter.DataWarehouse.CollectEventData
Instance name: name.contoso.local 
Instance ID: {88676CDF-E284-7838-AC70-E898DA1720CB}


Scenario 3

Event ID 2115 caused by invalid RunAs Credentials. 

Resolution 3

Examine the Operations Manager Event log for the following events.  These events typically indicate that the Data Warehouse SQL Server Authentication Account may have incorrect credentials. 

Log Name: Operations Manager
Source:        HealthService
Event ID:      7000
Task Category: Health Service
Level:         Error
Description:

The Health Service could not log on the RunAs account <ACCOUNT NAME> for management group <MANGEMENT GROUP NAME>.  The error is Logon failure: unknown user name or bad password.(1326L).  This will prevent the health service from monitoring or performing actions using this RunAs account

Log Name: Operations Manager
Source: HealthService
Event ID: 7015
Task Category: Health Service
Level: Error
Description:

The Health Service cannot verify the future validity of the RunAs account <ACCOUNT NAME> for management group <MANGEMENT GROUP NAME>. The error is Logon failure: unknown user name or bad password.(1326L).

To resolve the issue in this scenario, follow these steps:

1.     Open the Operations Manager console.
2.     Select Administration.
3.     Select Run As Configuration\Accounts.
4.     Configure the appropriate credentials for the Data Warehouse SQL Server Authentication Account.


Scenario 4

Event ID 2115 caused by disk performance issues.  The data disk hosting the Database, logs or TempDB used by the Operations Manager and Data Warehouse databases is slow or experiencing a functional problem. In this scenario we recommend leveraging RAID 10 and we also recommend enabling battery backed Write Cache on the Array Controller.

Resolution 4

Counters to identify disk pressure

Capture these Physical Disk counters for all drives that contain SQL data or log files:
  • % Idle Time: How much disk idle time is being reported. Anything below 50 percent could indicate a disk bottleneck.
  • Avg. Disk Queue Length: This value should not exceed 2 times the number of spindles on a LUN. For example, if a LUN has 25 spindles, a value of 50 is acceptable. However, if a LUN has 10 spindles, a value of 25 is too high. You could use the following formulas based on the RAID level and number of disks in the RAID configuration:
    • RAID 0: All of the disks are doing work in a RAID 0 set
    • Average Disk Queue Length <= # (Disks in the array) *2
    • RAID 1: half the disks are “doing work”; therefore, only half of them can be counted toward Disks Queue
    • Average Disk Queue Length <= # (Disks in the array/2) *2
    • RAID 10: half the disks are “doing work”; therefore, only half of them can be counted toward Disks Queue
    • Average Disk Queue Length <= # (Disks in the array/2) *2
    • RAID 5: All of the disks are doing work in a RAID 5 set
    • Average Disk Queue Length <= # (Disks in the array/2) *2
    • Avg. Disk sec/Transfer: The number of seconds it takes to complete one disk I/O
    • Avg. Disk sec/Read: The average time, in seconds, of a read of data from the disk
    • Avg. Disk sec/Write: The average time, in seconds, of a write of data to the disk
    • Disk Bytes/sec: The number of bytes being transferred to or from the disk per second
    • Disk Transfers/sec: The number of input and output operations per second (IOPS)

    NOTE The last three counters in this list should consistently have values of approximately .020 (20 ms) or lower and should never exceed.050 (50 ms). The following are the thresholds that are documented in the SQL Server performance troubleshooting guide:
        • Less than 10 ms: very good
        • Between 10 - 20 ms: okay
        • Between 20 - 50 ms: slow, needs attention
        • Greater than 50 ms: serious I/O bottleneck

    • Disk Bytes/sec: The number of bytes being transferred to or from the disk per second
    • Disk Transfers/sec: The number of input and output operations per second (IOPS)


When % Idle Time is low (10 percent or less), this means that the disk is fully utilized. In this case, the last two counters in this list (“Disk Bytes/sec” and “Disk Transfers/sec”) provide a good indication of the maximum throughput of the drive in bytes and in IOPS, respectively. The throughput of a SAN drive is highly variable, depending on the number of spindles, the speed of the drives, and the speed of the channel. The best bet is to check with the SAN vendor to find out how many bytes and IOPS the drive should support. If % Idle Time is low, and the values for these two counters do not meet the expected throughput of the drive, engage the SAN vendor to troubleshoot.

The following links provide deeper insight into troubleshooting SQL Server performance:
Scenario 5

Event ID 2115 is logged, and a management server generates an "unable to write data to the Data Warehouse" alert in System Center Operations Manager.

You experience the following symptoms on a management server computer that is running Microsoft System Center Operations Manager:

  • The management server generates one or more alerts that resemble the following:

    Performance data collection process unable to write data to the Data Warehouse

    Performance data collection process unable to write data to the Data Warehouse. Failed to store data in the Data Warehouse. Exception 'SqlException': Management Group with id '9069F7BD-55B8-C8E8-1CF9-4395F45527E2' is not allowed to access Data Warehouse under login 'DOMAIN\Action_Account' One or more workflows were affected by this. Workflow name: Microsoft.SystemCenter.DataWarehouse.CollectPerformanceData Instance name: dataWarehouseServer.domain.com Instance ID: {48936EE3-4E3E-BEE1-8C09-AFDAB8ECF236} Management group: Management Group Name.
  • The following event is logged in the Operations Manager event log on the management server:

    Event Type: Warning
    Event Source: HealthService
    Event Category: None
    Event ID: 2115
    Date: date
    Time: time
    User: N/A
    Computer: ManagementServerName
    Description: A Bind Data Source in Management Group Management Group Name has posted items to the workflow, but has not received a response in 1712 seconds. This indicates a performance or functional problem with the workflow. Workflow Id : Microsoft.SystemCenter.DataWarehouse.CollectPerformanceData Instance : ManagementServerName.domain.com Instance Id : {C7FDDE2A-E0AA-4B80-70DE-1D50D9965221}


Resolution 5

This issue may occur if the management server does not have accounts that are specified for its data warehouse "Run As" profiles. This issue is more likely to affect a secondary management server.

To resolve this problem, follow these steps:
  1. On the computer that is running System Center Operations Manager, open the Operations Console.
  2. In the navigation pane, click Administration.
  3. Expand Security, and then click Run As Profiles.
  4. In the Run As Profiles view, double-click Data Warehouse Account.
  5. In the Run As Profile Properties - Data Warehouse Account properties dialog box, click the Run As Accounts tab, and then click New.
  6. In the Run As Account list, click Data Warehouse Action Account.
  7. In the Name list, click the management server that generated the alert.
  8. Click OK two times.
  9. Follow steps 4 through 8 to assign the appropriate Run As account to the following profiles:
    Data Warehouse Configuration Synchronization Reader Account
    Data Warehouse Report Deployment Account
    Data Warehouse SQL Server Authentication Account
  10. For each profile, select the Run As account that matches the name of the Run As profile. For example, make the following assignments:
    Assign the Data Warehouse Configuration Synchronization Reader Account to the Data Warehouse Configuration Synchronization Reader Account profile.
    Assign the Data Warehouse Report Deployment Account to the Data Warehouse Report Deployment Account profile.
    Assign the Data Warehouse SQL Server Authentication Account to the Data Warehouse SQL Server Authentication Account profile.
  11. On the management server that generated the alert, restart the OpsMgr Health Service.
  12. In the Operations Manager event log on the management server, verify that event ID 31554 events are logged.

    NOTE Event ID 31554 indicates that the monitor state has changed to Healthy. This change resolves the alert.


Scenario 6

Event ID 2115 occurs on a server running HP MPIO FF DSM XP v3.01 to which there are no LUNs presented. When the user opens Performance Monitor and attempts to add a counter, Performance Monitor will hang and the handle count for this application increases rapidly.

Resolution 6

There are two workaround regarding this issue.

  1. Rename the HPPerfProv.dll file and reboot Windows. Performance Monitor will work without issue when the file is renamed and not loaded.
  2. Have at least 1 LUN present on the system.
For more information on this issue please see the following:

Event ID 2115 is caused by HPPerfProv.DLL
http://h20000.www2.hp.com/bizsupport/TechSupport/Document.jsp?lang=en&cc=us&objectID=c01743552&prodTypeId=18964&prodSeriesId=421492 (http://h20000.www2.hp.com/bizsupport/TechSupport/Document.jsp?lang=en&cc=us&objectID=c01743552&prodTypeId=18964&prodSeriesId=421492)


Note This is a "FAST PUBLISH" article created directly from within the Microsoft support organization. The information contained herein is provided as-is in response to emerging issues. As a result of the speed in making it available, the materials may include typographical errors and may be revised at any time without notice. See Terms of Use (http://go.microsoft.com/fwlink/?LinkId=151500) for other considerations.

APPLIES TO
  • Microsoft System Center Operations Manager 2007
  • Microsoft System Center Operations Manager 2007 R2
  • Microsoft System Center Operations Manager 2007 Service Pack 1
  • Microsoft System Center 2012 Operations Manager
Keywords: 
kbtshoot KB2681388
Share
Additional support options
Ask The Microsoft Small Business Support Community
Contact Microsoft Small Business Support
Find Microsoft Small Business Support Certified Partner
Find a Microsoft Store For In-Person Small Business Support