- Community Home
- >
- Servers and Operating Systems
- >
- HPE ProLiant
- >
- ProLiant Servers (ML,DL,SL)
- >
- Re: CPU Drift on DL585 G1
ProLiant Servers (ML,DL,SL)
1748184
Members
3995
Online
108759
Solutions
Forums
Categories
Company
Local Language
юдл
back
Forums
Discussions
Forums
- Data Protection and Retention
- Entry Storage Systems
- Legacy
- Midrange and Enterprise Storage
- Storage Networking
- HPE Nimble Storage
Discussions
Discussions
Discussions
Forums
Forums
Discussions
юдл
back
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
- BladeSystem Infrastructure and Application Solutions
- Appliance Servers
- Alpha Servers
- BackOffice Products
- Internet Products
- HPE 9000 and HPE e3000 Servers
- Networking
- Netservers
- Secure OS Software for Linux
- Server Management (Insight Manager 7)
- Windows Server 2003
- Operating System - Tru64 Unix
- ProLiant Deployment and Provisioning
- Linux-Based Community / Regional
- Microsoft System Center Integration
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Blogs
Information
Community
Resources
Community Language
Language
Forums
Blogs
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО09-25-2008 06:54 AM
тАО09-25-2008 06:54 AM
CPU Drift on DL585 G1
We're having an issue with CPU drifting on a DL585 G1. We have applied the following already which was described on an HP Customer Advisory but it did not resolve the issue. I've also included the report from Microsoft analysis. Anyone know if this something Microsoft needs to continue to troubleshoot or is there an HP solution to it?
Thanks!
MT
_____________________
For AMD Opteron 200-series, Opteron 800-series, Opteron 1200-series, Opteron 2200-series and Opteron 8200-series processors , to ensure proper operation of tasks sensitive to system time, perform either of the following actions, based on the operating system environment:
Microsoft Windows Server 2003 (any edition)
Edit the BOOT.ini file and add the parameter "/usepmtimer," then reboot the server. Adding the "/usepmtimer" parameter to the BOOT.INI file configures the Windows operating system to use the PM_TIMER, rather than the Time Stamp Counter.
Note 1: When installing the AMD Opteron Processor with AMD PowerNow! Technology driver Version 1.3.2.16 (or later) from AMD, the BOOT.INI file will automatically be updated with the "/usepmtimer" parameter. While the driver itself does not resolve this issue, the installation process will make the necessary changes to the BOOT.INI file to prevent the issue from occurring.
___________________________
Below is the analysis report from Microsoft
Hi Everyone,
Pertaining to the PSSDIAG research that I have done, please find the analysis that I have done and the issues that we are facing related to the CPU drift.
ANALYSIS
============
PSSDIAG
===========
PSSDIAG.LOG
2008/09/23 18:25:58.56 PSSDIAG Collection started. Press Ctrl+C to stop.
2008/09/23 18:39:21.27 PSSDIAG Ctrl+C pressed. Shutting down the collector
Pssdiag ran for 14 mins
SQLDIAG
============
2008-09-23 14:24:57.400 Server Microsoft SQL Server 2005 - 9.00.3239.00 (X64)
Apr 9 2008 22:41:28
Copyright (c) 1988-2005 Microsoft Corporation
Enterprise Edition (64-bit) on Windows NT 5.2 (Build 3790: Service Pack 2)
2008-09-23 14:24:57.480 Server SQL Server is starting at normal priority base (=7). This is an informational message only. No user action is required.
2008-09-23 14:24:57.480 Server Detected 4 CPUs. This is an informational message; no user action is required.
2008-09-23 14:24:58.620 Server The time stamp counter of CPU on scheduler id 1 is not synchronized with other CPUs.
2008-09-23 14:24:58.860 spid1s A significant part of sql server process memory has been paged out. This may result in a performance degradation. Duration: 0 seconds. Working set (KB): 37560, committed (KB): 88504, memory utilization: 42%.
2008-09-23 14:25:01.790 spid5s Server name is 'CRPSCSMSQ69V1\PUB'
2008-09-23 14:25:01.790 spid5s The NETBIOS name of the local node that is running the server is 'CRPSCSMCS69N1'.
2008-09-23 14:54:02.410 spid61 Using 'xpSLS.dll' version '0005.00.00' to execute extended stored procedure 'xp_backup_database'. This is an informational message only; no user action is required.
2008-09-23 17:25:05.960 Server The time stamp counter of CPU on scheduler id 1 is not synchronized with other CPUs.
2008-09-23 17:25:05.960 Server CPU time stamp frequency has changed from 2399922 to 1816584 ticks per millisecond. The new frequency will be used.
2008-09-23 17:29:05.960 Server The time stamp counter of CPU on scheduler id 2 is not synchronized with other CPUs.
2008-09-23 17:52:56.150 spid4s SQL Server has encountered 1 occurrence(s) of I/O requests taking longer than 15 seconds to complete on file [E:\MSSQL\Log\Results_Log.ldf] in database [Results] (5). The OS file handle is 0x00000000000008A8. The offset of the latest long I/O is: 0x0000005dbfa400
2008-09-23 17:53:07.620 Server The time stamp counter of CPU on scheduler id 3 is not synchronized with other CPUs
2008-09-23 18:13:13.040 spid4s SQL Server has encountered 1 occurrence(s) of I/O requests taking longer than 15 seconds to complete on file [E:\MSSQL\Log\Results_Log.ldf] in database [Results] (5). The OS file handle is 0x00000000000008A8. The offset of the latest long I/O is: 0x0000005fdbe000
2008-09-23 18:15:04.570 spid4s SQL Server has encountered 1 occurrence(s) of I/O requests taking longer than 15 seconds to complete on file [F:\MSSQL\Data\Results_Data_02.ndf] in database [Results] (5). The OS file handle is 0x000000000000088C. The offset of the latest long I/O is: 0x000002bbbb0000
2008-09-23 18:18:54.760 spid4s SQL Server has encountered 4 occurrence(s) of I/O requests taking longer than 15 seconds to complete on file [E:\MSSQL\Log\Results_Log.ldf] in database [Results] (5). The OS file handle is 0x00000000000008A8. The offset of the latest long I/O is: 0x00000003661e00
2008-09-23 18:19:04.770 spid4s SQL Server has encountered 1 occurrence(s) of I/O requests taking longer than 15 seconds to complete on file [F:\MSSQL\Data\Results_Data_01.mdf] in database [Results] (5). The OS file handle is 0x0000000000000870. The offset of the latest long I/O is: 0x000000b070e000
2008-09-23 18:22:55.040 spid4s SQL Server has encountered 3 occurrence(s) of I/O requests taking longer than 15 seconds to complete on file [F:\MSSQL\Data\Results_Data_02.ndf] in database [Results] (5). The OS file handle is 0x000000000000088C. The offset of the latest long I/O is: 0x0000020cf78000
2008-09-23 18:24:15.060 spid4s SQL Server has encountered 46 occurrence(s) of I/O requests taking longer than 15 seconds to complete on file [E:\MSSQL\Log\Results_Log.ldf] in database [Results] (5). The OS file handle is 0x00000000000008A8. The offset of the latest long I/O is: 0x0000002849e800
2008-09-23 18:25:08.210 Server The time stamp counter of CPU on scheduler id 1 is not synchronized with other CPUs.
2008-09-23 18:25:39.430 spid188 Configuration option 'blocked process threshold' changed from 5 to 5. Run the RECONFIGURE statement to install.
2008-09-23 18:25:45.030 spid188 SQL Trace ID 2 was started by login "CORP\NBEQ5YK".
2008-09-23 18:25:50.090 spid191 DBCC TRACEON 3004, server process ID (SPID) 191. This is an informational message only; no user action is required.
2008-09-23 18:25:50.100 spid191 DBCC TRACEON 3605, server process ID (SPID) 191. This is an informational message only; no user action is required.
2008-09-23 18:29:08.870 Server The time stamp counter of CPU on scheduler id 2 is not synchronized with other CPUs.
2008-09-23 18:29:46.110 spid4s SQL Server has encountered 4 occurrence(s) of I/O requests taking longer than 15 seconds to complete on file [F:\MSSQL\Data\Results_Data_02.ndf] in database [Results] (5). The OS file handle is 0x000000000000088C. The offset of the latest long I/O is: 0x00000076a50000
2008-09-23 18:29:46.150 spid4s SQL Server has encountered 6 occurrence(s) of I/O requests taking longer than 15 seconds to complete on file [E:\MSSQL\Log\Results_Log.ldf] in database [Results] (5). The OS file handle is 0x00000000000008A8. The offset of the latest long I/O is: 0x00000044d4ca00
2008-09-23 18:35:56.260 spid4s SQL Server has encountered 1 occurrence(s) of I/O requests taking longer than 15 seconds to complete on file [F:\MSSQL\Data\Results_Data_02.ndf] in database [Results] (5). The OS file handle is 0x000000000000088C. The offset of the latest long I/O is: 0x000002ce534000
Inference from logs:
1. The server has got working set size issues, but got the error message only once.
2. CPU drift issues still persists. This may impact the performance tuning on the box. As specified by Sharmila, we have to first ensure that the CPU drift is resolved.
3. There are tons of IO messages in the SQL Server error logs. The disk\IO\Hw\drivers looks to be issue for contention.
Not sure if these messages are bogus due to CPU drift.
sp_configure
==================
name minimum maximum config_value run_value
----------------------------------- ----------- ----------- ------------ -----------
max degree of parallelism 0 64 0 0
max server memory (MB) 16 2147483647 12288 12288
max worker threads 128 32767 0 0
min memory per query (KB) 512 2147483647 1024 1024
min server memory (MB) 0 2147483647 0 0
network packet size (B) 512 32767 32767 32767
Inference:
1. Max server memery-> 12GB
2. Netwrok packet sixe-> 32767. And this is very large packet size.
By default, the Microsoft .NET Framework Data Provider for SQL Server sets the network packet size to 8,192 bytes. Microsoft recommends to use not more that 8192 as network packet sixe, otherwise Cx might see performance issues.
http://support.microsoft.com/kb/903002/en-us
-> xp_msver
===============
8 ProcessorType 8664 NULL
19 PhysicalMemory 32599 32599
<\SQL-DIAG>
SQLNEXUS REPORT
===================
In the SQLNexus, I see lot of blocking and the top wait type as PageLatch_EX (Page Latch (non IO)). The max waititme is 30000secs.
But the all these blocking does not last long. The max duration of blocking remain is 24 secs Another thing is that we are not sure if the amount of time is correct since we are seeing the CPU drift issues.
The top waittypes are
PageLatch_Ex, the average waittime is 15123s LazyWriter_Sleep, the average waittime is 14929s SQLTrace_Buffer_Flush, the avg waittime is 14919s
In the SQL Nexus report the head blocker is waiting on Page latch in all blocking chains. I am seeing the blocking chain having Duration as 24 sec and Blocking 38 spids..
Head of Blocker: 79
Blocking start: 9/23/2008 6:36:58PM
Blocking Stop: 9/23/2008 6:37:22PM
Blocking duration: 24secs
Max Blocking Chain Size: 38
SPID 79 was running following command:
INSERT INTO CDSens_20080922 (RevalDate, JobRunUID, TradeLegUID, BucketUID, CreditCurveUID, ValuationAttribUID, ValueType...
and all other SPIDs are majorly waiting for the PageLatch_EX
This means, the SQL Server is doing lot of inserts\deletes\updates on the data pages and hence taking the latch on it. This may be due to the bad disk IO and bad query plans. Need to investigate it further.
Checked the perfmon counters and it looked good to me..
Physical Disk Minimum Maximum Average
============================================================================================================
Avg. Disk sec/Transfer : 0.001 | 0.002 | 0.001
Disk: 0 C: : 0 | 0.005 | 0
Disk: 1 D: : 0 | 0.006 | 0
Disk: 10 V: : 0 | 0 | 0
Disk: 11 U: : 0 | 0 | 0
Disk: 12 K: : 0 | 0 | 0
Disk: 13 F: : 0.001 | 0.067 | 0.008
Disk: 14 Y: : 0 | 0.01 | 0.001
Disk: 15 J: : 0 | 0 | 0
Disk: 16 M: : 0 | 0 | 0
Disk: 2 G: : 0.001 | 0.002 | 0.001
Disk: 3 : 0 | 0 | 0
Disk: 4 H: : 0 | 0.009 | 0.001
Disk: 5 E: : 0 | 0.003 | 0.001
Disk: 6 : 0 | 0.004 | 0
Disk: 7 : 0 | 0.008 | 0
Disk: 8 S: : 0 | 0 | 0
Disk: 9 T: : 0 | 0 | 0
Avg. Disk Queue Length : 0.446 | 1.649 | 0.868
Disk: 0 C: : 0 | 0.296 | 0.01
Disk: 1 D: : 0 | 0.275 | 0.003
Disk: 10 V: : 0 | 0 | 0
Disk: 11 U: : 0 | 0 | 0
Disk: 12 K: : 0 | 0 | 0
Disk: 13 F: : 0.011 | 0.826 | 0.167
Disk: 14 Y: : 0 | 0.017 | 0.003
Disk: 15 J: : 0 | 0 | 0
Disk: 16 M: : 0 | 0 | 0
Disk: 2 G: : 0.231 | 0.775 | 0.477
Disk: 3 : 0 | 0 | 0
Disk: 4 H: : 0 | 0.031 | 0.003
Disk: 5 E: : 0.09 | 0.756 | 0.203
Disk: 6 : 0 | 0.006 | 0.001
Disk: 7 : 0 | 0.012 | 0.001
Disk: 8 S: : 0 | 0 | 0
Disk: 9 T: : 0 | 0 | 0
So, the disk counters are perfect, we need to look for the plan of the query..
So, have to look in the query plan..
Bur before doing this, I would request you to resolve the CPU drift issue which can cause all sorts of problems for SQL, including incorrect query durations reported in Profiler or SET STATISTICS TIME ON, incorrect wait times in sysprocesses or SQLPERF(WAITSTATS)/sys.dm_os_wait_stats, bogus "I/O requests taking longer than 15 seconds to complete" warnings in the SQL errorlog, and even scheduler hangs.
**So, not looking at the plan at this point in time.
==To resolve the CPU drft issue we have already followed:
1. Set the server's power scheme to "Always 2. Install AMD's "Dual-Core Optimizer" patch.
I wanted you to set the processor affinity as well, but you cannot use procesor affinity since this is a cluster and on node 1 you have 4 dual core procs and on node 2 you have 4 single core procs.
==I requested you to run the RDTSCTEST.exe which would tell us the CPU drift on the Server, and here are the results:
G:\PSSDIAG>rdtsctest.exe
======================= Processor =======================
Processor Architecture: x64
Page Size: 4096
Processor Type: AMD64
Processor Type: Family 15, Model 5, Stepping 10
CPU Speed: 2400Mhz (2400000 ticks per ms)
Cache Line Size: 64
Cache Size: 1024KB
Active Processor Mask: 15 (0x000000000000000F)
Warning: CPUID Leaf 4 is not supported or disabled in BIOS. May be unable to det ermine core/processor mapping.
Hyper-Threading: Not capable
Multi-core: No
Physical Package/Socket Count: 4
Processor Core Count: 4 (1 per package)
Logical Processor Count: 4 (1 per core)
Logical Processor/Core/Socket Mapping
CPU APIC ID Physical ID Core ID HT/SMT ID
------ ----------- ----------- ----------- -----------
0 0 0 0 0
1 1 1 0 0
2 2 2 0 0
3 3 3 0 0
======================= NUMA ============================
Max NUMA Node: 3
Node 0 Processor Mask: 1 (0x0000000000000001)
Node 1 Processor Mask: 2 (0x0000000000000002)
Node 2 Processor Mask: 4 (0x0000000000000004)
Node 3 Processor Mask: 8 (0x0000000000000008)
======================= Memory ==========================
Total Physical Memory: 32599 MB
Avail Physical Memory: 17747 MB
% Memory In Use: 45
Total Page File: 64669 MB
Avail Page File: 50207 MB
NUMA Node 0 Avail Memory: 0 MB
NUMA Node 1 Avail Memory: 3101 MB
NUMA Node 2 Avail Memory: 0 MB
NUMA Node 3 Avail Memory: 3477 MB
======================= Actual CPU Speed ================
CPU 00 Current Speed: 2397Mhz
CPU 01 Current Speed: 2398Mhz
CPU 02 Current Speed: 2391Mhz
CPU 03 Current Speed: 2377Mhz
======================= RDTSC Deltas ==================== CPU 00 Ticks: 0x0000B00905A7D976, Drift: 2449197.2ms (0x55898781649 ticks) *Excessive Drift =============> IMP to NOTE CPU 01 Ticks: 0x0000AAB06D2FC32D, Drift: 0.0ms (0x00000000 ticks) CPU 02 Ticks: 0x0000ABC7CB7E1F7A, Drift: 499949.2ms (0x1175E4E5C4D ticks) *Excessive Drift ==============>IMP to NOTE
CPU 03 Ticks: 0x0000AB59C6B7137A, Drift: 303063.1ms (0xA95987504D ticks) *Excessive Drift ================IMP to NOTE
So, this clearly explains that the CPU drift is excessive on the server node.
==Recommended you that we need to contact the hardware vendor and windows paltform team (if they can help un reducing the CPU drift).
==At this point we will not do further research on the case (related to query plan etc).
ACTION PLAN
================
1. You will contact Hw vendor to reduce the CPU drift on the server
2. You will engage Platforms Engg (Dedicated to BOA) to check if he can help.
3. Once this is done, we will run PSSDIAG again on the server to get the data and do analysis.
4. Have advised you to reduce the Network packet size to atmost 8192KB (4096 would be the best)
Thanks!
MT
_____________________
For AMD Opteron 200-series, Opteron 800-series, Opteron 1200-series, Opteron 2200-series and Opteron 8200-series processors , to ensure proper operation of tasks sensitive to system time, perform either of the following actions, based on the operating system environment:
Microsoft Windows Server 2003 (any edition)
Edit the BOOT.ini file and add the parameter "/usepmtimer," then reboot the server. Adding the "/usepmtimer" parameter to the BOOT.INI file configures the Windows operating system to use the PM_TIMER, rather than the Time Stamp Counter.
Note 1: When installing the AMD Opteron Processor with AMD PowerNow! Technology driver Version 1.3.2.16 (or later) from AMD, the BOOT.INI file will automatically be updated with the "/usepmtimer" parameter. While the driver itself does not resolve this issue, the installation process will make the necessary changes to the BOOT.INI file to prevent the issue from occurring.
___________________________
Below is the analysis report from Microsoft
Hi Everyone,
Pertaining to the PSSDIAG research that I have done, please find the analysis that I have done and the issues that we are facing related to the CPU drift.
ANALYSIS
============
PSSDIAG
===========
PSSDIAG.LOG
2008/09/23 18:25:58.56 PSSDIAG Collection started. Press Ctrl+C to stop.
2008/09/23 18:39:21.27 PSSDIAG Ctrl+C pressed. Shutting down the collector
Pssdiag ran for 14 mins
SQLDIAG
============
2008-09-23 14:24:57.400 Server Microsoft SQL Server 2005 - 9.00.3239.00 (X64)
Apr 9 2008 22:41:28
Copyright (c) 1988-2005 Microsoft Corporation
Enterprise Edition (64-bit) on Windows NT 5.2 (Build 3790: Service Pack 2)
2008-09-23 14:24:57.480 Server SQL Server is starting at normal priority base (=7). This is an informational message only. No user action is required.
2008-09-23 14:24:57.480 Server Detected 4 CPUs. This is an informational message; no user action is required.
2008-09-23 14:24:58.620 Server The time stamp counter of CPU on scheduler id 1 is not synchronized with other CPUs.
2008-09-23 14:24:58.860 spid1s A significant part of sql server process memory has been paged out. This may result in a performance degradation. Duration: 0 seconds. Working set (KB): 37560, committed (KB): 88504, memory utilization: 42%.
2008-09-23 14:25:01.790 spid5s Server name is 'CRPSCSMSQ69V1\PUB'
2008-09-23 14:25:01.790 spid5s The NETBIOS name of the local node that is running the server is 'CRPSCSMCS69N1'.
2008-09-23 14:54:02.410 spid61 Using 'xpSLS.dll' version '0005.00.00' to execute extended stored procedure 'xp_backup_database'. This is an informational message only; no user action is required.
2008-09-23 17:25:05.960 Server The time stamp counter of CPU on scheduler id 1 is not synchronized with other CPUs.
2008-09-23 17:25:05.960 Server CPU time stamp frequency has changed from 2399922 to 1816584 ticks per millisecond. The new frequency will be used.
2008-09-23 17:29:05.960 Server The time stamp counter of CPU on scheduler id 2 is not synchronized with other CPUs.
2008-09-23 17:52:56.150 spid4s SQL Server has encountered 1 occurrence(s) of I/O requests taking longer than 15 seconds to complete on file [E:\MSSQL\Log\Results_Log.ldf] in database [Results] (5). The OS file handle is 0x00000000000008A8. The offset of the latest long I/O is: 0x0000005dbfa400
2008-09-23 17:53:07.620 Server The time stamp counter of CPU on scheduler id 3 is not synchronized with other CPUs
2008-09-23 18:13:13.040 spid4s SQL Server has encountered 1 occurrence(s) of I/O requests taking longer than 15 seconds to complete on file [E:\MSSQL\Log\Results_Log.ldf] in database [Results] (5). The OS file handle is 0x00000000000008A8. The offset of the latest long I/O is: 0x0000005fdbe000
2008-09-23 18:15:04.570 spid4s SQL Server has encountered 1 occurrence(s) of I/O requests taking longer than 15 seconds to complete on file [F:\MSSQL\Data\Results_Data_02.ndf] in database [Results] (5). The OS file handle is 0x000000000000088C. The offset of the latest long I/O is: 0x000002bbbb0000
2008-09-23 18:18:54.760 spid4s SQL Server has encountered 4 occurrence(s) of I/O requests taking longer than 15 seconds to complete on file [E:\MSSQL\Log\Results_Log.ldf] in database [Results] (5). The OS file handle is 0x00000000000008A8. The offset of the latest long I/O is: 0x00000003661e00
2008-09-23 18:19:04.770 spid4s SQL Server has encountered 1 occurrence(s) of I/O requests taking longer than 15 seconds to complete on file [F:\MSSQL\Data\Results_Data_01.mdf] in database [Results] (5). The OS file handle is 0x0000000000000870. The offset of the latest long I/O is: 0x000000b070e000
2008-09-23 18:22:55.040 spid4s SQL Server has encountered 3 occurrence(s) of I/O requests taking longer than 15 seconds to complete on file [F:\MSSQL\Data\Results_Data_02.ndf] in database [Results] (5). The OS file handle is 0x000000000000088C. The offset of the latest long I/O is: 0x0000020cf78000
2008-09-23 18:24:15.060 spid4s SQL Server has encountered 46 occurrence(s) of I/O requests taking longer than 15 seconds to complete on file [E:\MSSQL\Log\Results_Log.ldf] in database [Results] (5). The OS file handle is 0x00000000000008A8. The offset of the latest long I/O is: 0x0000002849e800
2008-09-23 18:25:08.210 Server The time stamp counter of CPU on scheduler id 1 is not synchronized with other CPUs.
2008-09-23 18:25:39.430 spid188 Configuration option 'blocked process threshold' changed from 5 to 5. Run the RECONFIGURE statement to install.
2008-09-23 18:25:45.030 spid188 SQL Trace ID 2 was started by login "CORP\NBEQ5YK".
2008-09-23 18:25:50.090 spid191 DBCC TRACEON 3004, server process ID (SPID) 191. This is an informational message only; no user action is required.
2008-09-23 18:25:50.100 spid191 DBCC TRACEON 3605, server process ID (SPID) 191. This is an informational message only; no user action is required.
2008-09-23 18:29:08.870 Server The time stamp counter of CPU on scheduler id 2 is not synchronized with other CPUs.
2008-09-23 18:29:46.110 spid4s SQL Server has encountered 4 occurrence(s) of I/O requests taking longer than 15 seconds to complete on file [F:\MSSQL\Data\Results_Data_02.ndf] in database [Results] (5). The OS file handle is 0x000000000000088C. The offset of the latest long I/O is: 0x00000076a50000
2008-09-23 18:29:46.150 spid4s SQL Server has encountered 6 occurrence(s) of I/O requests taking longer than 15 seconds to complete on file [E:\MSSQL\Log\Results_Log.ldf] in database [Results] (5). The OS file handle is 0x00000000000008A8. The offset of the latest long I/O is: 0x00000044d4ca00
2008-09-23 18:35:56.260 spid4s SQL Server has encountered 1 occurrence(s) of I/O requests taking longer than 15 seconds to complete on file [F:\MSSQL\Data\Results_Data_02.ndf] in database [Results] (5). The OS file handle is 0x000000000000088C. The offset of the latest long I/O is: 0x000002ce534000
Inference from logs:
1. The server has got working set size issues, but got the error message only once.
2. CPU drift issues still persists. This may impact the performance tuning on the box. As specified by Sharmila, we have to first ensure that the CPU drift is resolved.
3. There are tons of IO messages in the SQL Server error logs. The disk\IO\Hw\drivers looks to be issue for contention.
Not sure if these messages are bogus due to CPU drift.
sp_configure
==================
name minimum maximum config_value run_value
----------------------------------- ----------- ----------- ------------ -----------
max degree of parallelism 0 64 0 0
max server memory (MB) 16 2147483647 12288 12288
max worker threads 128 32767 0 0
min memory per query (KB) 512 2147483647 1024 1024
min server memory (MB) 0 2147483647 0 0
network packet size (B) 512 32767 32767 32767
Inference:
1. Max server memery-> 12GB
2. Netwrok packet sixe-> 32767. And this is very large packet size.
By default, the Microsoft .NET Framework Data Provider for SQL Server sets the network packet size to 8,192 bytes. Microsoft recommends to use not more that 8192 as network packet sixe, otherwise Cx might see performance issues.
http://support.microsoft.com/kb/903002/en-us
-> xp_msver
===============
8 ProcessorType 8664 NULL
19 PhysicalMemory 32599 32599
<\SQL-DIAG>
SQLNEXUS REPORT
===================
In the SQLNexus, I see lot of blocking and the top wait type as PageLatch_EX (Page Latch (non IO)). The max waititme is 30000secs.
But the all these blocking does not last long. The max duration of blocking remain is 24 secs Another thing is that we are not sure if the amount of time is correct since we are seeing the CPU drift issues.
The top waittypes are
PageLatch_Ex, the average waittime is 15123s LazyWriter_Sleep, the average waittime is 14929s SQLTrace_Buffer_Flush, the avg waittime is 14919s
In the SQL Nexus report the head blocker is waiting on Page latch in all blocking chains. I am seeing the blocking chain having Duration as 24 sec and Blocking 38 spids..
Head of Blocker: 79
Blocking start: 9/23/2008 6:36:58PM
Blocking Stop: 9/23/2008 6:37:22PM
Blocking duration: 24secs
Max Blocking Chain Size: 38
SPID 79 was running following command:
INSERT INTO CDSens_20080922 (RevalDate, JobRunUID, TradeLegUID, BucketUID, CreditCurveUID, ValuationAttribUID, ValueType...
and all other SPIDs are majorly waiting for the PageLatch_EX
This means, the SQL Server is doing lot of inserts\deletes\updates on the data pages and hence taking the latch on it. This may be due to the bad disk IO and bad query plans. Need to investigate it further.
Checked the perfmon counters and it looked good to me..
Physical Disk Minimum Maximum Average
============================================================================================================
Avg. Disk sec/Transfer : 0.001 | 0.002 | 0.001
Disk: 0 C: : 0 | 0.005 | 0
Disk: 1 D: : 0 | 0.006 | 0
Disk: 10 V: : 0 | 0 | 0
Disk: 11 U: : 0 | 0 | 0
Disk: 12 K: : 0 | 0 | 0
Disk: 13 F: : 0.001 | 0.067 | 0.008
Disk: 14 Y: : 0 | 0.01 | 0.001
Disk: 15 J: : 0 | 0 | 0
Disk: 16 M: : 0 | 0 | 0
Disk: 2 G: : 0.001 | 0.002 | 0.001
Disk: 3 : 0 | 0 | 0
Disk: 4 H: : 0 | 0.009 | 0.001
Disk: 5 E: : 0 | 0.003 | 0.001
Disk: 6 : 0 | 0.004 | 0
Disk: 7 : 0 | 0.008 | 0
Disk: 8 S: : 0 | 0 | 0
Disk: 9 T: : 0 | 0 | 0
Avg. Disk Queue Length : 0.446 | 1.649 | 0.868
Disk: 0 C: : 0 | 0.296 | 0.01
Disk: 1 D: : 0 | 0.275 | 0.003
Disk: 10 V: : 0 | 0 | 0
Disk: 11 U: : 0 | 0 | 0
Disk: 12 K: : 0 | 0 | 0
Disk: 13 F: : 0.011 | 0.826 | 0.167
Disk: 14 Y: : 0 | 0.017 | 0.003
Disk: 15 J: : 0 | 0 | 0
Disk: 16 M: : 0 | 0 | 0
Disk: 2 G: : 0.231 | 0.775 | 0.477
Disk: 3 : 0 | 0 | 0
Disk: 4 H: : 0 | 0.031 | 0.003
Disk: 5 E: : 0.09 | 0.756 | 0.203
Disk: 6 : 0 | 0.006 | 0.001
Disk: 7 : 0 | 0.012 | 0.001
Disk: 8 S: : 0 | 0 | 0
Disk: 9 T: : 0 | 0 | 0
So, the disk counters are perfect, we need to look for the plan of the query..
So, have to look in the query plan..
Bur before doing this, I would request you to resolve the CPU drift issue which can cause all sorts of problems for SQL, including incorrect query durations reported in Profiler or SET STATISTICS TIME ON, incorrect wait times in sysprocesses or SQLPERF(WAITSTATS)/sys.dm_os_wait_stats, bogus "I/O requests taking longer than 15 seconds to complete" warnings in the SQL errorlog, and even scheduler hangs.
**So, not looking at the plan at this point in time.
==To resolve the CPU drft issue we have already followed:
1. Set the server's power scheme to "Always 2. Install AMD's "Dual-Core Optimizer" patch.
I wanted you to set the processor affinity as well, but you cannot use procesor affinity since this is a cluster and on node 1 you have 4 dual core procs and on node 2 you have 4 single core procs.
==I requested you to run the RDTSCTEST.exe which would tell us the CPU drift on the Server, and here are the results:
G:\PSSDIAG>rdtsctest.exe
======================= Processor =======================
Processor Architecture: x64
Page Size: 4096
Processor Type: AMD64
Processor Type: Family 15, Model 5, Stepping 10
CPU Speed: 2400Mhz (2400000 ticks per ms)
Cache Line Size: 64
Cache Size: 1024KB
Active Processor Mask: 15 (0x000000000000000F)
Warning: CPUID Leaf 4 is not supported or disabled in BIOS. May be unable to det ermine core/processor mapping.
Hyper-Threading: Not capable
Multi-core: No
Physical Package/Socket Count: 4
Processor Core Count: 4 (1 per package)
Logical Processor Count: 4 (1 per core)
Logical Processor/Core/Socket Mapping
CPU APIC ID Physical ID Core ID HT/SMT ID
------ ----------- ----------- ----------- -----------
0 0 0 0 0
1 1 1 0 0
2 2 2 0 0
3 3 3 0 0
======================= NUMA ============================
Max NUMA Node: 3
Node 0 Processor Mask: 1 (0x0000000000000001)
Node 1 Processor Mask: 2 (0x0000000000000002)
Node 2 Processor Mask: 4 (0x0000000000000004)
Node 3 Processor Mask: 8 (0x0000000000000008)
======================= Memory ==========================
Total Physical Memory: 32599 MB
Avail Physical Memory: 17747 MB
% Memory In Use: 45
Total Page File: 64669 MB
Avail Page File: 50207 MB
NUMA Node 0 Avail Memory: 0 MB
NUMA Node 1 Avail Memory: 3101 MB
NUMA Node 2 Avail Memory: 0 MB
NUMA Node 3 Avail Memory: 3477 MB
======================= Actual CPU Speed ================
CPU 00 Current Speed: 2397Mhz
CPU 01 Current Speed: 2398Mhz
CPU 02 Current Speed: 2391Mhz
CPU 03 Current Speed: 2377Mhz
======================= RDTSC Deltas ==================== CPU 00 Ticks: 0x0000B00905A7D976, Drift: 2449197.2ms (0x55898781649 ticks) *Excessive Drift =============> IMP to NOTE CPU 01 Ticks: 0x0000AAB06D2FC32D, Drift: 0.0ms (0x00000000 ticks) CPU 02 Ticks: 0x0000ABC7CB7E1F7A, Drift: 499949.2ms (0x1175E4E5C4D ticks) *Excessive Drift ==============>IMP to NOTE
CPU 03 Ticks: 0x0000AB59C6B7137A, Drift: 303063.1ms (0xA95987504D ticks) *Excessive Drift ================IMP to NOTE
So, this clearly explains that the CPU drift is excessive on the server node.
==Recommended you that we need to contact the hardware vendor and windows paltform team (if they can help un reducing the CPU drift).
==At this point we will not do further research on the case (related to query plan etc).
ACTION PLAN
================
1. You will contact Hw vendor to reduce the CPU drift on the server
2. You will engage Platforms Engg (Dedicated to BOA) to check if he can help.
3. Once this is done, we will run PSSDIAG again on the server to get the data and do analysis.
4. Have advised you to reduce the Network packet size to atmost 8192KB (4096 would be the best)
3 REPLIES 3
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО09-25-2008 10:06 PM
тАО09-25-2008 10:06 PM
Re: CPU Drift on DL585 G1
hi
have you read this website?
http://digital.ni.com/public.nsf/allkb/D0FC3BE7D9F331398625731A005D76B8
and this
http://developer.amd.com/assets/TSC_Dual-Core_Utility.pdf
regards
have you read this website?
http://digital.ni.com/public.nsf/allkb/D0FC3BE7D9F331398625731A005D76B8
and this
http://developer.amd.com/assets/TSC_Dual-Core_Utility.pdf
regards
Didn't your momma teach you to say thanks!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО09-30-2008 01:04 PM
тАО09-30-2008 01:04 PM
Re: CPU Drift on DL585 G1
Hi Karlo -
I'm currently presenting this to my team. Will let you know how it goes after implementing it.
I'm currently presenting this to my team. Will let you know how it goes after implementing it.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО09-30-2008 01:07 PM
тАО09-30-2008 01:07 PM
Re: CPU Drift on DL585 G1
oh thanks for the update
OK let us know the results
OK let us know the results
Didn't your momma teach you to say thanks!
The opinions expressed above are the personal opinions of the authors, not of Hewlett Packard Enterprise. By using this site, you accept the Terms of Use and Rules of Participation.
News and Events
Support
© Copyright 2024 Hewlett Packard Enterprise Development LP