ProLiant Servers (ML,DL,SL)
1751927 Members
5295 Online
108783 Solutions
New Discussion юеВ

Re: Possible Memory Errors With Dual CPU/Linux System?

 
Isochron
Occasional Contributor

Possible Memory Errors With Dual CPU/Linux System?

I am seeing some behavior in some Java apps that indicates that RAM usage may be unreliable. I can query a string and be told that the string is non-null. I can then pass that non-null string into a process for sub-string extraction. In the extraction process, I will be told that the full string is Null. This does not happen consistently.

Here's my hardware setup:

Proliant DL145 G2
2 Opteron CPUs
16GB RAM (8 x 2GB)
RHEL4 build that only recognizes a single CPU

I added the 2nd CPU so that I would be able to access all 16GB RAM. Since I am using RAM from the bank associated with CPU-2 but RHEL doesn't know about that CPU, could there be any problems with that RAM. My server is using about 14GB of RAM so it is definitely into bank#2. Swap usage is minimal, at about 160KB.

My main question is whether or not the hardware/os ineraction with the RAM and CPUs could be problematic.

Thanks,

Andy Ford
Systems Administrator
Isochron, Inc.
7 REPLIES 7
James ~ Happy Dude
Honored Contributor

Re: Possible Memory Errors With Dual CPU/Linux System?

Andy,
I would be interested to know how much memory does the server see on POST !!
NB: I am not an expert with RHEL;

But I know these server HAS TO HAVE the 2nd CPU to even see the rest of the memory;

& though this is MUCH different:
http://h20000.www2.hp.com/bizsupport/TechSupport/Document.jsp?lang=en&cc=us&objectID=c00606132&jumpid=reg_R1002_USEN
I would recommend you to upgrade the Firmware to the LATEST Version:
http://h18023.www1.hp.com/support/files/server/us/download/25501.html

James.
Isochron
Occasional Contributor

Re: Possible Memory Errors With Dual CPU/Linux System?

James,

Thanks for the firmware tip.

As for the Proliant, POSTing shows all 16GB. As does RHEL via "top".
James ~ Happy Dude
Honored Contributor

Re: Possible Memory Errors With Dual CPU/Linux System?

I see this:
http://h20000.www2.hp.com/bizsupport/TechSupport/Document.jsp?objectID=c01124841

Is There something similar to RHEL;
as I said, I am not into linux/Unix; No way I can find it, even if the answers are in front of me ! LOL !

James.
James ~ Happy Dude
Honored Contributor

Re: Possible Memory Errors With Dual CPU/Linux System?

& this one:
http://h20000.www2.hp.com/bizsupport/TechSupport/Document.jsp?objectID=c00883105

I See RHEL is also effected... This might be it !!

James.
Isochron
Occasional Contributor

Re: Possible Memory Errors With Dual CPU/Linux System?

James,

Thanks for the PAE info. However, I am running the 64-bit version of RHEL4 (X64) and, according to the article, "64-bit operating systems, such as Windows Server 2003 x64 Edition, will recognize all of the installed memory without enabling PAE mode."

Also, since both POST and the OS are reporting 16GB, which is installed, it doesn't *seem* to be a memory reporting error.

I just wonder if using two cpus merely to access the additional RAM, while keeping the OS ignorant about the 2nd CPU might lead to inconsistencies when handling the RAM associated with that 2nd CPU.
James ~ Happy Dude
Honored Contributor

Re: Possible Memory Errors With Dual CPU/Linux System?

Okies; Refer the Attachment's Guidelines !

Is it all set the same way ?

James.
Isochron
Occasional Contributor

Re: Possible Memory Errors With Dual CPU/Linux System?

James,

Since I have all 8 RAM slots populated (each with 2GB DIMMs), the slot sequence is moot. Regarding the type of RAM, I am 99% sure that all 8 DIMMs are of the same speed and type. Also same part numbers. I installed the RAM back in July so I have forgotten enough of the specifics to be unable to say 100% that its all kosher.
However, I had this same document available when I was doing my homework before going from the 1GB DIMMs that were previously installed, to the 2GB units currently installed.
At this point I have neglected to mention that I have 2 DL145G2's, each with 2 Operaton CPUs and each with 8 2GB DIMMs. Both servers are exeriencing the same problem when manipulating text strings in RAM. This problem does not consistently happen, but when I restart the IBM Websphere application server involved, the problem goes away. Over time, the string problem becomes more pronounced.
Both Proliants are pretty utilized RAM-wise so I would expect both to be hitting DIMMs associated with CPU2. When I had the 1GB DIMMs I only had 1 CPU so the servers had 4GB of system RAM at that time. I did not see this issue at that time.