HPE 9000 and HPE e3000 Servers
cancel
Showing results for 
Search instead for 
Did you mean: 

rp3440 incorrect RAM amount reported

 
SOLVED
Go to solution
phelixyrus
Occasional Advisor

rp3440 incorrect RAM amount reported

I have two identical rp3440 systems with 4GB of physical memory installed (8 x 512) in each. One of the systems, however, only reports 2GB of RAM in SAM, STM, and dmesg. The output from cstm looks like this:

DIMM Slot Size (MB)
--------- ---------
0A 512
0B 512
1A 512
1B 512
2A 0
2B 0
3A 0
3B 0
4A 0
4B 0
5A 0
5B 0
--------- ---------
System Total (MB): 2048

My colleague told me that one of the RAM chips/slots after the first four (possibly 2A or 2B) might have failed, resulting in the remaining memory slots (3A and 3B) not being checked. Can anyone confirm this theory? Is there anything else that I can try before contacting HP support?

Thank you in advance for your help.
9 REPLIES 9
Sameer_Nirmal
Honored Contributor

Re: rp3440 incorrect RAM amount reported

The amount of memory you see through SAM,STM or dmesg is being recognised within OS OE. It maybe a case that the remaining memory is not at all made available to OS because of some problem ( POST failure etc ) with them when you booted the OS.

You can check the memory status from BCH to menu as
BCH -> IN -> ME

Detailsed FRU infomation and logs in could be found in
MP -> CM -> DF
MP -> SL -> E



phelixyrus
Occasional Advisor

Re: rp3440 incorrect RAM amount reported

Based on Sameer's comment, I looked at the Event Logs and found that the DIMMs were hardware deallocated (MEM_DIMM_HW_DEALLOCATED). I found a possible cause on one of the HP docs which says that "another DIMM in the same rank has been software deallocated," and that a solution is to "replace the software deallocated DIMM."

So, my next question (and hopefully the last for this forum) is how do I locate a DIMM or DIMMs that were software deallocated?

Thanks again...
Sameer_Nirmal
Honored Contributor

Re: rp3440 incorrect RAM amount reported

Since you already know from STM output and can see physically in the server, you can make out the de-allocated memory locations.

Even from the event log , the locations of DIMMs could be determined.

Refer the following manual as well for memory conifgurtions, rules, and slots identifications etc. at
http://docs.hp.com/en/A7137-96002/A7137-96002.pdf
Michael Steele_2
Honored Contributor

Re: rp3440 incorrect RAM amount reported

Check for deconfigured memory via cstm:

echo "map selclass qualifier memory ;info;wait;il" | /usr/sbin/cstm
Support Fatherhood - Stop Family Law
phelixyrus
Occasional Advisor

Re: rp3440 incorrect RAM amount reported

I could see that DIMM slots 2A, 2B, 3A, and 3B were deallocated by "hardware" (MEM_DIMM_HW_DEALLOCATED) in the MP Event Log. As I've mentioned, the recommended solution I've found in the HP doc is to replace the DIMM(s) that were deallocated by "software." Essentially, how can I precisely determine which of the 4 hardware deallocated chips (in slots 2A, 2B, 3A, and 3B) I need to replace?
Julian Hall
Occasional Advisor
Solution

Re: rp3440 incorrect RAM amount reported

Improvements have been made in the testing and deallocation of DIMMs with errors over the life cycle of the rp3440. I would recommend that you ensure that your PDC is at least 45.44.

If you examine the logs prior to the MEM_DIMM_HW_DEALLOCATED events, you may see earlier errors that clearly identify the DIMM with a problem, as in this example where DIMM 3B is suspect ...

312 SFW 2 2 0x448000A702E026F0 FFFFFFFF003BFF74 MEM_CORR_ERR
15 Mar 2006 14:27:31
313 SFW 2 2 0x448000A702E02710 FFFFFFFF003BFF74 MEM_CORR_ERR
15 Mar 2006 14:27:31
314 SFW 2 2 0x448000A702E02730 FFFFFFFF003BFF74 MEM_CORR_ERR
15 Mar 2006 14:27:32
315 SFW 2 *5 0xEE8000D802E02750 0000000000000C83 MEM_PDT_TABLE_FULL
15 Mar 2006 14:27:32
316 SFW 2 *3 0x648001D202E02770 FFFFFFFF002AFF74 MEM_DIMM_HW_DEALLOCATED
15 Mar 2006 14:27:32
317 SFW 2 *3 0x648001D202E02790 FFFFFFFF002BFF74 MEM_DIMM_HW_DEALLOCATED
15 Mar 2006 14:27:32
318 SFW 2 *3 0x648001D202E027B0 FFFFFFFF003AFF74 MEM_DIMM_HW_DEALLOCATED
15 Mar 2006 14:27:32
319 SFW 2 *3 0x648001D202E027D0 FFFFFFFF003BFF74 MEM_DIMM_HW_DEALLOCATED
15 Mar 2006 14:27:32

When a single DIMM is replaced within a quad, it is necessary to use the 'pdt clear' command at the BCH service menu, or rotate all the DIMMs in the quad to new slots.
Michael Steele_2
Honored Contributor

Re: rp3440 incorrect RAM amount reported

If you run:

echo "map selclass qualifier memory ;info;wait;il" | /usr/sbin/cstm

...the DIMM map and memory error log will tell you.

-- Information Tool Log for IPF_MEMORY on path memory --

Log creation time: Thu Nov 30 08:22:02 2006

Hardware path: memory


Basic Memory Description

Module Type: MEMORY
Page Size: 4096 Bytes
Total Physical Memory: N/A
Total Configured Memory: 4096 MB
Total Deconfigured Memory: N/A

Memory Board Inventory

DIMM Location Size(MB) DIMM Location Size(MB)
-------------------- -------- -------------------- --------
DIMM 0A 1024 DIMM 0B 1024
DIMM 1A 1024 DIMM 1B 1024
DIMM 2A ---- DIMM 2B ----
DIMM 3A ---- DIMM 3B ----
DIMM 4A ---- DIMM 4B ----
DIMM 5A ---- DIMM 5B ----

Total: 4096 (MB)

===========================================================================

Memory Error Log Summary

The memory error log is empty.

Page Deallocation Table (PDT)

The Page Deallocation Table is empty.

PDT Entries Used: 0
PDT Entries Free: 100
PDT Total Size: 100
Support Fatherhood - Stop Family Law
phelixyrus
Occasional Advisor

Re: rp3440 incorrect RAM amount reported

Thanks, Julian...

In the iLO's Event Logs, I set the filter level to 3 (warning), so that's why I didn't see errors prior to MEM_DIMM_HW_DEALLOCATED. I've set the filter to 2, and then I was able to pinpoint which DIMM was having problems (MEM_CORR_ERR). I guess I can't assume that major events like this must be level 3 and above.

We replaced the single failed DIMM, and the server seemed to freeze up during POST. So, we took out all the four DIMMs in the quad, rebooted the server and noticed that the PDT was already cleared at this point. We put all the four DIMMs back in the same slots they were taken out from, and everything now works just fine.

So, I assume that when Julian mentioned the running of the 'pdt clear' BCH service command, that this must be done BEFORE the single DIMM is replaced?

Thank you all for your contribution. The problem has now been resolved.
phelixyrus
Occasional Advisor

Re: rp3440 incorrect RAM amount reported

See Julian's comment above.