Integrity Servers
cancel
Showing results for 
Search instead for 
Did you mean: 

Finding bad dimm on the MP console of a RX7620

 
r rayfield
Occasional Advisor

Finding bad dimm on the MP console of a RX7620

Can I tell which memory dimm is bad from the FRU output? I have one stick that is marking a bank bad, and need to find out which one.
14 REPLIES 14
Torsten.
Acclaimed Contributor

Re: Finding bad dimm on the MP console of a RX7620

Hi,

run the stm on the memory followed by the logtool to get more information.

echo "selclass qualifier memory;info;wait;infolog" | /usr/sbin/cstm

Hope this helps!
Regards
Torsten.

__________________________________________________
There are only 10 types of people in the world -
those who understand binary, and those who don't.

__________________________________________________
No support by private messages. Please ask the forum!

If you feel this was helpful please click the KUDOS! thumb below!   
Court Campbell
Honored Contributor

Re: Finding bad dimm on the MP console of a RX7620

why not try cstm.

cstm
cstm> sel dev 2
cstm> info
cstm> infolog

"The difference between me and you? I will read the man page." and "Respect the hat." and "You could just do a search on ITRC, you don't need to start a thread on a topic that's been answered 100 times already." Oh, and "What. no points???"
r rayfield
Occasional Advisor

Re: Finding bad dimm on the MP console of a RX7620

OK, I see the bad bank, but what did you say do next?

Cab 0 Cell 0 DIMM 7A 2048 Deconf 060206112125 62706
Cab 0 Cell 0 DIMM 7B 2048 Deconf 060206111346 62706
Cab 0 Cell 1 DIMM 6A 2048 Deconf 060206112602 62706
Cab 0 Cell 1 DIMM 6B 2048 Deconf 060206111006 62706
Court Campbell
Honored Contributor

Re: Finding bad dimm on the MP console of a RX7620

I am not sure what you mean. If this server is under a support contract you need to call hp and have them come out and change the memory module.
"The difference between me and you? I will read the man page." and "Respect the hat." and "You could just do a search on ITRC, you don't need to start a thread on a topic that's been answered 100 times already." Oh, and "What. no points???"
Torsten.
Acclaimed Contributor

Re: Finding bad dimm on the MP console of a RX7620

deconfigured doesn't mean bad.

Contact the hp support.

Hope this helps!
Regards
Torsten.

__________________________________________________
There are only 10 types of people in the world -
those who understand binary, and those who don't.

__________________________________________________
No support by private messages. Please ask the forum!

If you feel this was helpful please click the KUDOS! thumb below!   
r rayfield
Occasional Advisor

Re: Finding bad dimm on the MP console of a RX7620

It is 3rd party memory, HP does not support it.
Torsten.
Acclaimed Contributor

Re: Finding bad dimm on the MP console of a RX7620

So try to run the logtool within stm.
One bad dimm will cause the quad (4 Dimms) to be deconfigured.

What does the stm log gives you?

Hope this helps!
Regards
Torsten.

__________________________________________________
There are only 10 types of people in the world -
those who understand binary, and those who don't.

__________________________________________________
No support by private messages. Please ask the forum!

If you feel this was helpful please click the KUDOS! thumb below!   
r rayfield
Occasional Advisor

Re: Finding bad dimm on the MP console of a RX7620

I don't know what you are talking about with the strn and logtool. Never saw the cstm command before today.....Can you give me the syntax please?
Torsten.
Acclaimed Contributor

Re: Finding bad dimm on the MP console of a RX7620

as mentioned above - first run

echo "selclass qualifier memory;info;wait;infolog" | /usr/sbin/cstm

Hope this helps!
Regards
Torsten.

__________________________________________________
There are only 10 types of people in the world -
those who understand binary, and those who don't.

__________________________________________________
No support by private messages. Please ask the forum!

If you feel this was helpful please click the KUDOS! thumb below!   
r rayfield
Occasional Advisor

Re: Finding bad dimm on the MP console of a RX7620

OK, ran that, what next?
Torsten.
Acclaimed Contributor

Re: Finding bad dimm on the MP console of a RX7620

interpret the results ;-)

Hope this helps!
Regards
Torsten.

__________________________________________________
There are only 10 types of people in the world -
those who understand binary, and those who don't.

__________________________________________________
No support by private messages. Please ask the forum!

If you feel this was helpful please click the KUDOS! thumb below!   
r rayfield
Occasional Advisor

Re: Finding bad dimm on the MP console of a RX7620

Thanks
rick jones
Honored Contributor

Re: Finding bad dimm on the MP console of a RX7620

FWIW, I suspect that stm/cstm et al are reasonably well documented at http://docs.hp.com/

there is no rest for the wicked yet the virtuous have no pillows
Sameer_Nirmal
Honored Contributor

Re: Finding bad dimm on the MP console of a RX7620

From MP FRU output, you can't point to bad DIMM. At point, you won't see a DIMM FRU detail if it's not detected at all. You can take a look at the events detected by the system firmware for DIMMS in MP SEL log.

In rx7620, basic memory configuration requires filling up a rank i.e. a pair of DIMM. Based on the output you posted, it looks like you have problem on two ranks on two different cell boards. Now within a rank, if one of the DIMMs generates MBE or gets de-configured, it will make other deconf too making rank no available for use. This server supports "memory chip spare or chip kill".

I think it's hard to pin-point the exact bad DIMM since other DIMM in the rank is inter-leaved I guess. So other DIMM would throw an error too. One might need to take a look at the MP SEL, CSTM Infolog, CSTM LogTool to do initial assessment. However it might require to isolate the problem by taking out DIMM one by one and looking at PDT entries.

Run CSTM logtool and take a look at its output for memory releated information and messages.

Create a a file /tmp/logtool.txt having

#Print the formatted logtool to /tmp directory
#
# Syntax: 'cstm -f logtool.txt'
#
ru
logtool
rs
saveas
/tmp/logtool.summary
done
fl
/var/stm/logs/os
saveas
/tmp/logtool.formatted
done
quit
ok
quit
ok

Run cstm as
# cstm -f /tmp/logtool.txt and take a look at the logtool files generated.

From EFI shell you can check memory and PDTs using

# dimmconfig
# pdt