- Community Home
- >
- Servers and Operating Systems
- >
- Integrity Servers
- >
- Re: rx2620 Memory problem
Categories
Company
Local Language
Forums
Discussions
Forums
- Data Protection and Retention
- Entry Storage Systems
- Legacy
- Midrange and Enterprise Storage
- Storage Networking
- HPE Nimble Storage
Discussions
Discussions
Discussions
Forums
Discussions
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
- BladeSystem Infrastructure and Application Solutions
- Appliance Servers
- Alpha Servers
- BackOffice Products
- Internet Products
- HPE 9000 and HPE e3000 Servers
- Networking
- Netservers
- Secure OS Software for Linux
- Server Management (Insight Manager 7)
- Windows Server 2003
- Operating System - Tru64 Unix
- ProLiant Deployment and Provisioning
- Linux-Based Community / Regional
- Microsoft System Center Integration
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Community
Resources
Forums
Blogs
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
08-30-2013 04:10 AM
08-30-2013 04:10 AM
rx2620 Memory problem
Hello all,
I have a rx2620 that has been running OpenVMS for the last 7 years. This morning it died and will not boot again.
Nobody touched the machine. The MP log tells me the machine seems to not see any memory at all.
Info about the machine:
redb2] MP:CM> sysrev
SYSREV
Current firmware revisions
MP FW : E.03.30
BMC FW : 04.01
System FW : 04.10
The System Event log has:
363 SFW 0 0 0x148002C500E02180 0000000000000000 BOOT_REBOOT
30 Aug 2013 09:19:04
364 SFW 0 *3 0x64800FA000E021A0 FFFFFFFF003BFF74 MEM_CHIPSPARE_DEALLOC_RANK
30 Aug 2013 09:19:11
365 SFW 0 *3 0x64800FA000E021C0 FFFFFFFF001BFF74 MEM_CHIPSPARE_DEALLOC_RANK
30 Aug 2013 09:19:11
366 SFW *5 0xC15220638F0221E0 FF3F4070000F0300 Type-02 0f7000 1011712
30 Aug 2013 09:19:11
367 SFW 0 *7 0xE08000D100E021F0 0000000000000000 MEM_NO_MEM_FOUND
30 Aug 2013 09:19:11
368 SFW *5 0xC15220638F022210 FF3F4070000F0300 Type-02 0f7000 1011712
30 Aug 2013 09:19:11
369 SFW 0 *7 0xF480003700E02220 000000000000000F BOOT_HALT_CELL
30 Aug 2013 09:19:11
370 BMC 2 0x20522063EF022240 0180A37000120300 Type-02 127003 1208323
30 Aug 2013 09:20:47
371 BMC 2 0x2000000001022250 0150A17000120300 Type-02 127001 1208321
00:00:01
372 BMC 2 0x2000000007022260 FFFF006F01050300 Type-02 056f00 356096
00:00:07
373 BMC *3 0x2000000007022270 FFFF010302050300 Type-02 050301 328449
00:00:07
374 BMC 2 0x2000000007022280 FFFF018302050300 Type-02 058301 361217
00:00:07
375 MP 0 2 0x5E800A7A00E02290 0000000000000000 MP_SELFTEST_RESULT
00:00:11
376 BMC 2 0x20522066390222B0 FFFF0103FDC00300 Type-02 c00301 12583681
30 Aug 2013 09:30:33
377 BMC 2 0x205220663B0222C0 FFFF006F04140300 Type-02 146f00 1339136
30 Aug 2013 09:30:35
378 BMC 2 0x205220663B0222D0 0401A37004120300 Type-02 127003 1208323
30 Aug 2013 09:30:35
379 BMC 2 0x205220663E0222E0 FFFF027000120300 Type-02 127002 1208322
30 Aug 2013 09:30:38
380 BMC 2 0x205220663F0222F0 FFFF0108F10D0300 Type-02 0d0801 854017
30 Aug 2013 09:30:39
381 BMC 2 0x205220663F022300 FFFF0108F20D0300 Type-02 0d0801 854017
30 Aug 2013 09:30:39
382 BMC 2 0x205220663F022310 FFFF0108F30D0300 Type-02 0d0801 854017
30 Aug 2013 09:30:39
383 BMC 2 0x205220663F022320 FFFF006FFA220300 Type-02 226f00 2256640
30 Aug 2013 09:30:39
384 SFW 2 0xC152206648022330 FFFF000A001D0300 Type-02 1d0a00 1903104
30 Aug 2013 09:30:48
385 SFW 0 1 0x5480006300E02340 0000000000000000 BOOT_START
30 Aug 2013 09:30:48
386 BMC 2 0x2052206659022360 FFFF027000120300 Type-02 127002 1208322
30 Aug 2013 09:31:05
387 SFW 0 0 0x148002C500E02370 0000000000000000 BOOT_REBOOT
30 Aug 2013 09:31:13
388 SFW 0 *3 0x64800FA000E02390 FFFFFFFF003BFF74 MEM_CHIPSPARE_DEALLOC_RANK
30 Aug 2013 09:31:23
389 SFW 0 *3 0x64800FA000E023B0 FFFFFFFF001BFF74 MEM_CHIPSPARE_DEALLOC_RANK
30 Aug 2013 09:31:23
390 SFW *5 0xC15220666B0223D0 FF3F4070000F0300 Type-02 0f7000 1011712
30 Aug 2013 09:31:23
391 SFW 0 *7 0xE08000D100E023E0 0000000000000000 MEM_NO_MEM_FOUND
30 Aug 2013 09:31:23
392 SFW *5 0xC15220666B022400 FF3F4070000F0300 Type-02 0f7000 1011712
30 Aug 2013 09:31:23
393 SFW 0 *7 0xF480003700E02410 000000000000000F BOOT_HALT_CELL
30 Aug 2013 09:31:23
394 BMC 2 0x20522067D2022430 FFFF006F04140300 Type-02 146f00 1339136
30 Aug 2013 09:37:22
395 BMC 2 0x20522067D7022440 040EA37004120300 Type-02 127003 1208323
30 Aug 2013 09:37:27
According to some googling, this seems to be caused by either memory that is installed incorrectly or a bad dimm.
Since this machine has not been opened in 7 years or so, I hope it is the latter. But I guess anyting could be wrong.
Can anybody here see what the issue is?
Thanks in advance,
Richard.
- Tags:
- DIMM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
08-30-2013 04:16 AM
08-30-2013 04:16 AM
Re: rx2620 Memory problem
Going further back in the log reveals lots and lots of:
321 SFW 2 0xC1520B6D7C021DD0 108F6070830C0300 Type-02 0c7000 815104
14 Aug 2013 11:43:56
322 SFW 0 2 0x448000A700E01DE0 FFFFFFFF002BFF74 MEM_CORR_ERR
14 Aug 2013 11:43:56
323 SFW 2 0xC1520BA23B021E00 108F6070830C0300 Type-02 0c7000 815104
14 Aug 2013 15:28:59
324 SFW 0 2 0x448000A700E01E10 FFFFFFFF002BFF74 MEM_CORR_ERR
14 Aug 2013 15:28:59
and:
325 SFW 0 2 0x408002B600E01E30 0000000000000000 MEM_PDT_SBE_PROMOTE
14 Aug 2013 15:28:59
But I cant tell which DIMM has the issue from this.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
08-30-2013 10:34 AM
08-30-2013 10:34 AM
Re: rx2620 Memory problem
> But I cant tell which DIMM has the issue from this.
The DIMM location in these logs is 2B:
FFFFFFFF002BFF74
00 = cell or board
2B = DIMM slot
Also know that the rx2620 utilizes DIMMs in quads, so if you remove 2B you will need to remove 2A, 3A and 3B as well.
-Bob
Was this helpful? Like this post by giving me a thumbs up below!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
08-30-2013 10:35 AM
08-30-2013 10:35 AM
Re: rx2620 Memory problem
You're viewing the System Event Log in Keyword Mode, which is the default mode. Switch the log viewer to Text Mode, and you may get the error messages in a more verbose form.
In a rx2620, memory must be installed in quads, i.e. sets of 4 DIMMs. There are 12 DIMM slots, i.e. 3 quads in total. (However, if 4 GB DIMMs are used, only 2 quads can be populated.)
MEM_CORR_ERR is a memory error that is correctable by the ECC subsystem. An uncorrectable memory error would cause the system to immediately crash. The system maintains a persistent Page Deallocation Table (PDT) that will be used to lock out the memory areas that are producing a lot of errors.
MEM_CHIPSPARE_DEALLOC_RANK indicates the system is rejecting a DIMM (or a quad), either because of an outright failure or because it has a too high frequency of correctable memory errors. Your system has two rejection messages per boot attempt: if those two rejected DIMMs are in different quads, and there are only 2 quads of DIMMs installed, it would mean that there are no more "good" quads available in the system. That might explain why the system cannot boot.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
08-30-2013 10:55 AM
08-30-2013 10:55 AM
Re: rx2620 Memory problem
That is exactly what looks like is going on Matti....
364 SFW 0 *3 0x64800FA000E021A0 FFFFFFFF003BFF74 MEM_CHIPSPARE_DEALLOC_RANK
365 SFW 0 *3 0x64800FA000E021C0 FFFFFFFF001BFF74 MEM_CHIPSPARE_DEALLOC_RANK
DIMMs 3B and 1B are being deallocated and each of those reside in a seperate quad. Likely the only two.
What I would suggest Richard, is to make up one quad of DIMMs for slots 0A/B and 1A/B to get the system to boot. Just leave out the DIMMs that are currently in 1B, 3B and now 2B. Once you get replacement memory DIMMs you can install them back as needed.
-Bob
Was this helpful? Like this post by giving me a thumbs up below!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
08-30-2013 02:20 PM - edited 08-30-2013 02:20 PM
08-30-2013 02:20 PM - edited 08-30-2013 02:20 PM
Re: rx2620 Memory problem
Ah, thank you all so much. It's good to finally understand those messages. I have over a years worth of the event log in a text file on my latop and now it is plain as day what is going on:
[ra@hamburger ~]$ grep MEM_CHIPSPARE_DEALLOC_RANK event.log
338 SFW 0 *3 0x64800FA000E01F40 FFFFFFFF003BFF74 MEM_CHIPSPARE_DEALLOC_RANK
339 SFW 0 *3 0x64800FA000E01F60 FFFFFFFF001BFF74 MEM_CHIPSPARE_DEALLOC_RANK
351 SFW 0 *3 0x64800FA000E02070 FFFFFFFF003BFF74 MEM_CHIPSPARE_DEALLOC_RANK
352 SFW 0 *3 0x64800FA000E02090 FFFFFFFF001BFF74 MEM_CHIPSPARE_DEALLOC_RANK
364 SFW 0 *3 0x64800FA000E021A0 FFFFFFFF003BFF74 MEM_CHIPSPARE_DEALLOC_RANK
365 SFW 0 *3 0x64800FA000E021C0 FFFFFFFF001BFF74 MEM_CHIPSPARE_DEALLOC_RANK
388 SFW 0 *3 0x64800FA000E02390 FFFFFFFF003BFF74 MEM_CHIPSPARE_DEALLOC_RANK
389 SFW 0 *3 0x64800FA000E023B0 FFFFFFFF001BFF74 MEM_CHIPSPARE_DEALLOC_RANK
1B and 3B deactivated and yes since the system has 8 dimms, all memory is now gone.
And then there is the 3rd dimm that is giving headaches:
[ra@hamburger ~]$ grep -c MEM_CORR_ERR event.log
145
[ra@hamburger ~]$ grep MEM_CORR_ERR event.log | grep -v FFFFFFFF002BFF74 | wc -l
0
Atleast all the other dimms seem to be fine.
Again, thanks for your help :)