Thursday, July 28, 2011

HP5400 blade failures

Every so often I find a dead blade or switch... (thanks for the job security HP). And looking back at the logs you'll see tombstone errors:

W 07/28/11 19:55:11 00374 chassis: Slot A Slave ROM Tombstone: 0x13000101
W 07/28/11 19:52:05 00374 chassis: Slot A Failed to boot-timeout-(ROM_ALIVE)
W 07/28/11 19:52:04 00374 chassis: Slot A Slave ROM Tombstone: 0x13000101
W 07/28/11 19:48:58 00374 chassis: Slot A Failed to boot-timeout-(ROM_ALIVE)
W 07/28/11 19:48:56 00374 chassis: Slot A Slave ROM Tombstone: 0x13000101
W 07/28/11 19:45:50 00274 chassis: (84) Slot A: Blade Crash detected - Available

And what is the fix? Well a reboot of course (if you are lucky). But luckily in a blade switch you can just reload that module. Type:

#reload module A (where A is the letter of the blade)


In probably 3/4 the cases this seems to fix it, and you may never have a problem with it again. In the other quarter, you can try this repeatedly but the only thing that works: call up HP and get a replacement.


If you are lucky enough to have it come up, don't expect it to come up right away either, it takes around two minutes to reload the blade, then you will probably still see:

#show log -r
I 07/28/11 21:25:52 00422 chassis: Slot A Ready
I 07/28/11 21:25:36 00376 chassis: Slot A Download Complete
I 07/28/11 21:25:34 00375 chassis: Slot A Downloading
W 07/28/11 21:25:28 00374 chassis: Slot A Failed to boot-timeout-(ROM_ALIVE)
W 07/28/11 21:25:27 00374 chassis: Slot A Slave ROM Tombstone: 0x13000101
W 07/28/11 21:22:21 00374 chassis: Slot A Failed to boot-timeout-(ROM_ALIVE)
W 07/28/11 21:22:20 00374 chassis: Slot A Slave ROM Tombstone: 0x13000101
W 07/28/11 21:19:14 00374 chassis: Slot A Failed to boot-timeout-(ROM_ALIVE)
W 07/28/11 21:19:12 00374 chassis: Slot A Slave ROM Tombstone: 0x13000101
W 07/28/11 21:16:06 00374 chassis: Slot A Failed to boot-timeout-(ROM_ALIVE)
W 07/28/11 21:16:05 00374 chassis: Slot A Slave ROM Tombstone: 0x13000101
I 07/28/11 21:13:00 02756 chassis: Slot A is powered up.
I 07/28/11 21:12:57 02755 chassis: Slot A is powered down.
I 07/28/11 21:12:45 02762 chassis: Request for "reload module A".


You might notice that it took 13 minutes before the switch decided to bring up the blade after the module reload, enough time for you to give up on it and start trekking out with a replacement. Cheers to those of you lucky enough to work with procurve gear!

No comments: