Page 3 of 3

Re: Physical Disk Failures

Posted: Tue Sep 25, 2018 12:41 pm
by MammaGutt
Spinning media will gracefully fail at 6 failed chunklets.

Re: Physical Disk Failures

Posted: Sun Sep 30, 2018 1:37 pm
by IvaAn
ailean wrote:I tend to see three methods;

1) Disk fails, little warning and auto rebuild from parity.
2) Disk failing, sometimes get warnings and auto moves data elsewhere.
3) Disk not happy, maybe a few warnings or not available for allocations but requires manual servicing to start the data rebuild/move process before the engineer arrives.

Support typically are aware if the disk is ready for replacement but not sure what info from the SP uploads they check for that, I suspect the estimates they sometime give are generic based on disk type/size.

The fun tends to begin when the extra load from the rebuild fails another disk and/or when inserting the new disk doesn't go to plan. Three different service companies and over a dozen different engineers in 5 years has led to random events during replacements but no data loss. ;)

yup if the disk is ready for replacement from the sp uploads they check for that

Re: Physical Disk Failures

Posted: Sun Oct 06, 2024 4:38 am
by Zinfamous22
Hi Guys, It’s been a while since I’ve seen a complete evacuation of a Mag. Performance, load, and percentage full are usually key factors to consider. Logging seems to be the standard now, although I remember there were concerns about how long you could run with it enabled. We’ve had situations where we had to run with Logging for several hours due to failed inserts of new disks, waiting for an engineer who knew enough about the 3PAR to resolve the issue.

I’ve seen this mainly during full Mag replacements, especially when FC450 disks weren’t available and we had to substitute with FC600 disks. Back then, all disks in the Mag had to be the same size. I recall some early disk replacements where the Mag was only about 10% full, and Logging was still a relatively new feature.
Youtuve Vanced
I’ve also had to manually start the service and remind engineers to come back in a few hours when support dropped the ball. There were times when certain support staff were using a remote portal that frequently failed, while other teams seemed to have access to more reliable tools and didn’t face the same issues.