New member, intro, and questions about 8400
Posted: Sat May 25, 2024 1:38 pm
Hello,
Let me first introduce myself so it's clear what my background is before I ask any questions
We own a hosting company that specializes in managed services, either bare metal, VMware, KVM, etc, mostly bare metal dedicated. We are setup in a way that makes it very easy for us to deploy a Fibre Channel connected machine, i.e. we have >500 blade slots (HP c7000) and run mostly BL460c Gen9 and BL460c Gen10 blades. We have two air-gapped FC fabrics, with Brocade DCX's, 16Gbps.
My journey to fibre channel SANs away from direct attached storage started in about 2008, when we bought an EMC CX3-40, and I was just amazed to see how fast it was compared to what we were doing at that time, i.e. PE2850's with the PERC4 and 6 x 10K SCSI drives (usually the Fujitsu MBA3300RC or the Maxtor Atlas 10K V). I immediately realized that SAN is the way forward, as it gave us amazing flexibility to deploy anything anywhere, and move things around if a system failed without having to send someone to the datacenter.
Around 2013ish we bought our first EVA4400, and was amazed at how much easier it was to use and deploy than the CX. By 2019 we had 8 x EVA8400's with 324 x 450G 15K disks spread across a single contiguous row of racks. We also had a single EVA6400, and probably 6-7 EVA4400's that were used for smaller customers, we even had an EVA4400 deployed at two customer sites, even one in the office.
Generally, it was an amazing period, it wasn't SSD fast, but it worked well.
Around 2019 one of the EVA8400's failed violently and both controllers went down, we had many customers screaming about fsck, etc ,bla bla. Our monitoring was *very* detailed, to the point that we were pulling syslogs out of command view and even scanning for disk surface errors in it's own internal logs with scripts we had made, and we were proactively ungrouping disks to keep the bad, (older, tired) disks out of the system. It was an uphill battle after this, and the decision was made to cut down the EVA's and go to server-based SANs using Oracle Solaris ZFS, and present the storage via COMSTAR/FC.
This has worked well from then until now, we probably have 40-45 machines running Solaris, each with either 100 x 10K SAS drives, or 20-22 SAS SSD drives always in RAID 10.
We have a single point of failure, as if one of these machines reboots or fails for any reason, all dependant client servers are down. We mitigate tihs by running mdraid on the blades for customers that need that type of availability, but in any event this is all a hassle and a pain.
I am thinking now to look at a 3Par 84xx and tippy toe into the 3Par world. I know I'm late, but I want to know if I can still do this.
My goals are as follows:
* I need to have, at the end of this project, 5-6 3Par SANs to minimize blast radius.
* I need some of them to be somewhere on the order of 30-50TB, 100% SSD
* I will need some of them to be around 80-100TB, 10K SAS and maybe SSD (if that auto-distribute block function actually is real and works)
* I will then need to have 1 x 3Par ,preferably an older one 7400 etc , 100% full of 10K SAS, that would be used as a replication target for *all* of the others.
Ideally I would be able to maintain replication full sync, rather than async, but if it really kills source performance I could survive with async.
With EVA we used to do this, for every 6 EVA 8400's we had we would replicate to another 8400 with 600G drives. We used async for most vdisks because sync would slow down the source array. (it was the IO's against the primary EVA's drives that would increase latency to client workloads, not the FP ports being overloaded)
So I guess my question is:
Am I too late to the 3par club?
We do not have the budget to go and buy a brand new Primera
We do however have the budget to get several refurb/good-used 3par 8400's
Do I really need a "4N" setup, or can I be OK with multiple 2N systems?
I don't like large blast radius situations, i.e. I am not the type of person that will say, OK let's build a giant 8400 with 4N and >400 drives and trust that we'll survive. I once had an EMC CX4-960 with >700 drives go sideways and that experience, I refuse to re-live again.
Thank you very much for any advice
Much appreciated
Let me first introduce myself so it's clear what my background is before I ask any questions
We own a hosting company that specializes in managed services, either bare metal, VMware, KVM, etc, mostly bare metal dedicated. We are setup in a way that makes it very easy for us to deploy a Fibre Channel connected machine, i.e. we have >500 blade slots (HP c7000) and run mostly BL460c Gen9 and BL460c Gen10 blades. We have two air-gapped FC fabrics, with Brocade DCX's, 16Gbps.
My journey to fibre channel SANs away from direct attached storage started in about 2008, when we bought an EMC CX3-40, and I was just amazed to see how fast it was compared to what we were doing at that time, i.e. PE2850's with the PERC4 and 6 x 10K SCSI drives (usually the Fujitsu MBA3300RC or the Maxtor Atlas 10K V). I immediately realized that SAN is the way forward, as it gave us amazing flexibility to deploy anything anywhere, and move things around if a system failed without having to send someone to the datacenter.
Around 2013ish we bought our first EVA4400, and was amazed at how much easier it was to use and deploy than the CX. By 2019 we had 8 x EVA8400's with 324 x 450G 15K disks spread across a single contiguous row of racks. We also had a single EVA6400, and probably 6-7 EVA4400's that were used for smaller customers, we even had an EVA4400 deployed at two customer sites, even one in the office.
Generally, it was an amazing period, it wasn't SSD fast, but it worked well.
Around 2019 one of the EVA8400's failed violently and both controllers went down, we had many customers screaming about fsck, etc ,bla bla. Our monitoring was *very* detailed, to the point that we were pulling syslogs out of command view and even scanning for disk surface errors in it's own internal logs with scripts we had made, and we were proactively ungrouping disks to keep the bad, (older, tired) disks out of the system. It was an uphill battle after this, and the decision was made to cut down the EVA's and go to server-based SANs using Oracle Solaris ZFS, and present the storage via COMSTAR/FC.
This has worked well from then until now, we probably have 40-45 machines running Solaris, each with either 100 x 10K SAS drives, or 20-22 SAS SSD drives always in RAID 10.
We have a single point of failure, as if one of these machines reboots or fails for any reason, all dependant client servers are down. We mitigate tihs by running mdraid on the blades for customers that need that type of availability, but in any event this is all a hassle and a pain.
I am thinking now to look at a 3Par 84xx and tippy toe into the 3Par world. I know I'm late, but I want to know if I can still do this.
My goals are as follows:
* I need to have, at the end of this project, 5-6 3Par SANs to minimize blast radius.
* I need some of them to be somewhere on the order of 30-50TB, 100% SSD
* I will need some of them to be around 80-100TB, 10K SAS and maybe SSD (if that auto-distribute block function actually is real and works)
* I will then need to have 1 x 3Par ,preferably an older one 7400 etc , 100% full of 10K SAS, that would be used as a replication target for *all* of the others.
Ideally I would be able to maintain replication full sync, rather than async, but if it really kills source performance I could survive with async.
With EVA we used to do this, for every 6 EVA 8400's we had we would replicate to another 8400 with 600G drives. We used async for most vdisks because sync would slow down the source array. (it was the IO's against the primary EVA's drives that would increase latency to client workloads, not the FP ports being overloaded)
So I guess my question is:
Am I too late to the 3par club?
We do not have the budget to go and buy a brand new Primera
We do however have the budget to get several refurb/good-used 3par 8400's
Do I really need a "4N" setup, or can I be OK with multiple 2N systems?
I don't like large blast radius situations, i.e. I am not the type of person that will say, OK let's build a giant 8400 with 4N and >400 drives and trust that we'll survive. I once had an EMC CX4-960 with >700 drives go sideways and that experience, I refuse to re-live again.
Thank you very much for any advice
Much appreciated