Re: VMware Fault Tolerance
Posted: Fri Dec 05, 2014 3:10 pm
Schmoog wrote:For true zero RTO, you need a distributed application a la web server farm, microsoft exchange DAG, etc.
I have to agree with this.
I have done a ton of research looking into this, and have implemented some of the solutions. (latest 6-12mo ago) Zerto, Double-Take, VMware Metro Cluster(stretched VMware cluster), MS clusters within VMware to name a few. None of them provided a zero RTO.
If you find a product that truly provides a zero RTO, please let all of us know.
The best I was able to do with a dumb application was ~10-20 seconds (from outage detection to app start up). Most of that time was from the windows 2008 VM, and application starting up. That was with a VMware stretched cluster, .6 ms latency between brocade directors, with 40gbs bandwidth between them(20gbs - FC, 20gbs - enet), at 50 & 57 miles away (dual fiber paths). Stretched layer 2 networking. And the cost...ouch. $1mil/yr - dark fiber lease. $500k network gear. 500+ man hours, etc. Total cost was ~$5mil. Both FC and Ethernet networks plugged into a Cisco dwdm 15454. Storage was dual HP XP 24k front ended by a FC device that keep the storage 100% synchronized (160TB).
With a smart application, you can put it behind a load balancer, and have a VM at each site...but how would it keep it's database in sync? which side does the db live on? Can it stay up all the time? How fast does it failover to a secondary system?
Multi-CPU FT is coming. but again.. If 1 OS or app gets corrupted, they both are. It only protects against hard VMware Host outages.
I remember hearing about an application that can mirror the entire system, even active memory between servers. I can't remember what app it was...maybe oracle.
A co-worker just pointed out that Tandem and possibly mainframe can provide a zero RTO.
He also pointed out that your hardware, OS, and all layers of the Application ALL need to support it. And even then, if you are in the middle of a transaction and something happens, there is a still a chance that it gets interrupted, or lost.
In my opinion, you should go back to whomever requested this, and reset their expectations.
Also, there might be some cloud options available. But I don't know if any of the providers will guarantee uptime 100% with zero RTO.
Good luck.
Scott