We recently migrated our bare metal databases to Google Cloud Platform. To do that, our Google rep recommended a tool called CloudEndure. I mentioned this to some peers and figured it would be helpful to share a review of the product more broadly.
In short: It’s pretty amazing and once it works it works wonderfully.
CloudEndure can replicate from bare metal or cloud to cloud for migrations or DR. They do a block level migration of the base drive. For moving data to the cloud that meant that once the data was there, at any time I could spin up a “replica” which would begin running as a rebooted server on the disk at the point in time that the last update was made. This made testing and completing a migration of two database servers many hours less work than it would have been to set up slaves.
Their support is very responsive and very helpful as well.
We didn’t have to pay for this service. Google provides it for “free” to get you into their cloud. I still had to pay for the resources used by the replicator in GCP, of course. I learned from another company that outside Google their cost is something like $115/instance/month with a minimum commitment of 10 instances. That’s pretty steep for ongoing disaster recovery.
There were some things I think they could improve:
- The sales pipeline was silly long. Like they felt they needed to keep me in the pipeline. I talked to two different sales people over the course of a week before they would give me access to the tool to start using it. I had already decided to “buy” when I contacted them.
- It took four days to get the initial 7TB of data migrated. They stated that was due to the limits of our outbound pipeline at our datacenter which may be true; I’ve never measured it.
- When we were writing at more than about 20MB/s the replicator lagged behind. Again, they said that they were limited by outbound bandwidth. Their UI wasn’t always clear what was going on, but their support was responsive.
- There were several quotas we had to increase in our GCP project to allow this to work. Fortunately, CloudEndure was responsive and let me know what the issue was the first time and Google increases quotas within about a minute of the request.
- Spinning up a replica can take a couple hours (it was 40 minutes in my first test and 2 hours in my final move.). I am sure it’s just because of all the data being moved around, but I was hoping for something faster. [As a note, when I later spun up new instances from snapshots of the migrated system in GCP it took about 20 minutes.]
- When I stopped replication it removed the agent from the source servers and cleaned up all the resources in GCP, but left the agent running on the replicas (where it had been migrated). I had to contact them to uninstall that manually.