Expedient: The Podcast | Transcript: Your Disaster Recovery Doesn't Have To Be A Disaster

Your Disaster Recovery Doesn't Have To Be A Disaster

November 18, 2024 / 24:43/E11

00:00:00:00 - 00:00:04:13
Speaker 3
Hello, everyone. My name is AJ Kuftic You didn't hear that before.

00:00:04:13 - 00:00:23:07
Speaker 3
You get to hear it now. I'm field CTO with Edxpedient and the person who doesn't know how to work a mute button. We are a full stack cloud service provider. We've been in this business for 20 years. We've been focusing a lot on the disaster recovery side of things. It has been something we've done for about 15 years now and is something that we find really, really important.

00:00:23:09 - 00:00:44:02
Speaker 3
But what a lot of people are looking at when they think of disaster recovery are these, you know, natural disasters? How do we have things like tornadoes, earthquakes, hurricanes, those sorts of things? How do we make this easier and better? And what a lot of folks see is they think it's just a one time thing. But the big thing that happens is ransomware attacks.

00:00:44:04 - 00:01:12:12
Speaker 3
I'm just going to quickly go through this because I think we can get there faster. But when we look at ransomware attacks, the bigger change that we see is that these are not one time things that happen. These are things that we need to do to actually clean up afterwards. They're not easy to see. And when we're working on these changes, this is a big shift for organizations to think about, because a lot of organizations are built around natural disasters.

00:01:12:14 - 00:01:33:03
Speaker 3
Bad thing happened, and now I need a failover. Now, we cleaned up from the bad thing and we go back. Ransomware attacks are not that way. And what we're seeing from ransomware attacks is you have 386 ransomware attacks in just healthcare just this year from the American Hospital Association. That's huge. Right? That's and that's by the way, those numbers are from like August.

00:01:33:09 - 00:01:58:23
Speaker 3
We've had two and a half months since then. That number's definitely gone up. But now you have 4611 ransomware attacks in 2023. That's from Sans. That means there's a high likelihood that your organization might be part of that number. I hope it wasn't, but it might be part of that number. And as we start to see this shift, it's a big it's a big change for a lot of organizations to have to consider this as part of their disaster recovery strategy.

00:01:58:23 - 00:02:20:25
Speaker 3
And a lot of them are using just backups to do that recovery. That's a big problem. When we have this, you know, shift of trying to recover quickly, they're only using backups to do that recovery. And so you have to wait for things to actually get pulled back in from backup storage into production. So it takes a really long time.

00:02:20:27 - 00:02:54:04
Speaker 3
High likelihood of attack. Very slow recovery process. Those things don't mesh. And we also see that the recovery itself isn't the same. Natural disasters happen quickly and then dissipate. Right. Tornado comes in. There was before the tornado. The tornado happened. We failed over. We clean up from the tornado. We go back, right? The tornado dissipates. But ransomware, a lot like mold from the rain, from the storm that gets into the walls, that starts to affect things and starts to have a, an impact on the health of the people that are in the house.

00:02:54:06 - 00:03:16:18
Speaker 3
And so to get rid of that, that's not something that's easy to do. They want to get in and infiltrate. They want to get in and take hold. And so what you'll see is attackers go laterally. They want to get in and they try to get into the Docker environment. They want to get into the platforms to actually have a to get in and stay there so that even if you do kick them out of production, they're still sitting there in Dr..

00:03:16:18 - 00:03:41:16
Speaker 3
And then when you recover, you're just bringing them back to life. This actually happens cyber cyber insurance and a lot of incident response teams will halt the recovery process until they can verify that the bad guys have been removed, that there's no attackers, that there's no foothold to gain. Because if you do all the recovery and the attackers are still there, they're just going to re encrypt everything and you don't have a third place to recover to.

00:03:41:17 - 00:04:04:03
Speaker 3
So how do you know that your space is clean. How do you know that that's what's going on. And how do you know that? That's a way the place where you can recover to and know that there's not going to be an attacker who has a base foothold there. The other thing we've seen is a lot of organizations want to change platforms, and that's definitely not the same when you're looking at a VMware shift to a hyperscale cloud.

00:04:04:06 - 00:04:34:02
Speaker 3
That's huge in terms of shifting the concept of an IT strategy and in terms of shifting a do your strategy as well. When you have a I'm utilizing, you know, zero to replicate my VMware VMs from one platform to another. That's pretty straightforward. VMs here, VMs there. We bring them up. Everybody's good. In a hyperscale environment, you start to utilize native services like Lambda or, Kubernetes services from those platforms where the cloud provider is managing that, that disaster recovery.

00:04:34:05 - 00:04:52:23
Speaker 3
How do you handle that? Do you need to make sure your data is available somewhere, or are you going to need to re spin up images somewhere? Do you need to have your repository available in multiple places for your images or your scripts, or where things run from where your code lives? That's a big, big shift from the way that you're doing things today.

00:04:52:25 - 00:05:12:21
Speaker 3
And so it requires you to re-architect how you do a recovery. And so that can also be a challenge and a lot of time consuming effort just to do the same things you're doing today. But then also, if you're shifting just your hypervisor platform from something like VMware to Nutanix and a lot of organizations, they're looking at that shift in that requires a capital purchase, right?

00:05:12:21 - 00:05:30:20
Speaker 3
I'm going from a three tier. These are my host. This is my storage. I'm going to a hyper converged platform. Well, now I need to have a second hyper converged platform so I can replicate from point A to point B. So how does that not only impact your Dr. strategy, but your platform shift strategy in order to do that?

00:05:30:22 - 00:05:56:12
Speaker 3
That's a lot of stuff to think about. But what we're also finding now is that the math isn't the same either. All of those licensing changes that we talked about, and this isn't just one particular vendor. A lot of things have moved from that perpetual license to that subscription licensing, and that means that you are going to have a lot of your costs tied into running things in a Docker environment that are never used.

00:05:56:14 - 00:06:13:06
Speaker 3
And so we actually did these numbers. I ran through these numbers. We're protecting 100 VMs and 30TB of storage. It's not a huge environment right. It's big. If you're a small environment. But for a lot of organizations who have 300 or 400 VMs, 100 VMs and 30TB of storage is maybe a percentage of their environment. Some people do.

00:06:13:06 - 00:06:38:24
Speaker 3
They have 400 VMs and they only protect 100 of them. So this is a fairly common thing that we see when we start talking to clients about Dr.. And doing that monthly yourself is somewhere between 15 and $16,000. And that is a large number for something you never hope to use. And this includes hardware, software, power, data center space and what a lot of organizations think they're doing to save money is they do, I call it hand-me-down hardware.

00:06:38:24 - 00:07:07:07
Speaker 3
I bought new stuff for production. I put the old gear into Dr.. And now I'm running it there just fine. But you're now on less performant hardware. And, for most of you who've been in it for a long time, you know that old hardware equals higher failure possibilities. So now you're putting things in place, more or less, to check a box that you have a place to recover to, but you're not really doing the recovery because we're a VMware pinnacle partner.

00:07:07:07 - 00:07:30:00
Speaker 3
We can do things like not charge for VMware licensing until you use it. That is a massive savings in terms of the amount that you're spending per month. And we can also do things like not charge for compute until you actually fail over. So that can also be a huge cost savings to you and your organization. So you really only focus on how much data I'm replicating over and not the data in the compute and otherwise.

00:07:30:02 - 00:07:46:25
Speaker 3
But the one that really sinks in is that most organizations do not consider the labor time, because at a certain point, somebody has to do the replication, somebody has to configure all that. Your second site, you have to patch and maintain and manage all that. You also have to make sure that your doctor plans are up to date.

00:07:47:01 - 00:08:03:24
Speaker 3
That requires somebody to think about that. As much as you know it's just part of the job. Yes, but it consumes time, just like patching does. I know that you're going to patch this month and next month of the month after that. That's all time that you have to consider. That is just baseline before you even get to.

00:08:03:28 - 00:08:22:05
Speaker 3
Here are the projects we're working on for ourselves. And also here are the projects we're working on for the business. Right. All of this time down here at the bottom of just keeping things going and keeping the lights on is what a lot of organizations just do not consider as part of the costs. But we do. And all of our costs include our services and our people's time.

00:08:22:07 - 00:08:46:16
Speaker 3
And when you include labor time, this gets worse. And so now you're talking about $145,000 to maintain all of this and to do the protection here that we can do for a fraction of that cost and include all the people time. And this is a big cost. That's $145,000. That's an easy, full, full time employee, if not one and a half to maybe even two in certain instances.

00:08:46:18 - 00:09:08:07
Speaker 3
That is a big savings that you can see from experience that allows you to actually focus on more important things, like keep it like, you know, making your business more agile or actually making sure that platform shift goes well, or making sure that the that you're eliminating the technical debt, those are all things that you can do because you have the time and capability to go do those things.

00:09:08:10 - 00:09:34:10
Speaker 3
But the, the, the bigger thing that we're seeing is that ransomware attacks require more than just replication. It requires a combination of multiple platforms. You have workload replication where we're able to land the workloads on the secondary compute. So I've now got a place to land workloads in a clean space, and I'm doing that constant replication. So I have the ability to recover faster.

00:09:34:12 - 00:09:53:21
Speaker 3
But I have this much shorter term. In fact, most of the platforms that you'll see the do replication things like zero, and Nutanix, Nutanix, the software has a journal period of a few days. Right. The idea is that you have a bunch of little granular recovery points, but it's very short, right? Because it's designed for a fast failover.

00:09:53:21 - 00:10:17:00
Speaker 3
I'm going back to a single point in time that's very recent, because I think, you know, the natural disaster things. So you can do failover and fail back because the amount of data that's changing between those individual points is very small. But if your attacker has been sitting there for more than a few days, like it did in the case of National Auto Care, when we talked to them, now my journal is a few days, but the attacker has been there for a month.

00:10:17:04 - 00:10:42:19
Speaker 3
They're in every single one of those recovery points. So to go into a data protection platform and be able to combine those two things together now I have the ability to land my data on secondary storage. It's a little bit of a slower recovery, although with our platform we can bring things up very quickly and do, instant recoveries and bring while the workload is running, we can start to move that back into production storage, but it gives us a longer recovery point window.

00:10:42:21 - 00:11:02:11
Speaker 3
Right? I have the ability to go out to a month or two months, depending on how you want to retain your data and have the ability to recover back in time to before the attacker got in, or back in time to before things went south. And then we can still do test and live failover with that data onto the same compute that the workload replication is doing.

00:11:02:13 - 00:11:23:08
Speaker 3
This allows us to recover from those long term infiltrations, and it also allows you to do things like test and dev resources, where I have all of my, you know, capabilities to do bring things up, like, hey, this is where I'm going to run on my test workloads, or I can recover into a live or I can do a test bubble where I can recover and do things, test against database schemas that you can't do in production.

00:11:23:11 - 00:11:49:20
Speaker 3
You can also do things like run secondary production workloads like Active Directory domain controllers, DNS server, SQL boxes that sit there and do application level failover but not the individual, not need to recover the whole, machine there. So there's a number of different ways that that works. The other thing that we can do there is ensure that business continuity just happens and goes into a clean space that we know is good.

00:11:49:21 - 00:12:09:12
Speaker 3
Our infrastructure is away from your infrastructure. So the only thing that an attacker could be in is the replicated workload. So we know that we're recovering into a known good space. And that allows your cybersecurity and your insurance people to calm down so that when you need to do a recovery, it happens very quickly. So what does this look like?

00:12:09:17 - 00:12:28:20
Speaker 3
We maintain the hardware hypervisor management plan, replication recovery plans and protection groups. But this is available in our East and West regions. This is a managed service from us and we deliver it with dedicated resources in per host or in our shared VMware environment. That's our enterprise cloud. We do it at a per gig of Ram cost, so we can do this in multiple different ways.

00:12:28:20 - 00:12:52:07
Speaker 3
If you're a smaller environment, you've got ten, 20, 30 VMs are shared. VMware environment can be a really cost effective method for doing that. And once you're above 75, VMs are dedicated. Private cloud environments can be an easy way to onboard and get a cluster that looks just like what you're doing on prem today, but do it in a secondary data center in a space that's already being fully managed, so you don't even have to think about it.

00:12:52:09 - 00:13:09:11
Speaker 3
And when we add in replication and management, that's a per VM cost that allows you to granularly assign what you're going to recover. So I have 400 VMs and I only want to protect 100. You're only paying for those 100. You're not trying to protect the entire world. So that allows you to be very granular and thoughtful about what you're doing.

00:13:09:13 - 00:13:29:15
Speaker 3
All of this is landing on all flash, high performance storage. We've done the calculations. All flash versus spinning disk cost wise. To us, is negligible. So we provide the better option because it's the better thing to do. All of this is done through single pane of glass management. You're logging into vCenter or you're logging into Prism Central to see those workloads in a Nutanix environment.

00:13:29:15 - 00:13:48:26
Speaker 3
We connect the two prism centrals together. So you see it across the board from a VMware side. You can see through the Xr2 interface what your two sides look like. Move your recovery plans around and point in that direction. And when we want to add in things like backup as a service, which is the correct thing to do for protecting against a ransomware attack, we have multiple options to do that as well.

00:13:48:26 - 00:14:09:06
Speaker 3
So we can do per tenant in a multi-tenant environment, which is where we see a lot of our clients landing because they're protecting their VMs. But we also have clients who are utilizing dedicated clusters where they want to take advantage of more of the feature set of our data protection platform for things like, antivirus scanning, or they want to have this dedicated environment that they can land even on premises.

00:14:09:09 - 00:14:29:19
Speaker 3
And we could do things like remote clusters, where that cluster runs on premises, protects your workload and replicates back to us, and can be utilized with our Docker platform to protect across the board. We take care of the entire platform. So you can focus on did my job. Is everything in here that is being protected that needs to be protected and not focus on okay, did my job run?

00:14:29:24 - 00:14:52:16
Speaker 3
Did we get the right update? Did we make sure that everything was interrupt the interrupt tested? We take care of all of that. But the biggest thing is that we protect the workloads across multiple levels. We can handle VMware workloads, Nutanix workloads, and even low priority test dev or longer term protection with data. With our cloud data protection platform having multiple RPO and RTL options to actually get you to where you want to go.

00:14:52:18 - 00:15:09:12
Speaker 3
But I think the biggest thing, and I know that we sometimes talk about this as part of it being a service, but really what it comes down to is you're protected by the best, Anthony Jackman, who you see up here, our chief innovation officer. He's been part of our, in top right corner there. He's been part of the webinars that I've done with him.

00:15:09:12 - 00:15:25:18
Speaker 3
Sharma degrees in the bottom left corner is our president, is a president of our board and one of the co-founders. We have been doing this for a very, very long time. We've assisted clients on hundreds of doctor tests just this year alone. I think we've done about 150 tests this year. And only four of them have been marked as failed.

00:15:25:18 - 00:15:51:09
Speaker 3
And most of those were based on. Hey, this VM didn't make it into this protection group or this, application didn't actually come up. The VM came up with the application didn't. This is a huge success rate for us. We want to ensure that you are successful and we will assist you with tests. As part of our premier care program, we've also successfully recovered clients from ransomware attacks so we can have things like, hey, this doctor and this doctor event that's happening for you.

00:15:51:10 - 00:16:08:25
Speaker 3
We sit on the phone with you at National Auto Care, want to keep coming back to you, but it's a really great story. We had somebody from our support team on the phone with them for 16 hours, helping them do recoveries. We got them back up in 48 hours, and most of that 48 hours. Was the FBI going, Allen's new and trying to make sure that all the attackers were out.

00:16:08:28 - 00:16:31:00
Speaker 3
We sit down and work with you as part of our delivery process to ensure that your networks, your workloads and recovery plans are all set to ensure it works right the first time. This is a big difference between running a cluster somewhere else and actually utilizing a service from a Docker provider. And with Premier Care, we'll even sit down with you and review your recovery plans in an ongoing fashion to ensure that new workloads get added.

00:16:31:02 - 00:16:46:21
Speaker 3
I have definitely been a part of multiple Docker tests where somebody went, hey, why didn't that application come up? Oh, because it didn't get out of the recovery crap. And then we have to write that down and then make sure that it gets in there. And every single time somebody would miss a VM or something, we get misconfigured.

00:16:46:22 - 00:17:02:09
Speaker 3
We make sure to sit down and kind of work with you to ensure, hey, do you know that all of those are in there? Are you keeping up with those because your environment's not static and neither is ours. So we want to make sure that everything is protected. What we do here is we simplify recovery with dress. We have we offer the hypervisor choice.

00:17:02:09 - 00:17:24:06
Speaker 3
So if you're switching platforms we have a Docker option to meet that need. But also DIY is no longer that cheapest option. And you can leverage our pinnacle partner status to save on those VMware licenses. Because we don't we don't have any cost until you failover. We can also offload the maintenance and support for something you hope to never use, and we can integrate with co-location for those physical workloads.

00:17:24:06 - 00:17:49:08
Speaker 3
So I have a physical, you know, VPN concentrator. I have physical database servers, things that are hard to do in a virtual environment. We can land that in co-location and directly connected in. It's been really, really powerful to do this for the last 15 years, and we continue to improve that in an ongoing fashion. But, if you have any questions up here, up top here, you can, click this scan this QR code to submit a question.

00:17:49:08 - 00:18:10:07
Speaker 3
We did get some really great questions in the pre-registration. So, I think we can bring up that first question on the, we'll we'll go through those ones. And if you ask questions, you can add those in, can you talk about network redundancy, redundancy in regards to the Experian data Center solution? So what we do is we have our 100 gig, connectivity from our data centers.

00:18:10:07 - 00:18:38:09
Speaker 3
We also offer multiple carrier options from our data centers for on prem, for on prem customers. So if you have a carrier that you're using, it's probably in our data centers. But as part of the the process of talking with us, we can go through and verify that that's the case. But the big thing that we do is we have the ability to do redundancy and full network failover between our data centers so we can fail over the external IPS and avoid DNS being a problem.

00:18:38:12 - 00:18:58:23
Speaker 3
We're having to rip in that space. The big thing that we're also seeing is that most clients, when they want to do a failover, the, actual platform doesn't handle that well. They didn't write the code in. They it's hard. There's a hard coded IP. So from a networking standpoint, we've been doing this for a very long time.

00:18:58:23 - 00:19:16:05
Speaker 3
And we work with you to make sure that everything is built out to a enterprise grade standard, but also in a redundant fashion, so that when you do that failover, it does work right. But we also have multiple network paths out of all of our data centers. And we build things to very, very, very high standards. So you also get to take advantage of that.

00:19:16:07 - 00:19:42:14
Speaker 3
Next question. How does experience a IVR solution compare against EC2? Well, EC2 is a product from Nutanix, for those of you who are not familiar, if you're not familiar with Nutanix, think VMC on AWS where you get a cluster it is running in Azure or AWS. And the big thing around that is it is a cluster that is deployed and then it's on you to maintain it from there.

00:19:42:17 - 00:20:00:05
Speaker 3
Right? So you have to do all the patching. You have to build out all the networking. You have to build all the recovery plans. You have to build out all of the, you have to build up the entire Docker strategy on that platform. So while it's great that you get a cluster, you have to do all the hard work, which is really what we're here to help with.

00:20:00:08 - 00:20:17:26
Speaker 3
I think the bigger side of this is it's also very expensive. You're running on hypervisor bare metal nodes. So that can be an incredibly high cost, and you are paying for them all the time. There's no way to shape or shave down. Excuse me. The, licensing that you are utilizing on that platform, you don't get to burst onto that platform.

00:20:17:26 - 00:20:37:11
Speaker 3
So you were paying for hot licensing all the time there. So that can be also very expensive. It is a good product in terms of being a cluster that you can use. But it's the difference is there is a cluster that you are replicating to versus a service that is helping you. And I think that's the biggest difference of all.

00:20:37:14 - 00:21:02:22
Speaker 3
What is the best scenario to test during a Docker exercise? I think the one that I've been talking about here for a while of ransomware and understanding that I think is the best one to test because it's a combination of doing an active failover with your replication technology of choice, but also recovering from backups in certain instances, testing that you can go back and testing both of those things because most people don't test their backups.

00:21:02:22 - 00:21:21:29
Speaker 3
That's another fun thing that most people don't do. And doing like test restores, but being able to say that, you know, that you can do a failover and you know, you can do a recovery from backup means that you have a very high confidence level, that in the event of a disaster or an attack, that you will be able to get things back up and running.

00:21:21:29 - 00:21:40:18
Speaker 3
So I think doing a test like a ransomware scenario provides the opportunity to do that versus just one where it's, hey, a tornado came through. The one that everybody always loves to use is production is a creator, and you can't talk to it anymore because you are starting the process of getting people to not think, I can go back to production to get something.

00:21:40:20 - 00:22:02:12
Speaker 3
The whole idea was the production is gone. In this particular instance, yeah, production still there, but it's very compromised. And so to not infect things, we have to keep things clean. And so that's really the scenario to test there as well, just to keep people in mind of like, hey, maybe production is still sitting there and it's still available technically, but you need to not touch it.

00:22:02:14 - 00:22:19:00
Speaker 3
And I think we have one last question here. Do you have any tools to aid in consulting with enterprises on their Docker planning process? The answer is yes. That's our pre-sales process, actually. A lot of what we do is we sit down with you to understand what are the workloads you want to replicate. How do you want to do that?

00:22:19:07 - 00:22:34:07
Speaker 3
How quickly do you want to be back up and running? What are your RPOs? What are your RTOs. Because it's not just here's resources. You go figure it out. It's us delivering this as a service. And so that requires us to do a little bit more of a deep dive that requires us to sit down and help you through that.

00:22:34:09 - 00:22:48:02
Speaker 3
We have done Dr.. Assessments. We can do some deep assessments now, but the big thing that we want to do is sit down with you and help you understand where are your pain points now? And I think a lot of people who have gone through a Docker test before, either they've never tested before. That's pain point number one.

00:22:48:02 - 00:23:06:03
Speaker 3
You just haven't done it before, or two, you've done Docker test. But this is the problem. It takes too long to get resources to get back up and running, or we are not sure that our backups are able to go or able to recover fast enough, or we're only recovering from backups and every time it takes us forever, we want to go faster.

00:23:06:10 - 00:23:23:18
Speaker 3
So you generally have an idea of what the problem is that you're trying to solve. So I would say that that's really come talk to us and we'll help you figure out where your Docker strategy needs to go. But with that, I do want to offer, scan this QR code right here to get the Gartner Market Guide for drivers.

00:23:23:21 - 00:23:40:24
Speaker 3
This talks a lot about device providers. We are in there. We are marked as one of the leaders because we're really good at what we do. Right. We've been doing this for 15 years. We are actively working on making it better every single day. And we are really here to help people solve problems. We're not here to just sell a product.

00:23:40:24 - 00:24:02:17
Speaker 3
It's very easy to do that when we're trying to solve problems for people. We want to take the time, understand it, and really be with you every step of the way. And that is a huge thing for us. And with that, I want to invite you all to our December 12th, webinar where we're going to do a look back at 2024, which has been multiple years in one year.

00:24:02:20 - 00:24:18:11
Speaker 3
It seems the January was literally decades ago. But it was really just the one year, so we're gonna look back at 2024 with our CEO, Brian Smith, and our chief innovation officer, Anthony Jackman, to see kind of what happened this year. And where do we stand today and what do we see going forward into 2025?

00:24:18:13 - 00:24:39:00
Speaker 3
It's going to be really great. I'm very excited about this because it's it's always good to do some reflection on what was happening and where do we see things going and trying to sort of reset and catch our breath and move forward. So I hope to see you next month. Again, you can scan right here onto the QR code to get that market guide for, drivers from our friends at Gartner.

00:24:39:02 - 00:24:42:10
Speaker 3
And we will see you next month. Thanks, everybody.

Creators and Guests

Host

AJ Kuftic

AJ Kuftic is Principal Product Strategist for Expedient. AJ has over 15 years of experience as a customer and partner helping end users build solutions that are sustainable and easy to manage. Having knowledge across various silos of IT infrastructure gives AJ a unique perspective of the pain points and what customers are looking to improve. When AJ isn’t thinking about the next big thing, he spends his time with his wife and 2 children trying to bake the perfect loaf of bread.

Your Disaster Recovery Doesn't Have To Be A Disaster

Broadcast by

Creators and Guests

headphones Listen Anywhere

Listen Anywhere