VMware Cloud Disaster Recovery – Advanced Solutions Design

Previously I talked about VCDR (VMware Cloud Disaster Disaster Recovery) Solutions Validation, and how to properly run a proof of concept.  The reality is that while it would be nice if all applications were as straightforward as our wordpress example, there are often far more complex applications requiring far more advanced designs for a disaster recovery plan to be successful.

External Network Dependencies

For many applications, even in modern datacenters, external network dependencies, or even Virtual Machines which are too large for traditional replication solutions can create challenges when creating disaster recovery plans.

To solve for this, 3rd party partners may be used to provide array or other host based replication services.  This solution requires far more effort on the part of the managed services partner, or the DR Admin.  Since the physical or too large for VCDR workloads cannot be orchestrated through the traditional SaaS orchestrator, there is an additional requirement to both test, and manually failing over.  A Layer 2 VPN between the VMware Cloud Environment and the partner location provides the connectivity for the applications running in both environments.

Script VM

For the more complex VM only environments, some scripts may need to be run during the test and recovery phases.Similar to bootstrap scripts for operating system provisioning, the scripts may be used for basic or even more advanced configuration changes.

The script VM should be an administrative or management server which is either the first VM started in the test recovery, or production recovery plans.  Separate VMs may be designated for testing versus production recovery as well, enabling further isolated testing one of the biggest value propositions of the solution.  Further specific documentation is located here, Configure Script VM.

Extending into VMC on AWS

WIth more complex architectures, it often makes sense to consider a mixed DR scenario.  In these cases, moving some of the applications to the VMC on AWS environment to run permanently, or leveraging another method for replicating outside the traditional VCDR SaaS orchestrator may be warranted.  While this does present some risk again since this is not tested with the rest of the VCDR environment, it does provide for additional options.  

With the recent addition of Cloud to Cloud DR, more options were made available for complex disaster recovery solutions.  Once an environment has been migrated full time to VMC on AWS, VCDR can be leveraged as a cost effective backup solution between reasons without a need to refactor applications.

Even in advanced DR scenarios, the VCDR solution is one of the more cost effective and user friendly available.  With the simplicity of the VMC on AWS cloud based interface, and policy based protection and recovery plans, even more complex environments can take advantage of the automated testing and low management overhead.  The best and most impactful DR solution is the one which is tested and which will successful recover in the event it is needed.

VMware Cloud Disaster Recovery – Advanced Solutions Design

VMware Cloud Disaster Recovery – Solution Validation

In the previous post, I talked about the VCDR (VMware Cloud Disaster Recovery) solution overview.  The best way to determine if a solution is the right fit is to validate the solution in a similar environment to where it might run in production. For this post, the focus is on validating the solution and ensuring a successful POC (Proof of Concept) or Pilot.  As always, please contact me or your local VMware team if you would like to hear more, or have an opportunity to test this out in your environment.

Successful Validation Plans

The key to any successful validation plan is proper planning.  For testing, it is always best to look for 2-3 use cases at most to be tested.  In the case of VCDR, generally the following tests make the most sense.

  • Successfully recover one or two windows or Linux web servers – Web servers are usually fairly simple to test initially.  Linux servers are generally faster to build and the license is open source making for a good test case.
  • Successfully recover a 3-tier app – Often time using two to three Linux VM’s running a Web Server, an App Server, and a Database Server, something such as WordPress or similar running on Linux, is often a good candidate since it is simple to setup and makes for a set of virtual machines, which must be connected or the app will not work properly.
  • An addition or an alternative for the 3-tier app would be any similar internal application which is a copy of production or a development system which could be leveraged for testing.

The purpose of the test is to demonstrate replication of the virtual machines into the DR (Disaster Recovery) environment; the actual application is less relevant than validating the functionality of the solution.

Setting up the “on premises” environment

It is critical for a POC to never connect production. POC’s are very much meant to be a demonstration of how things might work within a lab. The POC environment is for a finite period, typically 14 days or less, just enough to demonstrate a few simple tests.

The Lab setup for this should be very simple. A single vSphere host, or a very small isolated cluster will suffice, with a vCenter instance and the test application installed. A small OVA will be installed in the environment as a part of the POC so there should be sufficient capacity for that as well.

One of the most critical prerequisites to be addressed before beginning is the network connectivity. For most POC’s it is recommended to use a Route based VPN connection to isolate traffic, although policy based could work.  This will generally require engaging the network and firewall teams to prepare the environment.

Protecting the test workloads

The test cases above should be agreed upon.  The following is a formalized test plan that will be included in the POC test document.

  1. Demonstrate application protection and DR testing.  This will be accomplished by the following.
    1. Protect a single 3 tier application such as WordPress or similar from the lab environment into the VCDR environment.
    2. Complete up to 2 Full DR Recovery Tests and demonstrate the application running in the VCDR Pilot Light Environment.

The POC is very straight forward.  Simply deploy the VCDR Connector OVA to your lab vCenter, register the lab vCenter with the VCDR environment, and create the first protection group. 

In the case of a POC, there will only be a single protection group.  We will add the three WordPress virtual machines to our demo using a pattern name based on how we named them.

Creating a DR Plan requires mapping the protected site, the lab in this case, resources to resources in our cloud DR environment.  

A key decision point is in the virtual network.  You can choose to use the same network mappings for failover and testing.  In the POC we can use the same networks, but for a production deployment we want to ensure they are separate so we can run our tests in a bubble without impacting production workloads.

Once we are all set up, the last thing to do is replicate the protection group, and then we can run our failover testing into the VMC on AWS nodes connected as VCDR Pilot Light nodes.

While this is fairly straightforward, the key to any successful POC is to have very specific success criteria.  Be sure to understand what you want to test and how you will show a successful outcome.  Provided the 3-tiered app model fits your business model, this is a great use case to start with to validate the solution and get some hands on experience.  For more hands on experience, check out our Hands on Lab, https://docs.hol.vmware.com/HOL-2021/hol-2193-01-ism_html_en/, and be sure to come back for more VMC on AWS as we continue to look at the direction the cloud continues to go and the future of VMware. 

VMware Cloud Disaster Recovery – Solution Validation

VMware Cloud Disaster Recovery – Solution Overview

In November of 2020, I changed roles at VMware to join the VMware Cloud on AWS team as a Cloud Solutions Architect.  Going forward, I intend to work on a few posts related to the VMware Cloud product set, and cloud architecture.  I am a perpetual learner, so this is my way of sharing what I am working on. I welcome comments and feedback as I share. Many of the graphics in this post were taken from the VMware VCDR Documentation.

To start with I wanted to focus on VCDR (VMware Cloud Disaster Recovery), based on the Datrium acquisition.  To be clear, this is not marketing fluff, this is not a corporate perspective, this is my personal opinion based on the time I have spent working with VMware customers on the VCDR product.  I promise you this may sound like marketecture, but this article lays the important foundation for the next several.

The Problem

DR (Disaster Recovery) is not an exciting topic.  It is basically the equivalent to buying life insurance; you know you should do it, but usually it is a low priority, until it isn’t.  We often think of disasters as fire, flood, earthquake, and other natural disasters, but recently malware has become the largest problem requiring a good DR plan.  

When a system, or systems, are compromised, it is likely that not only the file systems are compromised, but also the backups, mount points, and even the DR location.  Prevention is the best way to solve this issue, but assuming you are attacked, a good DR plan is critical to restoring services quickly and securely.  

The Overview

VCDR uniquely solves this problem with immutable (read only/unchangeable) backups and continuous automated compliance checking.   

The biggest challenge is that when we do backups we are generally appending changed blocks which makes for a far more efficient backup solution. This lowers the cost, and the time to backup while still providing a point in time recovery solution. When the backup is compromised, the best case in this option is to go back to a time before there was malware, assuming it is not infected somehow.

VCDR solves this problem by creating an immutable point in time copy of the data. Since each point in time copy is isolated from the others, malware cannot infect the previous points. The system can then pull together all the partial backups to make what appears to be a full backup at any given point. Since this is all being handled as a service, recovery is near instant, and the recovery admin can recover from as many points as needed to find the best point to restore to. 

As a Service

The promise of everything as a service seems like a great idea, but in practice it can create some challenges. It requires that we trust the service, and that we regularly test the service. VCDR is no exception. Because this is a part of the VMware Cloud portfolio, this enables adjacency to other VMware Cloud services, in particular VMware Cloud on AWS. Leveraging the Pilot Light service, some applications which are critical for recovery can be recovered directly to a cloud based service while the less critical services can be brought back online in the Datacenter once the problems are mitigated.

By providing a warm DR location, the costs are significantly mitigated, and by using the “as a service” model, many of the lower value tasks such as patching and server management are handled by the service owners, VMware in this case.

Some cool details

Aside from the immutable backups, the SaaS orchestrator, and the Scale-out Cloud File System provide a significant edge for many users. The SaaS orchestrator provides a simple web interface to configure production groups.  Setting the protection groups by name patterns, or exclusion lists gives DR admins a simple setup, and no need to recover an onsite system, or log into a new site before doing recovery.  

The Scale-out Cloud File System is simply an object store which provides for far greater scale, as the name implies. For instant power on to test virtual machines, this cloud based file system will mitigate the need for the additional configuration during a declared disaster. Once the appropriate recovery point is identified, simply migrate the powered on Virtual Machine back to the datacenter, or run it in the Pilot Light environment in VMC on AWS while the host is being prepared to receive the recovered VM.

Moving forward I will explore test cases for the VCDR service, where it fits within the Backup, Site Recovery Manager/Site Recovery service continuum, and even dig into the VMware Cloud on AWS services. 

VMware Cloud Disaster Recovery – Solution Overview

Installing VMware Tanzu Basic on vSphere 7

With VMware Tanzu becoming more critical to the VMware Strategy, I thought I would see what it is like to install in my lab with out any experience on this specific product. I plan to write a few more posts about the experience and how it relates to VMware Cloud strategy. As a disclaimer this was done with nested virtualization, so this is not a performance test. William Lam wrote a post on an automated deployment, but I wanted to have a better understanding to share. To get myself started I watched Cormac Hogan’s video on the implementation.

Assuming the prerequisites are met which is covered in several youtube videos and other blogs, start with selecting “Workload Management” from the main menu in the vCenter web client. The initial choice is allows you to select NSX-T, if installed, or you will need to use HAProxy for the vCenter Network.

On the next screen, select your cluster, next and then choose your control plane size. For a lab deployment Tiny should suffice, depending on how many workloads to be deployed in the environment. On the next screen choose your storage policy, in my lab I am using vSAN to simplify things.

For the Load Balancer section, you just need to give it a name, something simple works and select HAProxy as the type. The Data Plane API Address is the IP of the HAProxy you setup, with a default port of 5556. Put in the username and password you put in when setting up HAProxy. The Virtual IP Address Range you should pick something in the workload network, separate from the management network and something not in the DHCP scope.

In the “Server Certificate Authority” you will need to SSH into the HAProxy VM, and copy the output of “cat /etc/haproxy/ca.crt” into the field.

In the workload management section, select the management network being used for the deployment. Input the network information including the start of the IP range you want to use.

Under the workload network, select your workload network and fill in the information. This should be on a separate broadcast domain from the management network.

For the Service Network pick something that is not conflicting with your existing networks, and at least a /23 range. Add your workload network from the previous screen below.

Finally select the content library you should have subscribed to already, and finish. It will take some time to provision and you can then provision k8s workloads natively in the vSphere environment.

A couple thoughts on this, the install wasn’t too bad, but it did take a while to understand the networking configuration, and setup everything correctly. I had also assumed this would be a little more like VMware Integrated Containers. While I have some understanding of deploying workloads through k8s, installing it is a bit more learning. The next steps for me are go through the deployment a few more times, and then start testing out some workloads running.

For those of us coming from the Infrastructure side of things, this is going to be a great learning opportunity, and if you are up for the challenge, Kube Academy is an exceptional, and no cost resource to learn from the experts. For those who do not have a home lab to work with, VMware also offers a Hands on Lab, for vSphere with Tanzu at no charge as well.

Installing VMware Tanzu Basic on vSphere 7

The strange and exciting world of the Internet of Things

For some time I have been passionate about connected devices, home automation, and digging deeper into emerging technologies. If you have read my previous posts, I tend to share what I am passionate about. As I return to blogging I will be writing about the Internet of Things, connected devices, and more on home automation. I plan to write simultaneously about Enterprise IoT and Home Automation.

The focus of the first Enterprise IoT series will be on VMware’s VMware’s Pulse IoT Center 2.0. Much of this builds on the exceptional work of a friend and colleague, Ken Osborne. Please take a look at his work here, http://iotken.com/.

From the home automation front, I have a number of updates, and will demonstrate building out a better home automation experience, and further document my families experience interacting with connected devices.

Join me as I explore the strange and exciting world of connected devices and learn with me about where the future of technology seems to be leading us.

The strange and exciting world of the Internet of Things

Who moved my VMware C# Client?

Years ago I was handed a rack of HP servers, a small EMC storage array, and a few CDs with something called ESX 2 on them. I was told I could use this software to put several virtual servers on the handful of physical servers I had available to me. There was a limited web client, available, most of my time was spent on the command line over SSH. The documentation was limited, I spent most of my time writing procedures for the company I was at, quickly earning my self a promotion, and a new role as a storage engineer.

Today VMware is announcing that the next release of the vSphere product line will deprecate the C# client in favor of the web client. As I have gone through this process, both as a vExpert and a VMware employee, there have been many questions. During our pre-announcement call with the product team at VMware, there were a number of concerns voiced about what will work on day 1 and what this does to the customers who have come to rely on performance. Rather than focus on the actual changes, most of which are still to be determined, it seemed more helpful to talk about the future of managing systems, and the future of operations.

george

When I started working on server administration, the number of systems one admin might manage was pretty low, maybe less than a dozen. With the advent of virtualization and cloud native applications, devops and no-ops, administrators are managing farms of servers, most of them virtual. We often hear about pets vs. cattle, the concept that most of our servers are moving from being pets, something we care for as a part of our family, to cattle, something we use to make money, if one of our cattle have a problem, we don’t spend too much time on it, we have many others, we can just make more.

Whether it is a VMware product, Openstack, or another management tool, abstracting deployment and management of systems is becoming more mainstream, and more cost effective. In this model, a management client is far less important than APIs and the full stack management they can enable. For the few use cases where the client is needed, the web client will continue to improve, but the true value is these improvements will drive new APIs and new tools developed for managing systems. While change is never easy, a longer term view both where we came from, and where we are going with the interfaces reminds us this is a necessary change, and less impactful than it may seem at first glance.

Who moved my VMware C# Client?

What is Dell really buying?

Standard disclaimer, this is my personal opinions, and does not reflect those of my employer, or of any insider knowledge, take it for what it is worth.

When I heard rumors of the Dell EMC deal, I was pretty skeptical.  I am a numbers guy, and the amount of debt that would be required is a bit staggering.  Why would a company like Dell even want to acquire a company like EMC?  Especially after we all watched the pain they went through to take the company private.  Why would EMC want to go through the pain of being taken private, by a former competitor no less?  With the HP breakup, and IBM selling off a number of their product lines over the past decade or so, this almost seems counterintuitive, an attempt to recreate the big tech companies of the 90’s & 2000’s which are all but gone.

Sales and Engineering Talent

I have many friends at Dell, I was even a customer when I worked for some small startups many years ago.  In my experience, Dell is really good at putting together commodity products, and pricing them to move.  Their sales teams are good, but the compensation model makes them tough to partner with.

EMC has a world class sales and marketing organization.  EMC enterprise sales reps are all about the customer experience.  They are machines with amazing relationship skills, and they are well taken care of.  Engineering at EMC is a huge priority as well.  EMC’s higher end support offerings, while costly, are worth every penny.  I have seen them fly in engineers for some larger customers to fix problems.  EMC products are all about the customer experience.  Even though I have not been a fan of their hardware lately, they have done some amazing things around making the experience second to none.

An Enterprise Storage & Software product

Let’s be honest, Dell has not been a truly enterprise player in the storage and software arena.  If we look at the products they have acquired, a majority of them are mid market plays.  Compellent was supposed to be their big enterprise storage play, but that is mid market at best.  From a software perspective, most of the products are low end, and they don’t tend to develop them further.

EMC on the other hand has enterprise class storage.  Say what you want about the complexity of the VMAX line, it is pretty solid.  It may be a pain to manage sometimes, but it does set the standard in enterprise storage.  EMC has also done amazing things with software.  ViPR Controller and ViPR SRM are impressive technologies when implemented appropriately.  EMC has also done quite well with some of their other software products, but more so they treat software as a critical part of the stack.

VMware

Enough said, the real value for Dell is getting a good stake in VMware.  Like it or not VMware is the market leader in Hypervisors, Cloud Management, Software Defined Networking, and making incredible strides in Automation, and Software Defined Storage.  The best thing that EMC has done is allowing VMware to continue to be independant.  If Dell can stick to that plan, the rewards can be incredible.

The reality is this deal won’t change much in the short term from an IT industry perspective.  Large storage companies such as EMC and HP Storage are getting their lunch eaten by smaller more agile storage startups.  Servers are becoming more of a commodity, and software continues to be the path forward for many enterprises.  This is a good deal for both Dell and EMC, the challenge will be not to go the way of HP.  If I could give Michael Dell one piece of advice, it would be to hire smart people and listen to them.  Culture matters and the culture is what makes EMC and VMware what they are so don’t try to change it.  Culture is the true value of this acquisition.

What is Dell really buying?

VMware certification framework, long walks on the beach, teddy bears…

I am not a VMware administrator. I know this may come as a surprise to some of you, but I occasionally have to look things up in the docs, or hunt in the GUI for where some alert may be burried. I spend a fair amount of time in the labs, honing my skills, but it is generally for the purpose of understanding more about the products I represent, and being able to speak intelligently about them. I have been a VCP-DV since 2010, but I have worked on VMware ESX since 2006. I consulted on and wrote a great deal of documentation and many designs prior to becoming a VCP. I have the utmost respect for the certification, knowing how challanging it was for me, and how I continue to struggle with the exam itself, largely because I need to be better disciplined about sitting down and studying for it, and because I am not dealing with it as an administrator.

That being said, I am not a fan of the current certification process for VMware. I have brought this up to the education services team as well as those in the community, and we have seen some changes, but I think we need to see more. Looking at other vendors, EMC or HP for example, certifications are geared at career paths. As a consultant I obtained the EMC Technical Architect certificaiton on the VNX products. It was challanging, and required three exams, but it was very focused on design with some interface and hands on knowledge required, but for the most part, it was around design principals which are specific to the product with some general design principals. HP’s storage architecture certification was similar, very focused on good design and solid product knowledge.

The main thing that differentiated these from the VMware certification process was the seperation of an architect track from the implimentation and engineering tracks. It is important for a architect to be able to understand the admin and engineering functions, VMware’s entry point with a very specific administration exam is counter intuitive. Continuing on with the new VCIX certification, formerly VCAP, requiring an implimentaiton exam again seems to be a bit off.

In my opinion, VMware Education should look at seperating out the tracks, and changing some of the course work to reflect this. By forcing everyone up a single path the value of the lower certifications are diluted, as it becomes a core requirement for many companies. That being said, I think that there should be some cross over one each exam and in each course. We need to drive more people to a higher level. I will also say that the addition of the Network Virtualization track to the others is refreshing, I am excited to see that we are growing the education and certification tracks, but there needs to be more clarity and better paths to get more advanced certifications.

One final thought I would leave you with, certifications are not the end all be all, much like education. I hold a BS and an MBA. The first thing I learned when I finished those is that my learning had just begun. As IT professionals it is incumbent upon us to continuously learn, grow, and improve. Versioning certifications is a necessary evil to make sure we are keeping up with our learning, but it falls to each of us to make sure we are pushing ourselves to learn, to seek out mentors, and to grow our own career.

VMware certification framework, long walks on the beach, teddy bears…

VMware User Groups: Not for just for VMware employees and Vendors

A couple weeks ago we had our annual Portland VMUG User Conference.  First of all, big Kudos to the local VMUG leaders, and a big thank to the VMUG National Headquarters, and the vendors who sponsored us.  A reoccurring theme with all of these events I participate in is the number of vendors and VMware employees presenting.  I say this not to be critical but to encourage a different mindset.  I am hesitant to say this, because I love getting up in front of the VMware users and talking about what we are doing, and getting their feedback and questions.  That is one of my favorite parts about being here is talking to our customers.

Something which has made the rounds with the usual suspects is the concept of mentoring customers to speak at the VMUGs.  Mike Laverick wrote this article last year, and I think we need to keep pushing this concept forward.  The VMUG has a program called Feed Forward, to make this a reality.  Now I am not the foremost expert on presenting, but the VMUG is something I consider personally important to me, especially in Portland.  I have been a member for 4 years now, and I have been presenting for 2-3 of those years as a partner and VMware employee.  I have met more cool people, and had more amazing conversations through the process.

The VMUG is not about me, it is not about vendors, it is absolutely all about the customer.  It does very little good to have our partners and employees present every session.  Of course there are some customers who do present, but as a VMUG member, and someone who cares deeply for what we do, I would encourage you to get out there and speak up and get involved.  There are literally hundreds of us who are willing to help you and encourage you.  Most of us are not perfect presenters, but we just want you to be successful.  I encourage you to start small, but let us help you start being more involved and grow your personal brand at your local VMUG.

VMware User Groups: Not for just for VMware employees and Vendors

The universe is big. It’s vast and complicated and ridiculous.

As I was meeting with a customer recently, we got onto the topic of workload portability. It was interesting, we were discussing the various cloud providers, AWS, Azure, and VMware’s vCloud Air, primarily, and how could they, a VMware shop, move workloads in and out of various cloud providers.

Most industry analysts, and those of us on the front lines trying to make this all work, or help our customers make it work, will agree that we are in a transition phase. Many people smarter than I have talked at length about how virtualization and infrastructure as a service is a bridge to get us to a new way of application development and delivery, one where all applications are delivered from the cloud, and where development is constant and iterative. Imagine patch Tuesday every hour every day…

So how do we get there? Well if virtualization is simply a bridge, that begs the question of portability of workloads, virtual machines in this case. Looking at the problem objectively, we have started down that path previously with the Open Virtualization Format (OVF), but that requires a powered off Virtual Machine which is then exported, copied, and then imported to the new system which creates the proper format as part of the import process. But why can’t we just live migrate workloads without downtime between disparate hypervisors and clouds?

From my perspective the answer is simple, it is coming, it has to, but the vendors will hold out as long as they can. For some companies, the hypervisor battle is still waging. I think it is safe to say we are seeing the commoditization of the hypervisor. As we look at VMware’s products, they are moving from being a hypervisor company, again nothing insider here, just review the expansion into cloud management, network and storage virtualization, application delivery, and so much more, but more and more they are able to manage other vendors hypervisors. We are seeing more focus on “Cloud Management Platforms”, and everyone wants to manage any hypervisor. It has to follow then that some standards emerge around the hypervisor, virtual hard drives, the whole stack so we can start moving within our own datacenters.

This does seem counter intuitive, but if we put this into perspective, there is very little advantage in consolidation at this point. Most companies are as consolidated as they will get, we are now just working to get many of them to the final 10% or so. It is rare to find a company who is not virtualizing production workloads now, so now we need to look at what is next. Standards must prevail as they have in the physical compute, network, and storage platforms. This doesn’t negate the value of the hypervisor, but it does provide for choice, and differentiation around features and support.

I don’t suspect we will see this happen anytime soon, but it begs the question of why not? It would seem to be the logical progression.

The universe is big. It’s vast and complicated and ridiculous.