VMware Cloud Disaster Recovery – Advanced Solutions Design

Previously I talked about VCDR (VMware Cloud Disaster Disaster Recovery) Solutions Validation, and how to properly run a proof of concept.  The reality is that while it would be nice if all applications were as straightforward as our wordpress example, there are often far more complex applications requiring far more advanced designs for a disaster recovery plan to be successful.

External Network Dependencies

For many applications, even in modern datacenters, external network dependencies, or even Virtual Machines which are too large for traditional replication solutions can create challenges when creating disaster recovery plans.

To solve for this, 3rd party partners may be used to provide array or other host based replication services.  This solution requires far more effort on the part of the managed services partner, or the DR Admin.  Since the physical or too large for VCDR workloads cannot be orchestrated through the traditional SaaS orchestrator, there is an additional requirement to both test, and manually failing over.  A Layer 2 VPN between the VMware Cloud Environment and the partner location provides the connectivity for the applications running in both environments.

Script VM

For the more complex VM only environments, some scripts may need to be run during the test and recovery phases.Similar to bootstrap scripts for operating system provisioning, the scripts may be used for basic or even more advanced configuration changes.

The script VM should be an administrative or management server which is either the first VM started in the test recovery, or production recovery plans.  Separate VMs may be designated for testing versus production recovery as well, enabling further isolated testing one of the biggest value propositions of the solution.  Further specific documentation is located here, Configure Script VM.

Extending into VMC on AWS

WIth more complex architectures, it often makes sense to consider a mixed DR scenario.  In these cases, moving some of the applications to the VMC on AWS environment to run permanently, or leveraging another method for replicating outside the traditional VCDR SaaS orchestrator may be warranted.  While this does present some risk again since this is not tested with the rest of the VCDR environment, it does provide for additional options.  

With the recent addition of Cloud to Cloud DR, more options were made available for complex disaster recovery solutions.  Once an environment has been migrated full time to VMC on AWS, VCDR can be leveraged as a cost effective backup solution between reasons without a need to refactor applications.

Even in advanced DR scenarios, the VCDR solution is one of the more cost effective and user friendly available.  With the simplicity of the VMC on AWS cloud based interface, and policy based protection and recovery plans, even more complex environments can take advantage of the automated testing and low management overhead.  The best and most impactful DR solution is the one which is tested and which will successful recover in the event it is needed.

VMware Cloud Disaster Recovery – Advanced Solutions Design

Installing VMware Tanzu Basic on vSphere 7

With VMware Tanzu becoming more critical to the VMware Strategy, I thought I would see what it is like to install in my lab with out any experience on this specific product. I plan to write a few more posts about the experience and how it relates to VMware Cloud strategy. As a disclaimer this was done with nested virtualization, so this is not a performance test. William Lam wrote a post on an automated deployment, but I wanted to have a better understanding to share. To get myself started I watched Cormac Hogan’s video on the implementation.

Assuming the prerequisites are met which is covered in several youtube videos and other blogs, start with selecting “Workload Management” from the main menu in the vCenter web client. The initial choice is allows you to select NSX-T, if installed, or you will need to use HAProxy for the vCenter Network.

On the next screen, select your cluster, next and then choose your control plane size. For a lab deployment Tiny should suffice, depending on how many workloads to be deployed in the environment. On the next screen choose your storage policy, in my lab I am using vSAN to simplify things.

For the Load Balancer section, you just need to give it a name, something simple works and select HAProxy as the type. The Data Plane API Address is the IP of the HAProxy you setup, with a default port of 5556. Put in the username and password you put in when setting up HAProxy. The Virtual IP Address Range you should pick something in the workload network, separate from the management network and something not in the DHCP scope.

In the “Server Certificate Authority” you will need to SSH into the HAProxy VM, and copy the output of “cat /etc/haproxy/ca.crt” into the field.

In the workload management section, select the management network being used for the deployment. Input the network information including the start of the IP range you want to use.

Under the workload network, select your workload network and fill in the information. This should be on a separate broadcast domain from the management network.

For the Service Network pick something that is not conflicting with your existing networks, and at least a /23 range. Add your workload network from the previous screen below.

Finally select the content library you should have subscribed to already, and finish. It will take some time to provision and you can then provision k8s workloads natively in the vSphere environment.

A couple thoughts on this, the install wasn’t too bad, but it did take a while to understand the networking configuration, and setup everything correctly. I had also assumed this would be a little more like VMware Integrated Containers. While I have some understanding of deploying workloads through k8s, installing it is a bit more learning. The next steps for me are go through the deployment a few more times, and then start testing out some workloads running.

For those of us coming from the Infrastructure side of things, this is going to be a great learning opportunity, and if you are up for the challenge, Kube Academy is an exceptional, and no cost resource to learn from the experts. For those who do not have a home lab to work with, VMware also offers a Hands on Lab, for vSphere with Tanzu at no charge as well.

Installing VMware Tanzu Basic on vSphere 7

To home lab or not to home lab

As I often do, I am again debating my need for a home lab.  My job is highly technical, to take technology architecture and tie it all together with the strategic goals of my customers.  Keeping my technical skills up to date is a full time job in and of itself, and begs the question, should I build out a home lab, or are my cloud based labs sufficient.

One of the perks to working at a large company is the ability to use our internal lab systems.  This can also include my laptop with VMware Workstation or Fusion product which affords some limited testing capabilities, mostly due to memory constraints.  Most of the places I have been have had great internal labs, demo gear, etc, which has been nice.  I have often maintained my own equipment as well, but to what end.  Keeping the equipment up to date becomes a full time job, and adds little value to my daily job.

With the competition in cloud providers, many providers will provide low or no cost environments for testing.  While this is not always ideal, for the most part, we are now able to run nested virtual systems, testing various hypervisors, and other solutions.  Many companies are now providing virtual appliance based products which enable us to stay fairly up to date.

Of course one of my favorites is VMware’s Hands on Labs.  In fairness I am a bit biased, working at VMware, and with the hands on labs team as often as I can.  Since a large majority of what I do centers around VMware’s technology, I will often run through the labs myself to stay sharp on the technology.

While the home lab will always have a special place in my heart, and while I am growing a rather large collection of raspberry pi devices, I think my home lab will be limited to smaller lower power devices for IoT testing for the moment.  While always subject to change, it is tough to justify the capital expenditure when there are so many good alternatives.

To home lab or not to home lab

Enterprise Architecture – When is good enough, good enough?

In a conversation with a large customer recently we were discussions their enterprise architecture.  A new CIO had come in and wants to move them to a converged infrastructure.  I digging into what their environment was going to look like as they migrated, and why they wanted to make that move.  It came down to a good enough design versus maximizing hardware efficiency.  Rather than trying to squeeze every bit of efficiency out of the systems, they were looking at how could they deploy a standard, and get a high degree of efficiency but the focus was more on time to market with new features.

My first foray into enterprise architecture was early in my career at a casino.  I moved from a DBA role to a storage engineer position vacated by my new manager.  I spend most of my time designing for performance to resolve poorly coded applications.  As applications improved, I started to push application teams and vendors to fix the code on their side.  As I started to work on the virtualization infrastructure design for this and other companies, I took pride in driving CPU and memory as hard as I could.  Getting as close to maxing out the systems while providing enough overhead for failover.  We kept putting more and more virtual systems into fewer and fewer servers.  In hindsight we spent far more time designing, deploying, and managing our individual snowflake hosts and guests that what we were saving in capital costs.  We were masters of “straining the gnat to swallow the camel”.

Good enterprise design should always take advantage of new technologies.  Enterprise architects must be looking at roadmaps to prevent obsolescence.  With the increased rate of change, just think about unikernel vs containers vs virtual machines, we are moving faster than our hardware refresh cycles on all of our infrastructure.

This doesn’t mean that converged or hyper-converged infrastructure is better or worse, it is an option, but one that is restrictive since the vendor must certify your chosen hypervisor, management software, automation software, etc. with each part of the system they put together.  On the other hand, building your own requires you do that.
The best solution is going to come with compromises.  We cannot continue to look at virtual machines or services per physical host.  Time to market for new or updated features are the new infrastructure metric.  The application teams ability to deploy packaged or developed software is what matters.  For those of us who grew up as infrastructure engineers and architects, we need to change our thinking, change our focus, and continue to add value by being partners to our development and application admin brethren.  That is how we truly add business value.

Enterprise Architecture – When is good enough, good enough?

He who controls the management software controls the universe.

No one ever got fired for buying IBM.  Well…how did that work out?

When I started working in storage, it was a major portion of our capital budget.  When we made a decision on a storage platform, we had to write the proposal for the CIO to change to another brand, and we had better be sure we didn’t have issues on the new platform.  We didn’t buy on price, we bought on brand, period.

I was speaking with a customer recently, and they were talking about how they were moving to a storage startup which recently went through an IPO.  I asked them how happy they were about it, and the response was, something to the effect, it is great, but we will likely make a change in a few years when someone comes out with something new and cool.  This wasn’t an smb account, not a startup, this was a major healthcare account.  They were moving away from a major enterprise storage vendor, and they were not the first one I had spoken to who is going down this path.

I remember when virtualization really started to take off.  The concept was amazing, we thought we were going to see massive reduction in data-centers and physical servers.  Please raise your hand if you have less physical servers than you did 10 years ago.  Maybe you do, but for the most part I rarely see that anyone has significantly reduced the number of workloads.  I guess virtualization failed and was a bad idea, time to move on to something else?  Of course not, we just got more efficient and started to run more workloads on the same number of systems.  We got more efficient and better at what we do, we prevented server sprawl, and thus realized cost savings through cost avoidance.  What has changed though is moving from one server vendor to another is pretty simple.

If I were still in the business of running datacenters I would probably spread over two or more vendors with some standard builds to keep costs down, and provide better availability.  From a storage perspective I wouldn’t really care who my storage vendors were provided they could meet my requirements.  Honestly I would probably build a patchwork datacenter.  Sure it would be a bit more work with patching and such, but if there are API’s, and we can do centralized management to deploy firmware to each system, why not, why be loyal.  For that matter, why have a single switch vendor?

See what I did there?  It is all about the software.  Whether you believe VMware, Microsoft, Red Hat, or someone else will win, the reality is it is a software world.  If your hardware will play nice with my hypervisor, and my management tool, why should I use only one vendor, if it won’t, why should I use it?  It is all about applications and portability.  Hardware isn’t going away, but it is sure getting dumber, as it should, and we are pushing more value through software.  He who controls the management software controls the universe.


He who controls the management software controls the universe.

The times they are a changin

Disclaimer: I am a VMware employee. This is my opinion, and has been my opinion for some time prior to joining the company. Anything I write on this blog may not be reflective of VMware’s strategy or their products.

With this weeks announcements from VMware, there has been a great deal of confusion on what made it into the release. So as not to add to it, I wanted to focus more on something you likely missed if you weren’t watching closely. As I said in the disclaimer, this is not a VMware specific post, but they do seem to be in the lead here.

For many years I was big on building management infrastructure. It was an easy gig as a consultant, it scales and it is fairly similar from environment to environment. Looking back, it is a little funny to think about how hardware vendors did this. First they sell you servers, then they sell you more servers to manage the servers they sold you, plus some software to monitor. When we built out virtual environments we did the same thing. It was great, we did less physical servers, but the concept was the same.

If you pay close attention to the trends with the larger cloud providers, we are seeing a big push toward hybrid cloud. Now this is not remarkable unless we look closer at management. The biggest value to hybrid cloud, used to be that we could burst workloads to the cloud. As more businesses move to some form of hybird cloud, it seems that the larger value is not being locked into on premise cloud management software.

At VMworld 2014, as well as during the launch this week, VMware touted their vCloud Air product. Whether you like the product or not, the thing that caught my eye is the outside model of management. Rather than standing up a management system inside the datacenter, simply lease the appropriate management systems and software. Don’t like your provider, great get another. Again I want to point out, I am using VMware as my example here, but there are others doing the same thing, just not on the same scale yet.

While this is not going to be right for everyone, we need to start rethinking how we manage our environments.  The times they are a changin.

The times they are a changin

The universe is big. It’s vast and complicated and ridiculous.

As I was meeting with a customer recently, we got onto the topic of workload portability. It was interesting, we were discussing the various cloud providers, AWS, Azure, and VMware’s vCloud Air, primarily, and how could they, a VMware shop, move workloads in and out of various cloud providers.

Most industry analysts, and those of us on the front lines trying to make this all work, or help our customers make it work, will agree that we are in a transition phase. Many people smarter than I have talked at length about how virtualization and infrastructure as a service is a bridge to get us to a new way of application development and delivery, one where all applications are delivered from the cloud, and where development is constant and iterative. Imagine patch Tuesday every hour every day…

So how do we get there? Well if virtualization is simply a bridge, that begs the question of portability of workloads, virtual machines in this case. Looking at the problem objectively, we have started down that path previously with the Open Virtualization Format (OVF), but that requires a powered off Virtual Machine which is then exported, copied, and then imported to the new system which creates the proper format as part of the import process. But why can’t we just live migrate workloads without downtime between disparate hypervisors and clouds?

From my perspective the answer is simple, it is coming, it has to, but the vendors will hold out as long as they can. For some companies, the hypervisor battle is still waging. I think it is safe to say we are seeing the commoditization of the hypervisor. As we look at VMware’s products, they are moving from being a hypervisor company, again nothing insider here, just review the expansion into cloud management, network and storage virtualization, application delivery, and so much more, but more and more they are able to manage other vendors hypervisors. We are seeing more focus on “Cloud Management Platforms”, and everyone wants to manage any hypervisor. It has to follow then that some standards emerge around the hypervisor, virtual hard drives, the whole stack so we can start moving within our own datacenters.

This does seem counter intuitive, but if we put this into perspective, there is very little advantage in consolidation at this point. Most companies are as consolidated as they will get, we are now just working to get many of them to the final 10% or so. It is rare to find a company who is not virtualizing production workloads now, so now we need to look at what is next. Standards must prevail as they have in the physical compute, network, and storage platforms. This doesn’t negate the value of the hypervisor, but it does provide for choice, and differentiation around features and support.

I don’t suspect we will see this happen anytime soon, but it begs the question of why not? It would seem to be the logical progression.

The universe is big. It’s vast and complicated and ridiculous.

Hyper-Convergence: a paradigm shift, or an inevitable evolution?

With the recent article on CRN about HP considering the acquisition of Simplivity, http://www.crn.com/news/data-center/300073066/sources-hewlett-packard-in-talks-to-acquire-hyper-converged-infrastructure-startup-simplivity.htm, it seems a good time to look at what simplivity does, and why they are an attractive acquisition target.

In order to answer both questions, we need to look at history. First was the mainframe. That was great, but inflexible, so we moved to distributed computing. This was substantially better, and brought us into the new way of computing, but there was a great deal of waste. Virtualization came along and enabled us to get higher utilization rates on our systems, but this required an incredible amount of design work up front, and it allowed the siloed IT department to proliferate since it did not force anyone to learn a skillset outside their own particular area of expertice. This lead us to converged infrastructure, a concept that if you could get everything from a single vendor, or support from a single vendor at the very least. Finally came the converged system, it provided a single vendor/support solution, packaged as one system, and we used it to grow the infrastructure based on performance or capacity. It was largely inflexible, by design, but it was simple to scale, and predictible.

To solve this problem, companies started working on the concept of Hyper-Convergence. Basically there were smaller discrete converged systems, many of which created high availability zones not through redundant hardware in each node, but through clustering. The software lived on each discrete converged node, and it was good. Compute, Network, and Storage, all scaling out in pre-defined small discrete nodes, enabling capacity planning, and fewer IT administrators to manage larger environments. Truly Software Defined Data Center, but at a scale that could start small and grow organically.

Why then is this interesting for a company like HP? As always I am not an insider, I have no information that is not public, I am engaging in speculation, based on what I am seeing in the industry. Looking at HP’s Converged Systems strategy, looking at what the market is doing, I believe that in the near future, the larger players in this space will look to converged systems as the way to sell. Hyper-convergence is a way forward to address the market space that is either too small, or needing something that traditional converged systems cannot provide. Hyper-convergence can provide a complimentary product to existing converged systems, and will round out solutions in many datacenters.

Hyper-Convergence is a part of the inevitable evolution of technology, whether HP ends up purchasing simplivity, these types of conversations show that such concepts are starting to pickup steam. It is time for companies to innovate or die, and this is a perfect opportunity for an already great company to keep moving forward.

Hyper-Convergence: a paradigm shift, or an inevitable evolution?

Defining the cloud Part 4: Supported

As I try to bring this series to a close, I want to look at what I would consider one of the final high level requirements in evaluating a cloud solution.  In the previous posts, we looked at the cloud as being application centric, self service, and open.  These are critical, but one of the more important parts of any technology is support.  This is something which has plagued linux for years.  For many of us, linux and unix are considered to be far superior to windows for many reasons.  The challenge has been the support.  Certainly Red Hat has done a fairly good job of providing support around their Fedora Based Red Hat Enterprise Linux, but that is one distro.  Canonical provides some support around Ubuntu, and there are others.

The major challenge with the opensource community is just that, it is open.  Open is good, but when we look at the broader opensource community, many of the best tools are written and maintained by one person or a small group.  They provide some support for their systems, but often times that is done as a favor to the community, or for a very small fee, they need to keep day jobs to make that work.

One challenge which seems to be better understood with the cloud, especially around openstack, is the need for enterprise support.  More and more companies are starting to jump on board and provide support for openstack, or their variant.  This works well, so long as you only use the core modules which are common.  In order to make money, all companies want you to use their addons.  This leads to some interesting issues for customers who want to add automation on top of the cloud or other features not in the core.

At the end of the day, a compromise must be struck.  It is unlikely that most companies will use a single vendor for all their cloud software, although that could make it less challenging in some regards.  It comes down to trade offs, but it is certain that we will continue to see further definition and development around the cloud, and around enterprise support for technologies which further abstract us from the hardware and enable us to be more connected, use the data which is already being collected, and the devices which are being and will be developed for this crazy new world.

Defining the cloud Part 4: Supported

Defining the cloud Part 3: Open


This may seem like an odd topic for the cloud, but I think it is important.  One of the questions I have been asked many times when discussing cloud solutions with customers is around portability of virtual machines, and interoperability with other providers.  This of course raises some obvious concerns for companies who want to make money building or providing cloud services and platforms.

We live in a soundbite culture.  If it can’t be said in 140 characters or less, we don’t really want to read it.  Hopefully you are still reading at this point, this is way past a tweet.  We like monthly services versus owning a datacenter, who wants to pay for the equipment when you can just rent it in the cloud.  More and more services are popping up to make it simpler for us to rent houses for a day or a few days, get a taxi, rent a car by the mile, or a bike by the hour.  There is nothing wrong with this, but we need to understand the impact.  What if each car had different controls to steer, what if there was no standard?  How could the providers then create services, it is all based on an open and agreed upon standard.

In order for the cloud to be truly useful, it must be based on standards.  This is where OpenStack is the most important.  Going far beyond just a set of API’s, OpenStack enables us to have a core set of features that are common to everyone.  Of course in order to make money, beyond just selling support for this, many companies choose to add additional features which differentiate them.  This is not opensource, but still based on the open framework.  For most companies, this still uses open standards such as the rest API, and other standards based ways of consuming the service.  Even VMware, perhaps the largest cloud software provider, uses standard API’s, and supports popular tools for managing their systems.

Open standards, open API’s, and standards based management features are critical for the cloud.  Of course everyone wants you to choose their cloud, but to be honest, most of us consume multiple cloud services at once.  I use DropBox, Box.Net, Google Drive, Skydrive, and a few other cloud storage providers because they all have different use cases for me.  I use Netflix and Hulu Plus because they give me different content.  Why then should business consumers not use some AWS, some Google Enterprise Cloud, some HP Public Cloud, and perhaps even some of the other smaller providers?  For the cloud to continue to be of value, we will have to adjust to the multi service provider cloud, and everyone will have to compete on the best services, the best features, and the best value.