jump to navigation

TOPIC: Standard Operating Procedures for Virtualization Management (SOPs) October 7, 2008

Posted by Roel Gydé in Data Center Management.
Tags: , , ,
add a comment

As virtualization provides a new dimension to IT, it also shows a new dimension when it comes to management and maintenance. Virtualization will lower the TCO, increase the ROI, offer the capacity to correctly react to the changing needs of the business, … all nice aspects … but … It is not unthinkable that all these nice benefits go down the drain when your environment is not well prepared. When talking about ‘the environment’ we refer to both the physical infrastructure and software but also to those maintaining the infrastructure.

In order to eliminate the “risks” involved when it comes to maintance & management, setting up ‘Standard Operating Procedures’ are a good start, as long as those that need to execute thse SOPs are properly educated.

What is a SOP ? According to Wikipedia a SOP is a set of instructions (with the force of a directive) covering features of operations that lend themselves to a definite or standardized procedure without loss of effectiveness.

So in plain English it comes to a well structured worflow that consists out of triggers, prerequisites, actors, actions, output, tools and management, that will guide those that need to execute the SOP through a management task without being ‘to lawfully’. The goal is setting up SOPs that click together without be to obligatory.

A practical example of a set of SOPs that flow into a process:

A new employee has been attracted by the company, this involves a number of IT tasks that need to be executed before that employee can start working.  

SOP1 – creating new user within domain
SOP2 – creating mailbox for new user
SOP3 – creating phone extension for new user
SOP4 – setting applicaiton permissions for new user

So all these SOPs flow into a complete process (f.e. Setting up new employees within the IT Department), it is not only the goal to roll out high quality easy to follow SOPs but they can even be combined into (for example) a Six Sigma project that provides you with the capacity to look for (financial) improvements.

Before a SOP gets on the way, there has to be a trigger that crosses a treshold. Within a management context of a virtualized data center, this comes to planned or unplanned maintenance or a project request. What ever type of trigger occurs, it is important that these triggers are recorded in a ticketing system. In general there are three triggers:

  • Planned maintenance: upgrades, fixes, rollouts, rollbacks, implementations, … in general these planned triggers occur after the approval of a customer, a business unit or a manager. There is clearly a formal approval necessary before you can start with this planned maintenance.
  • Unplanned maintenance: an event that has not been planned (for example hardware failure, …) that requires an immediate action from the operations team. Within this case no formal approval of a customer, business unit or manager.
  • Project Request: this is probably the widest of the three triggers, this ranges from setting up a new VM over creating a new vDisk, provisioning of a new server, … In most cases the Project Requests can be planned well in advance and are based on a formal approval of the customer, business unit or manager

Now that the trigges have been defined, it is important to indicate which prerequisites need to be met before the SOP gets to work. These prerequisites are often documents or approvals from the involved party, except for the unplanned maintenance triggers. The prerequisites should also include scheduling (if possible).

Once the prerequisites have been met, it is important to indicate those that need to execute the SOP. This can be a system administrator, an IT manager, a helpdesk coordinator, IT Ops, … make sure that you assign job roles to these actors instead of specific names.

The next step are the action steps, these steps should be prescriptive, actionable and concise, furthermore those tools that need to be used should be documented. It does not help to work out a book of actions, you should have confidence in those that take the actions as you have educated them on the actions.

  • Bad example: Login to management console X, click on the server with the issue, right-click, …
  • Good example: Login to management console, put host into maintenance mode, verify no VMs running, apply patch, …
If you document your actions to extensive this will generate boredom with the actors, which will result in increased downtime, which will result in decreased profitability for the business.
The actions that have been taken should result in an expected output. This output needs to be known when you setup a SOP. If you have a SOP for patch management, the output should be at least that the latest patch has been applied. Make sure you define the output as a checklist. This checklist should include the following:
  • The goal of the SOP has been met (for example applying a patch)
  • Ticket status within the ticketing system
  • A formal notification
The last part of the SOP should be the management part. Make sure they are generally available and updated on a regular base (document history !). Make sure you have an escalation procedure in place when a SOP goes wrong (perhaps this can also be set in a SOP). Metrics are also interesting, specifically for general management and SOPs can even be used when training new hires.

One aspect till now has not been covered: testing. Make sure you test these SOPs before you rmove them from a draft version into a final version. If testing shows errors, issues, … change the SOP before  you move them to the final stage of implementing the SOP. When implementing these SOPs make sure that all involved parties have received the necessary training. If you have the time (you should make the time !), do a testrun with the SOP.

Also it is very smart to implement a prioritization for the SOPs, some SOPs solve bigger issues that other. For example setting up a new VM should have a lower priority than solving a security SOP on a running VM.

Do all SOPs need to be documented, as long as there is no technological solution for this, yes. Within the near future, vendors will provide the necessary tools to automate certain SOPs. For virtualization one of these tools will be Citrixs Workflow Studio. It will offer the capacity to automatically start-up new VMs when the load increases, eventhough this SOP will be automated it might be important to keep a copy of the SOP on paper.

What is the value of a SOP to a channel partner (VAR, reseller, integrator) or even an IT department with internal customers

Not only will SOPs offer, those servicing others, to increase their profitability of the system engineers executing the SOPs (locally at the customer or from remote), it also offers better ‘invoicing capacity’ (SOPs are ticketed, ticketing means timestamping, timestamping means invoicing). Furthermore tasks get documented, which offers the capacity for the SEs to jobrotate increase the satisfaction of the personnel.

On a commercial level it offers the possibility to increase credibility with the customer and it even opens doors when it comes to upsellin. (f.e. SOP for adding a new VM, results in the VM not being created as the hardware does not meet the prerequisites). 

Call to action

If you have any SOPs, which you wish to share with other virtualization professionals, feel free to forward the location where they can be found and we will make sure that they get the necessary visibility. We will publish a number of SOPs in the near future, in the mean time start SOPPING (not meant in literal way). 

Example set of SOPs

  • Build VM
  • Commission VM
  • Decommission VM
  • Server maintenance (multiple SOPs)
  • Incident respons (multiple SOPs)
  • Contact supplier / vendor for support
  • Add LUN
  • Remove LUN
  • Patch template
  • Create snapshot
  • Revert snapshot
  • Disk Add
  • Disk Expand
  • Disk Remove

Microsoft TechED 2008: Hyper-V versus ESX June 19, 2008

Posted by Roel Gydé in Uncategorized.
Tags: , , , ,
1 comment so far

For one or another bizar reason the comparisions Hyper-V vs. ESX went public. A nice article on this discussion was posted by Peter Bruzesse on inforworld. The article was completely non biased and it was kept to the facts. Unfortunately once again some MS-addicts and VMW-addicts managed to get into a competing conversation on which technology was the best.

Personally I can understand that you prefer one brand over another, but that you start comparing a brand on features, that is way beyond my understanding. It is not the brand that counts, it is a solution that counts. This virtualization solution should fit the current and future requirements of the business, must be easy to manage, … clearly this decission can not be taken on feature-level. Below is an overview of our short reply to all the posts that Peter got on his article:

As a channel manager for a distributor in Europe we often get confronted with channelplayers that do not see which solution/brand to propose. Comparing the products on feature-level is not the best way. It all depends if you require an engine or a car like Simon Crosby often states.

Seen the fact that VMware has been in this market much longer than other brands, it is normal that they are leader. When new technology comes out, it is normal that everybody gets on this wagon and chooses that brand as there is no alternative.

If a company needs to take a decision regarding a brand/solution, expressions like “we’ve swapped 7000 VMs from datacenter 1 to datacenter 2 over a super high speed link 100km away” are totally irrelevant.

Decisions for a solution or a brand should be based on the current and future requirements of the business, the TCO, ROI manageability, ecosystem of the solution, …

There are three major players running around in the ballpark right now: Citrix, Microsoft and VMware and others following . But in the long term it will not be the hypervisor that will be the winner, it will be the business that once again gets in the driverseat and IT needs to align to this driverseat.

VKernel releases Capacity Bottleneck Analyzer 1.2 for VMware June 19, 2008

Posted by Roel Gydé in Uncategorized.
Tags: , , ,
add a comment

Imagine being able to instantly identify capacity bottlenecks on hosts, clusters and resources pools. Capacity bottlenecks, when not resolved, cause performance problems or even downtime. VKernel´s Capacity Bottleneck Analyzer Virtual Appliance immediately builds a list of current RAM, CPU, storage and network bottlenecks in a VMware infrastructure. It also predicts future capacity bottlenecks and alerts you when trends exceed customizable thresholds.

Free trial here, at present is unclear if the solution will also be available for XenServer or Hyper-V

Virtual Service Oriented Grids June 18, 2008

Posted by Roel Gydé in Uncategorized.
Tags: , , , ,
add a comment

Imagine if you take virtualization, combine it with SOA and add some grid computing … you get ‘virtual service oriented grids’.

With virtualization you can increase the load on your physical server, with SOA yoiu enable the system to be agile and align with the business. When adding grid computing you get the most flexible, adaptive datacenter that responds to the requirements from the business. That is what IT should be about, adapting IT to the business.

Intel will be publishing a book on how ‘Virtual Service Oriented Grids’ will be changing the enterprise. You can find an abstract here. A must read for late summer.

Virtualization and IT OPS … June 9, 2008

Posted by Roel Gydé in Uncategorized.
Tags: , ,
add a comment

Kevin Lees wrote an extensive article on the influence of virtualization on IT Operations and vice versa.

… Virtualization continues to be recognized as sufficiently mature for deployment in production environments. In fact, I would say it’s either rapidly approaching or has already arrived at the “knee in the curve.” The coming years will see an exponential increase in virtualized production environments as it passes this bend and begins its, what I believe will be, rapid climb up the steep part of the deployment curve.

What will influence how rapidly virtualization technologies are deployed in production? Certainly its increasing technical maturity will have a huge impact. For anyone who has attended VMworld or who takes regular notice of virtualization industry announcements, there should be little doubt that the advancements we see in virtualization’s core technology (for instance, hardware assisted virtualization) as well as in supporting solutions (like VMware’s Storage Motion, Stage Manager and Lifecycle Manager or Citrix’s XenCenter and XenMotion) will continue unabated. But, will the maturity of virtualization’s technologies and supporting solutions alone drive the steepness of the deployment curve? If not, what else might influence how rapidly virtualization is deployed in production environments? In my opinion, it will be IT Operations. This series of articles will look at virtualization’s impact on IT Operations. This first article discusses the advantages virtualization offers to IT Operations. The second article will address the IT Operations’ challenges presented by virtualization and the current state of available tools to address these challenges. The final article in the series will consider virtualization’s impact on IT Operations from an ITIL perspective. …

Integrien addresses virtualization complexity with “Integrien Alive” May 27, 2008

Posted by Roel Gydé in Uncategorized.
Tags: , ,
add a comment

It’s no secret that the rapid adoption of data center virtualization, while providing the clear cost savings of server consolidation, creates a much more difficult environment for the management of performance and availability of virtualized applications. While other management vendors tout their ability to capture metrics from virtual environments, their tools still require massive amounts of manual effort to solve performance problems and simply cannot scale in a virtualized environment. Integrien is taking a new approach to virtualization management by automating much of this manual effort in today’s dynamic and complex virtualized environments.