Virtualization Technology News and Information
Article
RSS
Rackspace Joins Amazon in Cloud Reboot Over Xen Hypervisor Bug

A few days ago, Amazon Web Services made an announcement to EC2 customers that there would be an urgent patch and reboot session occurring across a widespread area of its cloud environment, and it would take place within a short window of only a few days.  The reason?  Some type of bug in the Xen hypervisor, which is what Amazon's cloud technology is built on. 

But that news raised another question.  Amazon isn't alone in using the Xen hypervisor in the cloud building business.  So, if there is a nasty bug raising this type of alarm in the EC2 cloud community, why are there seemingly no alarms coming from other large Xen-based cloud service providers?

We still don't know exactly what the Xen XSA-108 bug is all about yet, as the information is still embargoed until October 1, 2014, presumably to give folks a chance to fix things before the information becomes public and exposure and risk is heightened.

Enter Rackspace.

While the company's OnMetal servers do not operate on Xen, its NextGen Public Cloud environment, like Amazon EC2, is also built on Xen technology.  And because of that, Rackspace warned customers that it too, like Amazon, would begin a "cloud reboot" across its facilities. 

Rackspace said it had come up with a solution to handle the Xen bug, and that they anticipated that a reboot would be necessary for all Standard, Performance 1 and Performance 2 Cloud Servers within its Cloud Servers infrastructure.  And the company has already begun implementation of that plan.  They kicked off reboots on Sunday in the US and Europe, and said it would follow up those reboots with more of the same across its Sydney data center on Tuesday. 

The company went on to explain:

"Recently, an issue that has the potential to impact a portion of the Public Cloud environment was reported. Our engineers and developers continue to work closely with our vendors and partners to apply the solution to remediate this issue. While we believe in transparent communication, there are times when we must withhold certain details in order to protect you, our customers."

In preparation for those reboots, the company recommended customers take the following proactive steps to ensure that their environments would be configured to return to proper operations once the patch and reboot was completed:

  • Verify all necessary services (Apache, IIS, MySQL, etc.) are configured to start on server boot
  • Ensure that you have up-to-date server images and file-level backups enabled, and confirm that you have backups of all critical data
  • Confirm that any unsaved changes, such as firewall rules and application configurations, are indeed saved

The plan seems sound, and it should ensure minimal downtime for customers.  The problem with the plan for many is that it didn't seem to provide enough notice to customers to really make the necessary changes or take the necessary precautions.  According to reports, the alerts went out this past Friday sometime after 9:00PM via email.  While I realize that most people in the world are tethered to their smartphones (and as a result, their email), getting a message like that after 9:00PM doesn't really give the warm and fuzzies when it comes to maintenance alerts.  

Shouldn't a maintenance alert go out during normal business hours?  And since Amazon alerted folks days earlier, wouldn't a longer maintenance window have been appreciated here as well?  Providing the appropriate steps is great, but timing, as they say, is everything.  A longer window of time or an alert during normal business hours during the work week itself would have been more apropos in this instance.

And oh yeah, one other question.  Wasn't the cloud supposed to provide resiliency?  Did (and do) customers just expect their servers to continue to work without them having to take any steps because of a patch and a reboot? 

Published Monday, September 29, 2014 7:30 AM by David Marshall
Filed under: ,
Comments
There are no comments for this post.
To post a comment, you must be a registered user. Registration is free and easy! Sign up now!
top25
Calendar
<September 2014>
SuMoTuWeThFrSa
31123456
78910111213
14151617181920
21222324252627
2829301234
567891011