[lug] Why is it SO easy to destroy cloud environments?

Rob Nagler nagler at bivio.biz
Tue Oct 9 11:43:42 MDT 2012


> I can't believe you found it within yourself to type that... even in jest!

This is very serious to me, and why we have no production VMs at
bivio.  We use the cloud, but only for development and test purposes.

How is "juju destroy-environment" any different than my for loop.

Here are the account cancellation policies for Linode and AWS:

http://docs.amazonwebservices.com/AWSEC2/latest/UserGuide/index.html?cancel-ec2.html
http://library.linode.com/linode-platform/billing#sph_account-cancellation

Just for reference, here's what it says when I click on Cancel Account in AWS:

 Account Cancellation

    Once you select to cancel your Amazon Web Services account, you will
    be required to sign up as a new user to begin using AWS again. All of
    your current data will permanently be deleted and you will immediately
    lose access to Amazon Web Services.

    Are you sure you want to cancel your AWS account? NOTE: You will not
    be able to undo this cancellation.

The last line is amazing.  Your backups, snapshots, data, VM config,
etc. are all going to be gone if you click "OK".  I've done this, and
they are gone, forever.

Every cloud platform I've seen has this "Destroy World!" feature,
which actually, is much easier than "Hello, World!".  Way more clicks
to create and configure a single VM than to destroy your entire
platform.

I think we have learned nothing from our past.  Read this article by
Brian Reid from 1986:

ftp://rtfm.mit.edu/pub/reid.txt

This is worse than a screwdriver with a gelignite handle.  It's more
like a screwdriver with a bunker busting bomb attached to the handle.

I have talked to numerous so-called experts about this problem, and
nobody has 1) even thought of it, or 2) come up with a workaround.
Even if you do "rm -rf /" on a real server, it doesn't destroy your
backups, too.  It won't destroy disks in a vault, ever.  It doesn't
destroy the physical computers.  Also, btw, it doesn't happen very
quickly on a system with TBs of data.  Somebody would have to be
pretty sneaky and really good to kill a large site by running rm -rf
on all servers without you noticing.   "Destroy World!" is
instantaneous.

I've been working in distributed systems for decades.  I have done
some really, really dumb things, which were all recoverable.  As an
example, I was the creator of the rsh configuration which amplified
the problem in Reid's expose above.  There are some real issues with
automation to this degree, and I learned that lesson in 1986.

It's only a matter of time before some large site goes down, hard and forever.

Rob



More information about the LUG mailing list