[lug] LVM and disk failure

Sun Jan 8 14:57:55 MST 2006

Daniel Webb wrote:
> On Sun, Jan 08, 2006 at 09:39:11AM -0700, Dan Ferris wrote:
> 
>>LVM doesn't increase or decrease robustness.  It has nothing to do with 
>>robustness.
> 
> Everyone is saying this to me, but I think we're meaning two different things.
> What you mean is "LVM doesn't increase or decrease robustness of a given
> physical volume".  I agree.  What I mean is "LVM decreases the robustness of a
> logical volume relative to a similar size physical volume".  In other words, a
> 300G logical volume spanning three 100G physical volumes is 1/3 as robust as a
> logical volume spanning a single 300G physical volume.  That's assuming that
> each physical drive has the same chance of failures as any other.

You're correct from a probability of failure perspective.  But managing 
robustness (or even disk usage) across multiple drives isn't why LVM was 
created.

If you have multiple drives you use RAID to ensure that they don't 
increase the chance of failure of your whole server.  Or perhaps not. 
Maybe you don't care whether your audio files are available or not so 
you just put them on a separate disk.  The chance of the web server 
functionality dying remains unchanged.

The thing that LVM is for is to allow you to move your physical blocks 
around between your partitions (logical volumes).  So you add a disk (or 
array) and want it divided between /opt and /srv (which already use part 
of another disk/array).  Or you realize (belatedly) that you don't 
really need 5GB in / but need more than 2GB in /usr.

In the Windows workstation world (and server, to some degree) everything 
goes in C:.  That's because putting it elsewhere doesn't buy much, is a 
headache, and it's hard to reorganize things when you realize your shiny 
new app insists on running from your (undersized) C:.

In the Linux world you don't want /tmp to use up all your / space and 
crash.  You don't want anything in /home executed (ever, really).  So 
you make different partitions for different purposes.  But then 
resizing, without LVM, on a single device really sucks.

So first decide how to arrange your drives for performance and 
reliability.  That will depend on number of drives, read vs. write load, 
etc.  And don't forget to consider your controllers which may give you 
extra data paths to use or single points of failure.  (I've seen lots of 
data lost off RAIDs.  I've never seen a drive fail in a RAID.  Granted 
my experience is a corner case.)

Now that your drives are installed, put them all in LVM so you have one 
big pool supplying blocks to your file systems.  (Sure, you may run 
across reasons to do things more complicated.  But prove that disk I/O 
is the problem before you optimize it.)

> If your filesystem spans multiple physical volumes, it absolutely decreases
> robustness, unless that filesystem deals well with getting a large piece
> chopped out of it.

I doubt many file systems handle that well because it hasn't been a 
likely failure mode until things like LVM are widely used.  How will you 
lose a third of your partition at once?

And probably there are better places to address that issue than the FS. 
  No really, ext2 doesn't have to handle losing a third of the 
partition.  If it does you either restore from backup or your RAID knows 
how to handle it.  Separating the functions probably makes sense from a 
coding standpoint.

> For my 1000-disk example, I'd be a little worried even with RAID.

No you wouldn't.  1000 disks is a SAN and you have loads of redundancy 
built in there.  Or you got ripped off.  Or you built it yourself 
instead of buying the SAN you should have.

> In the case I have in mind, I'm thinking of user accounts that should be
> resizable.  I think a better way will be to create a new logical volume and
> filesystem for each user, instead of putting all users on the same filesystem.

That's an interesting idea.  You should research whether that's easier 
than using quotas (the traditional approach) and what the trade offs are 
and then write a paper.  The LUGs here would be interested in it, and 
maybe even USENIX et. al.

> Then at worst those users whose logical volumes cross a physical boundary are
> twice as likely to lose their filesystem, but that's not nearly as bad as the
> risk of doing it all on one filesystem across many physical drives.

Again, if you have many physical drives you need to address the impact 
of a failure and plan accordingly.  It's probably easier to use RAID and 
hot swap a bad drive than to try to figure out how to reconfigure LVM 
and restore the users' data.

HTH,
Dave