[lug] [Slightly OT] File Management?
Nate Duehr
nate at natetech.com
Tue Mar 24 16:08:03 MDT 2009
Paul,
The only small problem with this part, that I see, is that point-in-time
commercial systems (I won't speak for Linux systems, since many of them
aren't code-reviewed in any meaningful way... or doing anything to
peer-reviewed data integrity standards outside of the open-source community)
have been used successfully for years, and have also been tested/known
working for years, and their limitations are well-documented by their
creators.
There are always limitations when attempting to copy things "atomically" on
filesytems.
Usually such point-in-time systems are handled via methodologies that treat
the storage subsystem like a giant FIFO buffer at the lowest layers, even if
multiple things are writing to the storage from the system level. The
"snapshot request" is acted upon ONLY at a carefully calculated
point-in-time, not an arbitrary one. One where the system is in the correct
state machine state for taking such a snapshot.
Nothing is confirmed written to disk back to the kernel (and ultimately out
to the application storing the data) until it's written at least once into
a journal file, and then the FS handles making whatever copies (RAID,
whatever), that were requested of it. Modes are available for needing "X"
number of copies of supremely important data prior to giving a
write-confirmation to the OS's kernel, in extreme cases. The application or
the admin/owner of the overall system must know this and understand that
data loss occurs any time everything's not written and CONFIRMED by the
filesystem, although the filesystem can attempt to alleviate that minor pain
in creative and less error corrected ways.
For any snapshot-based system, you as the admin make the request, and the
system only does it when everything in the FIFO has been committed to disk
and confirmed. Everything else is either being written to RAM (lost if
power removed right at this point in time), or written to a separate journal
that's typically faster than the filesystem itself and less error-corrected
-- but almost always still there.
Since we speak Unix/Linux here... almost all of these systems are
inode-based. If an inode changes after the snapshot time, a new inode/block
is written, and the old one is retained. If you request a file from the
point-in-time snapshot's "directory", the old inode number is used. If
you're operating against the current data, the most current inode containing
that data is used. The snapshots MUST be eventually deleted to recover this
"lost" duplicate inode space... eventually.
Whether or not the APPLICATIONS above the lower layers have modes
appropriate for point-in-time backups is also often in question, and it's a
requirement that the admin KNOW how the applications handle such events.
All that's really necessary to handle point-in-time "snapshot" style backups
are applications smart enough to either a) stop writing temporarily or b)
can recover gracefully from data loss. RDBMS's already do this. The
leaders of the RDBMS pack all have the ability to go into a "quiescent" mode
where database reads/writes are stopped for a period of time at a "good"
point-in-time for the RDBMS.
Other applications should at least attempt to use the native filesystem
locking mechanisms in the OS for a particular filesystem, so the "snapshot"
system at least has a solid chance of doing the right thing. In fact, it's
been pointed out by a great many that the recent data losses in ext4 are by
far more a problem with badly written code not using the native file locking
calls, than the filesystem itself. Too many coders skip these things,
and/or refuse to use a file locking wrapper library for their reads/writes
of data. Spoiled by fast and plentiful disk, today's application software
has gotten "lazy" about data integrity.
No backup is perfect in relation to a timeline -- even venerable old "tar"
didn't have warnings about "file has changed during backup" during its early
life, but it certainly does today...
The only way to do it "right" requires that the point-in-time overall
"system" happens completely transparently to higher layer (and possibly
badly written) applications. That means the best point-in-time systems are
found down in the filesystem itself.
There are numerous filesystems that do this, today. They do work! (While
I'm not going to trust it yet with anything important, ext4 has modes that
do things similar to the above, but they're NOT the default settings for the
journaling type/methodology in use on most distros systems. This is where
people get bitten... the devil is in the details, when a distro starts
making choices for you, instead of the admin making a conscious choice and
documenting the limitations to the overall business system being created or
used at the higher levels.)
Nate
-----Original Message-----
From: lug-bounces at lug.boulder.co.us [mailto:lug-bounces at lug.boulder.co.us]
On Behalf Of Paul E Condon
Sent: Tuesday, March 24, 2009 12:49 PM
To: lug at lug.boulder.co.us
Subject: Re: [lug] [Slightly OT] File Management?
On a slightly different level, I am skeptical of the very idea of
"point in time snapshots". At a level of detail that we can think
about, and may actually be coming into reality, there is no single
definition of time that is valid everywhere. This is a well known
result of the Special Theory of Relativity. More significantly,
because of transmission delays, it is not possible to know of a
commit of a transaction until some time after it has actually
occurred. This can lead to the system letting other transactions
to commit that would have not been allowed had the first commit
been known about promptly. This is not a problem of a mistake
being made because of an inadequate implementation, but because of
an inadequate understanding of the nature of time. Talk of a
"point in time" is evidense of an inadequate understanding of
the nature of time.
--
Paul E Condon
pecondon at mesanetworks.net
_______________________________________________
Web Page: http://lug.boulder.co.us
Mailing List: http://lists.lug.boulder.co.us/mailman/listinfo/lug
Join us on IRC: lug.boulder.co.us port=6667 channel=#colug
More information about the LUG
mailing list