[lug] syncing deletes in backup (or picture workflows)

Anthony Foiani tkil at scrye.com
Tue Feb 1 21:47:54 MST 2011


Davide Del Vento <davide.del.vento at gmail.com> writes:

> Problem is not space/money - but you have to multiply your number by
> more than 3, since I have a need a drive for the desktop (SATA, your
> price is almost right), USB (your price is optimistic, especially
> for self-powered, which are the only ones I consider) and latop
> (much higher price, if not impossible to find). In fact at present
> space is a problem only on the laptop.

I consider this in more detail below, but I what I suggest is:

1. Enough room on the workstation that is immediately in front of you
   for you to download all data off the flash drive.  This can be your
   laptop, or a workstation, or whatever; either way, even 32GB flash
   devices should fit into the spare room of modern laptops and
   desktops.

2. On a persistent at-home connection (be it external drive, or a
   workstation that isn't a laptop, or similar), have enough storage
   for (a) all the raw footage you ever shoot, as well as room to
   duplicate (b) those pictures that you know you want to keep.

3. For full disaster recovery, use some sort of off-site storage for
   at least the subset (2b), along with any documentation that goes
   along with them.  This could be cloud storage, this could be an
   external HD that you take to the office and update once a month, it
   could be a DVD-R of all the "new stuff" since the last time you
   burned the previous one.

   (ALTHOUGH I wouldn't consider any of those formats "archival" in
   any sense; your best bet (at a reasonable price) is to get a new HD
   every 2-3 years, copy everything over to it, and only turn on that
   HD 1-2 times a month, keeping it a dry, cool, dust-free environment
   the rest of the time.)

> But my main problem is clutter, so I don't *want* to keep every
> stupid photo.

That's assuming you know what you'll want/need in 10 years.  I'm bad
at precognition.  (But see below for possible remedies.)

> I know [about Tim Bray's photography blog posts], thanks.

Just checking.  He has some other good threads on photo manipulation
workflow.

You might also consider chatting with pros that have been dealing with
shooting digital for the last 10 years.  The mechanisms I discuss are
probably not applicable there (figure hundreds of shots per day, shot
raw on 20+ megapixel backs, so easily GB/data *per day*.)

> I''ll have a deeper look at your scripts, thanks for sharing them

No problem.  'quick-photo-index' is old old old, but it still works,
so I've never bothered "fixing" it.  It also uses "filesystem as
database", which I prefer to "some other database, generate filesystem
in batch or at view time".  Moving a directory / tarball around is
easier than provisioning tools everywhere.

[This doesn't contradict Rob's comments about documentation being
superior to file naming; I agree, but I would just stick that
documentation into the EXIF/JPEG comments and cart them around with
the raw data itself.]

> It depends on your life. In mine so often this has become: memory
> card is full or almost full, we're leaving house in 10 minutes, the
> kids are screaming, the wife is in a hurry to do something before we
> leave and ask for my help.
>
> a) forget about taking the camera
>
> b) delete (some) pictures randomly on the camera when needed
>
> c) put them on the laptop and delete all of them on the camera
>    (finger crossed that e.g. the laptop won't be stolen, or its disk
>    won't break etc)
>
> d) put them on the laptop and delete some them on the camera -
>    making more PITA for later
>
> e) put them on the laptop and do a quick rsync, with all the
>    downsides on not sifting/sorting/deleting them *before*
>
> f) send rest of the family to hell, and "sit down, weed them out,
>    document, and forget about that trip - hoping that the rest of
>    the family went and had fun".

Unfortunately, you end up saying that you don't have time to do it
now, and you don't have time to do it later.  That's kinda a tough set
of requirements to meet, don't you think?

See also:  http://dilbert.com/strips/comic/1994-02-20/

> No suggestions on the tool I was looking for, then? Let me quote
> myself in case somebody is joining the thread only now:

Here's my suggestion.  [I'm assuming that you have storage lined up as
I described at the beginning of this reply (e.g., an "in your face"
workstation, some other persistent storage at home, and possible
offsite storage).]

1. Write a script (or have someone else write it, or find a program
   that can do it, or see if even stupid windows auto-play will do it
   for you) that does the following:

   a. When the memory card is inserted (or the camera is attached via
      USB), all photos are copied off the removable media into the raw
      storage area.  Let's say we put it into a directory like
      "unsorted-YYYYMMDDHHMMSS".

   b. If the filenames on the camera / memory card are not guaranteed
      unique, one might either need to copy each folder separately, or
      flatten the media's namespace and provide guaranteed unique
      local names and a remapping table.

   c. Copy that directory to the at-home persistent storage.

2. Write another script that watches the at-home persistent storage.
   When a new batch of photos arrives, checksum each file, and
   optionally upload it to the offsite backup.

   a. By looking at the checksums, this step can detect any "duplicate
      raw image" uploads.

   b. Alternately, use content-based naming (rename by hash + link) to
      remove dups "magically".

These two steps provide you with:

A. "out the door in 10 minutes"
B. "the laptop is stolen"
C. "camera memory is full" [not while out, obviously]
D. "just got home, and I'm exhausted"
E. "oops, took more pics after uploading, don't want dups"
F. "don't need vast amounts of storage on the laptop"

Now, at some point, you have the time / energy / focus to sort through
your raw footage. 

3. On the laptop (if still there), or on the workstation, or by
   copying the unsorted-YYYYMMDDHHMMSS directory back to laptop...

   a. Create a new, empty, sibling directory "sorted-YYYYMMDDHHMMSS"

   b. Copy the "keepers" to "sorted-...."  (note, *Copy*, not *Move*.)

   c. Apply any lossless transformations (rotations, comment metadata
      append) you care to.

   d. To apply lossy transforms (resizing, cropping, white balance, etc):

         i. Rename the original [in the "sorted-" directory] to "raw-....".

         ii. Copy "raw-...." to "....".

         iii. Apply lossy transforms to "...."

      Rationale: if it's important enough to keep, it's worth keeping
      as much data as possible.

   e. Copy the completed "sorted-..." directory to your workstation.

4. On the workstation, watch for new "sorted-..." directories.  When
   one appears, it could be immediately published (e.g., to flickr).
   It should always be mirrored to offsite storage, if you're using
   that option.  (Or, at least, queued for "next time we burn a DVD").

These steps give you:

G. "get rid of what we currently think is trash"
H. "make sure we can find the important stuff"
I. "make it easy to backup / publish the important stuff"
J. "oops, I really *did* want that goofy pic from last week"

So, no single simple tool.  But with the above [fairly straight-
forward] workflow, a few scripts, and a handful of standard tools, you
can get everything you are asking for and more.

> Of course this tool *must* lack the most important feature of a
> version control system, i.e. ability to revert to a previous status,
> which would be too space consuming for this amount of data.

View a "state" as a curated selection from an immutable "sea" of raw
content, and I think this is entirely doable.

HTH,
t.



More information about the LUG mailing list