[lug] large file management

Quentin Hartman qhartman at gmail.com
Mon Mar 17 15:49:53 MDT 2014


In a more general sense, this sort of problem is solved by Document
Management System software. OpenKM is supposed to be good, but I've never
used it.

Subversion does work pretty well with binary files these days, especially
if your use case is append-mostly. You just need to setup some file locking
workflows for editing. On your laptop, you could then do selective
checkouts to save space, and/or setup multiple repos to break things up.
Say one repo each for major events and/or topics, which are externals to an
"all photos" repo. On your space constrained thing, you can also use a
remote SVN repo browser to find what you want and get it selectively, even
if you don't break things up. To make this work well you are talking about
a deployment that goes beyond out-of-the-box config, but it's totally
doable. I have several binary-mostly SVN repos that I manage that total
several hundred GB of data, and they work fine. Once setup, normal SVN
backup tools apply. Everything you need is well documented already. One
downside is that SVN is "slow" when doing these sorts of checkouts /
checkins because of all the overhead it introduces. Whether or not that is
tolerable is up to you.

Of course, most people use Picasa and the like to solve this sort of
problem.


On Mon, Mar 17, 2014 at 3:24 PM, Davide Del Vento <
davide.del.vento at gmail.com> wrote:

> I have a large number of fairly large files (spoiler alert: the
> digital pictures of a lifetime). The collection is growing, mostly but
> not only by append. I have three copies onsite, and would like to add
> a forth copy offsite. Two of the copies are actively edited
> (occasionally, but the frequency doesn't matter), and often appended.
> The third (and forth) copy is just for backup.
>
> Managing such a collection of files is becoming a nightmare. Consider
> some scenarios:
>
> Some files are added to Copy 1 (my laptop) taken from the SD card of
> the camera in a hurry. No synch is done with the other copies at this
> time, because I have to run out of the door to take pictures of the
> kids.
>
> Some files are deleted in Copy 2 (e.g. by my wife) because they were
> out of focus, blurred, not worth keeping. No synch is done with the
> other copies at this time, either.
>
> Some (old) files are deleted in Copy 2 (e.g. by myself) because we run
> out of space in the laptop.
>
> Then, one is in the process of rsynch'ing the 3 copies and my rsynch
> workflow (discussed on this list some time ago, IIRC) breaks. Not that
> it was particularly good anyway, having to know "off channel" what
> happened to which copy before I could fire a reasonable command....
> Since I'm reading my swag at the BLUG (Cliff Stoll's "Silicon Snake
> Oil", but that's the subject for another post), I'm tempted to say
> "computers are crap", let my mother ship me my old film cameras and
> lenses from the early 80s (or maybe buying them here for cheap) and
> stop at that. Heck those cameras worked even without batteries!
>
> Instead I say: there must be a better way. I mean, we solve this very
> same problem (for text files) zillions of times a day, don't we? But
> for large binaries, I can't just throw [git|hg|svn] at it and be
> happy. As mentioned in the scenario, at least on one machine space is
> tight. Yes, space is getting cheaper and the like, but: a) I don't
> have any money to buy a disk for that laptop and b) there is no reason
> to keep a copy of some pictures from a trip in 2003 on it anyway. And
> I am not even sure [git|hg|svn] is the right tool for the job.
>
> So googling around I found http://code.google.com/p/boar/ and
> http://www.cis.upenn.edu/~bcpierce/unison/index.html and I ask: has
> any of you tried either? Or anything other piece of software worth
> trying for this purpose?
>
> Cheers,
> Davide
> _______________________________________________
> Web Page:  http://lug.boulder.co.us
> Mailing List: http://lists.lug.boulder.co.us/mailman/listinfo/lug
> Join us on IRC: irc.hackingsociety.org port=6667 channel=#hackingsociety
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lug.boulder.co.us/pipermail/lug/attachments/20140317/f9f17676/attachment.html>


More information about the LUG mailing list