[lug] Which Database schema?

Bear Giles bgiles at coyotesong.com
Tue Nov 13 16:25:03 MST 2018


> In the meantime I'll leave it as a 55 columns each containing the same,
native fields for different  weeks/yr.

Eek!

You still want to normalize your data. If absolutely necessary you can
create a view that does stuff like this.

The question on whether to split names and addresses into components is
actually pretty deep. Besides the problems that are obvious to us in some
countries you can have more than one legal name, e.g., you might have
different names in different character sets. Or you might have different
names for different purposes or at different times (e.g., how people often
change their name here at marriage.) In this case you want to keep it as
abstract as possible modulo business needs. E.g., many sites want it split
out because they send the information to an address verification webapp.

But ironically that's also an increasingly moot point since privacy
practices would require the information to be encrypted anyway. The best
way to handle that is to create a 'wallet' (name-value pairs), encrypt
that, and then store that encrypted value in the database. You can expose
bits for searchability, e.g., first three letters of last name, last four
of credit card number, everything but the address1 and address2 lines for
the address. But with best practices name, email address, street address,
etc., should be encrypted no matter how small your database or app.


On Tue, Nov 13, 2018 at 4:00 PM BC <bcarr at purgatoire.org> wrote:

>
>
> On 11/13/2018 3:50 PM, Steve Webb wrote:
> > I think that the schema is always problem-dependant.  For things
> > like FullName versus fname,lname,minitial - it really depends on how
> > you're querying and indexing your data.  If you think that you're
> > going to be querying based on last-name, it may be better to have
> > last-name have its own field.  Sometimes, it's not bad to have
> > both.  Also, size is a consideration too.  If you have a *HUGE*
> > dataset, then your indices may be larger or smaller depending on
> > your use-case, so if you're only indexing the first three characters
> > of a field, you may want to have seperate fields instead of a single
> > larger field (if you're indexing the first 3 characters of a field
> > and you're searching a database of full names for a last name,
> > you'll probably end up doing a full table scan anyway).
> >
> > The interesting thing about multiple fields and creating/querying
> > indices, is that you could create a single index on the first 3
> > characters of the firstname field and the first 3 characters of the
> > lastname field and as long as your 'where' clause is filtering on
> > both fields, it'll use the index and be super-efficient!
> >
> > An interesting read is how Reddit started out with a "things" table
> > and a "data" table for the entire site. Things have changed since
> > then, but in the early days, the whole site ran off just two
> > tables.  The KISS principle at its most interesting if you ask me.
> >
> http://highscalability.com/blog/2010/5/17/7-lessons-learned-while-building-reddit-to-270-million-page.html
> > (Lesson 3).
>
> Thanks for this explanation (which I'll need to study) and the link.
>
> I guess what I was thinking was that the fastest overall process is to
> read everything into memory and do all the machinations in memory,
> since memory is sooooo much faster than disk I/O, even SSD I/O. That
> way, I'd be making two trips to the db, one for total read and the
> other for total write.  In my application the database data will never
> come close to overcoming the memory available (db size < 2MiB; memory
> size > 3GiB).
>
> So should db size vs memory size be a consideration as well?
>
> _______________________________________________
> Web Page:  http://lug.boulder.co.us
> Mailing List: http://lists.lug.boulder.co.us/mailman/listinfo/lug
> Join us on IRC: irc.hackingsociety.org port=6667 channel=#hackingsociety
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lug.boulder.co.us/pipermail/lug/attachments/20181113/650ba5c9/attachment.html>


More information about the LUG mailing list