[lug] Which Database schema?

Tue Nov 13 15:50:27 MST 2018

I think that the schema is always problem-dependant.  For things like
FullName versus fname,lname,minitial - it really depends on how you're
querying and indexing your data.  If you think that you're going to be
querying based on last-name, it may be better to have last-name have its
own field.  Sometimes, it's not bad to have both.  Also, size is a
consideration too.  If you have a *HUGE* dataset, then your indices may be
larger or smaller depending on your use-case, so if you're only indexing
the first three characters of a field, you may want to have seperate fields
instead of a single larger field (if you're indexing the first 3 characters
of a field and you're searching a database of full names for a last name,
you'll probably end up doing a full table scan anyway).

The interesting thing about multiple fields and creating/querying indices,
is that you could create a single index on the first 3 characters of the
firstname field and the first 3 characters of the lastname field and as
long as your 'where' clause is filtering on both fields, it'll use the
index and be super-efficient!

An interesting read is how Reddit started out with a "things" table and a
"data" table for the entire site.  Things have changed since then, but in
the early days, the whole site ran off just two tables.  The KISS principle
at its most interesting if you ask me.
http://highscalability.com/blog/2010/5/17/7-lessons-learned-while-building-reddit-to-270-million-page.html
(Lesson 3).

On Tue, Nov 13, 2018 at 3:40 PM BC <bcarr at purgatoire.org> wrote:

>
>
> On 11/13/2018 3:22 PM, Zan Lynx wrote:
> > I recall reading a blog a while ago by a developer who regretted
> > making lots of columns.
> > Because of things like Chinese names he got rid of first name, last name
> and simply had name. Instead of street number, street, city, etc., he just
> had mailing address.
> >
> > Once he got outside of the US and European countries things just got too
> weird for his many columns.
>
> Essentially, I was wondering if it is ultimately faster to read and
> write one BIG column over a LITTLE column.
>
> If the disk isn't fragmented much, I would think one BIG column is
> faster, but Bear's suggestion wouldn't support that.  I may need to
> set up a test case and try it to see. In the meantime I'll leave it as
> a 55 columns each containing the same, native fields for different
> weeks/yr.
>
> _______________________________________________
> Web Page:  http://lug.boulder.co.us
> Mailing List: http://lists.lug.boulder.co.us/mailman/listinfo/lug
> Join us on IRC: irc.hackingsociety.org port=6667 channel=#hackingsociety

-- 
Sent from Steve Webb's gmail account.
http://badcheese.com / http://twitter.com/scumola
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lug.boulder.co.us/pipermail/lug/attachments/20181113/09ff4e07/attachment.html>