[lug] Which Database schema?

BC bcarr at purgatoire.org
Tue Nov 13 16:00:40 MST 2018



On 11/13/2018 3:50 PM, Steve Webb wrote:
> I think that the schema is always problem-dependant.  For things 
> like FullName versus fname,lname,minitial - it really depends on how 
> you're querying and indexing your data.  If you think that you're 
> going to be querying based on last-name, it may be better to have 
> last-name have its own field.  Sometimes, it's not bad to have 
> both.  Also, size is a consideration too.  If you have a *HUGE* 
> dataset, then your indices may be larger or smaller depending on 
> your use-case, so if you're only indexing the first three characters 
> of a field, you may want to have seperate fields instead of a single 
> larger field (if you're indexing the first 3 characters of a field 
> and you're searching a database of full names for a last name, 
> you'll probably end up doing a full table scan anyway).
>
> The interesting thing about multiple fields and creating/querying 
> indices, is that you could create a single index on the first 3 
> characters of the firstname field and the first 3 characters of the 
> lastname field and as long as your 'where' clause is filtering on 
> both fields, it'll use the index and be super-efficient!
>
> An interesting read is how Reddit started out with a "things" table 
> and a "data" table for the entire site. Things have changed since 
> then, but in the early days, the whole site ran off just two 
> tables.  The KISS principle at its most interesting if you ask me. 
> http://highscalability.com/blog/2010/5/17/7-lessons-learned-while-building-reddit-to-270-million-page.html 
> (Lesson 3).

Thanks for this explanation (which I'll need to study) and the link.

I guess what I was thinking was that the fastest overall process is to 
read everything into memory and do all the machinations in memory, 
since memory is sooooo much faster than disk I/O, even SSD I/O. That 
way, I'd be making two trips to the db, one for total read and the 
other for total write.  In my application the database data will never 
come close to overcoming the memory available (db size < 2MiB; memory 
size > 3GiB).

So should db size vs memory size be a consideration as well?



More information about the LUG mailing list