[lug] Q. on SQL Foreign Keys (PostgreSQL)

Bear Giles bgiles at coyotesong.com
Wed Aug 23 15:31:39 MDT 2017


I used libpq years ago - it might be easier to use to get the metadata than
querying the information_schema directly.

To get the load order you need to create a graph with nodes = tables,
ordered links = referential integrity dependency. I don't think you need to
worry about the columns -- a foreign key must refer to a single table no
matter how many columns it uses.

Do a topological sort on the graph.

1. Go through the graph and remove all nodes with no inbound or outbound
dependencies. These tables can be loaded at any time - before or after the
rest of the data.

2. Go through the graph and list all nodes with no inbound dependencies and
at least one outbound dependency. These don't have any dependencies but
other tables depend on them. You can load these tables in any order.

3. Remove those nodes, decrement the inbound dependencies counter on the
corresponding nodes/tables, and repeat. This will create sets of tables and
while the order of the tables within the set don't matter the order of
tables between sets do.

This will either finish with no nodes remaining or you'll find a circular
dependency. That dependency may involve more than two tables - A depends on
B which depends on C which depends on A - since I think the database may be
smart enough to prevent you from creating a two-party dependency. In this
case you're hosed.

(Well, not really, but you need to know that it will require special
handling and the tables will be difficult to update without disabling and
reenabling constraints.)

Remember in all cases you want to turn off the indexes and foreign keys,
load the data, and then recreate the index (upstream table) or foreign key
(downstream table). That's a lot more efficient than populating the table
with the constraints in place. It';s up to you whether you do this in one
big move - in which case order doesn't matter as long as you create the
index before the foreign key - or at each step so you'll have earlier
warning of a problem.

Finally run 'vacuum analyze'.

On Wed, Aug 23, 2017 at 3:15 PM, <stimits at comcast.net> wrote:

> The database itself seems to be SQL92 or newer, not sure what PostgreSQL
> 9.4.9 uses. It has the ANSI information_schema, plus its on representation.
> I've been concentrating on using information_schema.
>
> My program which is doing the work is C++ (libpqxx).
>
> For this case actual mapping is only about depicting primary keys and
> secondary keys, with a line drawn from one table to the other when a
> foreign key needs a column. This should show load order for various data
> files, it doesn't need to be much more detail. So far I'm drawing an SVG
> image for a root node and all foreign keys depending on it, but I have some
> issues with multi-column  foreign key queries to relate one exact column of
> one specific table to another column (my current query incorrectly tries to
> map a single column to three different columns if there is a three-way
> relation...I don't need anything describing uniqueness since all I need to
> know is one table is loaded before another).
>
> I chose SVG as my initial output because it can be scaled infinitely (I
> sometimes toy with the idea of creating a 3D povray view just for fun). I
> hope I can add code to allow a web browser to click on a node and drill
> down to information on the XML file containing the node's data. Once I get
> that I can actually write a simple bash script to load files in the correct
> order, but stopping to edit data at break points. If I can edit data and
> keep load order correct the application becomes usable. It also becomes
> possible to know what new interfaces I must add to the web app which lost
> its interfaces over time from various updates. Being able to manually edit
> the data and get correct results seems to be the place to start rather than
> trying to fix an interface I don't understand.
>
> I'm not certain if I properly understand multi-column foreign key
> references...I probably do, but understanding how to automate an
> information_schema query which names another column implementing the data
> is failing in the multi-column case.
>
> ----- Original Message -----
> From: Bear Giles <bgiles at coyotesong.com>
> To: Boulder (Colorado) Linux Users Group -- General Mailing List <
> lug at lug.boulder.co.us>
> Sent: Wed, 23 Aug 2017 20:53:05 -0000 (UTC)
> Subject: Re: [lug] Q. on SQL Foreign Keys (PostgreSQL)
> Neo4j is noSQL.
> What language(s) are you using?  I know how to collect the table metadata
> (columns, indexes, foreign keys, etc.) in Java and it would be easy to spit
> it back out in something like xml or json.
> However what do you plan do to with it. I don't remember if drawing a
> planar graph is NP-complete but with a thousand tables and multi-column
> referential integrity keys any physical depiction is going to be difficult.
> At a minimum you'll probably want to find some way to partition the tables
> into somewhat cohesive subunits.
> Note: if this is an old database and you can do this work on a copy (so
> you don't break everything) you can move the cohesive subunits into their
> own schemas. The cross-schema references just use a fully-qualified name.
> Each schema will be a lot easier to understand and maintain since it will
> be much more focused (and maybe even akin to a microservice architecture),
> e.g., for a pseudo-Amazon you would have separate schemas for customer
> accounts, payment information, orders (what was ordered), shipments (when
> was it shipped? to where?), book information, book reviews, etc.
> At the end of this you can create views so that all of the tables and
> views remain in their new schemas but still appear to be in a single schema
> (probably 'public') that the application already expects.
> Oh yeah - back to the plotting. There's a plotting library that takes a
> graph description and produces a dynamic image of it. You can tug on
> individual nodes to reorganize it. I'm drawing a complete blank on the name
> of it though - it's not java specific. It's used on a number of websites
> though, and IIRC you can customize it so that the graph nodes could be ERD
> elements although the results would probably be utterly unreadable.
>
> On Wed, Aug 23, 2017 at 2:40 PM, <stimits at comcast.net> wrote:
>
>> It sounds like Neo4j is an actual database, rather than a tool to map
>> other people's database (though it looks like something I would put on a
>> PostgreSQL wish list).
>>
>> I specifically have data from other people's applications for which the
>> applications evolved over time and the original interface to install the
>> data was lost. The same data is used, but only direct SQL edit is now
>> possible on part of that data. I need to be able to map out some load
>> orders from some XML files in order to correctly edit that sample data for
>> real world use, and secondarily create a working interface after figuring
>> out what is missing. The database in use is PostgreSQL, so if Neo4j can map
>> this, I'm all for it. It seems though that I would have to dump
>> PostgreSQL...which isn't really an alternative. Plus this is all free work
>> and free software running on Linux, no money is available for purchasing
>> commercial licensing or third party support.
>>
>> ----- Original Message -----
>> From: Alan Robertson <alanr at unix.sh>
>> To: lug at lug.boulder.co.us
>> Sent: Wed, 23 Aug 2017 20:24:51 -0000 (UTC)
>> Subject: Re: [lug] Q. on SQL Foreign Keys (PostgreSQL)
>> Maybe use Neo4j?  ;-)
>> /me ducks...
>> --
>>   Alan Robertson
>>   alanr at unix.sh
>> On Wed, Aug 23, 2017, at 01:25 PM, stimits at comcast.net wrote:
>>
>> Hi,
>>
>> I'm wondering if there is someone I might be able to ask a question
>> off-list about SQL foreign keys (I'm using PostgreSQL, but this should
>> apply to most SQL servers). Basically I'm trying to build an application
>> which will read metadata tables and draw a graphical chart of complex
>> relations (perhaps 1000 tables and almost all interrelated) and having a
>> problem with foreign key descriptions. I've been trying to figure this out
>> for quite some time, but can't wrap my brain around it. I want to know if
>> my idea of how multi-column foreign keys work is correct, or if I need to
>> approach this in a different way. Thanks!
>> _______________________________________________
>> Web Page:  http://lug.boulder.co.us
>> Mailing List: http://lists.lug.boulder.co.us/mailman/listinfo/lug
>> Join us on IRC: irc.hackingsociety.org port=6667 channel=#hackingsociety
>>
>>
>> _______________________________________________
>> Web Page:  http://lug.boulder.co.us
>> Mailing List: http://lists.lug.boulder.co.us/mailman/listinfo/lug
>> Join us on IRC: irc.hackingsociety.org port=6667 channel=#hackingsociety
>
>
> _______________________________________________
> Web Page:  http://lug.boulder.co.us
> Mailing List: http://lists.lug.boulder.co.us/mailman/listinfo/lug
> Join us on IRC: irc.hackingsociety.org port=6667 channel=#hackingsociety
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lug.boulder.co.us/pipermail/lug/attachments/20170823/ac9103a3/attachment.html>


More information about the LUG mailing list