[lug] Q. on SQL Foreign Keys (PostgreSQL)

stimits at comcast.net stimits at comcast.net
Wed Aug 23 16:27:05 MDT 2017


Submitting a query and filling in a vector, followed by turning it into a tree, is fairly trivial. The thing to know is that the XML files may not load all columns at the same time in a given table. Perhaps more important is that implementing a new interface for the application also will not update all columns at the same time, but I will have need for specific columns. So it seems that knowing a given column refers to another particular column is useful even if not mandatory (there are some references to business models for which exact knowledge of column referencing another specific column would also be very handy when looking at it in a more abstract way). I have not yet seen it, but I suppose it is possible a multi-column key could could require a unique value across other columns where not all columns are in the same table...basically I want to be certain even if I never run into that case.
 
So far what I do is load a vector of all tables and columns. Then I promote regular columns to primary keys where applicable (this works...perhaps my primary key query is correct or else I've failed to detect two attempts to promote the same key to primary key status).
 
After this I try to promote relevant columns to foreign key status. This is where it goes wrong because my knowledge of information_schema is insufficient...I should be getting one column promoted to a foreign key status pointing at a single other column. My concept is that even if multiple columns are being required in combination to be unique that the data in a particular column is itself determined only by the knowledge of another single column. Somehow I need to get the JOIN to not match the column marked as a foreign key with columns which are used as a uniqueness constraint...only one column should provide the data the key column matches even if other columns are part of uniqueness. I have trouble describing this, and this is one of the things I want to understand: Perhaps my concept is wrong and this can't be done where one column of a foreign key always points at the data of one and only one remote column (I'm quite willing to throw out uniqueness requirements thrown in by other parts of the key).
 
This is probably getting a bit off topic for Linux though, I don't know if perhaps the rest of the world is getting bored of the conversation :P
 
If interested, this is the query which should run on any DB with the ANSI information_schema, but is being run on a Fedora PostgreSQL 9.4 server (multi-column constraints fail to provide a unique row for a local_table.key_column to point at):
 
SELECT    tc.table_name AS local_table,    kcu.column_name AS key_column,    ccu.table_name AS fk_table,    ccu.column_name AS fk_column FROM information_schema.table_constraints AS tc     JOIN information_schema.key_column_usage AS kcu      ON tc.constraint_name = kcu.constraint_name                AND tc.constraint_type = 'FOREIGN KEY'    JOIN information_schema.constraint_column_usage AS ccu      ON ccu.constraint_name = tc.constraint_nameWHERE constraint_type = 'FOREIGN KEY' AND tc.table_nameIN (SELECT table_name  FROM information_schema.tables WHERE table_schema='public'   AND table_type='BASE TABLE')AND (tc.table_name, kcu.column_name) <> (ccu.table_name, ccu.column_name)ORDER BY local_table, key_column, fk_table;
 
----- Original Message -----From: Bear Giles <bgiles at coyotesong.com>To: Boulder (Colorado) Linux Users Group -- General Mailing List <lug at lug.boulder.co.us>Sent: Wed, 23 Aug 2017 21:31:39 -0000 (UTC)Subject: Re: [lug] Q. on SQL Foreign Keys (PostgreSQL)


I used libpq years ago - it might be easier to use to get the metadata than querying the information_schema directly.

To get the load order you need to create a graph with nodes = tables, ordered links = referential integrity dependency. I don't think you need to worry about the columns -- a foreign key must refer to a single table no matter how many columns it uses.

Do a topological sort on the graph.

1. Go through the graph and remove all nodes with no inbound or outbound dependencies. These tables can be loaded at any time - before or after the rest of the data.

2. Go through the graph and list all nodes with no inbound dependencies and at least one outbound dependency. These don't have any dependencies but other tables depend on them. You can load these tables in any order.

3. Remove those nodes, decrement the inbound dependencies counter on the corresponding nodes/tables, and repeat. This will create sets of tables and while the order of the tables within the set don't matter the order of tables between sets do.

This will either finish with no nodes remaining or you'll find a circular dependency. That dependency may involve more than two tables - A depends on B which depends on C which depends on A - since I think the database may be smart enough to prevent you from creating a two-party dependency. In this case you're hosed.

(Well, not really, but you need to know that it will require special handling and the tables will be difficult to update without disabling and reenabling constraints.)

Remember in all cases you want to turn off the indexes and foreign keys, load the data, and then recreate the index (upstream table) or foreign key (downstream table). That's a lot more efficient than populating the table with the constraints in place. It';s up to you whether you do this in one big move - in which case order doesn't matter as long as you create the index before the foreign key - or at each step so you'll have earlier warning of a problem.

Finally run 'vacuum analyze'.


On Wed, Aug 23, 2017 at 3:15 PM, <stimits at comcast.net> wrote:

The database itself seems to be SQL92 or newer, not sure what PostgreSQL 9.4.9 uses. It has the ANSI information_schema, plus its on representation. I've been concentrating on using information_schema.
 
My program which is doing the work is C++ (libpqxx).
 
For this case actual mapping is only about depicting primary keys and secondary keys, with a line drawn from one table to the other when a foreign key needs a column. This should show load order for various data files, it doesn't need to be much more detail. So far I'm drawing an SVG image for a root node and all foreign keys depending on it, but I have some issues with multi-column  foreign key queries to relate one exact column of one specific table to another column (my current query incorrectly tries to map a single column to three different columns if there is a three-way relation...I don't need anything describing uniqueness since all I need to know is one table is loaded before another).
 
I chose SVG as my initial output because it can be scaled infinitely (I sometimes toy with the idea of creating a 3D povray view just for fun). I hope I can add code to allow a web browser to click on a node and drill down to information on the XML file containing the node's data. Once I get that I can actually write a simple bash script to load files in the correct order, but stopping to edit data at break points. If I can edit data and keep load order correct the application becomes usable. It also becomes possible to know what new interfaces I must add to the web app which lost its interfaces over time from various updates. Being able to manually edit the data and get correct results seems to be the place to start rather than trying to fix an interface I don't understand.
 
I'm not certain if I properly understand multi-column foreign key references...I probably do, but understanding how to automate an information_schema query which names another column implementing the data is failing in the multi-column case.


 
----- Original Message -----From: Bear Giles <bgiles at coyotesong.com>To: Boulder (Colorado) Linux Users Group -- General Mailing List <lug at lug.boulder.co.us>Sent: Wed, 23 Aug 2017 20:53:05 -0000 (UTC)Subject: Re: [lug] Q. on SQL Foreign Keys (PostgreSQL)


Neo4j is noSQL.

What language(s) are you using?  I know how to collect the table metadata (columns, indexes, foreign keys, etc.) in Java and it would be easy to spit it back out in something like xml or json.

However what do you plan do to with it. I don't remember if drawing a planar graph is NP-complete but with a thousand tables and multi-column referential integrity keys any physical depiction is going to be difficult. At a minimum you'll probably want to find some way to partition the tables into somewhat cohesive subunits.

Note: if this is an old database and you can do this work on a copy (so you don't break everything) you can move the cohesive subunits into their own schemas. The cross-schema references just use a fully-qualified name. Each schema will be a lot easier to understand and maintain since it will be much more focused (and maybe even akin to a microservice architecture), e.g., for a pseudo-Amazon you would have separate schemas for customer accounts, payment information, orders (what was ordered), shipments (when was it shipped? to where?), book information, book reviews, etc.

At the end of this you can create views so that all of the tables and views remain in their new schemas but still appear to be in a single schema (probably 'public') that the application already expects.

Oh yeah - back to the plotting. There's a plotting library that takes a graph description and produces a dynamic image of it. You can tug on individual nodes to reorganize it. I'm drawing a complete blank on the name of it though - it's not java specific. It's used on a number of websites though, and IIRC you can customize it so that the graph nodes could be ERD elements although the results would probably be utterly unreadable.


On Wed, Aug 23, 2017 at 2:40 PM, <stimits at comcast.net> wrote:

It sounds like Neo4j is an actual database, rather than a tool to map other people's database (though it looks like something I would put on a PostgreSQL wish list).
 
I specifically have data from other people's applications for which the applications evolved over time and the original interface to install the data was lost. The same data is used, but only direct SQL edit is now possible on part of that data. I need to be able to map out some load orders from some XML files in order to correctly edit that sample data for real world use, and secondarily create a working interface after figuring out what is missing. The database in use is PostgreSQL, so if Neo4j can map this, I'm all for it. It seems though that I would have to dump PostgreSQL...which isn't really an alternative. Plus this is all free work and free software running on Linux, no money is available for purchasing commercial licensing or third party support.


 
----- Original Message -----From: Alan Robertson <alanr at unix.sh>To: lug at lug.boulder.co.usSent: Wed, 23 Aug 2017 20:24:51 -0000 (UTC)Subject: Re: [lug] Q. on SQL Foreign Keys (PostgreSQL)


Maybe use Neo4j?  ;-)

/me ducks...


--
  Alan Robertson
  alanr at unix.sh




On Wed, Aug 23, 2017, at 01:25 PM, stimits at comcast.net wrote:

Hi,
 
I'm wondering if there is someone I might be able to ask a question off-list about SQL foreign keys (I'm using PostgreSQL, but this should apply to most SQL servers). Basically I'm trying to build an application which will read metadata tables and draw a graphical chart of complex relations (perhaps 1000 tables and almost all interrelated) and having a problem with foreign key descriptions. I've been trying to figure this out for quite some time, but can't wrap my brain around it. I want to know if my idea of how multi-column foreign keys work is correct, or if I need to approach this in a different way. Thanks!
_______________________________________________
Web Page:  http://lug.boulder.co.us
Mailing List: http://lists.lug.boulder.co.us/mailman/listinfo/lug
Join us on IRC: irc.hackingsociety.org port=6667 channel=#hackingsociety





_______________________________________________

Web Page:  http://lug.boulder.co.us

Mailing List: http://lists.lug.boulder.co.us/mailman/listinfo/lug

Join us on IRC: irc.hackingsociety.org port=6667 channel=#hackingsociety






_______________________________________________

Web Page:  http://lug.boulder.co.us

Mailing List: http://lists.lug.boulder.co.us/mailman/listinfo/lug

Join us on IRC: irc.hackingsociety.org port=6667 channel=#hackingsociety



-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lug.boulder.co.us/pipermail/lug/attachments/20170823/262ca707/attachment.html>


More information about the LUG mailing list