[lug] Q. on SQL Foreign Keys (PostgreSQL)

Mon Aug 28 15:47:51 MDT 2017

The actual application doing the mapping of the database is C++ using ordinary queries and not running on tomcat. The application which will later need edits is on tomcat (it isn't my application), but is independent of everything except the database itself. I was hoping the ANSI schema tables would provide knowledge on foreign key column relations. The JDBC queries you listed are what I am looking for, but it has to be done without JDBC/ODBC via a SQL query (in theory this would work on any database with ANSI information_schema).

What would be really interesting is if the JDBC front end you mentioned could be mapped out via PostgreSQL "explain" utilities to see what the actual query is behind this. Unfortunately I am struggling to discover how information_schema knows a particular column used as a foreign key corresponds to another column when the constraint was created using a multi-column constraint. Nowhere can I find a table relating the foreign column to another table's column unless everything was single column to start with.

----- Original Message -----From: Bear Giles <bgiles at coyotesong.com>To: Boulder (Colorado) Linux Users Group -- General Mailing List <lug at lug.boulder.co.us>Sent: Sun, 27 Aug 2017 13:01:47 -0000 (UTC)Subject: Re: [lug] Q. on SQL Foreign Keys (PostgreSQL)

If you're using tomcat to present the information I think you'll be far better off working with JDBC instead of trying to work with queries of the information schema.

The process is pretty straightforward.

1. get a Connection.

2. call conn.getDatabaseMetaData().  https://docs.oracle.com/javase/8/docs/api/java/sql/DatabaseMetaData.html

3. call metadata.getTables() to get a list of all tables. You can specify null for most of the fields to get a list of all tables in the database - but you'll probably want to specify 'public' for schema to avoid pulling in the tables used by the database itself, etc.

4. call metadata.getPrimaryKeys() to get a list of primary keys for each table.

5. call metadata.getIndexInfo() to get a list of all indexes for each table. Note: we know that referential integrity requires the target be a unique index, either a primary key or a secondary index. IIRC each index has a name, either explicit or system generated, so you'll want to create a bidirectional map of (table, columns) and index name and can then use that as shorthand when identifying dependencies.

6. call metadata.getExportedKeys() and metadata.getImportedKeys() to get the referential integrity checks. I think that's 'from' and 'to'.  You can use the map discussed above to create a list of (from-table, to-table, index-name).

7. there's also a metadata.getCrossReference() method but I've never used it.

8. finally, as an aside there's a metadata.getBestRowIdentifier() method that can help you if a table doesn't have a primary key.

All of these methods return a ResultSet. You'll have to iterate over it but it's well-documented and it's easy to figure out what goes together - all of the leading columns will be the same and (iirc) there's usually a position number as well. You might want to do a ResultSetMetaData query on the ResultSet since the spec only requires the minimum columns. Many databases provide additional information. Also some databases provide a 'blank' record and then provide additional information.

You only have to do this once - obviously if you're retrieving this information directly from the database you can write summary information to your own tables.

(Note: from a security perspective use two tables. Create a read-only user with only SELECT permissions on the original database to retrieve the metadata, create a standard user on a second database to write any summary tables. Your tomcat instance should only connect to the latter. "second database" can be the same physical database server - just use a different user account. Your analysis program will need to establish two connections but that's not a problem.)

As I mentioned earlier if you're working on a copy of the databases - you can use pg_dump to dump just the schema and import it into a new database - then you should consider moving tables into new schemas as you work. It could be as simple as "public" and "done", or it could collect related tables into separate schemas. PostgreSQL is nice since you can specify multiple schemas to use without a prefix - it's a 'set xxx=x,y,z' call after you establish the connection. I forgot the 'xxx' at the moment but it's easy to find. 

Bear

P.S., you can specify null values on the getPrimaryKey(), etc., and get information for all of the tables in a single call. That's slightly more efficient from a database perspective - you'll have a single query instead of a thousand - and it's not much harder to code for. However conceptually it's probably better to do one call to get a list of tables, then call a separate method that gets all of the keys, indexes, etc., for each table. Obviously you'll want to get all of the tables + indexes (+ names) before doing the foreign reference calls so you can immediately reduce multi-column foreign keys to the single index name.

On Fri, Aug 25, 2017 at 3:42 PM, <stimits at comcast.net> wrote:

...
> On 08/23/2017 04:27 PM, stimits at comcast.net wrote:>> My concept is that even if multiple columns are being required in combination to be unique that the data in >a particular column is itself determined only by the knowledge of another single column.
>
>Not following this. The compound key usage I know has> T1.A -> T2.A> T1.B -> T2.B

This is how I believe it works, but I'm trying to verify. My current query is probably wrong because it results in something like:
T1.A->T2.A
T1.A->T2.B
T1.B->T2.A
T1.B->T2.B
(...note T1.A pointing at multiple providers...my JOIN is wrong but I'm struggling to understand ANSI information_schema in a way which allows one-and-only-one column-to-column relation...there does not seem to be a way to ignore T1.A->T2.B and T1.B->T2.A using information_schema...I was hoping someone here knows more about information_schema since it is ANSI)

So this does probably confirm the query JOIN is wrong instead of my concept of one-to-one column-to-single-column foreign key relation.
  >where you join on A & B in both tables. The fields A and B need not have a>uniqueness constraint in either table. So what are the foreign key dependencies>here? T1.A depends on both T2.A and T2.B?
>

So far as I know multiple columns of the table with the foreign key are combined in a single constraint such that T1.A and T1.B are unique, and such that T1.A points at a different table.column, and T1.B also points to a different table.column. I'm struggling to find a way to limit the result to a single column pointing at the single column...the query fails because it can't distinguish which of the foreign columns corresponds to one particular local column.

>Or are you referring to a situation where T1.X is a string concatenation>or a struct with values from T2.A, T2.B, T2.C? Saying that T1.X has>a foreign key depending on T2.A, T2.B & T2.C would make sense here.

I was thinking perhaps my idea of a single column pointing at a single column might fail if it were possible for a foreign key to be defined as a concatenation of two text/varchar columns. I don't think these tables do this, but if it is possible to do so, then I probably have to program to take this into account. Every time I simplify and don't include corner cases I find I have to start over.

>If you can find joins with the fields in question maybe it will clarify things.>Knowing if these relationships are supposed to drive cascading operations (e.g.>delete) would probably also help.

_______________________________________________

Web Page:  http://lug.boulder.co.us

Mailing List: http://lists.lug.boulder.co.us/mailman/listinfo/lug

Join us on IRC: irc.hackingsociety.org port=6667 channel=#hackingsociety

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lug.boulder.co.us/pipermail/lug/attachments/20170828/85e6b3bb/attachment.html>