[lug] Q. on SQL Foreign Keys (PostgreSQL)

Sun Aug 27 07:01:47 MDT 2017

If you're using tomcat to present the information I think you'll be far
better off working with JDBC instead of trying to work with queries of the
information schema.

The process is pretty straightforward.

1. get a Connection.

2. call conn.getDatabaseMetaData().
https://docs.oracle.com/javase/8/docs/api/java/sql/DatabaseMetaData.html

3. call metadata.getTables() to get a list of all tables. You can specify
null for most of the fields to get a list of all tables in the database -
but you'll probably want to specify 'public' for schema to avoid pulling in
the tables used by the database itself, etc.

4. call metadata.getPrimaryKeys() to get a list of primary keys for each
table.

5. call metadata.getIndexInfo() to get a list of all indexes for each
table. Note: we know that referential integrity requires the target be a
unique index, either a primary key or a secondary index. IIRC each index
has a name, either explicit or system generated, so you'll want to create a
bidirectional map of (table, columns) and index name and can then use that
as shorthand when identifying dependencies.

6. call metadata.getExportedKeys() and metadata.getImportedKeys() to get
the referential integrity checks. I think that's 'from' and 'to'.  You can
use the map discussed above to create a list of (from-table, to-table,
index-name).

7. there's also a metadata.getCrossReference() method but I've never used
it.

8. finally, as an aside there's a metadata.getBestRowIdentifier() method
that can help you if a table doesn't have a primary key.

All of these methods return a ResultSet. You'll have to iterate over it but
it's well-documented and it's easy to figure out what goes together - all
of the leading columns will be the same and (iirc) there's usually a
position number as well. You might want to do a ResultSetMetaData query on
the ResultSet since the spec only requires the minimum columns. Many
databases provide additional information. Also some databases provide a
'blank' record and then provide additional information.

You only have to do this once - obviously if you're retrieving this
information directly from the database you can write summary information to
your own tables.

(Note: from a security perspective use two tables. Create a read-only user
with only SELECT permissions on the original database to retrieve the
metadata, create a standard user on a second database to write any summary
tables. Your tomcat instance should only connect to the latter. "second
database" can be the same physical database server - just use a different
user account. Your analysis program will need to establish two connections
but that's not a problem.)

As I mentioned earlier if you're working on a copy of the databases - you
can use pg_dump to dump just the schema and import it into a new database -
then you should consider moving tables into new schemas as you work. It
could be as simple as "public" and "done", or it could collect related
tables into separate schemas. PostgreSQL is nice since you can specify
multiple schemas to use without a prefix - it's a 'set xxx=x,y,z' call
after you establish the connection. I forgot the 'xxx' at the moment but
it's easy to find.

Bear

P.S., you can specify null values on the getPrimaryKey(), etc., and get
information for all of the tables in a single call. That's slightly more
efficient from a database perspective - you'll have a single query instead
of a thousand - and it's not much harder to code for. However conceptually
it's probably better to do one call to get a list of tables, then call a
separate method that gets all of the keys, indexes, etc., for each table.
Obviously you'll want to get all of the tables + indexes (+ names) before
doing the foreign reference calls so you can immediately reduce
multi-column foreign keys to the single index name.

On Fri, Aug 25, 2017 at 3:42 PM, <stimits at comcast.net> wrote:

> ...
> > On 08/23/2017 04:27 PM, stimits at comcast.net wrote:
> >> My concept is that even if multiple columns are being required in
> combination to be unique that the data in >a particular column is itself
> determined only by the knowledge of another single column.
> >
> >Not following this. The compound key usage I know has
> > T1.A -> T2.A
> > T1.B -> T2.B
>
> This is how I believe it works, but I'm trying to verify. My current query
> is probably wrong because it results in something like:
> T1.A->T2.A
> T1.A->T2.B
> T1.B->T2.A
> T1.B->T2.B
> (...note T1.A pointing at multiple providers...my JOIN is wrong but I'm
> struggling to understand ANSI information_schema in a way which allows
> one-and-only-one column-to-column relation...there does not seem to be a
> way to ignore T1.A->T2.B and T1.B->T2.A using information_schema...I was
> hoping someone here knows more about information_schema since it is ANSI)
>
> So this does probably confirm the query JOIN is wrong instead of my
> concept of one-to-one column-to-single-column foreign key relation.
>
> >where you join on A & B in both tables. The fields A and B need not have a
> >uniqueness constraint in either table. So what are the foreign key
> dependencies
> >here? T1.A depends on both T2.A and T2.B?
> >
>
> So far as I know multiple columns of the table with the foreign key are
> combined in a single constraint such that T1.A and T1.B are unique, and
> such that T1.A points at a different table.column, and T1.B also points to
> a different table.column. I'm struggling to find a way to limit the result
> to a single column pointing at the single column...the query fails because
> it can't distinguish which of the foreign columns corresponds to one
> particular local column.
>
> >Or are you referring to a situation where T1.X is a string concatenation
> >or a struct with values from T2.A, T2.B, T2.C? Saying that T1.X has
> >a foreign key depending on T2.A, T2.B & T2.C would make sense here.
>
> I was thinking perhaps my idea of a single column pointing at a single
> column might fail if it were possible for a foreign key to be defined as a
> concatenation of two text/varchar columns. I don't think these tables do
> this, but if it is possible to do so, then I probably have to program to
> take this into account. Every time I simplify and don't include corner
> cases I find I have to start over.
>
> >If you can find joins with the fields in question maybe it will clarify
> things.
> >Knowing if these relationships are supposed to drive cascading operations
> (e.g.
> >delete) would probably also help.
>
>
> _______________________________________________
> Web Page:  http://lug.boulder.co.us
> Mailing List: http://lists.lug.boulder.co.us/mailman/listinfo/lug
> Join us on IRC: irc.hackingsociety.org port=6667 channel=#hackingsociety
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lug.boulder.co.us/pipermail/lug/attachments/20170827/627b1cd9/attachment.html>