Make pg_dump emit more accurate dependency information.

While pg_dump has included dependency information in archive-format output ever since 7.3, it never made any large effort to ensure that that information was actually useful. In particular, in common situations where dependency chains include objects that aren't separately emitted in the dump, the dependencies shown for objects that were emitted would reference the dump IDs of these un-dumped objects, leaving no clue about which other objects the visible objects indirectly depend on. So far, parallel pg_restore has managed to avoid tripping over this misfeature, but only by dint of some crude hacks like not trusting dependency information in the pre-data section of the archive. It seems prudent to do something about this before it rises up to bite us, so instead of emitting the "raw" dependencies of each dumped object, recursively search for its actual dependencies among the subset of objects that are being dumped. Back-patch to 9.2, since that code hasn't yet diverged materially from HEAD. At some point we might need to back-patch further, but right now there are no known cases where this is actively necessary. (The one known case, bug #6699, is fixed in a different way by my previous patch.) Since this patch depends on 9.2 changes that made TOC entries be marked before output commences as to whether they'll be dumped, back-patching further would require additional surgery; and as of now there's no evidence that it's worth the risk.

Make pg_dump emit more accurate dependency information.
While pg_dump has included dependency information in archive-format output ever since 7.3, it never made any large effort to ensure that that information was actually useful. In particular, in common situations where dependency chains include objects that aren't separately emitted in the dump, the dependencies shown for objects that were emitted would reference the dump IDs of these un-dumped objects, leaving no clue about which other objects the visible objects indirectly depend on. So far, parallel pg_restore has managed to avoid tripping over this misfeature, but only by dint of some crude hacks like not trusting dependency information in the pre-data section of the archive. It seems prudent to do something about this before it rises up to bite us, so instead of emitting the "raw" dependencies of each dumped object, recursively search for its actual dependencies among the subset of objects that are being dumped. Back-patch to 9.2, since that code hasn't yet diverged materially from HEAD. At some point we might need to back-patch further, but right now there are no known cases where this is actively necessary. (The one known case, bug #6699, is fixed in a different way by my previous patch.) Since this patch depends on 9.2 changes that made TOC entries be marked before output commences as to whether they'll be dumped, back-patching further would require additional surgery; and as of now there's no evidence that it's worth the risk.
8a504a36 · Tom Lane · a1ef01fe · 8a504a36 · 8a504a36
Commit 8a504a36 authored Jun 25, 2012 by Tom Lane
Expand all Hide whitespace changes
Inline Side-by-side

Showing with 182 additions and 42 deletions

src/bin/pg_dump/pg_backup_archiver.c src/bin/pg_dump/pg_backup_archiver.c +11 -5

src/bin/pg_dump/pg_dump.c src/bin/pg_dump/pg_dump.c +171 -37

No files found.
--- a/src/bin/pg_dump/pg_backup_archiver.c
+++ b/src/bin/pg_dump/pg_backup_archiver.c
@@ -3498,9 +3498,14 @@ restore_toc_entries_parallel(ArchiveHandle *AH)
 	 * Do all the early stuff in a single connection in the parent. There's no
 	 * great point in running it in parallel, in fact it will actually run
 	 * faster in a single connection because we avoid all the connection and
-	 * setup overhead.	Also, pg_dump is not currently very good about showing
-	 * all the dependencies of SECTION_PRE_DATA items, so we do not risk
-	 * trying to process them out-of-order.
+	 * setup overhead.  Also, pre-9.2 pg_dump versions were not very good
+	 * about showing all the dependencies of SECTION_PRE_DATA items, so we do
+	 * not risk trying to process them out-of-order.
+	 *
+	 * Note: as of 9.2, it should be guaranteed that all PRE_DATA items appear
+	 * before DATA items, and all DATA items before POST_DATA items.  That is
+	 * not certain to be true in older archives, though, so this loop is coded
+	 * to not assume it.
 	 */
 	skipped_some = false;
 	for (next_work_item = AH->toc->next; next_work_item != AH->toc; next_work_item = next_work_item->next)
@@ -4162,8 +4167,9 @@ fix_dependencies(ArchiveHandle *AH)

 	/*
 	 * Count the incoming dependencies for each item.  Also, it is possible
-	 * that the dependencies list items that are not in the archive at all.
-	 * Subtract such items from the depCounts.
+	 * that the dependencies list items that are not in the archive at all
+	 * (that should not happen in 9.2 and later, but is highly likely in
+	 * older archives).  Subtract such items from the depCounts.
 	 */
 	for (te = AH->toc->next; te != AH->toc; te = te->next)
 	{

--- a/src/bin/pg_dump/pg_dump.c
+++ b/src/bin/pg_dump/pg_dump.c