Commits · 632ae6829f7abda34e15082c91d9dfb3fc0f298b · Abuhujair Javed / Postgres FD Implementation

16 Aug, 2011 9 commits

Forget about targeting catalog cache invalidations by tuple TID. · 632ae682

Tom Lane authored Aug 16, 2011

The TID isn't stable enough: we might queue an sinval event before a VACUUM
FULL, and then process it afterwards, when the target tuple no longer has
the same TID. So we must invalidate entries on the basis of hash value
only. The old coding can be shown to result in various bizarre,
hard-to-reproduce errors in the presence of concurrent VACUUM FULLs on
system catalogs, and could easily result in permanent catalog corruption,
up to and including complete loss of tables.

This commit is just a minimal fix that removes the unsafe comparison.
We should remove transmission of the tuple TID from sinval messages
altogether, and then arrange to suppress the extra message in the common
case of a heap_update that doesn't change the key hashvalue. But that's
going to be much more invasive, and will only produce a probably-marginal
performance gain, so it doesn't seem like material for a back-patch.

Back-patch to 9.0. Before that, VACUUM FULL refused to do any tuple moving
if it found any INSERT_IN_PROGRESS or DELETE_IN_PROGRESS tuples (and
CLUSTER would give up altogether), so there was no risk of moving a tuple
that might be the subject of an unsent sinval message.

632ae682

Fix incorrect order of operations during sinval reset processing. · f4d7f1ad

Tom Lane authored Aug 16, 2011

We have to be sure that we have revalidated each nailed-in-cache relcache
entry before we try to use it to load data for some other relcache entry.
The introduction of "mapped relations" in 9.0 broke this, because although
we updated the state kept in relmapper.c early enough, we failed to
propagate that information into relcache entries soon enough; in
particular, we could try to fetch pg_class rows out of pg_class before
we'd updated its relcache entry's rd_node.relNode value from the map.

This bug accounts for Dave Gould's report of failures after "vacuum full
pg_class", and I believe that there is risk for other system catalogs
as well.

The core part of the fix is to copy relmapper data into the relcache
entries during "phase 1" in RelationCacheInvalidate(), before they'll be
used in "phase 2". To try to future-proof the code against other similar
bugs, I also rearranged the order in which nailed relations are visited
during phase 2: now it's pg_class first, then pg_class_oid_index, then
other nailed relations. This should ensure that RelationClearRelation can
apply RelationReloadIndexInfo to all nailed indexes without risking use
of not-yet-revalidated relcache entries.

Back-patch to 9.0 where the relation mapper was introduced.

f4d7f1ad

Preserve toast value OIDs in toast-swap-by-content for CLUSTER/VACUUM FULL. · 7b0d0e93

Tom Lane authored Aug 16, 2011

This works around the problem that a catalog cache entry might contain a
toast pointer that we try to dereference just as a VACUUM FULL completes
on that catalog. We will see the sinval message on the cache entry when
we acquire lock on the toast table, but by that point we've already told
tuptoaster.c "here's the pointer to fetch", so it's difficult from a code
structural standpoint to update the pointer before we use it. Much less
painful to ensure that toast pointers are not invalidated in the first
place. We have to add a bit of code to deal with the case that a value
that previously wasn't toasted becomes so; but that should be a
seldom-exercised corner case, so the inefficiency shouldn't be significant.

Back-patch to 9.0. In prior versions, we didn't allow CLUSTER on system
catalogs, and VACUUM FULL didn't result in reassignment of toast OIDs, so
there was no problem.

7b0d0e93

Fix race condition in relcache init file invalidation. · 2ada6779

Tom Lane authored Aug 16, 2011

The previous code tried to synchronize by unlinking the init file twice,
but that doesn't actually work: it leaves a window wherein a third process
could read the already-stale init file but miss the SI messages that would
tell it the data is stale.  The result would be bizarre failures in catalog
accesses, typically "could not read block 0 in file ..." later during
startup.

Instead, hold RelCacheInitLock across both the unlink and the sending of
the SI messages.  This is more straightforward, and might even be a bit
faster since only one unlink call is needed.

This has been wrong since it was put in (in 2002!), so back-patch to all
supported releases.

2ada6779

Adjust total size in pg_basebackup progress report when reality changes · 1bb69245

Magnus Hagander authored Aug 16, 2011

When streaming including WAL, the size estimate will always be incorrect,
since we don't know how much WAL is included. To make sure the output doesn't
look completely unreasonable, this patch increases the total size whenever we
go past the estimate, to make sure we never go above 100%.

1bb69245

Fix bogus comment that claimed that the new BACKUP METHOD line in · 2877c67b
Heikki Linnakangas authored Aug 16, 2011
```
backup_label was new in 9.0. Spotted by Fujii Masao.
```
2877c67b

Make pg_basebackup progress report translatable · 3b3f0935

Peter Eisentraut authored Aug 16, 2011

Also fix a potential portability bug, because INT64_FORMAT is only
guaranteed to be available with snprintf, not fprintf.

3b3f0935

Use less cryptic variable names · 005e5c30
Peter Eisentraut authored Aug 16, 2011

005e5c30
In pg_upgrade, avoid dumping orphaned temporary tables. This makes the · 2411fbda
Bruce Momjian authored Aug 15, 2011
```
pg_upgrade schema matching pattern match pg_dump/pg_dumpall.

Fix for 9.0, 9.1, and 9.2.
```
2411fbda

15 Aug, 2011 2 commits

Adjust regression tests for error message change · 5845f427
Peter Eisentraut authored Aug 15, 2011

5845f427

Add "Reason code" prefix to internal SSI error messages · e5475a80

Peter Eisentraut authored Aug 15, 2011

This makes it clearer that the error message is perhaps not supposed
to be understood by users, and it also makes it somewhat clearer that
it was not accidentally omitted from translation.

Idea from Heikki Linnakangas, except that we don't mark "Reason code"
for translation at this point, because that would make the
implementation too cumbersome.

e5475a80

14 Aug, 2011 3 commits

Fix unsafe order of operations in foreign-table DDL commands. · 52994e9e

Tom Lane authored Aug 14, 2011

When updating or deleting a system catalog tuple, it's necessary to acquire
RowExclusiveLock on the catalog before looking up the tuple; otherwise a
concurrent VACUUM FULL on the catalog might move the tuple to a different
TID before we can apply the update. Coding patterns that find the tuple
via a table scan aren't at risk here, but when obtaining the tuple from a
catalog cache, correct ordering is important; and several routines in
foreigncmds.c got it wrong. Noted while running the regression tests in
parallel with VACUUM FULL of assorted system catalogs.

For consistency I moved all the heap_open calls to the starts of their
functions, including a couple for which there was no actual bug.

Back-patch to 8.4 where foreigncmds.c was added.

52994e9e

Message style improvements · 85612039
Peter Eisentraut authored Aug 14, 2011

85612039
Fix typo · 7431cb25
Peter Eisentraut authored Aug 14, 2011

7431cb25

13 Aug, 2011 2 commits

Fix incorrect timeout handling during initial authentication transaction. · 592b615d

Tom Lane authored Aug 13, 2011

The statement start timestamp was not set before initiating the transaction
that is used to look up client authentication information in pg_authid.
In consequence, enable_sig_alarm computed a wrong value (far in the past)
for statement_fin_time. That didn't have any immediate effect, because the
timeout alarm was set without reference to statement_fin_time; but if we
subsequently blocked on a lock for a short time, CheckStatementTimeout
would consult the bogus value when we cancelled the lock timeout wait,
and then conclude we'd timed out, leading to immediate failure of the
connection attempt. Thus an innocent "vacuum full pg_authid" would cause
failures of concurrent connection attempts. Noted while testing other,
more serious consequences of vacuum full on system catalogs.

We should set the statement timestamp before StartTransactionCommand(),
so that the transaction start timestamp is also valid. I'm not sure if
there are any non-cosmetic effects of it not being valid, but the xact
timestamp is at least sent to the statistics machinery.

Back-patch to 9.0. Before that, the client authentication timeout was done
outside any transaction and did not depend on this state to be valid.

592b615d

Make USECS_PER_* timestamp macros visible even when we are not using · 6d7bd5de
Bruce Momjian authored Aug 12, 2011
```
integer timestamps.
```
6d7bd5de

11 Aug, 2011 5 commits

Add major features list and introductory text for 9.1 release notes. · eb72adc8
Bruce Momjian authored Aug 11, 2011
```
Backpatch to 9.1, obviously.
```
eb72adc8

Teach unix_latch.c to use poll() where available. · a180776f

Tom Lane authored Aug 11, 2011

poll() is preferred over select() on platforms where both are available,
because it tends to be a bit faster and it doesn't have an arbitrary limit
on the range of FD numbers that can be accessed. The FD range limit does
not appear to be a risk factor for any 9.1 usages, so this doesn't need to
be back-patched, but we need to have it in place if we keep on expanding
the uses of WaitLatch.

a180776f

Display both per-table and per-column FDW options in psql's \d output. · d82d8486
Robert Haas authored Aug 11, 2011
```
Along the way, rename "Options" to "FDW Options" in various places for
consistency and clarity.

Shigeru Hanada
```
d82d8486

Unbreak legacy syntax "COMMENT ON RULE x IS y", with no relation name. · 5057366e

Robert Haas authored Aug 11, 2011

check_object_ownership() isn't happy about the null relation pointer.
We could fix it there, but this seems more future-proof.

5057366e

Change psql's \dd command to do something more useful. · 59414cde

Robert Haas authored Aug 11, 2011

Instead of displaying comments on an arbitrary subset of the object
types which support them, make \dd display comments on exactly those
object types which don't have their own backlash commands.  We now
regard the display of comments as properly the job of the relevant
backslash command (though many of them do so only in verbose mode)
rather than something that \dd should be responsible for.  However,
a handful of object types have no backlash command, so make \dd
give information about those.

Josh Kupershmidt

59414cde

10 Aug, 2011 5 commits

Remove wal_sender_delay GUC, because it's no longer useful. · cff75130

Tom Lane authored Aug 10, 2011

The latch infrastructure is now capable of detecting all cases where the
walsender loop needs to wake up, so there is no reason to have an arbitrary
timeout.

Also, modify the walsender loop logic to follow the standard pattern of
ResetLatch, test for work to do, WaitLatch.  The previous coding was both
hard to follow and buggy: it would sometimes busy-loop despite having
nothing available to do, eg between receipt of a signal and the next time
it was caught up with new WAL, and it also had interesting choices like
deciding to update to WALSNDSTATE_STREAMING on the strength of information
known to be obsolete.

cff75130

Add a bit of debug logging to backend_read_statsfile(). · 79b2ee20

Tom Lane authored Aug 10, 2011

This is in hopes of learning more about what causes "pgstat wait timeout"
warnings in the buildfarm. This patch should probably be reverted once
we've learned what we can. As coded, it will result in regression test
"failures" at half the delay that the existing code does, so I expect
to see a few more than before.

79b2ee20

Change the autovacuum launcher to use WaitLatch instead of a poll loop. · 4dab3d5a

Tom Lane authored Aug 10, 2011

In pursuit of this (and with the expectation that WaitLatch will be needed
in more places), convert the latch field that was already added to PGPROC
for sync rep into a generic latch that is activated for all PGPROC-owning
processes, and change many of the standard backend signal handlers to set
that latch when a signal happens. This will allow WaitLatch callers to be
wakened properly by these signals.

In passing, fix a whole bunch of signal handlers that had been hacked to do
things that might change errno, without adding the necessary save/restore
logic for errno. Also make some minor fixes in unix_latch.c, and clean
up bizarre and unsafe scheme for disowning the process's latch. Much of
this has to be back-patched into 9.1.

Peter Geoghegan, with additional work by Tom

4dab3d5a

Oops, we're working on version 9.2 already, not 9.1. Update the · 1f1b70a7
Heikki Linnakangas authored Aug 10, 2011
```
PG_CONTROL_VERSION accordingly; I updated it wrong in previous commit.
```
1f1b70a7

If backup-end record is not seen, and we reach end of recovery from a · 41f9ffd9

Heikki Linnakangas authored Aug 10, 2011

streamed backup, throw an error and refuse to start up. The restore has not
finished correctly in that case and the data directory is possibly corrupt.
We already errored out in case of archive recovery, but could not during
crash recovery because we couldn't distinguish between the case that
pg_start_backup() was called and the database then crashed (must not error,
data is OK), and the case that we're restoring from a backup and not all
the needed WAL was replayed (data can be corrupt).

To distinguish those cases, add a line to backup_label to indicate
whether the backup was taken with pg_start/stop_backup(), or by streaming
(ie. pg_basebackup).

This requires re-initdb, because of a new field added to the control file.

41f9ffd9

09 Aug, 2011 7 commits

Measure WaitLatch's timeout parameter in milliseconds, not microseconds. · 9f17ffd8

Tom Lane authored Aug 09, 2011

The original definition had the problem that timeouts exceeding about 2100
seconds couldn't be specified on 32-bit machines.  Milliseconds seem like
sufficient resolution, and finer grain than that would be fantasy anyway
on many platforms.

Back-patch to 9.1 so that this aspect of the latch API won't change between
9.1 and later releases.

Peter Geoghegan

9f17ffd8

Documentation improvement and minor code cleanups for the latch facility. · 4e15a4db

Tom Lane authored Aug 09, 2011

Improve the documentation around weak-memory-ordering risks, and do a pass
of general editorialization on the comments in the latch code. Make the
Windows latch code more like the Unix latch code where feasible; in
particular provide the same Assert checks in both implementations.
Fix poorly-placed WaitLatch call in syncrep.c.

This patch resolves, for the moment, concerns around weak-memory-ordering
bugs in latch-related code: we have documented the restrictions and checked
that existing calls meet them. In 9.2 I hope that we will install suitable
memory barrier instructions in SetLatch/ResetLatch, so that their callers
don't need to be quite so careful.

4e15a4db

Avoid creating PlaceHolderVars immediately within PlaceHolderVars. · cff60f2d

Tom Lane authored Aug 09, 2011

Such a construction is useless since the lower PlaceHolderVar is already
nullable; no need to make it more so. Noted while pursuing bug #6154.

This is just a minor planner efficiency improvement, since the final plan
will come out the same anyway after PHVs are flattened. So not worth the
risk of back-patching.

cff60f2d

Use clearer notation for getnameinfo() return handling · f4a9da0a

Peter Eisentraut authored Aug 09, 2011

Writing

    if (getnameinfo(...))
        handle_error();

reads quite strangely, so use something like

    if (getnameinfo(...) != 0)
        handle_error();

instead.

f4a9da0a

Change the way string relopts are allocated. · 77949a29

Heikki Linnakangas authored Aug 09, 2011

Don't try to allocate the default value for a string relopt in the same
palloc chunk as the relopt_string struct. That didn't work too well if you
added a built-in string relopt in the stringRelOpts array, as it's not
possible to have an initializer for a variable length struct in C. This
makes the code slightly simpler too.

While we're at it, move the call to validator function in
add_string_reloption to before the allocation, so that if someone does pass
a bogus default value, we don't leak memory.

77949a29

Fix grammar and spelling in log message. · 5b6c8436
Heikki Linnakangas authored Aug 09, 2011

5b6c8436

Fix nested PlaceHolderVar expressions that appear only in targetlists. · 77ba2325

Tom Lane authored Aug 09, 2011

A PlaceHolderVar's expression might contain another, lower-level
PlaceHolderVar. If the outer PlaceHolderVar is used, the inner one
certainly will be also, and so we have to make sure that both of them get
into the placeholder_list with correct ph_may_need values during the
initial pre-scan of the query (before deconstruct_jointree starts).
We did this correctly for PlaceHolderVars appearing in the query quals,
but overlooked the issue for those appearing in the top-level targetlist;
with the result that nested placeholders referenced only in the targetlist
did not work correctly, as illustrated in bug #6154.

While at it, add some error checking to find_placeholder_info to ensure
that we don't try to create new placeholders after it's too late to do so;
they have to all be created before deconstruct_jointree starts.

Back-patch to 8.4 where the PlaceHolderVar mechanism was introduced.

77ba2325

08 Aug, 2011 4 commits

Teach psql to display the comments on SQL/MED objects in verbose mode. · d82a9d2a

Robert Haas authored Aug 08, 2011

The relevant backslash commands already exist, so we're just adding an
additional column.  With this commit, all objects that have psql backslash
commands and accept comments should now display those comments at least
in verbose mode.

Josh Kupershmidt, with doc additions by me.

d82a9d2a

Teach psql to display the comments on conversions and domains. · c9ac00e6

Robert Haas authored Aug 08, 2011

\dc and \dD now accept a "+" option, which will cause the comments to
be displayed.  Along the way, correct a few oversights in the previous
commit in this area, 3b17efdf - namely,
(1) when \dL+ is used, make description still be the last column, for
consistency with what we've done elsewhere; and (2) document the
difference between \dC and \dC+.

Josh Kupershmidt, with a couple of doc changes by me.

c9ac00e6

Teach vacuumlo to limit number of removals, via new -l option. · b69f2e36

Robert Haas authored Aug 08, 2011

Also, handle failure better: don't just blindly keep trying to delete
stuff after the transaction has already failed.

Tim Lewis, reviewed by Josh Kupershmidt, with further hacking by me.

b69f2e36

Correct the lie in pg_config.h.win32 about having inttypes.h. · f54e373d

Andrew Dunstan authored Aug 08, 2011

This lie has been harmless until now, but has been exposed by the
change to include postgres.h before the python headers, which
in some versions include inttypes.h if HAVE_INTTYPES_H is set.

f54e373d

07 Aug, 2011 3 commits
- Add "Compatibility" section to ALTER EXTENSION reference page · 04b17684
  Peter Eisentraut authored Aug 07, 2011
```
Almost all other pages have one; this one must have been forgotten.
```
  04b17684
- Remove whitespace from end of lines · f29a0044
  Peter Eisentraut authored Aug 07, 2011
  
  f29a0044
- Rename "Example" sections to "Examples" in dblink chapter · a559ed5e
  Peter Eisentraut authored Aug 07, 2011
```
For consistency with other man pages.
```
  a559ed5e