Commits · 08e261cbc94ce9a72c0660b2786eaadae9f6fb41 · Abuhujair Javed / Postgres FD Implementation

01 Nov, 2011 9 commits

Fix race condition with toast table access from a stale syscache entry. · 08e261cb

Tom Lane authored Nov 01, 2011

If a tuple in a syscache contains an out-of-line toasted field, and we
try to fetch that field shortly after some other transaction has committed
an update or deletion of the tuple, there is a race condition: vacuum
could come along and remove the toast tuples before we can fetch them.
This leads to transient failures like "missing chunk number 0 for toast
value NNNNN in pg_toast_2619", as seen in recent reports from Andrew
Hammond and Tim Uckun.

The design idea of syscache is that access to stale syscache entries
should be prevented by relation-level locks, but that fails for at least
two cases where toasted fields are possible: ANALYZE updates pg_statistic
rows without locking out sessions that might want to plan queries on the
same table, and CREATE OR REPLACE FUNCTION updates pg_proc rows without
any meaningful lock at all.

The least risky fix seems to be an idea that Heikki suggested when we
were dealing with a related problem back in August: forcibly detoast any
out-of-line fields before putting a tuple into syscache in the first place.
This avoids the problem because at the time we fetch the parent tuple from
the catalog, we should be holding an MVCC snapshot that will prevent
removal of the toast tuples, even if the parent tuple is outdated
immediately after we fetch it. (Note: I'm not convinced that this
statement holds true at every instant where we could be fetching a syscache
entry at all, but it does appear to hold true at the times where we could
fetch an entry that could have a toasted field. We will need to be a bit
wary of adding toast tables to low-level catalogs that don't have them
already.) An additional benefit is that subsequent uses of the syscache
entry should be faster, since they won't have to detoast the field.

Back-patch to all supported versions. The problem is significantly harder
to reproduce in pre-9.0 releases, because of their willingness to flush
every entry in a syscache whenever the underlying catalog is vacuumed
(cf CatalogCacheFlushRelation); but there is still a window for trouble.

08e261cb

Clean up whitespace and indentation in parser and scanner files · 654e1f96
Peter Eisentraut authored Nov 01, 2011
```
These are not touched by pgindent, so clean them up a bit manually.
```
654e1f96
Comment changes to show bgwriter no longer performs checkpoints. · f3ebaad4
Simon Riggs authored Nov 01, 2011

f3ebaad4
Have checkpointer send stats once each processing loop. · 3ba18205
Simon Riggs authored Nov 01, 2011
```
Noted by Fujii Masao
```
3ba18205
Update pg_upgrade comment on missing 'postgres' database. · 09d1174e
Bruce Momjian authored Nov 01, 2011

09d1174e
Add new file for checkpointer.c · bf405ba8
Simon Riggs authored Nov 01, 2011

bf405ba8
Allow pg_upgrade to upgrade an old cluster that doesn't have a · a50d860a
Bruce Momjian authored Nov 01, 2011
```
'postgres' database.
```
a50d860a

Split work of bgwriter between 2 processes: bgwriter and checkpointer. · 806a2aee

Simon Riggs authored Nov 01, 2011

bgwriter is now a much less important process, responsible for page
cleaning duties only. checkpointer is now responsible for checkpoints
and so has a key role in shutdown. Later patches will correct doc
references to the now old idea that bgwriter performs checkpoints.
Has beneficial effect on performance at high write rates, but mainly
refactoring to more easily allow changes for power reduction by
simplifying previously tortuous code around required to allow page
cleaning and checkpointing to time slice in the same process.

Patch by me, Review by Dickson Guedes

806a2aee

Document that multiple LDAP servers can be specified · 589adb86
Magnus Hagander authored Nov 01, 2011

589adb86

31 Oct, 2011 1 commit

Stop btree indexscans upon reaching nulls in either direction. · 6980f817

Tom Lane authored Oct 31, 2011

The existing scan-direction-sensitive tests were overly complex, and
failed to stop the scan in cases where it's perfectly legitimate to do so.
Per bug #6278 from Maksym Boguk.

Back-patch to 8.3, which is as far back as the patch applies easily.
Doesn't seem worth sweating over a relatively minor performance issue in
8.2 at this late date. (But note that this was a performance regression
from 8.1 and before, so 8.2 is being left as an outlier.)

6980f817

30 Oct, 2011 2 commits

Support more locale-specific formatting options in cash_out(). · 6743a878

Tom Lane authored Oct 30, 2011

The POSIX spec defines locale fields for controlling the ordering of the
value, sign, and currency symbol in monetary output, but cash_out only
supported a small subset of these options.  Fully implement p/n_sign_posn,
p/n_cs_precedes, and p/n_sep_by_space per spec.  Fix up cash_in so that
it will accept all these format variants.

Also, make sure that thousands_sep is only inserted to the left of the
decimal point, as required by spec.

Per bug #6144 from Eduard Kracmar and discussion of bug #6277.  This patch
includes some ideas from Alexander Lakhin's proposed patch, though it is
very different in detail.

6743a878

Further improvement of make_greater_string. · eb5834d5

Tom Lane authored Oct 30, 2011

Make sure that it considers all the possibilities that the old code did,
instead of trying only one possibility per character position. To keep the
runtime in bounds, instead tweak the character incrementers to not try
every possible multibyte character code. Remove unnecessary logic to
restore the old character value on failure. Additional comment and
formatting cleanup.

eb5834d5

29 Oct, 2011 4 commits

Update visibilitymap.c header comments. · fae54e4a
Robert Haas authored Oct 29, 2011
```
Recent work on index-only scans left this somewhat out of date.
```
fae54e4a

Fix assorted bogosities in cash_in() and cash_out(). · 7609239f

Tom Lane authored Oct 29, 2011

cash_out failed to handle multiple-byte thousands separators, as per bug
#6277 from Alexander Law.  In addition, cash_in didn't handle that either,
nor could it handle multiple-byte positive_sign.  Both routines failed to
support multiple-byte mon_decimal_point, which I did not think was worth
changing, but at least now they check for the possibility and fall back to
using '.' rather than emitting invalid output.  Also, make cash_in handle
trailing negative signs, which formerly it would reject.  Since cash_out
generates trailing negative signs whenever the locale tells it to, this
last omission represents a fail-to-reload-dumped-data bug.  IMO that
justifies patching this all the way back.

7609239f

Improve make_greater_string() with encoding-specific incrementers. · 78d523b6

Robert Haas authored Oct 29, 2011

This infrastructure doesn't in any way guarantee that the character
we produce will sort before the one we incremented; but it does at least
make it much more likely that we'll end up with something that is a valid
character, which improves our chances.

Kyotaro Horiguchi, with various adjustments by me.

78d523b6

Remove pg_upgrade dependency on the 'postgres' database existing in the · 51eba98c
Bruce Momjian authored Oct 28, 2011
```
new cluster.   vacuumdb, used by pg_upgrade, still has this dependency.
```
51eba98c

28 Oct, 2011 9 commits

Allow hint bits to be set sooner for temporary and unlogged tables. · 53f1ca59

Robert Haas authored Oct 28, 2011

We need not wait until the commit record is durably on disk, because
in the event of a crash the page we're updating with hint bits will
be gone anyway. Per off-list report from Heikki Linnakangas, this
can significantly degrade the performance of unlogged tables; I was
able to show a 2x speedup from this patch on a pgbench run with scale
factor 15. In practice, this will mostly help small, heavily updated
tables, because on larger tables you're unlikely to run into the same
row again before the commit record makes it out to disk.

53f1ca59

Demote some sanity checks in BufferIsValid() to assertions. · b6335a3f
Robert Haas authored Oct 28, 2011
```
Testing reveals that this macro is a hot-spot for index-only-scans.
Per discussion with Tom Lane.
```
b6335a3f

Remove hard-coded "\connect postgres" from pg_dumpall. · deb15803

Robert Haas authored Oct 28, 2011

This doesn't appear to accompish anything useful, and does make the
restore fail if the postgres database happens to have been dropped.

deb15803

De-parallelize ecpg build some more. · 74812624

Tom Lane authored Oct 28, 2011

Make sure ecpg/include/ is rebuilt before the other subdirectories,
so that ecpg_config.h is up to date. This is not likely to matter
during production builds, only development, so no back-patch.

74812624

Clarify that ORDER BY/FOR UPDATE can't malfunction at higher iso levels. · 9cf12dfd
Robert Haas authored Oct 28, 2011
```
Kevin Grittner
```
9cf12dfd
Change "and and" to "and". · 6c21105f
Robert Haas authored Oct 28, 2011
```
Report by Vik Reykja, patch by Kevin Grittner.
```
6c21105f
Clarify pg_upgrade error message that the 'postgres' database must exist · 9846dcfb
Bruce Momjian authored Oct 28, 2011
```
in the old cluster.
```
9846dcfb
Update docs to point to the timezone library's new home at IANA. · ece12659
Tom Lane authored Oct 27, 2011
```
The recent unpleasantness with copyrights has accelerated a move that
was already in planning.
```
ece12659
Update pg_upgrade testing instructions. · 38f3c7c4
Bruce Momjian authored Oct 27, 2011

38f3c7c4

27 Oct, 2011 3 commits

Fix the number of lwlocks needed by the "fast path" lock patch. It needs · cbf65509

Heikki Linnakangas authored Oct 27, 2011

one lock per backend or auxiliary process - the need for a lock for each
aux processes was not accounted for in NumLWLocks(). No-one noticed,
because the three locks needed for the three aux processes fit into the
few extra lwlocks we allocate for 3rd party modules that don't call
RequestAddinLWLocks() (NUM_USER_DEFINED_LWLOCKS, 4 by default).

cbf65509

Avoid recursion while processing ELSIF lists in plpgsql. · 051d1ba7

Tom Lane authored Oct 27, 2011

The original implementation of ELSIF in plpgsql converted the construct
into nested simple IF statements. This was prone to stack overflow with
long ELSIF lists, in two different ways. First, it's difficult to generate
the parsetree without using right-recursion in the bison grammar, and
that's prone to parser stack overflow since nothing can be reduced until
the whole list has been read. Second, we'd recurse during execution, thus
creating an unnecessary risk of execution-time stack overflow. Rewrite
so that the ELSIF list is represented as a flat list, scanned via iteration
not recursion, and generated through left-recursion in the grammar.
Per a gripe from Håvard Kongsgård.

051d1ba7

Add simple script to check for right recursion in Bison grammars. · 756a4ed5

Tom Lane authored Oct 27, 2011

We should generally use left-recursion not right-recursion to parse lists.
Bison hasn't got any built-in way to check for this type of inefficiency,
and I didn't find anything on the net in a quick search, so I wrote a
little Perl script to do it. Add to src/tools/ so we don't have to
re-invent this wheel next time we wonder if we're doing anything stupid.

Currently, the only place that seems to need fixing is plpgsql's stmt_else
production, so the problem doesn't appear to be common enough to warrant
trying to include such a test in our standard build process. If we did
want to do that, we'd need a way to ignore some false positives, such as
a_expr := '-' a_expr

756a4ed5

26 Oct, 2011 7 commits

Typo fixes. · bf820136

Tom Lane authored Oct 26, 2011

expect -> except, noted by Andrew Dunstan.  Also, "cannot" seems more
readable here than "can not", per David Wheeler.

bf820136

Improve planner's ability to recognize cases where an IN's RHS is unique. · 3e4b3465

Tom Lane authored Oct 26, 2011

If the right-hand side of a semijoin is unique, then we can treat it like a
normal join (or another way to say that is: we don't need to explicitly
unique-ify the data before doing it as a normal join). We were recognizing
such cases when the RHS was a sub-query with appropriate DISTINCT or GROUP
BY decoration, but there's another way: if the RHS is a plain relation with
unique indexes, we can check if any of the indexes prove the output is
unique. Most of the infrastructure for that was there already in the join
removal code, though I had to rearrange it a bit. Per reflection about a
recent example in pgsql-performance.

3e4b3465

Fix pg_bsd_indent bug where newlines were not being trimmed from typedef · 360429e1
Bruce Momjian authored Oct 26, 2011
```
lines.  Update pg_bsd_indent required version to 1.1 (and update ftp
site).

Problem reported by Magnus.
```
360429e1

Implement streaming xlog for backup tools · d9bae531

Magnus Hagander authored Oct 26, 2011

Add option for parallel streaming of the transaction log while a
base backup is running, to get the logfiles before the server has
removed them.

Also add a tool called pg_receivexlog, which streams the transaction
log into files, creating a log archive without having to wait for
segments to complete, thus decreasing the window of data loss without
having to waste space using archive_timeout. This works best in
combination with archive_command - suggested usage docs etc coming later.

d9bae531

MingW doesn't support wcstombs_s()... · 2b64f3f1
Magnus Hagander authored Oct 26, 2011

2b64f3f1

Change FK trigger naming convention to fix self-referential FKs. · 1e3b21dd

Tom Lane authored Oct 26, 2011

Use names like "RI_ConstraintTrigger_a_NNNN" for FK action triggers and
"RI_ConstraintTrigger_c_NNNN" for FK check triggers. This ensures the
action trigger fires first in self-referential cases where the very same
row update fires both an action and a check trigger. This change provides
a non-probabilistic solution for bug #6268, at the risk that it could break
client code that is making assumptions about the exact names assigned to
auto-generated FK triggers. Hence, change this in HEAD only. No need for
forced initdb since old triggers continue to work fine.

1e3b21dd

Change FK trigger creation order to better support self-referential FKs. · 58958726

Tom Lane authored Oct 26, 2011

When a foreign-key constraint references another column of the same table,
row updates will queue both the PK's ON UPDATE action and the FK's CHECK
action in the same event. The ON UPDATE action must execute first, else
the CHECK will check a non-final state of the row and possibly throw an
inappropriate error, as seen in bug #6268 from Roman Lytovchenko.

Now, the firing order of multiple triggers for the same event is determined
by the sort order of their pg_trigger.tgnames, and the auto-generated names
we use for FK triggers are "RI_ConstraintTrigger_NNNN" where NNNN is the
trigger OID. So most of the time the firing order is the same as creation
order, and so rearranging the creation order fixes it.

This patch will fail to fix the problem if the OID counter wraps around or
adds a decimal digit (eg, from 99999 to 100000) while we are creating the
triggers for an FK constraint. Given the small odds of that, and the low
usage of self-referential FKs, we'll live with that solution in the back
branches. A better fix is to change the auto-generated names for FK
triggers, but it seems unwise to do that in stable branches because there
may be client code that depends on the naming convention. We'll fix it
that way in HEAD in a separate patch.

Back-patch to all supported branches, since this bug has existed for a long
time.

58958726

25 Oct, 2011 5 commits
- Fix typo · b0bec068
  Magnus Hagander authored Oct 25, 2011
  
  b0bec068
- Make event_source visible on all platforms · a87b9ae1
  Magnus Hagander authored Oct 25, 2011
```
On non-windows platform, we just ignore any value set there.

Noted by Jaime Casanova
```
  a87b9ae1
- Remove argument decoration that appears unsupported on mingw · 9c4c8c84
  Magnus Hagander authored Oct 25, 2011
  
  9c4c8c84
- Support configurable eventlog application names on Windows · d8ea33f2
  Magnus Hagander authored Oct 25, 2011
```
This allows different instances to use the eventlog with different
identifiers, by setting the event_source GUC, similar to how
syslog_ident works.

Original patch by MauMau, heavily modified by Magnus Hagander
```
  d8ea33f2
- Add debugging aid in isolationtester · 90d8e8ff
  Alvaro Herrera authored Oct 24, 2011
  
  90d8e8ff