Commits · ef28e05ac50a557b6c1214171c93b576a6709802 · Abuhujair Javed / Postgres FD Implementation

01 Nov, 2012 3 commits

Fix bogus handling of $(X) (i.e., ".exe") in isolationtester Makefile. · ef28e05a

Tom Lane authored Nov 01, 2012

I'm not sure why commit 1eb1dde0 seems
to have made this start to fail on Cygwin when it never did before ---
but nonetheless, the coding was pretty bogus, and unlike the way we
handle $(X) anywhere else.  Per buildfarm.

ef28e05a

Limit the number of rel sets considered in consider_index_join_outer_rels. · 19e36477

Tom Lane authored Nov 01, 2012

In bug #7626, Brian Dunavant exposes a performance problem created by
commit 3b8968f2: that commit attempted to
consider *all* possible combinations of indexable join clauses, but if said
clauses join to enough different relations, there's an exponential increase
in the number of outer-relation sets considered.

In Brian's example, all the clauses come from the same equivalence class,
which means it's redundant to use more than one of them in an indexscan
anyway. So we can prevent the problem in this class of cases (which is
probably the majority of real examples) by rejecting combinations that
would only serve to add a known-redundant clause.

But that still leaves us exposed to exponential growth of planning time
when the query has a lot of non-equivalence join clauses that are usable
with the same index. I chose to prevent such cases by setting an upper
limit on the number of relation sets considered, equal to ten times the
number of index clauses considered so far. (This sliding limit still
allows new relsets to be added on as we move to additional index columns,
which is probably more important than considering even more combinations of
clauses for the previous column.) This should keep the amount of work done
roughly linear rather than exponential in the apparent query complexity.
This part of the fix is pretty ad-hoc; but without a clearer idea of
real-world cases for which this would result in markedly inferior plans,
it's hard to see how to do better.

19e36477

Have make never delete intermediate files automatically · 1eb1dde0
Peter Eisentraut authored Oct 31, 2012
```
Several hacks in certain modes already thought this was a bad idea, so
just disable it globally.
```
1eb1dde0

31 Oct, 2012 4 commits

Fix erroneous choice of timeline variable, too · 2f1692d2
Alvaro Herrera authored Oct 31, 2012

2f1692d2
Document that TCP keepalive settings read as 0 on Unix-socket connections. · e774b764
Tom Lane authored Oct 31, 2012
```
Per bug #7631 from Rob Johnson.  The code is operating as designed, but the
docs didn't explain it.
```
e774b764

Fix erroneous choices of segNo variables · 9b8dd7e8

Alvaro Herrera authored Oct 31, 2012

Commit dfda6eba (which changed segment numbers to use a single 64 bit
variable instead of log/seg) introduced a couple of bogus choices of
exactly which log segment number variable to use in each case.

This is currently pretty harmless; in one place, the bogus number was
only being used in an error message for a pretty unlikely condition
(failure to fsync a WAL segment file).  In the other, it was using a
global variable instead of the local variable; but all callsites were
passing the value of the global variable anyway.

No need to backpatch because that commit is not on earlier branches.

9b8dd7e8

Fix ALTER EXTENSION / SET SCHEMA · 04f28bdb

Alvaro Herrera authored Oct 31, 2012

In its original conception, it was leaving some objects into the old
schema, but without their proper pg_depend entries; this meant that the
old schema could be dropped, causing future pg_dump calls to fail on the
affected database.  This was originally reported by Jeff Frost as #6704;
there have been other complaints elsewhere that can probably be traced
to this bug.

To fix, be more consistent about altering a table's subsidiary objects
along the table itself; this requires some restructuring in how tables
are relocated when altering an extension -- hence the new
AlterTableNamespaceInternal routine which encapsulates it for both the
ALTER TABLE and the ALTER EXTENSION cases.

There was another bug lurking here, which was unmasked after fixing the
previous one: certain objects would be reached twice via the dependency
graph, and the second attempt to move them would cause the entire
operation to fail.  Per discussion, it seems the best fix for this is to
do more careful tracking of objects already moved: we now maintain a
list of moved objects, to avoid attempting to do it twice for the same
object.

Authors: Alvaro Herrera, Dimitri Fontaine
Reviewed by Tom Lane

04f28bdb

28 Oct, 2012 1 commit

Preserve intermediate .c files in coverage mode · 4af3dda1

Peter Eisentraut authored Oct 28, 2012

The introduction of the .y -> .c pattern rule causes some .c files such
as bootparse.c to be considered intermediate files in the .y -> .c -> .o
rule chain, which make would automatically delete. But in coverage
mode, the processing tools such as genhtml need those files, so mark
them as "precious" so that make preserves them.

4af3dda1

26 Oct, 2012 3 commits

Throw error if expiring tuple is again updated or deleted. · 6868ed74

Kevin Grittner authored Oct 26, 2012

This prevents surprising behavior when a FOR EACH ROW trigger
BEFORE UPDATE or BEFORE DELETE directly or indirectly updates or
deletes the the old row.  Prior to this patch the requested action
on the row could be silently ignored while all triggered actions
based on the occurence of the requested action could be committed.
One example of how this could happen is if the BEFORE DELETE
trigger for a "parent" row deleted "children" which had trigger
functions to update summary or status data on the parent.

This also prevents similar surprising problems if the query has a
volatile function which updates a target row while it is already
being updated.

There are related issues present in FOR UPDATE cursors and READ
COMMITTED queries which are not handled by this patch.  These
issues need further evalution to determine what change, if any, is
needed.

Where the new error messages are generated, in most cases the best
fix will be to move code from the BEFORE trigger to an AFTER
trigger.  Where this is not feasible, the trigger can avoid the
error by re-issuing the triggering statement and returning NULL.

Documentation changes will be submitted in a separate patch.

Kevin Grittner and Tom Lane with input from Florian Pflug and
Robert Haas, based on problems encountered during conversion of
Wisconsin Circuit Court trigger logic to plpgsql triggers.

6868ed74

Prefer actual constants to pseudo-constants in equivalence class machinery. · 17804fa7

Tom Lane authored Oct 26, 2012

generate_base_implied_equalities_const() should prefer plain Consts over
other em_is_const eclass members when choosing the "pivot" value that
all the other members will be equated to. This makes it more likely that
the generated equalities will be useful in constraint-exclusion proofs.
Per report from Rushabh Lathia.

17804fa7

In pg_dump, dump SEQUENCE SET items in the data not pre-data section. · 5a39114f

Tom Lane authored Oct 26, 2012

Represent a sequence's current value as a separate TableDataInfo dumpable
object, so that it can be dumped within the data section of the archive
rather than in pre-data. This fixes an undesirable inconsistency between
the meanings of "--data-only" and "--section=data", and also fixes dumping
of sequences that are marked as extension configuration tables, as per a
report from Marko Kreen back in July. The main cost is that we do one more
SQL query per sequence, but that's probably not very meaningful in most
databases.

Back-patch to 9.1, since it has the extension configuration issue even
though not the --section switch.

5a39114f

24 Oct, 2012 2 commits

Tweak genericcostestimate's fudge factor for index size. · bf01e34b

Tom Lane authored Oct 24, 2012

To provide some bias against using a large index when a small one would do
as well, genericcostestimate adds a "fudge factor", which for a long time
was random_page_cost * index_pages/10000. However, this can grow to be the
dominant term in indexscan cost estimates when the index involved is large
enough, a behavior that was never intended. Change to a ln(1 + n/10000)
formulation, which has nearly the same behavior up to a few hundred pages
but tails off significantly thereafter. (A log curve seems correct on
first principles, since what we're trying to account for here is index
descent costs, which are typically logarithmic.) Per bug #7619 from Niko
Kiirala.

Possibly this change should get back-patched, but I'm hesitant to mess with
cost estimates in stable branches.

bf01e34b

When converting a table to a view, remove its system columns. · a4e8680a

Tom Lane authored Oct 24, 2012

Views should not have any pg_attribute entries for system columns.
However, we forgot to remove such entries when converting a table to a
view. This could lead to crashes later on, if someone attempted to
reference such a column, as reported by Kohei KaiGai.

Patch in HEAD only. This bug has been there forever, but in the back
branches we will have to defend against existing mis-converted views,
so it doesn't seem worthwhile to change the conversion code too.

a4e8680a

23 Oct, 2012 1 commit

Add context info to OAT_POST_CREATE security hook · f4c4335a

Alvaro Herrera authored Oct 23, 2012

... and have sepgsql use it to determine whether to check permissions
during certain operations.  Indexes that are being created as a result
of REINDEX, for instance, do not need to have their permissions checked;
they were already checked when the index was created.

Author: KaiGai Kohei, slightly revised by me

f4c4335a

21 Oct, 2012 1 commit

Correct predicate locking for DROP INDEX CONCURRENTLY. · 4c9d0901

Kevin Grittner authored Oct 21, 2012

For the non-concurrent case there is an AccessExclusiveLock lock
on both the index and the heap at a time during which no other
process is using either, before which the index is maintained and
used for scans, and after which the index is no longer used or
maintained.  Predicate locks can safely be moved from the index to
the related heap relation under the protection of these locks.
This was done prior to the introductin of DROP INDEX CONCURRENTLY
and continues to be done for non-concurrent index drops.

For concurrent index drops, the predicate locks must be moved when
there are no index scans in progress on that index and no more can
subsequently start, and before heap inserts stop maintaining the
index.  As long as these conditions are guaranteed when the
TransferPredicateLocksToHeapRelation() function is called,
stronger locks are not needed for correctness.

Kevin Grittner based on questions by Tom Lane in reviewing the
DROP INDEX CONCURRENTLY patch and in cooperation with Andres
Freund and Simon Riggs.

4c9d0901

20 Oct, 2012 2 commits

Fix pg_dump's handling of DROP DATABASE commands in --clean mode. · edef20f6

Tom Lane authored Oct 20, 2012

In commit 4317e024, I accidentally broke
this behavior while rearranging code to ensure that --create wouldn't
affect whether a DATABASE entry gets put into archive-format output.
Thus, 9.2 would issue a DROP DATABASE command in --clean mode, which is
either useless or dangerous depending on the usage scenario.
It should not do that, and no longer does.

A bright spot is that this refactoring makes it easy to allow the
combination of --clean and --create to work sensibly, ie, emit DROP
DATABASE then CREATE DATABASE before reconnecting.  Ordinarily we'd
consider that a feature addition and not back-patch it, but it seems
silly to not include the extra couple of lines required in the 9.2
version of the code.

Per report from Guillaume Lelarge, though this is slightly more extensive
than his proposed patch.

edef20f6

Prevent overflow in pgbench's percent-done display. · ca0b960e
Tom Lane authored Oct 20, 2012
```
Per Thom Brown.
```
ca0b960e

19 Oct, 2012 4 commits

Fix UtilityContainsQuery() to handle CREATE TABLE AS EXECUTE correctly. · 5d1abe64

Tom Lane authored Oct 19, 2012

The code seems to have been written to handle the pre-parse-analysis
representation, where an ExecuteStmt would appear directly under
CreateTableAsStmt.  But in reality the function is only run on
already-parse-analyzed statements, so there will be a Query node in
between.  We'd not noticed the bug because the function is generally
not used at all except in extended query protocol.

Per report from Robert Haas and Rushabh Lathia.

5d1abe64

Fix hash_search to avoid corruption of the hash table on out-of-memory. · 4e32f8cd

Tom Lane authored Oct 19, 2012

An out-of-memory error during expand_table() on a palloc-based hash table
would leave a partially-initialized entry in the table.  This would not be
harmful for transient hash tables, since they'd get thrown away anyway at
transaction abort.  But for long-lived hash tables, such as the relcache
hash, this would effectively corrupt the table, leading to crash or other
misbehavior later.

To fix, rearrange the order of operations so that table enlargement is
attempted before we insert a new entry, rather than after adding it
to the hash table.

Problem discovered by Hitoshi Harada, though this is a bit different
from his proposed patch.

4e32f8cd

Fix ruleutils to print "INSERT INTO foo DEFAULT VALUES" correctly. · 0d689505
Tom Lane authored Oct 19, 2012
```
Per bug #7615 from Marko Tiikkaja.  Apparently nobody ever tried this
case before ...
```
0d689505

Fix orphan on cancel of drop index concurrently. · da857275

Simon Riggs authored Oct 19, 2012

Canceling DROP INDEX CONCURRENTLY during
wait could allow an orphaned index to be
left behind which could not be dropped.

Backpatch to 9.2

Andres Freund, tested by Abhijit Menon-Sen

da857275

18 Oct, 2012 11 commits

Further cleanup of catcache.c ilist changes. · 002191a1

Tom Lane authored Oct 18, 2012

Remove useless duplicate initialization of bucket headers, don't use a
dlist_mutable_iter in a performance-critical path that doesn't need it,
make some other cosmetic changes for consistency's sake.

002191a1

Remove unnecessary "head" arguments from some dlist/slist functions. · dc5aeca1

Tom Lane authored Oct 18, 2012

dlist_delete, dlist_insert_after, dlist_insert_before, slist_insert_after
do not need access to the list header, and indeed insisting on that negates
one of the main advantages of a doubly-linked list.

In consequence, revert addition of "cache_bucket" field to CatCTup.

dc5aeca1

Code review for inline-list patch. · 8f8d7464

Tom Lane authored Oct 18, 2012

Make foreach macros less syntactically dangerous, and fix some typos in
evidently-never-tested ones.  Add missing slist_next_node and
slist_head_node functions.  Fix broken dlist_check code.  Assorted comment
improvements.

8f8d7464

Use a more portable platform test. · 2f2be747
Andrew Dunstan authored Oct 18, 2012

2f2be747

Further tweaking of the readfile() function in pg_ctl. · 2a49585e

Heikki Linnakangas authored Oct 18, 2012

Don't leak a file descriptor if the file is empty or we can't read its size.

Expect there to be a newline at the end of the last line, too. If there
isn't, ignore anything after the last newline. This makes it a tiny bit
more robust in case the file is appended to concurrently, so that we don't
return the last line if it hasn't been fully written yet. And this makes
the code a bit less obscure, anyway. Per Tom Lane's suggestion.

Backpatch to all supported branches.

2a49585e

Isolation test for DROP INDEX CONCURRENTLY · 160984c8
Simon Riggs authored Oct 18, 2012
```
for recent concurrent changes.

Abhijit Menon-Sen
```
160984c8

Re-think guts of DROP INDEX CONCURRENTLY. · 2f0e480d

Simon Riggs authored Oct 18, 2012

Concurrent behaviour was flawed when using
a two-step process, so add an additional
phase of processing to ensure concurrency
for both SELECTs and INSERT/UPDATE/DELETEs.

Backpatch to 9.2

Andres Freund, tweaked by me

2f0e480d

Fix planning of non-strict equivalence clauses above outer joins. · 72a4231f

Tom Lane authored Oct 18, 2012

If a potential equivalence clause references a variable from the nullable
side of an outer join, the planner needs to take care that derived clauses
are not pushed to below the outer join; else they may use the wrong value
for the variable. (The problem arises only with non-strict clauses, since
if an upper clause can be proven strict then the outer join will get
simplified to a plain join.) The planner attempted to prevent this type
of error by checking that potential equivalence clauses aren't
outerjoin-delayed as a whole, but actually we have to check each side
separately, since the two sides of the clause will get moved around
separately if it's treated as an equivalence. Bugs of this type can be
demonstrated as far back as 7.4, even though releases before 8.3 had only
a very ad-hoc notion of equivalence clauses.

In addition, we neglected to account for the possibility that such clauses
might have nonempty nullable_relids even when not outerjoin-delayed; so the
equivalence-class machinery lacked logic to compute correct nullable_relids
values for clauses it constructs. This oversight was harmless before 9.2
because we were only using RestrictInfo.nullable_relids for OR clauses;
but as of 9.2 it could result in pushing constructed equivalence clauses
to incorrect places. (This accounts for bug #7604 from Bill MacArthur.)

Fix the first problem by adding a new test check_equivalence_delay() in
distribute_qual_to_rels, and fix the second one by adding code in
equivclass.c and called functions to set correct nullable_relids for
generated clauses. Although I believe the second part of this is not
currently necessary before 9.2, I chose to back-patch it anyway, partly to
keep the logic similar across branches and partly because it seems possible
we might find other reasons why we need valid values of nullable_relids in
the older branches.

Add regression tests illustrating these problems. In 9.0 and up, also
add test cases checking that we can push constants through outer joins,
since we've broken that optimization before and I nearly broke it again
with an overly simplistic patch for this problem.

72a4231f

pg_dump: Output functions deterministically sorted · 7b583b20
Alvaro Herrera authored Oct 17, 2012
```
Implementation idea from Tom Lane

Author: Joel Jacobson
Reviewed by Joachim Wieland
```
7b583b20
Revert tests for drop index concurrently. · 5ad72cee
Simon Riggs authored Oct 18, 2012

5ad72cee
Add isolation tests for DROP INDEX CONCURRENTLY. · 4e206744
Simon Riggs authored Oct 18, 2012
```
Backpatch to 9.2 to ensure bugs are fixed.

Abhijit Menon-Sen
```
4e206744

17 Oct, 2012 5 commits

Close un-owned SMgrRelations at transaction end. · ff3f9c8d

Tom Lane authored Oct 17, 2012

If an SMgrRelation is not "owned" by a relcache entry, don't allow it to
live past transaction end.  This design allows the same SMgrRelation to be
used for blind writes of multiple blocks during a transaction, but ensures
that we don't hold onto such an SMgrRelation indefinitely.  Because an
SMgrRelation typically corresponds to open file descriptors at the fd.c
level, leaving it open when there's no corresponding relcache entry can
mean that we prevent the kernel from reclaiming deleted disk space.
(While CacheInvalidateSmgr messages usually fix that, there are cases
where they're not issued, such as DROP DATABASE.  We might want to add
some more sinval messaging for that, but I'd be inclined to keep this
type of logic anyway, since allowing VFDs to accumulate indefinitely
for blind-written relations doesn't seem like a good idea.)

This code replaces a previous attempt towards the same goal that proved
to be unreliable.  Back-patch to 9.1 where the previous patch was added.

ff3f9c8d

Revert "Use "transient" files for blind writes, take 2". · 9bacf0e3

Tom Lane authored Oct 17, 2012

This reverts commit fba105b1.
That approach had problems with the smgr-level state not tracking what
we really want to happen, and with the VFD-level state not tracking the
smgr-level state very well either. In consequence, it was still possible
to hold kernel file descriptors open for long-gone tables (as in recent
report from Tore Halset), and yet there were also cases of FDs being closed
undesirably soon. A replacement implementation will follow.

9bacf0e3

Embedded list interface · a66ee69a

Alvaro Herrera authored Oct 16, 2012

Provide a common implementation of embedded singly-linked and
doubly-linked lists.  "Embedded" in the sense that the nodes'
next/previous pointers exist within some larger struct; this design
choice reduces memory allocation overhead.

Most of the implementation uses inlineable functions (where supported),
for performance.

Some existing uses of both types of lists have been converted to the new
code, for demonstration purposes.  Other uses can (and probably will) be
converted in the future.  Since dllist.c is unused after this conversion,
it has been removed.

Author: Andres Freund
Some tweaks by me
Reviewed by Tom Lane, Peter Geoghegan

a66ee69a

Fix typo in previous commit · f862a326
Simon Riggs authored Oct 17, 2012

f862a326
Clarify hash index caution and copy to CREATE INDEX docs · 9f9695a0
Simon Riggs authored Oct 17, 2012

9f9695a0

16 Oct, 2012 1 commit
- When outputting the session id in log_line_prefix (%c) or in CSV log · 22cc3b35
  Bruce Momjian authored Oct 16, 2012
```
output mode, cause the hex digits after the period to always be at least
four hex digits, with zero-padding.
```
  22cc3b35
15 Oct, 2012 2 commits

alter_generic regression test cannot run concurrently with privileges test. · b72bd3d1

Tom Lane authored Oct 15, 2012

... because the latter plays games with the privileges for language SQL.
It looks like running alter_generic in parallel with "misc" is OK though.

Also, adjust serial_schedule to maintain the same test ordering (up to
parallelism) as parallel_schedule.

b72bd3d1

Fix typo in comment. · 7d3ed5ae
Heikki Linnakangas authored Oct 15, 2012
```
Fujii Masao
```
7d3ed5ae