Commits · 8f0d8f481e86514bb35538827df7e1e35baee368 · Abuhujair Javed / Postgres FD Implementation

21 Jan, 2013 3 commits

Fix one-byte buffer overrun in PQprintTuples(). · 8f0d8f48

Tom Lane authored Jan 20, 2013

This bug goes back to the original Postgres95 sources.  Its significance
to modern PG versions is marginal, since we have not used PQprintTuples()
internally in a very long time, and it doesn't seem to have ever been
documented either.  Still, it *is* exposed to client apps, so somebody
out there might possibly be using it.

Xi Wang

8f0d8f48

Fix error-checking typo in check_TSCurrentConfig(). · 535e69a4
Tom Lane authored Jan 20, 2013
```
The code failed to detect an out-of-memory failure.

Xi Wang
```
535e69a4

doc: Fix syntax of a URL · 693eb9df

Peter Eisentraut authored Jan 20, 2013

Leading white space before the "http:" is apparently treated as a
relative link at least by some browsers.

693eb9df

20 Jan, 2013 2 commits

Fix an O(N^2) performance issue for sessions modifying many relations. · d5b31cc3

Tom Lane authored Jan 20, 2013

AtEOXact_RelationCache() scanned the entire relation cache at the end of
any transaction that created a new relation or assigned a new relfilenode.
Thus, clients such as pg_restore had an O(N^2) performance problem that
would start to be noticeable after creating 10000 or so tables. Since
typically only a small number of relcache entries need any cleanup, we
can fix this by keeping a small list of their OIDs and doing hash_searches
for them. We fall back to the full-table scan if the list overflows.

Ideally, the maximum list length would be set at the point where N
hash_searches would cost just less than the full-table scan. Some quick
experimentation says that point might be around 50-100; I (tgl)
conservatively set MAX_EOXACT_LIST = 32. For the case that we're worried
about here, which is short single-statement transactions, it's unlikely
there would ever be more than about a dozen list entries anyway; so it's
probably not worth being too tense about the value.

We could avoid the hash_searches by instead keeping the target relcache
entries linked into a list, but that would be noticeably more complicated
and bug-prone because of the need to maintain such a list in the face of
relcache entry drops. Since a relcache entry can only need such cleanup
after a somewhat-heavyweight filesystem operation, trying to save a
hash_search per cleanup doesn't seem very useful anyway --- it's the scan
over all the not-needing-cleanup entries that we wish to avoid here.

Jeff Janes, reviewed and tweaked a bit by Tom Lane

d5b31cc3

Clarify that streaming replication can be both async and sync · 0a2da528
Magnus Hagander authored Jan 20, 2013
```
Josh Kupershmidt
```
0a2da528

19 Jan, 2013 4 commits

Use SET TRANSACTION READ ONLY in pg_dump, if server supports it. · 26d905a1

Tom Lane authored Jan 19, 2013

This currently does little except serve as documentation. (The one case
where it has a performance benefit, SERIALIZABLE mode in 9.1 and up, was
already using READ ONLY mode.) However, it's possible that it might have
performance benefits in future, and in any case it seems like good
practice since it would catch any accidentally non-read-only operations.

Pavan Deolasee

26d905a1

Modernize string literal syntax in tutorial example. · 4b94cfb5

Tom Lane authored Jan 19, 2013

Un-double the backslashes in the LIKE patterns, since
standard_conforming_strings is now the default.  Just to be sure, include
a command to set standard_conforming_strings to ON in the example.

Back-patch to 9.1, where standard_conforming_strings became the default.

Josh Kupershmidt, reviewed by Jeff Janes

4b94cfb5

Make pgxs build executables with the right suffix. · 9f10f7dc

Andrew Dunstan authored Jan 19, 2013

Complaint and patch from Zoltán Böszörményi.

When cross-compiling, the native make doesn't know
about the Windows .exe suffix, so it only builds with
it when explicitly told to do so.

The native make will not see the link between the target
name and the built executable, and might this do unnecesary
work, but that's a bigger problem than this one, if in fact
we consider it a problem at all.

Back-patch to all live branches.

9f10f7dc

libpq doc: Clarify what commands return PGRES_TUPLES_OK · fb197290

Peter Eisentraut authored Jan 18, 2013

The old text claimed that INSERT and UPDATE always return
PGRES_COMMAND_OK, but INSERT/UPDATE with RETURNING return
PGRES_TUPLES_OK.

Josh Kupershmidt

fb197290

18 Jan, 2013 8 commits

Protect against SnapshotNow race conditions in pg_tablespace scans. · c2a14bc7

Tom Lane authored Jan 18, 2013

Use of SnapshotNow is known to expose us to race conditions if the tuple(s)
being sought could be updated by concurrently-committing transactions.
CREATE DATABASE and DROP DATABASE are particularly exposed because they do
heavyweight filesystem operations during their scans of pg_tablespace,
so that the scans run for a very long time compared to most.  Furthermore,
the potential consequences of a missed or twice-visited row are nastier
than average:

* createdb() could fail with a bogus "file already exists" error, or
  silently fail to copy one or more tablespace's worth of files into the
  new database.

* remove_dbtablespaces() could miss one or more tablespaces, thus failing
  to free filesystem space for the dropped database.

* check_db_file_conflict() could likewise miss a tablespace, leading to an
  OID conflict that could result in data loss either immediately or in
  future operations.  (This seems of very low probability, though, since a
  duplicate database OID would be unlikely to start with.)

Hence, it seems worth fixing these three places to use MVCC snapshots, even
though this will someday be superseded by a generic solution to SnapshotNow
race conditions.

Back-patch to all active branches.

Stephen Frost and Tom Lane

c2a14bc7

Rename new latex longtable function name, for consistency · 530bbfac
Bruce Momjian authored Jan 18, 2013

530bbfac

Unbreak lock conflict detection for Hot Standby. · d8c38966

Robert Haas authored Jan 18, 2013

This got broken in the original fast-path locking patch, because
I failed to account for the fact that Hot Standby startup process
might take a strong relation lock on a relation in a database to
which it is not bound, and confused MyDatabaseId with the database
ID of the relation being locked.

Report and diagnosis by Andres Freund.  Final form of patch by me.

d8c38966

Improve pg_upgrade error report · 600250d0

Bruce Momjian authored Jan 18, 2013

If the cluster alignments don't match, output this suggestion:

	Likely one cluster is a 32-bit install, the other 64-bit

600250d0

Fix off-by-one bug in xlog reading logic · 8c17144c
Alvaro Herrera authored Jan 18, 2013
```
Bug reported by Michael Paquier

Author: Andres Freund
```
8c17144c

psql latex fixes · 74a82baf

Bruce Momjian authored Jan 18, 2013

Remove extra line at bottom of table for new 'latex' mode border=3.
Also update 'latex'-longtable 'tableattr' docs to say
'whitespace-separated' instead of 'space'.

74a82baf

Now that START_REPLICATION returns the next timeline's ID after reaching end · 6f7cddc7

Heikki Linnakangas authored Jan 18, 2013

of timeline, take advantage of that in walreceiver.

Startup process is still in control of choosign the target timeline, by
scanning the timeline history files present in pg_xlog, but walreceiver now
uses the next timeline's ID to fetch its history file immediately after it
has finished streaming the old timeline. Before, the standby would first try
to restart streaming on the old timeline, which fetches the missing timeline
history file as a side-effect, and only then restart from the new timeline.
This patch eliminates the extra iteration, which speeds up the timeline
switch and reduces the noise in the log caused by the extra restart on the
old timeline.

6f7cddc7

Use the right timeline when beginning to stream from master. · 2ff65553

Heikki Linnakangas authored Jan 18, 2013

The xlogreader refactoring broke the logic to decide which timeline to start
streaming from. XLogPageRead() uses the timeline history to check which
timeline the requested WAL position falls into. However, after the
refactoring, XLogPageRead() is always first called with the first page in
the segment, to verify the segment header, and only then with the actual WAL
position we're interested in. That first read of the segment's header made
XLogPageRead() to always start streaming from the old timeline containing
the segment header, not the timeline containing the actual record, if there
was a timeline switch within the segment.

I thought I fixed this yesterday, but that fix was too narrow and only fixed
this for the corner-case that the timeline switch happened in the first page
of the segment. To fix this more robustly, pass explicitly the position of
the record we're actually interested in to XLogPageRead, and use that to
decide which timeline to read from, rather than deduce it from the page and
offset.

Per report from Fujii Masao.

2ff65553

17 Jan, 2013 14 commits

When xlogreader asks the callback function to read a page, make sure we · 88228e6f

Heikki Linnakangas authored Jan 17, 2013

get a large enough part of the page to include the beginning of the next
record we're interested in. The XLogPageRead callback uses the requested
length to decide which timeline to stream WAL from, and if the first call
is short, and the page contains a timeline switch, we'll repeatedly try
to stream that page from the old timeline, and never get across the
timeline switch.

88228e6f

I added a result set to START_STREAMING command, but neglected walreceiver. · 3684a534

Heikki Linnakangas authored Jan 17, 2013

The patch to allow pg_receivexlog to switch timeline added a result set
after copy has ended in START_STREAMING command, to return the next
timeline's ID to the client. But walreceived didn't get the memo, and threw
an error on the unexpected result set. Fix.

3684a534

Accelerate end-of-transaction dropping of relations · 279628a0

Alvaro Herrera authored Jan 17, 2013

When relations are dropped, at end of transaction we need to remove the
files and clean the buffer pool of buffers containing pages of those
relations. Previously we would scan the buffer pool once per relation
to clean up buffers. When there are many relations to drop, the
repeated scans make this process slow; so we now instead pass a list of
relations to drop and scan the pool once, checking each buffer against
the passed list. When the number of relations is larger than a
threshold (which as of this patch is being set to 20 relations) we sort
the array before starting, and bsearch the array; when it's smaller, we
simply scan the array linearly each time, because that's faster. The
exact optimal threshold value depends on many factors, but the
difference is not likely to be significant enough to justify making it
user-settable.

This has been measured to be a significant win (a 15x win when dropping
100,000 relations; an extreme case, but reportedly a real one).

Author: Tomas Vondra, some tweaks by me
Reviewed by: Robert Haas, Shigeru Hanada, Andres Freund, Álvaro Herrera

279628a0

Make pg_receivexlog and pg_basebackup -X stream work across timeline switches. · 0b632913

Heikki Linnakangas authored Jan 17, 2013

This mirrors the changes done earlier to the server in standby mode. When
receivelog reaches the end of a timeline, as reported by the server, it
fetches the timeline history file of the next timeline, and restarts
streaming from the new timeline by issuing a new START_STREAMING command.

When pg_receivexlog crosses a timeline, it leaves the .partial suffix on the
last segment on the old timeline. This helps you to tell apart a partial
segment left in the directory because of a timeline switch, and a completed
segment. If you just follow a single server, it won't make a difference, but
it can be significant in more complicated scenarios where new WAL is still
generated on the old timeline.

This includes two small changes to the streaming replication protocol:
First, when you reach the end of timeline while streaming, the server now
sends the TLI of the next timeline in the server's history to the client.
pg_receivexlog uses that as the next timeline, so that it doesn't need to
parse the timeline history file like a standby server does. Second, when
BASE_BACKUP command sends the begin and end WAL positions, it now also sends
the timeline IDs corresponding the positions.

0b632913

Improve memory space management in tuplesort and tuplestore. · 8ae35e91

Tom Lane authored Jan 17, 2013

The code originally just doubled the size of the tuple-pointer array so
long as that would fit in allowedMem. This could result in failing to use
as much as half of allowedMem, if (as is typical) the last doubling attempt
didn't quite fit. Worse, we might double the array size but be unable to
use most of the added slots, because there was no room left within the
allowedMem limit for tuples the slots should point to. To fix, double only
so long as we've used less than half of allowedMem in total. Then do one
more array enlargement, but scale it based on total memory consumption so
far. This will work nicely as long as the average tuple size is reasonably
stable, and in any case should be better than the old method.

This change will result in large sort operations consuming a larger
fraction of work_mem than they typically did in the past. The release
notes should mention that users may want to revisit their work_mem
settings, if they'd tuned those settings based on the old behavior of
sorting.

Jeff Janes, reviewed by Peter Geoghegan and Robert Haas

8ae35e91

Fix a couple of error-handling bugs in the xlogreader patch. · 1296d5c5

Heikki Linnakangas authored Jan 17, 2013

XLogReadRecord should reset its state on every error, to make sure it
re-reads the page on next call. It was inconsistent in that some errors did
that, but some did not.

In ReadRecord(), don't give up on an error if we're in standby mode. The
loop was set up to retry, but the checks within the loop broke out of the
loop on any error.

Andres Freund, with some tweaking by me.

1296d5c5

Add a latex-longtable output format to psql · b14f81bc

Bruce Momjian authored Jan 17, 2013

latex longtable is more powerful than the 'tabular' output format
'latex' uses.  Also add border=3 support to 'latex'.

b14f81bc

Silence compiler warnings · 8ef69616
Magnus Hagander authored Jan 17, 2013

8ef69616

Make GiST indexes on-disk compatible with 9.2 again. · 9ee4d06f

Heikki Linnakangas authored Jan 17, 2013

The patch that turned XLogRecPtr into a uint64 inadvertently changed the
on-disk format of GiST indexes, because the NSN field in the GiST page
opaque is an XLogRecPtr. That breaks pg_upgrade. Revert the format of that
field back to the two-field struct that XLogRecPtr was before. This is the
same we did to LSNs in the page header to avoid changing on-disk format.

Bump catversion, as this invalidates any existing GiST indexes built on
9.3devel.

9ee4d06f

Base the default SSL ciphers on DEFAULT instead of ALL · bba486f3

Magnus Hagander authored Jan 17, 2013

It's better to start from what the OpenSSL people consider a good
default and then remove insecure things (low encryption, exportable
encryption and md5 at this point) from that, instead of starting
from everything that exists and remove from that. We trust the
OpenSSL people to make good choices about what the default is.

bba486f3

Make size-output fixed length in pg_basebackup verbose mode · 4eebf130
Magnus Hagander authored Jan 17, 2013
```
This way the line doesn't shift right as the amount of data processed
increases.
```
4eebf130

Truncate filenames in the leadning end in pg_basebackup verbose output · d7e9ca7f

Magnus Hagander authored Jan 17, 2013

When truncating at the end, like before, the output would often end up
just showing the path instead of the filename.

Also increase the length of the filename by 5, which still keeps us at
less than 80 characters in most outputs.

d7e9ca7f

Support multiple -t/--table arguments for more commands · f3af5344

Magnus Hagander authored Jan 17, 2013

On top of the previous support in pg_dump, add support to specify
multiple tables (by using the -t option multiple times) to
pg_restore, clsuterdb, reindexdb and vacuumdb.

Josh Kupershmidt, reviewed by Karl O. Pinc

f3af5344

Get rid of pg_dump's README · 36bdfa52

Peter Eisentraut authored Jan 16, 2013

It was largely full of outdated and incorrect information.  Move the few
notes which were still relevant into header comments of pg_backup_tar.c
and pg_dumpall.c.

Josh Kupershmidt

36bdfa52

16 Jan, 2013 1 commit

Split out XLog reading as an independent facility · 7fcbf6a4

Alvaro Herrera authored Jan 16, 2013

This new facility can not only be used by xlog.c to carry out crash
recovery, but also by external programs.  By supplying a function to
read XLog pages from somewhere, all the WAL reading can be used for
completely different purposes.

For the standard backend use, the behavior should be pretty much the
same as previously.  As for non-backend programs, an hypothetical
pg_xlogdump program is now closer to reality, but some more backend
support is still necessary.

This patch was originally submitted by Andres Freund in a different
form, but Heikki Linnakangas opted for and authored another design of
the concept.  Andres has advanced the patch since Heikki's initial
version.  Review and some (mostly cosmetics) changes by me.

7fcbf6a4

15 Jan, 2013 5 commits

Make \? help message more clear when not connected. · 8606dd81

Heikki Linnakangas authored Jan 15, 2013

On second thought, "none" could mislead to think that you're connected a
database with that name. Duplicate the whole string, so that it can be
more easily translated. In back-branches, thought, just use an empty string
in place of the database name, to avoid adding a translatable string.

8606dd81

Don't pass NULL to fprintf, if not currently connected to a database. · b04ce529
Heikki Linnakangas authored Jan 15, 2013
```
Backpatch all the way to 8.3. Fixes bug #7811, per report and diagnosis by
Meng Qingzhong.
```
b04ce529

Rework order of checks in ALTER / SET SCHEMA · 7ac5760f

Alvaro Herrera authored Jan 15, 2013

When attempting to move an object into the schema in which it already
was, for most objects classes we were correctly complaining about
exactly that ("object is already in schema"); but for some other object
classes, such as functions, we were instead complaining of a name
collision ("object already exists in schema").  The latter is wrong and
misleading, per complaint from Robert Haas in
CA+TgmoZ0+gNf7RDKRc3u5rHXffP=QjqPZKGxb4BsPz65k7qnHQ@mail.gmail.com

To fix, refactor the way these checks are done.  As a bonus, the
resulting code is smaller and can also share some code with Rename
cases.

While at it, remove use of getObjectDescriptionOids() in error messages.
These are normally disallowed because of translatability considerations,
but this one had slipped through since 9.1.  (Not sure that this is
worth backpatching, though, as it would create some untranslated
messages in back branches.)

This is loosely based on a patch by KaiGai Kohei, heavily reworked by
me.

7ac5760f

Give a proper error message if connecting to incompatible server. · ffda0597
Heikki Linnakangas authored Jan 15, 2013
```
The WAL streaming message format changed in 9.3, so 9.3 pg_basebackup or
pg_receivelog won't work against older servers.
```
ffda0597

Fix hash_update_hash_key() to handle same-bucket case correctly. · 1b794d3f

Tom Lane authored Jan 14, 2013

Original coding would corrupt the hashtable if the item being updated was
at the end of its bucket chain and the new hash key hashed to that same
bucket. Diagnosis and fix by Heikki Linnakangas.

1b794d3f

14 Jan, 2013 3 commits

Return value of lseek() can be negative on failure. · 3f4b1749

Heikki Linnakangas authored Jan 15, 2013

Because the return value of lseek() was assigned to an unsigned size_t
variable, we'd fail to notice an error return code -1. Compiler gave a
warning about this.

Andres Freund

3f4b1749

Fix obsolete SQL syntax in comment. · 325c54b6

Tom Lane authored Jan 14, 2013

This was legal back in the days of add_missing_from, though perhaps
never good style.  It's not legal anymore ...

Jan Urbański

325c54b6

Reject out-of-range dates in to_date(). · 5c4eb916

Tom Lane authored Jan 14, 2013

Dates outside the supported range could be entered, but would not print
reasonably, and operations such as conversion to timestamp wouldn't behave
sanely either.  Since this has the potential to result in undumpable table
data, it seems worth back-patching.

Hitoshi Harada

5c4eb916