- 14 Mar, 2016 11 commits
-
-
Robert Haas authored
We don't support any parallel write operations at present, so choosing a parallel plan causes us to error out. Also, add a new regression test that uses EXPLAIN ANALYZE SELECT INTO; if we'd had this previously, force_parallel_mode testing would have caught this issue. Mithun Cy and Robert Haas
-
Tom Lane authored
In the initial revision of the upper-planner pathification work, the only available way for an FDW or custom-scan provider to inject Paths representing post-scan-join processing was to insert them during scan-level GetForeignPaths or similar processing. While that's not impossible, it'd require quite a lot of duplicative processing to look forward and see if the extension would be capable of implementing the whole query. To improve matters for custom-scan providers, provide a hook function at the point where the core code is about to start filling in upperrel Paths. At this point Paths are available for the whole scan/join tree, which should reduce the amount of redundant effort considerably. (An alternative design that was suggested was to provide a separate hook for each post-scan-join processing step, but that seems messy and not clearly more useful.) Following our time-honored tradition, there's no documentation for this hook outside the source code. As-is, this hook is only meant for custom scan providers, which we can't assume very much about. A followon patch will implement an FDW callback to let FDWs do the same thing in a somewhat more structured fashion.
-
Tom Lane authored
Although the default choice of rel->reltarget should typically be sufficient for scan or join paths, it's not at all sufficient for the purposes PathTargets were invented for; in particular not for upper-relation Paths. So break API compatibility by adding a PathTarget argument to create_foreignscan_path(). To ease updating of existing code, accept a NULL value of the argument as selecting rel->reltarget.
-
Tom Lane authored
In commit 19a54114 I did not make PathTarget a subtype of Node, and embedded a RelOptInfo's reltarget directly into it rather than having a separately-allocated Node. In hindsight that was misguided micro-optimization, enabled by the fact that at that point we didn't have any Paths with custom PathTargets. Now that PathTarget processing has been fleshed out some more, it's easier to see that it's better to have PathTarget as an indepedent Node type, even if it does cost us one more palloc to create a RelOptInfo. So change it while we still can. This commit just changes the representation, without doing anything more interesting than that.
-
Tom Lane authored
Negative klen is documented since Perl 5.16, and 5.6 is no longer supported so no need to comment about it. Dagfinn Ilmari Mannsåker
-
Tom Lane authored
Perl's integers are pointer-sized, so can hold more than INT_MAX on LP64 platforms, and come in both signed (IV) and unsigned (UV). Floating point values (NV) may also be larger than double. Since Perl 5.19.4 array indices are SSize_t instead of I32, so allow up to SSize_t_max on those versions. The limit is not imposed just by av_extend's argument type, but all the array handling code, so remove the speculative comment. Dagfinn Ilmari Mannsåker
-
Robert Haas authored
Etsuro Fujita, reviewed (though not completely endorsed) by Ashutosh Bapat, and slightly expanded by me.
-
Tom Lane authored
Commit 23a27b03 widened the rows-stored counters to uint64, but that's academic unless we allow the tuple pointer array to exceed 1GB. (It might be a good idea to provide some other limit on how much storage a SPITupleTable can eat. On the other hand, there are plenty of other ways to drive a backend into swap hell.) Dagfinn Ilmari Mannsåker
-
Robert Haas authored
The old code is bad for two reasons. First, it has an off-by-one error. Second, it won't help if you aren't running with assertions enabled. Per discussion, we want a check here in that case too. Author: KaiGai Kohei, adjusted by me. Reviewed-by: Petr Jelinek Discussion: 56E0D547.1030101@2ndquadrant.com
-
Tom Lane authored
I didn't bother with a catversion bump. Report and patch by Thomas Munro
-
Tom Lane authored
Previously, configure would take any string, including an empty string, leading to obscure compile failures in guc.c. It seems worth expending a few lines of code to ensure that the argument is a decimal number between 1 and 65535. Report and patch by Jim Nasby; reviews by Alex Shulgin, Peter Eisentraut, Ivan Kartyshov
-
- 13 Mar, 2016 7 commits
-
-
Tom Lane authored
Commit d88976cf removed this code from ginFreeScanKeys(): - if (entry->list) - pfree(entry->list); evidently in the belief that that ItemPointer array is allocated in the keyCtx and so would be reclaimed by the following MemoryContextReset. Unfortunately, it isn't and it won't. It'd likely be a good idea for that to become so, but as a simple and back-patchable fix in the meantime, restore this code to ginFreeScanKeys(). Also, add a similar pfree to where startScanEntry() is about to zero out entry->list. I am not sure if there are any code paths where this change prevents a leak today, but it seems like cheap future-proofing. In passing, make the initial allocation of so->entries[] use palloc not palloc0. The code doesn't depend on unused entries being zero; if it did, the array-enlargement code in ginFillScanEntry() would be wrong. So using palloc0 initially can only serve to confuse readers about what the invariant is. Per report from Felipe de Jesús Molina Bravo, via Jaime Casanova in <CAJGNTeMR1ndMU2Thpr8GPDUfiHTV7idELJRFusA5UXUGY1y-eA@mail.gmail.com>
-
Peter Eisentraut authored
-
Magnus Hagander authored
Per suggestion from Tomas Vondra Author: Julien Rouhaud
-
Magnus Hagander authored
Noted by Tomas Vondra
-
Tom Lane authored
_strtoui64() is available in MSVC builds, but apparently not with other Windows toolchains. Thanks to Petr Jelinek for the diagnosis.
- 12 Mar, 2016 4 commits
-
-
Tom Lane authored
Commit a2dabf0e had the bright idea that it could modify a "const" global variable if it merely casted away const from a pointer. This does not work on platforms where the compiler puts "const" variables into read-only storage. Depressingly, we evidently have no such platforms in our buildfarm ... an oversight I have now remedied. (The one platform that is known to catch this is recent OS X with -fno-common.) Per report from Chris Ruprecht. Back-patch to 9.5 where the bogus code was introduced.
-
Tom Lane authored
This patch widens SPI_processed, EState's es_processed field, PortalData's portalPos field, FuncCallContext's call_cntr and max_calls fields, ExecutorRun's count argument, PortalRunFetch's result, and the max number of rows in a SPITupleTable to uint64, and deals with (I hope) all the ensuing fallout. Some of these values were declared uint32 before, and others "long". I also removed PortalData's posOverflow field, since that logic seems pretty useless given that portalPos is now always 64 bits. The user-visible results are that command tags for SELECT etc will correctly report tuple counts larger than 4G, as will plpgsql's GET GET DIAGNOSTICS ... ROW_COUNT command. Queries processing more tuples than that are still not exactly the norm, but they're becoming more common. Most values associated with FETCH/MOVE distances, such as PortalRun's count argument and the count argument of most SPI functions that have one, remain declared as "long". It's not clear whether it would be worth promoting those to int64; but it would definitely be a large dollop of additional API churn on top of this, and it would only help 32-bit platforms which seem relatively less likely to see any benefit. Andreas Scherbaum, reviewed by Christian Ullrich, additional hacking by me
-
Andres Freund authored
Buildfarm members gaur and pademelon are old enough not to know about MAP_FAILED; which is used in 428b1d6b. Include portability/mem.h to fix; as already done in a bunch of other places.
-
Tom Lane authored
CitusDB is using these and don't wish to redesign their code right now. I am not on board with this being a good idea, or a good precedent, but I lack the energy to fight about it.
-
- 11 Mar, 2016 16 commits
-
-
Robert Haas authored
Commit a892234f added a second bit per page to the visibility map, but pg_upgrade has been unaware of it up until now. Therefore, a pg_upgrade from an earlier major release of PostgreSQL to any commit preceding this one and following the one mentioned above would result in invalid visibility map contents on the new cluster, very possibly leading to data corruption. This plugs that hole. Masahiko Sawada, reviewed by Jeff Janes, Bruce Momjian, Simon Riggs, Michael Paquier, Andres Freund, me, and others.
-
Tom Lane authored
It is frequently useful for volatile, set-returning, or expensive functions in a SELECT's targetlist to be postponed till after ORDER BY and LIMIT are done. Otherwise, the functions might be executed for every row of the table despite the presence of LIMIT, and/or be executed in an unexpected order. For example, in SELECT x, nextval('seq') FROM tab ORDER BY x LIMIT 10; it's probably desirable that the nextval() values are ordered the same as x, and that nextval() is not run more than 10 times. In the past, Postgres was inconsistent in this area: you would get the desirable behavior if the ordering were performed via an indexscan, but not if it had to be done by an explicit sort step. Getting the desired behavior reliably required contortions like SELECT x, nextval('seq') FROM (SELECT x FROM tab ORDER BY x) ss LIMIT 10; This patch conditionally postpones evaluation of pure-output target expressions (that is, those that are not used as DISTINCT, ORDER BY, or GROUP BY columns) so that they effectively occur after sorting, even if an explicit sort step is necessary. Volatile expressions and set-returning expressions are always postponed, so as to provide consistent semantics. Expensive expressions (costing more than 10 times typical operator cost, which by default would include any user-defined function) are postponed if there is a LIMIT or if there are expressions that must be postponed. We could be more aggressive and postpone any nontrivial expression, but there are costs associated with doing so: it requires an extra Result plan node which adds some overhead, and postponement changes the volume of data going through the sort step, perhaps for the worse. Since we tend not to have very good estimates of the output width of nontrivial expressions, it's hard to have much confidence in our ability to predict whether postponement would increase or decrease the cost of the sort; therefore this patch doesn't attempt to make decisions conditionally on that. Between these factors and a general desire not to change query behavior when there's not a demonstrable benefit, it seems best to be conservative about applying postponement. We might tweak the decision rules in the future, though. Konstantin Knizhnik, heavily rewritten by me
-
Teodor Sigaev authored
Also it fixes dynamic array allocation disallowed by ANSI-C. Author: Stas Kelvich
-
Teodor Sigaev authored
Some dictionaries have duplicated base words with different affix set, we just merge that sets into one set. But previously merging of sets of affixes was actually a concatenation of strings but it's wrong for numeric representation of affixes because such representation uses comma to separate affixes. Author: Artur Zakirov
-
Teodor Sigaev authored
-
Teodor Sigaev authored
Adds several tsvector editting function: convert tsvector to/from text array, set weight for given lexemes, delete lexeme(s), unnest, filter lexemes with given weights Author: Stas Kelvich with some editorization by me Reviewers: Tomas Vondram, Teodor Sigaev
-
Tom Lane authored
Teach make_group_input_target() and make_window_input_target() to work entirely with the PathTarget representation of tlists, rather than constructing a tlist and immediately deconstructing it into PathTarget format. In itself this only saves a few palloc's; the bigger picture is that it opens the door for sharing cost_qual_eval work across all of planner.c's constructions of PathTargets. I'll come back to that later. In support of this, flesh out tlist.c's infrastructure for PathTargets a bit more.
-
Magnus Hagander authored
New configuration parameter auto_explain.sample_ratio makes it possible to log just a fraction of the queries meeting the configured threshold, to reduce the amount of logging. Author: Craig Ringer and Julien Rouhaud Review: Petr Jelinek
-
Robert Haas authored
Andreas Karlsson and Robert Haas
-
Robert Haas authored
Per Amit Kapila.
-
Magnus Hagander authored
Much cruft had accumulated over time with a large number of parameters passed down between functions very deep. With this refactoring, instead introduce a StreamCtl structure that holds the parameters, and pass around a pointer to this structure instead. This makes it much easier to add or remove fields that are needed deeper down in the implementation without having to modify every function header in the file. Patch by me after much nagging from Andres Reviewed by Craig Ringer and Daniel Gustafsson
-
Simon Riggs authored
emit_log_hook could only see the translated text, making it harder to identify which message was being sent. Pass original text to allow the exact message to be identified, whichever language is used for logging. Discussion: 20160216.184755.59721141.horiguchi.kyotaro@lab.ntt.co.jp Author: Kyotaro Horiguchi
-
Robert Haas authored
The old code is wrong, because it returns a pointer to an automatic variable. And it's also more clever than we really need to be considering that the case it's worrying about should never happen.
-
Andres Freund authored
Reported-By: Peter Eisentraut Discussion: 56E2239E.1050607@gmx.net
-
Andres Freund authored
Up to now checkpoints were written in the order they're in the BufferDescriptors. That's nearly random in a lot of cases, which performs badly on rotating media, but even on SSDs it causes slowdowns. To avoid that, sort checkpoints before writing them out. We currently sort by tablespace, relfilenode, fork and block number. One of the major reasons that previously wasn't done, was fear of imbalance between tablespaces. To address that balance writes between tablespaces. The other prime concern was that the relatively large allocation to sort the buffers in might fail, preventing checkpoints from happening. Thus pre-allocate the required memory in shared memory, at server startup. This particularly makes it more efficient to have checkpoint flushing enabled, because that'll often result in a lot of writes that can be coalesced into one flush. Discussion: alpine.DEB.2.10.1506011320000.28433@sto Author: Fabien Coelho and Andres Freund
-
Andres Freund authored
Currently writes to the main data files of postgres all go through the OS page cache. This means that some operating systems can end up collecting a large number of dirty buffers in their respective page caches. When these dirty buffers are flushed to storage rapidly, be it because of fsync(), timeouts, or dirty ratios, latency for other reads and writes can increase massively. This is the primary reason for regular massive stalls observed in real world scenarios and artificial benchmarks; on rotating disks stalls on the order of hundreds of seconds have been observed. On linux it is possible to control this by reducing the global dirty limits significantly, reducing the above problem. But global configuration is rather problematic because it'll affect other applications; also PostgreSQL itself doesn't always generally want this behavior, e.g. for temporary files it's undesirable. Several operating systems allow some control over the kernel page cache. Linux has sync_file_range(2), several posix systems have msync(2) and posix_fadvise(2). sync_file_range(2) is preferable because it requires no special setup, whereas msync() requires the to-be-flushed range to be mmap'ed. For the purpose of flushing dirty data posix_fadvise(2) is the worst alternative, as flushing dirty data is just a side-effect of POSIX_FADV_DONTNEED, which also removes the pages from the page cache. Thus the feature is enabled by default only on linux, but can be enabled on all systems that have any of the above APIs. While desirable and likely possible this patch does not contain an implementation for windows. With the infrastructure added, writes made via checkpointer, bgwriter and normal user backends can be flushed after a configurable number of writes. Each of these sources of writes controlled by a separate GUC, checkpointer_flush_after, bgwriter_flush_after and backend_flush_after respectively; they're separate because the number of flushes that are good are separate, and because the performance considerations of controlled flushing for each of these are different. A later patch will add checkpoint sorting - after that flushes from the ckeckpoint will almost always be desirable. Bgwriter flushes are most of the time going to be random, which are slow on lots of storage hardware. Flushing in backends works well if the storage and bgwriter can keep up, but if not it can have negative consequences. This patch is likely to have negative performance consequences without checkpoint sorting, but unfortunately so has sorting without flush control. Discussion: alpine.DEB.2.10.1506011320000.28433@sto Author: Fabien Coelho and Andres Freund
-
- 10 Mar, 2016 2 commits
-
-
Tom Lane authored
All along, this function should have treated WindowFuncs in a manner similar to Aggrefs, ie with an option whether or not to recurse into them. By not considering the case, it was always recursing, which is OK for most callers (although I suspect that the case in prepare_sort_from_pathkeys might represent a bug). But now we need return-without-recursing behavior as well. There are also more than a few callers that should never see a WindowFunc, and now we'll get some error checking on that.
-
Robert Haas authored
Commit a892234f gave us enough infrastructure to avoid vacuuming pages where every tuple on the page is already frozen. So, replace the notion of a scan_all or whole-table vacuum with the less onerous notion of an "aggressive" vacuum, which will pages that are all-visible, but still skip those that are all-frozen. This should greatly reduce the cost of anti-wraparound vacuuming on large clusters where the majority of data is never touched between one cycle and the next, because we'll no longer have to read all of those pages only to find out that we don't need to do anything with them. Patch by me, reviewed by Masahiko Sawada.
-