1. 21 Dec, 2016 1 commit
    • Tom Lane's avatar
      Fix strange behavior (and possible crashes) in full text phrase search. · 89fcea1a
      Tom Lane authored
      In an attempt to simplify the tsquery matching engine, the original
      phrase search patch invented rewrite rules that would rearrange a
      tsquery so that no AND/OR/NOT operator appeared below a PHRASE operator.
      But this approach had numerous problems.  The rearrangement step was
      missed by ts_rewrite (and perhaps other places), allowing tsqueries
      to be created that would cause Assert failures or perhaps crashes at
      execution, as reported by Andreas Seltenreich.  The rewrite rules
      effectively defined semantics for operators underneath PHRASE that were
      buggy, or at least unintuitive.  And because rewriting was done in
      tsqueryin() rather than at execution, the rearrangement was user-visible,
      which is not very desirable --- for example, it might cause unexpected
      matches or failures to match in ts_rewrite.
      
      As a somewhat independent problem, the behavior of nested PHRASE operators
      was only sane for left-deep trees; queries like "x <-> (y <-> z)" did not
      behave intuitively at all.
      
      To fix, get rid of the rewrite logic altogether, and instead teach the
      tsquery execution engine to manage AND/OR/NOT below a PHRASE operator
      by explicitly computing the match location(s) and match widths for these
      operators.
      
      This requires introducing some additional fields into the publicly visible
      ExecPhraseData struct; but since there's no way for third-party code to
      pass such a struct to TS_phrase_execute, it shouldn't create an ABI problem
      as long as we don't move the offsets of the existing fields.
      
      Another related problem was that index searches supposed that "!x <-> y"
      could be lossily approximated as "!x & y", which isn't correct because
      the latter will reject, say, "x q y" which the query itself accepts.
      This required some tweaking in TS_execute_ternary along with the main
      tsquery engine.
      
      Back-patch to 9.6 where phrase operators were introduced.  While this
      could be argued to change behavior more than we'd like in a stable branch,
      we have to do something about the crash hazards and index-vs-seqscan
      inconsistency, and it doesn't seem desirable to let the unintuitive
      behaviors induced by the rewriting implementation stand as precedent.
      
      Discussion: https://postgr.es/m/28215.1481999808@sss.pgh.pa.us
      Discussion: https://postgr.es/m/26706.1482087250@sss.pgh.pa.us
      89fcea1a
  2. 16 Dec, 2016 1 commit
    • Tom Lane's avatar
      Improve documentation around TS_execute(). · 23c75b55
      Tom Lane authored
      I got frustrated by the lack of commentary in this area, so here is some
      reverse-engineered documentation, along with minor stylistic cleanup.
      No code changes more significant than removal of unused variables.
      
      Back-patch to 9.6, not because that's useful in itself, but because
      we have some bugs to fix in phrase search and this would cause merge
      failures if it's only in HEAD.
      23c75b55
  3. 27 Jun, 2016 1 commit
    • Teodor Sigaev's avatar
      Do not fallback to AND for FTS phrase operator. · 3dbbd0f0
      Teodor Sigaev authored
      If there is no positional information of lexemes then phrase operator will not
      fallback to AND operator. This change makes needing to modify TS_execute()
      interface, because somewhere (in indexes, for example) positional information
      is unaccesible and in this cases we need to force fallback to AND.
      
      Per discussion c19fcfec308e6ccd952cdde9e648b505@mail.gmail.com
      3dbbd0f0
  4. 09 Jun, 2016 1 commit
  5. 07 Apr, 2016 1 commit
    • Teodor Sigaev's avatar
      Phrase full text search. · bb140506
      Teodor Sigaev authored
      Patch introduces new text search operator (<-> or <DISTANCE>) into tsquery.
      On-disk and binary in/out format of tsquery are backward compatible.
      It has two side effect:
      - change order for tsquery, so, users, who has a btree index over tsquery,
        should reindex it
      - less number of parenthesis in tsquery output, and tsquery becomes more
        readable
      
      Authors: Teodor Sigaev, Oleg Bartunov, Dmitry Ivanov
      Reviewers: Alexander Korotkov, Artur Zakirov
      bb140506
  6. 02 Mar, 2016 1 commit
    • Tom Lane's avatar
      Create stub functions to support pg_upgrade of old contrib/tsearch2. · eb43e851
      Tom Lane authored
      Commits 9ff60273 and dbe23289 adjusted the declarations
      of some core functions referenced by contrib/tsearch2's install script,
      forgetting that in a pg_upgrade situation, we'll be trying to restore
      operator class definitions that reference the old signatures.  We've
      hit this problem before; solve it in the same way as before, namely by
      installing stub functions that have the expected signature and just
      invoke the correct function.  Per report from Jeff Janes.
      
      (Someday we ought to stop supporting contrib/tsearch2, but I'm not
      sure today is that day.)
      eb43e851
  7. 02 Jan, 2016 1 commit
  8. 06 Jan, 2015 1 commit
  9. 12 Mar, 2014 1 commit
    • Heikki Linnakangas's avatar
      Allow opclasses to provide tri-valued GIN consistent functions. · c5608ea2
      Heikki Linnakangas authored
      With the GIN "fast scan" feature, GIN can skip items without fetching all
      the keys for them, if it can prove that they don't match regardless of
      those keys. So far, it has done the proving by calling the boolean
      consistent function with all combinations of TRUE/FALSE for the unfetched
      keys, but since that's O(n^2), it becomes unfeasible with more than a few
      keys. We can avoid calling consistent with all the combinations, if we can
      tell the operator class implementation directly which keys are unknown.
      
      This commit includes a triConsistent function for the built-in array and
      tsvector opclasses.
      
      Alexander Korotkov, with some changes by me.
      c5608ea2
  10. 07 Jan, 2014 1 commit
  11. 01 Jan, 2013 1 commit
  12. 24 Jun, 2012 1 commit
    • Peter Eisentraut's avatar
      Replace int2/int4 in C code with int16/int32 · b8b2e3b2
      Peter Eisentraut authored
      The latter was already the dominant use, and it's preferable because
      in C the convention is that intXX means XX bits.  Therefore, allowing
      mixed use of int2, int4, int8, int16, int32 is obviously confusing.
      
      Remove the typedefs for int2 and int4 for now.  They don't seem to be
      widely used outside of the PostgreSQL source tree, and the few uses
      can probably be cleaned up by the time this ships.
      b8b2e3b2
  13. 01 Jan, 2012 1 commit
  14. 16 Feb, 2011 1 commit
    • Tom Lane's avatar
      Add backwards-compatible declarations of some core GIN support functions. · 6595dd04
      Tom Lane authored
      These are needed to support reloading dumps of 9.0 installations containing
      contrib/intarray or contrib/tsearch2.  Since not only regular dump/reload
      but binary upgrade would fail, it seems worth the trouble to carry these
      stubs for awhile.  Note that the contrib opclasses referencing these
      functions will still work fine, since GIN doesn't actually pay any
      attention to the declared signature of a support function.
      6595dd04
  15. 09 Jan, 2011 1 commit
  16. 01 Jan, 2011 1 commit
  17. 20 Sep, 2010 1 commit
  18. 02 Jan, 2010 1 commit
  19. 16 Jul, 2009 1 commit
    • Peter Eisentraut's avatar
      Make backend header files C++ safe · de160e2c
      Peter Eisentraut authored
      This alters various incidental uses of C++ key words to use other similar
      identifiers, so that a C++ compiler won't choke outright.  You still
      (probably) need extern "C" { }; around the inclusion of backend headers.
      
      based on a patch by Kurt Harriman <harriman@acm.org>
      
      Also add a script cpluspluscheck to check for C++ compatibility in the
      future.  As of right now, this passes without error for me.
      de160e2c
  20. 11 Jun, 2009 1 commit
  21. 01 Jan, 2009 1 commit
  22. 16 May, 2008 1 commit
  23. 21 Apr, 2008 1 commit
    • Tom Lane's avatar
      Allow float8, int8, and related datatypes to be passed by value on machines · 8472bf7a
      Tom Lane authored
      where Datum is 8 bytes wide.  Since this will break old-style C functions
      (those still using version 0 calling convention) that have arguments or
      results of these types, provide a configure option to disable it and retain
      the old pass-by-reference behavior.  Likewise, provide a configure option
      to disable the recently-committed float4 pass-by-value change.
      
      Zoltan Boszormenyi, plus configurability stuff by me.
      8472bf7a
  24. 25 Mar, 2008 1 commit
    • Tom Lane's avatar
      Simplify and standardize conversions between TEXT datums and ordinary C · 220db7cc
      Tom Lane authored
      strings.  This patch introduces four support functions cstring_to_text,
      cstring_to_text_with_len, text_to_cstring, and text_to_cstring_buffer, and
      two macros CStringGetTextDatum and TextDatumGetCString.  A number of
      existing macros that provided variants on these themes were removed.
      
      Most of the places that need to make such conversions now require just one
      function or macro call, in place of the multiple notational layers that used
      to be needed.  There are no longer any direct calls of textout or textin,
      and we got most of the places that were using handmade conversions via
      memcpy (there may be a few still lurking, though).
      
      This commit doesn't make any serious effort to eliminate transient memory
      leaks caused by detoasting toasted text objects before they reach
      text_to_cstring.  We changed PG_GETARG_TEXT_P to PG_GETARG_TEXT_PP in a few
      places where it was easy, but much more could be done.
      
      Brendan Jurd and Tom Lane
      220db7cc
  25. 01 Jan, 2008 1 commit
  26. 28 Nov, 2007 1 commit
  27. 15 Nov, 2007 2 commits
  28. 13 Nov, 2007 1 commit
  29. 23 Oct, 2007 1 commit
  30. 21 Oct, 2007 1 commit
  31. 19 Oct, 2007 1 commit
  32. 10 Sep, 2007 1 commit
  33. 07 Sep, 2007 1 commit
    • Teodor Sigaev's avatar
      Refactoring by Heikki Linnakangas <heikki@enterprisedb.com> with · e5be8998
      Teodor Sigaev authored
      small editorization by me
      
      - Brake the QueryItem struct into QueryOperator and QueryOperand.
        Type was really the only common field between them. QueryItem still
        exists, and is used in the TSQuery struct as before, but it's now a
        union of the two. Many other changes fell from that, like separation
        of pushval_asis function into pushValue, pushOperator and pushStop.
      
      - Moved some structs that were for internal use only from header files
        to the right .c-files.
      
      - Moved tsvector parser to a new tsvector_parser.c file. Parser code was
        about half of the size of tsvector.c, it's also used from tsquery.c, and
        it has some data structures of its own, so it seems better to separate
        it. Cleaned up the API so that TSVectorParserState is not accessed from
        outside tsvector_parser.c.
      
      - Separated enumerations (#defines, really) used for QueryItem.type
        field and as return codes from gettoken_query. It was just accidental
        code sharing.
      
      - Removed ParseQueryNode struct used internally by makepol and friends.
        push*-functions now construct QueryItems directly.
      
      - Changed int4 variables to just ints for variables like "i" or "array
        size", where the storage-size was not significant.
      e5be8998
  34. 25 Aug, 2007 1 commit
    • Tom Lane's avatar
      Cleanup for some problems in tsearch patch: · 7351b5fa
      Tom Lane authored
      - ispell initialization crashed on empty dictionary file
      - ispell initialization crashed on affix file with prefixes but no suffixes
      - stop words file was run through pg_verify_mbstr, with database
        encoding, but it's supposed to be UTF-8; similar bug for synonym files
      - bunch of comments added, typos fixed, and other cleanup
      
      Introduced consistent encoding checking/conversion of data read from tsearch
      configuration files, by doing this in a single t_readline() subroutine
      (replacing direct usages of fgets).  Cleaned up API for readstopwords too.
      
      Heikki Linnakangas
      7351b5fa
  35. 21 Aug, 2007 1 commit