• Tom Lane's avatar
    First cut at making useful selectivity estimates for range queries · 0dbffa70
    Tom Lane authored
    (ie, WHERE x > lowbound AND x < highbound).  It's not very bright yet
    but it does something useful.  Also, rename intltsel/intgtsel to
    scalarltsel/scalargtsel to reflect usage better.  Extend convert_to_scalar
    to do something a little bit useful with string data types.  Still need
    to make it do something with date/time datatypes, but I'll wait for
    Thomas's datetime unification dust to settle first.  Eventually the
    routine ought not have any type-specific knowledge at all; it ought to
    be calling a type-dependent routine found via a pg_type column; but
    that's a task for another day.
    0dbffa70
xindex.sgml 21.3 KB
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597
<!--
$Header: /cvsroot/pgsql/doc/src/sgml/xindex.sgml,v 1.7 2000/01/24 07:16:49 tgl Exp $
Postgres documentation
-->

 <chapter id="xindex">
  <title>Interfacing Extensions To Indices</title>

  <para>
   The procedures described thus far let you define a new type, new
   functions and new operators.  However, we cannot yet define a secondary
   index (such as a <acronym>B-tree</acronym>, <acronym>R-tree</acronym> or
   hash access method) over a new type or its operators.
  </para>

  <para>
   Look back at
   <xref endterm="EXTEND-CATALOGS" linkend="EXTEND-CATALOGS">.
   The right half shows the  catalogs  that we must modify in order to tell
   <productname>Postgres</productname> how to use a user-defined type and/or
   user-defined  operators with an index (i.e., <filename>pg_am, pg_amop,
    pg_amproc, pg_operator</filename> and <filename>pg_opclass</filename>).
   Unfortunately, there is no simple command to do this.  We will demonstrate
   how to modify these catalogs through a running example:  a  new  operator
   class for the <acronym>B-tree</acronym> access method that stores and
   sorts complex numbers in ascending absolute value order.
  </para>

  <para>
   The <filename>pg_am</filename> class contains one instance for every user
   defined access method.  Support for the heap access method is built into
   <productname>Postgres</productname>, but every other access method is
   described here.  The schema is

   <table tocentry="1">
    <title>Index Schema</title>
    <titleabbrev>Indices</titleabbrev>
    <tgroup cols="2">
     <thead>
      <row>
       <entry>Attribute</entry>
       <entry>Description</entry>
      </row>
     </thead>
     <tbody>
      <row>
       <entry>amname</entry>
       <entry>name of the access method</entry>
      </row>
      <row>
       <entry>amowner</entry>
       <entry>object id of the owner's instance in pg_user</entry>
      </row>
      <row>
       <entry>amkind</entry>
       <entry>not used at present, but set to 'o' as a place holder</entry>
      </row>
      <row>
       <entry>amstrategies</entry>
       <entry>number of strategies for this access method (see below)</entry>
      </row>
      <row>
       <entry>amsupport</entry>
       <entry>number of support routines for this access method (see below)</entry>
      </row>
      <row>
       <entry>amgettuple</entry>
      </row>
      <row>
       <entry>aminsert</entry>
      </row>
      <row>
       <entry>...</entry>
       <entry>procedure  identifiers  for  interface routines to the access
	method.  For example, regproc ids for opening,  closing,  and
	getting instances from the access method appear here.</entry>
      </row>
     </tbody>
    </tgroup>
   </table>
  </para>

  <para>
   The <acronym>object ID</acronym> of the instance in
   <filename>pg_am</filename> is used as a foreign key in lots of other
   classes.  You  don't  need to  add a new instance to this class; all
   you're interested in is the <acronym>object ID</acronym> of the access
   method instance you want to extend:

   <programlisting>
SELECT oid FROM pg_am WHERE amname = 'btree';

         +----+
         |oid |
         +----+
         |403 |
         +----+
   </programlisting>

   We will use that <command>SELECT</command> in a <command>WHERE</command>
   clause later.
  </para>

  <para>
   The <filename>amstrategies</filename> attribute exists to standardize
   comparisons across data types.  For example, <acronym>B-tree</acronym>s
   impose a strict ordering on keys, lesser to greater.  Since
   <productname>Postgres</productname> allows the user to define operators,
   <productname>Postgres</productname> cannot look at the name of an operator
   (eg, ">" or "<") and tell what kind of comparison it is.  In fact,
   some  access methods don't impose any ordering at all.  For example,
   <acronym>R-tree</acronym>s express a rectangle-containment relationship,
   whereas a hashed data structure expresses only bitwise similarity based
   on the value of a hash function.  <productname>Postgres</productname>
   needs some consistent way of taking a qualification in your query,
   looking at the operator and then deciding if a usable index exists.  This
   implies that <productname>Postgres</productname> needs to know, for
   example, that the  "<="  and  ">" operators partition a
   <acronym>B-tree</acronym>.  <productname>Postgres</productname>
   uses strategies to express these relationships  between
   operators and the way they can be used to scan indices.
  </para>

  <para>
   Defining a new set of strategies is beyond the scope of this discussion,
   but we'll explain how <acronym>B-tree</acronym> strategies work because
   you'll need to know that to add a new operator class. In the
   <filename>pg_am</filename> class, the amstrategies attribute is the
   number of strategies defined for this access method. For
   <acronym>B-tree</acronym>s, this number is 5.  These strategies
   correspond to

   <table tocentry="1">
    <title>B-tree Strategies</title>
    <titleabbrev>B-tree</titleabbrev>
    <tgroup cols="2">
     <thead>
      <row>
       <entry>Operation</entry>
       <entry>Index</entry>
      </row>
     </thead>
     <tbody>
      <row>
       <entry>less than</entry>
       <entry>1</entry>
      </row>
      <row>
       <entry>less than or equal</entry>
       <entry>2</entry>
      </row>
      <row>
       <entry>equal</entry>
       <entry>3</entry>
      </row>
      <row>
       <entry>greater than or equal</entry>
       <entry>4</entry>
      </row>
      <row>
       <entry>greater than</entry>
       <entry>5</entry>
      </row>
     </tbody>
    </tgroup>
   </table>
  </para>

  <para>
   The idea is that you'll need to add procedures corresponding to the
   comparisons above to the <filename>pg_amop</filename> relation (see below).
   The access method code can use these strategy numbers, regardless of data
   type, to figure out how to partition the <acronym>B-tree</acronym>,
   compute selectivity, and so on.  Don't worry about the details of adding
   procedures yet; just understand that there must be a set of these
   procedures for <filename>int2, int4, oid,</filename> and every other
   data type on which a <acronym>B-tree</acronym> can operate.
  </para>

  <para>
   Sometimes, strategies aren't enough information for the system to figure
   out how to use an index.  Some access methods require other support
   routines in order to work. For example, the <acronym>B-tree</acronym>
   access method must be able to compare two keys and determine whether one
   is greater than, equal to, or less than the other.  Similarly, the
   <acronym>R-tree</acronym> access method must be able to compute
   intersections,  unions, and sizes of rectangles.  These
   operations do not correspond to user qualifications in
   SQL queries;  they are administrative routines used by
   the access methods, internally.
  </para>

  <para>
   In order to manage diverse support routines consistently across all
   <productname>Postgres</productname> access methods,
   <filename>pg_am</filename> includes an attribute called
   <filename>amsupport</filename>.  This attribute records the number of
   support routines used by an access method.  For <acronym>B-tree</acronym>s,
   this number is one -- the routine to take two keys and return -1, 0, or
   +1, depending on whether the first key is less than, equal
   to, or greater than the second.

   <note>
    <para>
     Strictly  speaking, this routine can return a negative
     number (< 0), 0, or a non-zero positive number (> 0).
    </para>
   </note>
  </para>

  <para>
   The <filename>amstrategies</filename> entry in pg_am is just the number
   of strategies defined for the access method in question.  The procedures
   for less than, less equal, and so on don't appear in
   <filename>pg_am</filename>.  Similarly, <filename>amsupport</filename>
   is just the number of support routines required by  the  access
   method.  The actual routines are listed elsewhere.
  </para>

  <para>
   The next class of interest is pg_opclass.  This class exists only to
   associate a name and default type with an oid.  In pg_amop, every
   <acronym>B-tree</acronym> operator class has a set of procedures, one
   through five, above. Some existing opclasses are <filename>int2_ops,
    int4_ops, and oid_ops</filename>.  You need to add an instance with your
   opclass name (for example, <filename>complex_abs_ops</filename>) to
   <filename>pg_opclass</filename>.  The <filename>oid</filename> of
   this instance is a foreign key in other classes.

   <programlisting>
INSERT INTO pg_opclass (opcname, opcdeftype)
    SELECT 'complex_abs_ops', oid FROM pg_type WHERE typname = 'complex_abs';

SELECT oid, opcname, opcdeftype
    FROM pg_opclass
    WHERE opcname = 'complex_abs_ops';

         +------+-----------------+------------+
         |oid   | opcname         | opcdeftype |
         +------+-----------------+------------+
         |17314 | complex_abs_ops |      29058 |
         +------+-----------------+------------+
   </programlisting>

   Note that the oid for your <filename>pg_opclass</filename> instance will
   be different!  Don't worry about this though.  We'll get this number
   from the system later just like we got the oid of the type here.
  </para>

  <para>
   So now we have an access method and an operator  class.
   We  still  need  a  set of operators; the procedure for
   defining operators was discussed earlier in  this  manual.
   For  the  complex_abs_ops  operator  class on Btrees,
   the operators we require are:

   <programlisting>
        absolute value less-than
        absolute value less-than-or-equal
        absolute value equal
        absolute value greater-than-or-equal
        absolute value greater-than
   </programlisting>
  </para>

  <para>
   Suppose the code that implements the functions  defined
   is stored in the file
   <filename>PGROOT/src/tutorial/complex.c</filename>
  </para>

  <para>
   Part of the code look like this: (note that we will only show the
   equality operator for the rest of the examples.  The other four
   operators are very similar.  Refer to <filename>complex.c</filename>
   or <filename>complex.source</filename> for the details.)

   <programlisting>
#define Mag(c) ((c)-&gt;x*(c)-&gt;x + (c)-&gt;y*(c)-&gt;y)

         bool
         complex_abs_eq(Complex *a, Complex *b)
         {
             double amag = Mag(a), bmag = Mag(b);
             return (amag==bmag);
         }
   </programlisting>
  </para>

  <para>
   There are a couple of important things that are happening below.
  </para>

  <para>
   First, note that operators for less-than, less-than-or equal, equal,
   greater-than-or-equal, and greater-than for <filename>int4</filename>
   are being defined.  All of these operators are already defined for
   <filename>int4</filename> under the names &lt;, &lt;=, =, &gt;=,
   and &gt;. The new operators behave differently, of course.  In order
   to guarantee that <productname>Postgres</productname> uses these
   new operators rather than the old ones, they need to be named differently
   from the old ones.  This is a key point: you can overload operators in
   <productname>Postgres</productname>, but only if the operator isn't
   already defined for the argument types.  That is, if you have &lt;
   defined for (int4, int4), you can't define it again.
   <productname>Postgres</productname> does not check this when you define
   your operator, so be careful.  To avoid this problem, odd names will be
   used for the operators.  If you get this wrong, the access methods
   are likely to crash when you try to do scans.
  </para>

  <para>
   The other important point is that all the operator functions return
   Boolean values.  The access methods rely on this fact.  (On the other
   hand, the support function returns whatever the particular access method
   expects -- in this case, a signed integer.) The final routine in the
   file is the "support routine" mentioned when we discussed the amsupport
   attribute of the <filename>pg_am</filename> class.  We will use this
   later on.  For now, ignore it.
  </para>

  <para>
   <programlisting>
CREATE FUNCTION complex_abs_eq(complex_abs, complex_abs)
              RETURNS bool
              AS 'PGROOT/tutorial/obj/complex.so'
              LANGUAGE 'c';
   </programlisting>
  </para>

  <para>
   Now define the operators that use them.  As noted, the operator names
   must be unique among all operators that take two <filename>int4</filename>
   operands.  In order to see if the operator names listed below are taken,
   we can do a query  on <filename>pg_operator</filename>:

   <programlisting>
    /*
     * this query uses the regular expression operator (~)
     * to find three-character operator names that end in
     * the character &amp;
     */
    SELECT *
     FROM pg_operator
     WHERE oprname ~ '^..&amp;$'::text;
   </programlisting>

  </para>

  <para>
   to see if your name is taken for the types you want.  The important
   things here are the procedure (which are the <acronym>C</acronym>
   functions defined above) and the restriction and join selectivity
   functions.  You should just use the ones used below--note that there
   are different such functions for the less-than, equal, and greater-than
   cases.  These must be supplied, or the access method will crash when it
   tries to use the operator.  You should copy the names for restrict and
   join, but use the procedure names you defined in the last step.

   <programlisting>
CREATE OPERATOR = (
     leftarg = complex_abs, rightarg = complex_abs,
     procedure = complex_abs_eq,
     restrict = eqsel, join = eqjoinsel
         )
   </programlisting>
  </para>

  <para>
   Notice that five operators corresponding to less,  less equal, equal,
   greater, and greater equal are defined.
  </para>

  <para>
   We're just about finished. the last thing we need to do is to update
   the <filename>pg_amop</filename> relation.  To do this, we need the
   following attributes:

   <table tocentry="1">
    <title><filename>pg_amproc</filename> Schema</title>
    <titleabbrev><filename>pg_amproc</filename></titleabbrev>
    <tgroup cols="2">
     <thead>
      <row>
       <entry>Attribute</entry>
       <entry>Description</entry>
      </row>
     </thead>
     <tbody>
      <row>
       <entry>amopid</entry>
       <entry>the <filename>oid</filename> of the <filename>pg_am</filename> instance
	for  B-tree (== 403, see above)</entry>
      </row>
      <row>
       <entry>amopclaid</entry>
       <entry>the <filename>oid</filename> of the
	<filename>pg_opclass</filename>  instance for <filename>complex_abs_ops</filename>
	(== whatever you got instead  of <filename>17314</filename>, see above)</entry>
      </row>
      <row>
       <entry>amopopr</entry>
       <entry>the <filename>oid</filename>s of the  operators  for the opclass
	(which we'll get in just a minute)</entry>
      </row>
     </tbody>
    </tgroup>
   </table>
  </para>

  <para>
   So we need the <filename>oid</filename>s of the operators we just
   defined.  We'll look up the names of all the operators that take
   two <filename>complex</filename>es, and pick ours out:
   
   <programlisting>
    SELECT o.oid AS opoid, o.oprname
     INTO TABLE complex_ops_tmp
     FROM pg_operator o, pg_type t
     WHERE o.oprleft = t.oid and o.oprright = t.oid
      and t.typname = 'complex_abs';

         +------+---------+
         |oid   | oprname |
         +------+---------+
         |17321 | &lt;       |
         +------+---------+
         |17322 | &lt;=      |
         +------+---------+
         |17323 |  =      |
         +------+---------+
         |17324 | &gt;=      |
         +------+---------+
         |17325 | &gt;       |
         +------+---------+
   </programlisting>

   (Again, some of your <filename>oid</filename> numbers will almost
   certainly be different.)  The operators we are interested in are those
   with <filename>oid</filename>s 17321 through 17325.  The values you
   get will probably be different, and you should substitute them for the
   values below.  We will do this with a select statement.
  </para>

  <para>
   Now we're ready to update <filename>pg_amop</filename> with our new
   operator class.  The most important thing in this entire discussion
   is that the operators are ordered, from less equal through greater
   equal, in <filename>pg_amop</filename>.  We add the instances we need:

   <programlisting>
    INSERT INTO pg_amop (amopid, amopclaid, amopopr, amopstrategy)
        SELECT am.oid, opcl.oid, c.opoid, 1
        FROM pg_am am, pg_opclass opcl, complex_abs_ops_tmp c
        WHERE amname = 'btree' AND
            opcname = 'complex_abs_ops' AND
            c.oprname = '<';
   </programlisting>

   Now do this for the other operators substituting for the "1" in the
   third line above and the "<" in the last line.  Note the order:
   "less than" is 1, "less than or equal" is 2, "equal" is 3, "greater
   than or equal" is 4, and "greater than" is 5.
  </para>

  <para>
   The next step is registration of the "support routine" previously
   described in our discussion of <filename>pg_am</filename>.  The
   <filename>oid</filename> of this support routine is stored in the
   <filename>pg_amproc</filename> class, keyed by the access method
   <filename>oid</filename> and the operator class <filename>oid</filename>.
   First, we need to register the function in
   <productname>Postgres</productname> (recall that we put the
   <acronym>C</acronym> code that implements this routine in the bottom of
   the file in which we implemented the operator routines):

   <programlisting>
    CREATE FUNCTION complex_abs_cmp(complex, complex)
     RETURNS int4
     AS 'PGROOT/tutorial/obj/complex.so'
     LANGUAGE 'c';

    SELECT oid, proname FROM pg_proc
     WHERE proname = 'complex_abs_cmp';

         +------+-----------------+
         |oid   | proname         |
         +------+-----------------+
         |17328 | complex_abs_cmp |
         +------+-----------------+
   </programlisting>

   (Again, your <filename>oid</filename> number will probably be different
   and you should substitute the value you see for the value below.)
   We can add the new instance as follows:

   <programlisting>
    INSERT INTO pg_amproc (amid, amopclaid, amproc, amprocnum)
        SELECT a.oid, b.oid, c.oid, 1
            FROM pg_am a, pg_opclass b, pg_proc c
            WHERE a.amname = 'btree' AND
                b.opcname = 'complex_abs_ops' AND
                c.proname = 'complex_abs_cmp';
   </programlisting>
  </para>

  <para>
   Now we need to add a hashing strategy to allow the type to be indexed.
   We do this by using another type in pg_am but we reuse the same ops.

   <programlisting>
    INSERT INTO pg_amop (amopid, amopclaid, amopopr, amopstrategy)
        SELECT am.oid, opcl.oid, c.opoid, 1
        FROM pg_am am, pg_opclass opcl, complex_abs_ops_tmp c
        WHERE amname = 'hash' AND
            opcname = 'complex_abs_ops' AND
            c.oprname = '=';
   </programlisting>
  </para>

  <para>
   In order to use this index in a where clause, we need to modify the
   <filename>pg_operator</filename> class as follows.

   <programlisting>
    UPDATE pg_operator
        SET oprrest = 'eqsel'::regproc, oprjoin = 'eqjoinsel'
        WHERE oprname = '=' AND
            oprleft = oprright AND
            oprleft = (SELECT oid FROM pg_type WHERE typname = 'complex_abs');
    
    UPDATE pg_operator
        SET oprrest = 'neqsel'::regproc, oprjoin = 'neqjoinsel'
        WHERE oprname = '<filename>' AND
            oprleft = oprright AND
            oprleft = (SELECT oid FROM pg_type WHERE typname = 'complex_abs');
    
    UPDATE pg_operator
        SET oprrest = 'neqsel'::regproc, oprjoin = 'neqjoinsel'
        WHERE oprname = '<filename>' AND
            oprleft = oprright AND
            oprleft = (SELECT oid FROM pg_type WHERE typname = 'complex_abs');
    
    UPDATE pg_operator
        SET oprrest = 'scalarltsel'::regproc, oprjoin = 'scalarltjoinsel'
        WHERE oprname = '<' AND 
            oprleft = oprright AND
            oprleft = (SELECT oid FROM pg_type WHERE typname = 'complex_abs');
    
    UPDATE pg_operator
        SET oprrest = 'scalarltsel'::regproc, oprjoin = 'scalarltjoinsel'
        WHERE oprname = '<=' AND
            oprleft = oprright AND
            oprleft = (SELECT oid FROM pg_type WHERE typname = 'complex_abs');
    
    UPDATE pg_operator
        SET oprrest = 'scalargtsel'::regproc, oprjoin = 'scalargtjoinsel'
        WHERE oprname = '>' AND
            oprleft = oprright AND
            oprleft = (SELECT oid FROM pg_type WHERE typname = 'complex_abs');
    
    UPDATE pg_operator
        SET oprrest = 'scalargtsel'::regproc, oprjoin = 'scalargtjoinsel'
        WHERE oprname = '>=' AND
            oprleft = oprright AND
            oprleft = (SELECT oid FROM pg_type WHERE typname = 'complex_abs');</filename></filename>
   </programlisting> 
  </para>

  <para>
   And last (Finally!) we register a description of this type.

   <programlisting>
    INSERT INTO pg_description (objoid, description) 
    SELECT oid, 'Two part G/L account'
	    FROM pg_type WHERE typname = 'complex_abs';
   </programlisting> 
  </para>

 </chapter>

<!-- Keep this comment at the end of the file
Local variables:
mode: sgml
sgml-omittag:nil
sgml-shorttag:t
sgml-minimize-attributes:nil
sgml-always-quote-attributes:t
sgml-indent-step:1
sgml-indent-data:t
sgml-parent-document:nil
sgml-default-dtd-file:"./reference.ced"
sgml-exposed-tags:nil
sgml-local-catalogs:"/usr/lib/sgml/catalog"
sgml-local-ecat-files:nil
End:
-->