• Tom Lane's avatar
    Make reformat_dat_file.pl preserve all blank lines. · 2cdf359f
    Tom Lane authored
    In its original form, reformat_dat_file.pl smashed consecutive blank
    lines to a single blank line, which was helpful for mopping up excess
    whitespace during the bootstrap data format conversion.  But going
    forward, there seems little reason to do that; if developers want to
    put in multiple blank lines, let 'em.  This makes it conform to the
    documentation I (tgl) wrote, too.
    
    In passing, clean up some sloppy markup choices in bki.sgml.
    
    John Naylor
    
    Discussion: https://postgr.es/m/28827.1523039259@sss.pgh.pa.us
    2cdf359f
bki.sgml 36.8 KB
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967
<!-- doc/src/sgml/bki.sgml -->

<chapter id="bki">
 <title>System Catalog Declarations and Initial Contents</title>

 <para>
  <productname>PostgreSQL</productname> uses many different system catalogs
  to keep track of the existence and properties of database objects, such as
  tables and functions.  Physically there is no difference between a system
  catalog and a plain user table, but the backend C code knows the structure
  and properties of each catalog, and can manipulate it directly at a low
  level.  Thus, for example, it is inadvisable to attempt to alter the
  structure of a catalog on-the-fly; that would break assumptions built into
  the C code about how rows of the catalog are laid out.  But the structure
  of the catalogs can change between major versions.
 </para>

 <para>
  The structures of the catalogs are declared in specially formatted C
  header files in the <filename>src/include/catalog/</filename> directory of
  the source tree.  In particular, for each catalog there is a header file
  named after the catalog (e.g., <filename>pg_class.h</filename>
  for <structname>pg_class</structname>), which defines the set of columns
  the catalog has, as well as some other basic properties such as its OID.
  Other critical files defining the catalog structure
  include <filename>indexing.h</filename>, which defines the indexes present
  on all the system catalogs, and <filename>toasting.h</filename>, which
  defines TOAST tables for catalogs that need one.
 </para>

 <para>
  Many of the catalogs have initial data that must be loaded into them
  during the <quote>bootstrap</quote> phase
  of <application>initdb</application>, to bring the system up to a point
  where it is capable of executing SQL commands.  (For
  example, <filename>pg_class.h</filename> must contain an entry for itself,
  as well as one for each other system catalog and index.)  This
  initial data is kept in editable form in data files that are also stored
  in the <filename>src/include/catalog/</filename> directory.  For example,
  <filename>pg_proc.dat</filename> describes all the initial rows that must
  be inserted into the <structname>pg_proc</structname> catalog.
 </para>

 <para>
  To create the catalog files and load this initial data into them, a
  backend running in bootstrap mode reads a <acronym>BKI</acronym>
  (Backend Interface) file containing commands and initial data.
  The <filename>postgres.bki</filename> file used in this mode is prepared
  from the aforementioned header and data files, while building
  a <productname>PostgreSQL</productname> distribution, by a Perl script
  named <filename>genbki.pl</filename>.
  Although it's specific to a particular <productname>PostgreSQL</productname>
  release, <filename>postgres.bki</filename> is platform-independent and is
  installed in the <filename>share</filename> subdirectory of the
  installation tree.
 </para>

 <para>
  <filename>genbki.pl</filename> also produces a derived header file for
  each catalog, for example <filename>pg_class_d.h</filename> for
  the <structname>pg_class</structname> catalog.  This file contains
  automatically-generated macro definitions, and may contain other macros,
  enum declarations, and so on that can be useful for client C code that
  reads a particular catalog.
 </para>

 <para>
  Most Postgres developers don't need to be directly concerned with
  the <acronym>BKI</acronym> file, but almost any nontrivial feature
  addition in the backend will require modifying the catalog header files
  and/or initial data files.  The rest of this chapter gives some
  information about that, and for completeness describes
  the <acronym>BKI</acronym> file format.
 </para>

 <sect1 id="system-catalog-declarations">
  <title>System Catalog Declaration Rules</title>

  <para>
   The key part of a catalog header file is a C structure definition
   describing the layout of each row of the catalog.  This begins with
   a <literal>CATALOG</literal> macro, which so far as the C compiler is
   concerned is just shorthand for <literal>typedef struct
   FormData_<replaceable>catalogname</replaceable></literal>.
   Each field in the struct gives rise to a catalog column.
   Fields can be annotated using the BKI property macros described
   in <filename>genbki.h</filename>, for example to define a default value
   for a field or mark it as nullable or not nullable.
   The <literal>CATALOG</literal> line can also be annotated, with some
   other BKI property macros described in <filename>genbki.h</filename>, to
   define other properties of the catalog as a whole, such as whether
   it has OIDs (by default, it does).
  </para>

  <para>
   The system catalog cache code (and most catalog-munging code in general)
   assumes that the fixed-length portions of all system catalog tuples are
   in fact present, because it maps this C struct declaration onto them.
   Thus, all variable-length fields and nullable fields must be placed at
   the end, and they cannot be accessed as struct fields.
   For example, if you tried to
   set <structname>pg_type</structname>.<structfield>typrelid</structfield>
   to be NULL, it would fail when some piece of code tried to reference
   <literal>typetup-&gt;typrelid</literal> (or worse,
   <literal>typetup-&gt;typelem</literal>, because that follows
   <structfield>typrelid</structfield>).  This would result in
   random errors or even segmentation violations.
  </para>

  <para>
   As a partial guard against this type of error, variable-length or
   nullable fields should not be made directly visible to the C compiler.
   This is accomplished by wrapping them in <literal>#ifdef
   CATALOG_VARLEN</literal> ... <literal>#endif</literal> (where
   <literal>CATALOG_VARLEN</literal> is a symbol that is never defined).
   This prevents C code from carelessly trying to access fields that might
   not be there or might be at some other offset.
   As an independent guard against creating incorrect rows, we
   require all columns that should be non-nullable to be marked so
   in <structname>pg_attribute</structname>.  The bootstrap code will
   automatically mark catalog columns as <literal>NOT NULL</literal>
   if they are fixed-width and are not preceded by any nullable column.
   Where this rule is inadequate, you can force correct marking by using
   <literal>BKI_FORCE_NOT_NULL</literal>
   and <literal>BKI_FORCE_NULL</literal> annotations as needed.  But note
   that <literal>NOT NULL</literal> constraints are only enforced in the
   executor, not against tuples that are generated by random C code,
   so care is still needed when manually creating or updating catalog rows.
  </para>

  <para>
   Frontend code should not include any <filename>pg_xxx.h</filename>
   catalog header file, as these files may contain C code that won't compile
   outside the backend.  (Typically, that happens because these files also
   contain declarations for functions
   in <filename>src/backend/catalog/</filename> files.)
   Instead, frontend code may include the corresponding
   generated <filename>pg_xxx_d.h</filename> header, which will contain
   OID <literal>#define</literal>s and any other data that might be of use
   on the client side.  If you want macros or other code in a catalog header
   to be visible to frontend code, write <literal>#ifdef
   EXPOSE_TO_CLIENT_CODE</literal> ... <literal>#endif</literal> around that
   section to instruct <filename>genbki.pl</filename> to copy that section
   to the <filename>pg_xxx_d.h</filename> header.
  </para>

  <para>
   A few of the catalogs are so fundamental that they can't even be created
   by the <acronym>BKI</acronym> <literal>create</literal> command that's
   used for most catalogs, because that command needs to write information
   into these catalogs to describe the new catalog.  These are
   called <firstterm>bootstrap</firstterm> catalogs, and defining one takes
   a lot of extra work: you have to manually prepare appropriate entries for
   them in the pre-loaded contents of <structname>pg_class</structname>
   and <structname>pg_type</structname>, and those entries will need to be
   updated for subsequent changes to the catalog's structure.
   (Bootstrap catalogs also need pre-loaded entries
   in <structname>pg_attribute</structname>, but
   fortunately <filename>genbki.pl</filename> handles that chore nowadays.)
   Avoid making new catalogs be bootstrap catalogs if at all possible.
  </para>
 </sect1>

 <sect1 id="system-catalog-initial-data">
  <title>System Catalog Initial Data</title>

  <para>
   Each catalog that has any manually-created initial data (some do not)
   has a corresponding <literal>.dat</literal> file that contains its
   initial data in an editable format.
  </para>

  <sect2 id="system-catalog-initial-data-format">
   <title>Data File Format</title>

   <para>
    Each <literal>.dat</literal> file contains Perl data structure literals
    that are simply eval'd to produce an in-memory data structure consisting
    of an array of hash references, one per catalog row.
    A slightly modified excerpt from <filename>pg_database.dat</filename>
    will demonstrate the key features:
   </para>

<programlisting>
[

# LC_COLLATE and LC_CTYPE will be replaced at initdb time with user choices
# that might contain non-word characters, so we must double-quote them.

{ oid =&gt; '1', oid_symbol =&gt; 'TemplateDbOid',
  descr =&gt; 'database\'s default template',
  datname =&gt; 'template1', datdba =&gt; 'PGUID', encoding =&gt; 'ENCODING',
  datcollate =&gt; '"LC_COLLATE"', datctype =&gt; '"LC_CTYPE"', datistemplate =&gt; 't',
  datallowconn =&gt; 't', datconnlimit =&gt; '-1', datlastsysoid =&gt; '0',
  datfrozenxid =&gt; '0', datminmxid =&gt; '1', dattablespace =&gt; '1663',
  datacl =&gt; '_null_' },

]
</programlisting>

   <para>
    Points to note:
   </para>

   <itemizedlist>

    <listitem>
     <para>
      The overall file layout is: open square bracket, one or more sets of
      curly braces each of which represents a catalog row, close square
      bracket.  Write a comma after each closing curly brace.
     </para>
    </listitem>

    <listitem>
     <para>
      Within each catalog row, write comma-separated
      <replaceable>key</replaceable> <literal>=&gt;</literal>
      <replaceable>value</replaceable> pairs.  The
      allowed <replaceable>key</replaceable>s are the names of the catalog's
      columns, plus the metadata keys <literal>oid</literal>,
      <literal>oid_symbol</literal>, and <literal>descr</literal>.
      (The use of <literal>oid</literal> and <literal>oid_symbol</literal>
      is described in <xref linkend="system-catalog-oid-assignment"/>
      below.  <literal>descr</literal> supplies a description string for
      the object, which will be inserted
      into <structname>pg_description</structname>
      or <structname>pg_shdescription</structname> as appropriate.)
      While the metadata keys are optional, the catalog's defined columns
      must all be provided, except when the catalog's <literal>.h</literal>
      file specifies a default value for the column.
     </para>
    </listitem>

    <listitem>
     <para>
      All values must be single-quoted.  Escape single quotes used within
      a value with a backslash.  (Backslashes meant as data need not be
      doubled, however; this follows Perl's rules for simple quoted
      literals.)
     </para>
    </listitem>

    <listitem>
     <para>
      Null values are represented by <literal>_null_</literal>.
     </para>
    </listitem>

    <listitem>
     <para>
      If a value is a macro to be expanded
      by <application>initdb</application>, it should also contain double
      quotes as shown above, unless we know that no special characters can
      appear within the string that will be substituted.
     </para>
    </listitem>

    <listitem>
     <para>
      Comments are preceded by <literal>#</literal>, and must be on their
      own lines.
     </para>
    </listitem>

    <listitem>
     <para>
      To aid readability, field values that are OIDs of other catalog
      entries can be represented by names rather than numeric OIDs.
      This is described in <xref linkend="system-catalog-oid-references"/>
      below.
     </para>
    </listitem>

    <listitem>
     <para>
      Since hashes are unordered data structures, field order and line
      layout aren't semantically significant.  However, to maintain a
      consistent appearance, we set a few rules that are applied by the
      formatting script <filename>reformat_dat_file.pl</filename>:

      <itemizedlist>

       <listitem>
        <para>
         Within each pair of curly braces, the metadata
         fields <literal>oid</literal>, <literal>oid_symbol</literal>,
         and <literal>descr</literal> (if present) come first, in that
         order, then the catalog's own fields appear in their defined order.
        </para>
       </listitem>

       <listitem>
        <para>
         Newlines are inserted between fields as needed to limit line length
         to 80 characters, if possible.  A newline is also inserted between
         the metadata fields and the regular fields.
        </para>
       </listitem>

       <listitem>
        <para>
         If the catalog's <literal>.h</literal> file specifies a default
         value for a column, and a data entry has that same
         value, <filename>reformat_dat_file.pl</filename> will omit it from
         the data file.  This keeps the data representation compact.
        </para>
       </listitem>

       <listitem>
        <para>
         <filename>reformat_dat_file.pl</filename> preserves blank lines
         and comment lines as-is.
        </para>
       </listitem>

      </itemizedlist>

      It's recommended to run <filename>reformat_dat_file.pl</filename>
      before submitting catalog data patches.  For convenience, you can
      simply change to <filename>src/include/catalog/</filename> and
      run <literal>make reformat-dat-files</literal>.
     </para>
    </listitem>

    <listitem>
     <para>
      If you want to add a new method of making the data representation
      smaller, you must implement it
      in <filename>reformat_dat_file.pl</filename> and also
      teach <function>Catalog::ParseData()</function> how to expand the
      data back into the full representation.
     </para>
    </listitem>

   </itemizedlist>
  </sect2>

  <sect2 id="system-catalog-oid-assignment">
   <title>OID Assignment</title>

   <para>
    A catalog row appearing in the initial data can be given a
    manually-assigned OID by writing an <literal>oid
    =&gt; <replaceable>nnnn</replaceable></literal> metadata field.
    Furthermore, if an OID is assigned, a C macro for that OID can be
    created by writing an <literal>oid_symbol
    =&gt; <replaceable>name</replaceable></literal> metadata field.
   </para>

   <para>
    Pre-loaded catalog rows must have preassigned OIDs if there are OID
    references to them in other pre-loaded rows.  A preassigned OID is
    also needed if the row's OID must be referenced from C code.
    If neither case applies, the <literal>oid</literal> metadata field can
    be omitted, in which case the bootstrap code assigns an OID
    automatically, or leaves it zero in a catalog that has no OIDs.
    In practice we usually preassign OIDs for all or none of the pre-loaded
    rows in a given catalog, even if only some of them are actually
    cross-referenced.
   </para>

   <para>
    Writing the actual numeric value of any OID in C code is considered
    very bad form; always use a macro, instead.  Direct references
    to <structname>pg_proc</structname> OIDs are common enough that there's
    a special mechanism to create the necessary macros automatically;
    see <filename>src/backend/utils/Gen_fmgrtab.pl</filename>.  Similarly
    &mdash; but, for historical reasons, not done the same way &mdash;
    there's an automatic method for creating macros
    for <structname>pg_type</structname>
    OIDs.  <literal>oid_symbol</literal> entries are therefore not
    necessary in those two catalogs.  Likewise, macros for
    the <structname>pg_class</structname> OIDs of system catalogs and
    indexes are set up automatically.  For all other system catalogs, you
    have to manually specify any macros you need
    via <literal>oid_symbol</literal> entries.
   </para>

   <para>
    To find an available OID for a new pre-loaded row, run the
    script <filename>src/include/catalog/unused_oids</filename>.
    It prints inclusive ranges of unused OIDs (e.g., the output
    line <quote>45-900</quote> means OIDs 45 through 900 have not been
    allocated yet).  Currently, OIDs 1-9999 are reserved for manual
    assignment; the <filename>unused_oids</filename> script simply looks
    through the catalog headers and <filename>.dat</filename> files
    to see which ones do not appear.  You can also use
    the <filename>duplicate_oids</filename> script to check for mistakes.
    (That script is run automatically at compile time, and will stop the
    build if a duplicate is found.)
   </para>

   <para>
    The OID counter starts at 10000 at the beginning of a bootstrap run.
    If a catalog row is in a table that requires OIDs, but no OID was
    preassigned by an <literal>oid</literal> field, then it will
    receive an OID of 10000 or above.
   </para>
  </sect2>

  <sect2 id="system-catalog-oid-references">
   <title>OID Reference Lookup</title>

   <para>
    Cross-references from one initial catalog row to another can be written
    by just writing the preassigned OID of the referenced row.  But
    that's error-prone and hard to understand, so for frequently-referenced
    catalogs, <filename>genbki.pl</filename> provides mechanisms to write
    symbolic references instead.  Currently this is possible for references
    to access methods, functions, operators, opclasses, opfamilies, and
    types.  The rules are as follows:
   </para>

   <itemizedlist>

    <listitem>
     <para>
      Use of symbolic references is enabled in a particular catalog column
      by attaching <literal>BKI_LOOKUP(<replaceable>lookuprule</replaceable>)</literal>
      to the column's definition, where <replaceable>lookuprule</replaceable>
      is <literal>pg_am</literal>, <literal>pg_proc</literal>,
      <literal>pg_operator</literal>, <literal>pg_opclass</literal>,
      <literal>pg_opfamily</literal>, or <literal>pg_type</literal>.
      <literal>BKI_LOOKUP</literal> can be attached to columns of
      type <type>Oid</type>, <type>regproc</type>, <type>oidvector</type>,
      or <type>Oid[]</type>; in the latter two cases it implies performing a
      lookup on each element of the array.
     </para>
    </listitem>

    <listitem>
     <para>
      In such a column, all entries must use the symbolic format except
      when writing <literal>0</literal> for InvalidOid.  (If the column is
      declared <type>regproc</type>, you can optionally
      write <literal>-</literal> instead of <literal>0</literal>.)
      <filename>genbki.pl</filename> will warn about unrecognized names.
     </para>
    </listitem>

    <listitem>
     <para>
      Access methods are just represented by their names, as are types.
      Type names must match the referenced <structname>pg_type</structname>
      entry's <structfield>typname</structfield>; you do not get to use any
      aliases such as <literal>integer</literal>
      for <literal>int4</literal>.
     </para>
    </listitem>

    <listitem>
     <para>
      A function can be represented by
      its <structfield>proname</structfield>, if that is unique among
      the <filename>pg_proc.dat</filename> entries (this works like regproc
      input).  Otherwise, write it
      as <replaceable>proname(argtypename,argtypename,...)</replaceable>,
      like regprocedure.  The argument type names must be spelled exactly as
      they are in the <filename>pg_proc.dat</filename> entry's
      <structfield>proargtypes</structfield> field.  Do not insert any
      spaces.
     </para>
    </listitem>

    <listitem>
     <para>
      Operators are represented
      by <replaceable>oprname(lefttype,righttype)</replaceable>,
      writing the type names exactly as they appear in
      the <filename>pg_operator.dat</filename>
      entry's <structfield>oprleft</structfield>
      and <structfield>oprright</structfield> fields.
      (Write <literal>0</literal> for the omitted operand of a unary
      operator.)
     </para>
    </listitem>

    <listitem>
     <para>
      The names of opclasses and opfamilies are only unique within an
      access method, so they are represented
      by <replaceable>access_method_name</replaceable><literal>/</literal><replaceable>object_name</replaceable>.
     </para>
    </listitem>

    <listitem>
     <para>
      In none of these cases is there any provision for
      schema-qualification; all objects created during bootstrap are
      expected to be in the pg_catalog schema.
     </para>
    </listitem>
   </itemizedlist>

   <para>
    <filename>genbki.pl</filename> resolves all symbolic references while it
    runs, and puts simple numeric OIDs into the emitted BKI file.  There is
    therefore no need for the bootstrap backend to deal with symbolic
    references.
   </para>
  </sect2>

  <sect2 id="system-catalog-recipes">
   <title>Recipes for Editing Data Files</title>

   <para>
    Here are some suggestions about the easiest ways to perform common tasks
    when updating catalog data files.
   </para>

   <formalpara>
    <title>Add a new column with a default to a catalog:</title>
    <para>
     Add the column to the header file with
     a <literal>BKI_DEFAULT(<replaceable>value</replaceable>)</literal>
     annotation.  The data file need only be adjusted by adding the field
     in existing rows where a non-default value is needed.
    </para>
   </formalpara>

   <formalpara>
    <title>Add a default value to an existing column that doesn't have
     one:</title>
    <para>
     Add a <literal>BKI_DEFAULT</literal> annotation to the header file,
     then run <literal>make reformat-dat-files</literal> to remove
     now-redundant field entries.
    </para>
   </formalpara>

   <formalpara>
    <title>Remove a column, whether it has a default or not:</title>
    <para>
     Remove the column from the header, then run <literal>make
     reformat-dat-files</literal> to remove now-useless field entries.
    </para>
   </formalpara>

   <formalpara>
    <title>Change or remove an existing default value:</title>
    <para>
     You cannot simply change the header file, since that will cause the
     current data to be interpreted incorrectly.  First run <literal>make
     expand-dat-files</literal> to rewrite the data files with all
     default values inserted explicitly, then change or remove
     the <literal>BKI_DEFAULT</literal> annotation, then run <literal>make
     reformat-dat-files</literal> to remove superfluous fields again.
    </para>
   </formalpara>

   <formalpara>
    <title>Ad-hoc bulk editing:</title>
    <para>
     <filename>reformat_dat_file.pl</filename> can be adapted to perform
     many kinds of bulk changes.  Look for its block comments showing where
     one-off code can be inserted.  In the following example, we are going
     to consolidate two boolean fields in <structname>pg_proc</structname>
     into a char field:

     <orderedlist>
      <listitem>
       <para>
        Add the new column, with a default,
        to <filename>pg_proc.h</filename>:
<programlisting>
+    /* see PROKIND_ categories below */
+    char        prokind BKI_DEFAULT(f);
</programlisting>
       </para>
      </listitem>

      <listitem>
       <para>
        Create a new script based on <filename>reformat_dat_file.pl</filename>
        to insert appropriate values on-the-fly:
<programlisting>
-           # At this point we have the full row in memory as a hash
-           # and can do any operations we want. As written, it only
-           # removes default values, but this script can be adapted to
-           # do one-off bulk-editing.
+           # One-off change to migrate to prokind
+           # Default has already been filled in by now, so change to other
+           # values as appropriate
+           if ($values{proisagg} eq 't')
+           {
+               $values{prokind} = 'a';
+           }
+           elsif ($values{proiswindow} eq 't')
+           {
+               $values{prokind} = 'w';
+           }
</programlisting>
       </para>
      </listitem>

      <listitem>
       <para>
        Run the new script:
<programlisting>
$ cd src/include/catalog
$ perl -I ../../backend/catalog  rewrite_dat_with_prokind.pl  pg_proc.dat
</programlisting>
        At this point <filename>pg_proc.dat</filename> has all three
        columns, <structfield>prokind</structfield>,
        <structfield>proisagg</structfield>,
        and <structfield>proiswindow</structfield>, though they will appear
        only in rows where they have non-default values.
       </para>
      </listitem>

      <listitem>
       <para>
        Remove the old columns from <filename>pg_proc.h</filename>:
<programlisting>
-    /* is it an aggregate? */
-    bool        proisagg BKI_DEFAULT(f);
-
-    /* is it a window function? */
-    bool        proiswindow BKI_DEFAULT(f);
</programlisting>
       </para>
      </listitem>

      <listitem>
       <para>
        Finally, run <literal>make reformat-dat-files</literal> to remove
        the useless old entries from <filename>pg_proc.dat</filename>.
       </para>
      </listitem>
     </orderedlist>

     For further examples of scripts used for bulk editing, see
     <filename>convert_oid2name.pl</filename>
     and <filename>remove_pg_type_oid_symbols.pl</filename> attached to this
     message:
     <ulink url="https://www.postgresql.org/message-id/CAJVSVGVX8gXnPm+Xa=DxR7kFYprcQ1tNcCT5D0O3ShfnM6jehA@mail.gmail.com"></ulink>
    </para>
   </formalpara>
  </sect2>
 </sect1>

 <sect1 id="bki-format">
  <title><acronym>BKI</acronym> File Format</title>

  <para>
   This section describes how the <productname>PostgreSQL</productname>
   backend interprets <acronym>BKI</acronym> files.  This description
   will be easier to understand if the <filename>postgres.bki</filename>
   file is at hand as an example.
  </para>

  <para>
   <acronym>BKI</acronym> input consists of a sequence of commands.  Commands are made up
   of a number of tokens, depending on the syntax of the command.
   Tokens are usually separated by whitespace, but need not be if
   there is no ambiguity.  There is no special command separator; the
   next token that syntactically cannot belong to the preceding
   command starts a new one.  (Usually you would put a new command on
   a new line, for clarity.)  Tokens can be certain key words, special
   characters (parentheses, commas, etc.), numbers, or double-quoted
   strings.  Everything is case sensitive.
  </para>

  <para>
   Lines starting with <literal>#</literal> are ignored.
  </para>

 </sect1>

 <sect1 id="bki-commands">
  <title><acronym>BKI</acronym> Commands</title>

  <variablelist>
   <varlistentry>
    <term>
     <literal>create</literal>
     <replaceable class="parameter">tablename</replaceable>
     <replaceable class="parameter">tableoid</replaceable>
     <optional><literal>bootstrap</literal></optional>
     <optional><literal>shared_relation</literal></optional>
     <optional><literal>without_oids</literal></optional>
     <optional><literal>rowtype_oid</literal> <replaceable>oid</replaceable></optional>
     (<replaceable class="parameter">name1</replaceable> =
     <replaceable class="parameter">type1</replaceable>
     <optional><literal>FORCE NOT NULL</literal> | <literal>FORCE NULL</literal> </optional> <optional>,
     <replaceable class="parameter">name2</replaceable> =
     <replaceable class="parameter">type2</replaceable>
     <optional><literal>FORCE NOT NULL</literal> | <literal>FORCE NULL</literal> </optional>,
     ...</optional>)
    </term>

    <listitem>
     <para>
      Create a table named <replaceable
      class="parameter">tablename</replaceable>, and having the OID
      <replaceable class="parameter">tableoid</replaceable>,
      with the columns given in parentheses.
     </para>

     <para>
      The following column types are supported directly by
      <filename>bootstrap.c</filename>: <type>bool</type>,
      <type>bytea</type>, <type>char</type> (1 byte),
      <type>name</type>, <type>int2</type>,
      <type>int4</type>, <type>regproc</type>, <type>regclass</type>,
      <type>regtype</type>, <type>text</type>,
      <type>oid</type>, <type>tid</type>, <type>xid</type>,
      <type>cid</type>, <type>int2vector</type>, <type>oidvector</type>,
      <type>_int4</type> (array), <type>_text</type> (array),
      <type>_oid</type> (array), <type>_char</type> (array),
      <type>_aclitem</type> (array).  Although it is possible to create
      tables containing columns of other types, this cannot be done until
      after <structname>pg_type</structname> has been created and filled with
      appropriate entries.  (That effectively means that only these
      column types can be used in bootstrap catalogs, but non-bootstrap
      catalogs can contain any built-in type.)
     </para>

     <para>
      When <literal>bootstrap</literal> is specified,
      the table will only be created on disk; nothing is entered into
      <structname>pg_class</structname>,
      <structname>pg_attribute</structname>, etc, for it.  Thus the
      table will not be accessible by ordinary SQL operations until
      such entries are made the hard way (with <literal>insert</literal>
      commands).  This option is used for creating
      <structname>pg_class</structname> etc themselves.
     </para>

     <para>
      The table is created as shared if <literal>shared_relation</literal> is
      specified.
      It will have OIDs unless <literal>without_oids</literal> is specified.
      The table's row type OID (<structname>pg_type</structname> OID) can optionally
      be specified via the <literal>rowtype_oid</literal> clause; if not specified,
      an OID is automatically generated for it.  (The <literal>rowtype_oid</literal>
      clause is useless if <literal>bootstrap</literal> is specified, but it can be
      provided anyway for documentation.)
     </para>
    </listitem>
   </varlistentry>

   <varlistentry>
    <term>
     <literal>open</literal> <replaceable class="parameter">tablename</replaceable>
    </term>

    <listitem>
     <para>
      Open the table named
      <replaceable class="parameter">tablename</replaceable>
      for insertion of data.  Any currently open table is closed.
     </para>
    </listitem>
   </varlistentry>

   <varlistentry>
    <term>
     <literal>close</literal> <optional><replaceable class="parameter">tablename</replaceable></optional>
    </term>

    <listitem>
     <para>
      Close the open table.  The name of the table can be given as a
      cross-check, but this is not required.
     </para>
    </listitem>
   </varlistentry>

   <varlistentry>
    <term>
     <literal>insert</literal> <optional><literal>OID =</literal> <replaceable class="parameter">oid_value</replaceable></optional> <literal>(</literal> <replaceable class="parameter">value1</replaceable> <replaceable class="parameter">value2</replaceable> ... <literal>)</literal>
    </term>

    <listitem>
     <para>
      Insert a new row into the open table using <replaceable
      class="parameter">value1</replaceable>, <replaceable
      class="parameter">value2</replaceable>, etc., for its column
      values and <replaceable
      class="parameter">oid_value</replaceable> for its OID.  If
      <replaceable class="parameter">oid_value</replaceable> is zero
      (0) or the clause is omitted, and the table has OIDs, then the
      next available OID is assigned.
     </para>

     <para>
      NULL values can be specified using the special key word
      <literal>_null_</literal>.  Values containing spaces must be
      double quoted.
     </para>
    </listitem>
   </varlistentry>

   <varlistentry>
    <term>
     <literal>declare</literal> <optional><literal>unique</literal></optional>
     <literal>index</literal> <replaceable class="parameter">indexname</replaceable>
     <replaceable class="parameter">indexoid</replaceable>
     <literal>on</literal> <replaceable class="parameter">tablename</replaceable>
     <literal>using</literal> <replaceable class="parameter">amname</replaceable>
     <literal>(</literal> <replaceable class="parameter">opclass1</replaceable>
     <replaceable class="parameter">name1</replaceable>
     <optional>, ...</optional> <literal>)</literal>
    </term>

    <listitem>
     <para>
      Create an index named <replaceable
      class="parameter">indexname</replaceable>, having OID
      <replaceable class="parameter">indexoid</replaceable>,
      on the table named
      <replaceable class="parameter">tablename</replaceable>, using the
      <replaceable class="parameter">amname</replaceable> access
      method.  The fields to index are called <replaceable
      class="parameter">name1</replaceable>, <replaceable
      class="parameter">name2</replaceable> etc., and the operator
      classes to use are <replaceable
      class="parameter">opclass1</replaceable>, <replaceable
      class="parameter">opclass2</replaceable> etc., respectively.
      The index file is created and appropriate catalog entries are
      made for it, but the index contents are not initialized by this command.
     </para>
    </listitem>
   </varlistentry>

   <varlistentry>
    <term>
     <literal>declare toast</literal>
     <replaceable class="parameter">toasttableoid</replaceable>
     <replaceable class="parameter">toastindexoid</replaceable>
     <literal>on</literal> <replaceable class="parameter">tablename</replaceable>
    </term>

    <listitem>
     <para>
      Create a TOAST table for the table named
      <replaceable class="parameter">tablename</replaceable>.
      The TOAST table is assigned OID
      <replaceable class="parameter">toasttableoid</replaceable>
      and its index is assigned OID
      <replaceable class="parameter">toastindexoid</replaceable>.
      As with <literal>declare index</literal>, filling of the index
      is postponed.
     </para>
    </listitem>
   </varlistentry>

   <varlistentry>
    <term><literal>build indices</literal></term>

    <listitem>
     <para>
      Fill in the indices that have previously been declared.
     </para>
    </listitem>
   </varlistentry>
  </variablelist>

 </sect1>

 <sect1 id="bki-structure">
  <title>Structure of the Bootstrap <acronym>BKI</acronym> File</title>

  <para>
   The <literal>open</literal> command cannot be used until the tables it uses
   exist and have entries for the table that is to be opened.
   (These minimum tables are <structname>pg_class</structname>,
   <structname>pg_attribute</structname>, <structname>pg_proc</structname>, and
   <structname>pg_type</structname>.)   To allow those tables themselves to be filled,
   <literal>create</literal> with the <literal>bootstrap</literal> option implicitly opens
   the created table for data insertion.
  </para>

  <para>
   Also, the <literal>declare index</literal> and <literal>declare toast</literal>
   commands cannot be used until the system catalogs they need have been
   created and filled in.
  </para>

  <para>
   Thus, the structure of the <filename>postgres.bki</filename> file has to
   be:
   <orderedlist>
    <listitem>
     <para>
      <literal>create bootstrap</literal> one of the critical tables
     </para>
    </listitem>
    <listitem>
     <para>
      <literal>insert</literal> data describing at least the critical tables
     </para>
    </listitem>
    <listitem>
     <para>
      <literal>close</literal>
     </para>
    </listitem>
    <listitem>
     <para>
      Repeat for the other critical tables.
     </para>
    </listitem>
    <listitem>
     <para>
      <literal>create</literal> (without <literal>bootstrap</literal>) a noncritical table
     </para>
    </listitem>
    <listitem>
     <para>
      <literal>open</literal>
     </para>
    </listitem>
    <listitem>
     <para>
      <literal>insert</literal> desired data
     </para>
    </listitem>
    <listitem>
     <para>
      <literal>close</literal>
     </para>
    </listitem>
    <listitem>
     <para>
      Repeat for the other noncritical tables.
     </para>
    </listitem>
    <listitem>
     <para>
      Define indexes and toast tables.
     </para>
    </listitem>
    <listitem>
     <para>
      <literal>build indices</literal>
     </para>
    </listitem>
   </orderedlist>
  </para>

  <para>
   There are doubtless other, undocumented ordering dependencies.
  </para>
 </sect1>

 <sect1 id="bki-example">
  <title>BKI Example</title>

  <para>
   The following sequence of commands will create the
   table <literal>test_table</literal> with OID 420, having two columns
   <literal>cola</literal> and <literal>colb</literal> of type
   <type>int4</type> and <type>text</type>, respectively, and insert
   two rows into the table:
<programlisting>
create test_table 420 (cola = int4, colb = text)
open test_table
insert OID=421 ( 1 "value1" )
insert OID=422 ( 2 _null_ )
close test_table
</programlisting>
  </para>
 </sect1>
</chapter>