page.sgml 10 KB
Newer Older
1
<!--
2
$PostgreSQL: pgsql/doc/src/sgml/page.sgml,v 1.17 2003/12/14 00:10:32 neilc Exp $
3 4
-->

5 6 7 8 9 10
<chapter id="page">

<title>Page Files</title>

<abstract>
<para>
11
A description of the database file page format.
12 13 14 15
</para>
</abstract>

<para>
16 17 18 19 20 21
This section provides an overview of the page format used by
<productname>PostgreSQL</productname> tables and indexes.  (Index
access methods need not use this page format.  At present, all index
methods do use this basic format, but the data kept on index metapages
usually doesn't follow the item layout rules exactly.)  TOAST tables
and sequences are formatted just like a regular table.
22
</para>
23 24 25 26 27 28

<para>
In the following explanation, a
<firstterm>byte</firstterm>
is assumed to contain 8 bits.  In addition, the term
<firstterm>item</firstterm>
29
refers to an individual data value that is stored on a page.  In a table,
30
an item is a row; in an index, an item is an index entry.
31
</para>
32 33

<para>
34

35
<xref linkend="page-table"> shows the basic layout of a page.
36 37
There are five parts to each page.

38
</para>
39

40
<table tocentry="1" id="page-table">
41 42
<title>Sample Page Layout</title>
<titleabbrev>Page Layout</titleabbrev>
43
<tgroup cols="2">
44 45 46
<thead>
<row>
<entry>
47
Item
48
</entry>
49
<entry>Description</entry>
50 51
</row>
</thead>
52

53
<tbody>
54

55
<row>
56
 <entry>PageHeaderData</entry>
57 58
 <entry>20 bytes long. Contains general information about the page, including
free space pointers.</entry>
59 60
</row>

61
<row>
62 63
<entry>ItemPointerData</entry>
<entry>Array of (offset,length) pairs pointing to the actual items.</entry>
64 65
</row>

66
<row>
67
<entry>Free space</entry>
68
<entry>The unallocated space. All new rows are allocated from here, generally from the end.</entry>
69 70
</row>

71
<row>
72 73
<entry>Items</entry>
<entry>The actual items themselves.</entry>
74 75
</row>

76
<row>
77
<entry>Special Space</entry>
78 79
<entry>Index access method specific data. Different methods store different
data. Empty in ordinary tables.</entry>
80 81
</row>

82 83 84
</tbody>
</tgroup>
</table>
85

86
 <para>
87

88
  The first 20 bytes of each page consists of a page header
89
  (PageHeaderData). Its format is detailed in <xref
90 91
  linkend="pageheaderdata-table">. The first two fields deal with WAL
  related stuff. This is followed by three 2-byte integer fields
92 93 94
  (<structfield>pd_lower</structfield>, <structfield>pd_upper</structfield>,
  and <structfield>pd_special</structfield>). These represent byte offsets to
  the start
95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115
  of unallocated space, to the end of unallocated space, and to the start of
  the special space. 
  
 </para>
 
 <table tocentry="1" id="pageheaderdata-table">
 <title>PageHeaderData Layout</title>
 <titleabbrev>PageHeaderData Layout</titleabbrev>
 <tgroup cols="4">   
 <thead>
  <row> 
   <entry>Field</entry>
   <entry>Type</entry>
   <entry>Length</entry>
   <entry>Description</entry>
  </row>
 </thead>
 <tbody>
  <row>
   <entry>pd_lsn</entry>
   <entry>XLogRecPtr</entry>
116
   <entry>8 bytes</entry>
117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143
   <entry>LSN: next byte after last byte of xlog</entry>
  </row>
  <row>
   <entry>pd_sui</entry>
   <entry>StartUpID</entry>
   <entry>4 bytes</entry>
   <entry>SUI of last changes (currently it's used by heap AM only)</entry>
  </row>
  <row>
   <entry>pd_lower</entry>
   <entry>LocationIndex</entry>
   <entry>2 bytes</entry>
   <entry>Offset to start of free space.</entry>
  </row>
  <row>
   <entry>pd_upper</entry>
   <entry>LocationIndex</entry>
   <entry>2 bytes</entry>
   <entry>Offset to end of free space.</entry>
  </row>
  <row>
   <entry>pd_special</entry>
   <entry>LocationIndex</entry>
   <entry>2 bytes</entry>
   <entry>Offset to start of special space.</entry>
  </row>
  <row>
144 145
   <entry>pd_pagesize_version</entry>
   <entry>uint16</entry>
146
   <entry>2 bytes</entry>
147
   <entry>Page size and layout version number information.</entry>
148 149 150 151
  </row>
 </tbody>
 </tgroup>
 </table>
152

153
 <para>
154 155
  All the details may be found in
  <filename>src/include/storage/bufpage.h</filename>.
156 157
 </para>

158 159 160
 <para>  
  Special space is a region at the end of the page that is allocated at page
  initialization time and contains information specific to an access method. 
161 162 163 164 165
  The last 2 bytes of the page header,
  <structfield>pd_pagesize_version</structfield>, store both the page size
  and a version indicator.  Beginning with
  <productname>PostgreSQL</productname> 7.3 the version number is 1; prior
  releases used version number 0.  (The basic page layout and header format
166
  has not changed, but the layout of heap row headers has.)  The page size
167 168
  is basically only present as a cross-check; there is no support for having
  more than one page size in an installation.
169
 </para>
170

171
 <para>
172

173
  Following the page header are item identifiers
174 175
  (<type>ItemIdData</type>), each requiring four bytes.
  An item identifier contains a byte-offset to
176 177
  the start of an item, its length in bytes, and a set of attribute bits
  which affect its interpretation.
178 179 180 181 182 183 184 185 186 187 188 189
  New item identifiers are allocated
  as needed from the beginning of the unallocated space.
  The number of item identifiers present can be determined by looking at
  <structfield>pd_lower</>, which is increased to allocate a new identifier.
  Because an item
  identifier is never moved until it is freed, its index may be used on a
  long-term basis to reference an item, even when the item itself is moved
  around on the page to compact free space.  In fact, every pointer to an
  item (<type>ItemPointer</type>, also known as
  <type>CTID</type>) created by
  <productname>PostgreSQL</productname> consists of a page number and the
  index of an item identifier.
190

191
 </para>
192

193 194 195 196
 <para>
 
  The items themselves are stored in space allocated backwards from the end
  of unallocated space.  The exact structure varies depending on what the
197 198
  table is to contain. Tables and sequences both use a structure named
  <type>HeapTupleHeaderData</type>, described below.
199 200 201 202 203

 </para>
 
 <para>
 
204 205 206 207
  The final section is the <quote>special section</quote> which may
  contain anything the access method wishes to store. Ordinary tables
  do not use this at all (indicated by setting
  <structfield>pd_special</> to equal the pagesize).
208 209 210 211 212
  
 </para>
 
 <para>

213
  All table rows are structured the same way. There is a fixed-size
214 215 216 217
  header (occupying 23 bytes on most machines), followed by an optional null
  bitmap, an optional object ID field, and the user data. The header is
  detailed
  in <xref linkend="heaptupleheaderdata-table">.  The actual user data
218
  (columns of the row) begins at the offset indicated by
219 220 221 222 223 224 225 226 227 228 229 230 231 232 233
  <structfield>t_hoff</>, which must always be a multiple of the MAXALIGN
  distance for the platform.
  The null bitmap is
  only present if the <firstterm>HEAP_HASNULL</firstterm> bit is set in
  <structfield>t_infomask</structfield>. If it is present it begins just after
  the fixed header and occupies enough bytes to have one bit per data column
  (that is, <structfield>t_natts</> bits altogether). In this list of bits, a
  1 bit indicates not-null, a 0 bit is a null.  When the bitmap is not
  present, all columns are assumed not-null.
  The object ID is only present if the <firstterm>HEAP_HASOID</firstterm> bit
  is set in <structfield>t_infomask</structfield>.  If present, it appears just
  before the <structfield>t_hoff</> boundary.  Any padding needed to make
  <structfield>t_hoff</> a MAXALIGN multiple will appear between the null
  bitmap and the object ID.  (This in turn ensures that the object ID is
  suitably aligned.)
234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250
  
 </para>
 
 <table tocentry="1" id="heaptupleheaderdata-table">
 <title>HeapTupleHeaderData Layout</title>
 <titleabbrev>HeapTupleHeaderData Layout</titleabbrev>
 <tgroup cols="4">   
 <thead>
  <row> 
   <entry>Field</entry>
   <entry>Type</entry>
   <entry>Length</entry>
   <entry>Description</entry>
  </row>
 </thead>
 <tbody>
  <row>
251 252
   <entry>t_xmin</entry>
   <entry>TransactionId</entry>
253
   <entry>4 bytes</entry>
254
   <entry>insert XID stamp</entry>
255 256 257 258 259
  </row>
  <row>
   <entry>t_cmin</entry>
   <entry>CommandId</entry>
   <entry>4 bytes</entry>
260
   <entry>insert CID stamp (overlays with t_xmax)</entry>
261 262
  </row>
  <row>
263 264
   <entry>t_xmax</entry>
   <entry>TransactionId</entry>
265
   <entry>4 bytes</entry>
266
   <entry>delete XID stamp</entry>
267 268
  </row>
  <row>
269 270
   <entry>t_cmax</entry>
   <entry>CommandId</entry>
271
   <entry>4 bytes</entry>
272
   <entry>delete CID stamp (overlays with t_xvac)</entry>
273 274
  </row>
  <row>
275
   <entry>t_xvac</entry>
276 277
   <entry>TransactionId</entry>
   <entry>4 bytes</entry>
278
   <entry>XID for VACUUM operation moving row version</entry>
279 280 281 282 283
  </row>
  <row>
   <entry>t_ctid</entry>
   <entry>ItemPointerData</entry>
   <entry>6 bytes</entry>
284
   <entry>current TID of this or newer row version</entry>
285 286 287 288 289 290 291 292 293 294 295
  </row>
  <row>
   <entry>t_natts</entry>
   <entry>int16</entry>
   <entry>2 bytes</entry>
   <entry>number of attributes</entry>
  </row>
  <row>
   <entry>t_infomask</entry>
   <entry>uint16</entry>
   <entry>2 bytes</entry>
296
   <entry>various flags</entry>
297 298 299 300 301
  </row>
  <row>
   <entry>t_hoff</entry>
   <entry>uint8</entry>
   <entry>1 byte</entry>
302
   <entry>offset to user data</entry>
303 304 305 306 307 308
  </row>
 </tbody>
 </tgroup>
 </table>

 <para>
309 310
   All the details may be found in
   <filename>src/include/access/htup.h</filename>.
311 312 313 314 315 316
 </para>

 <para>
 
  Interpreting the actual data can only be done with information obtained
  from other tables, mostly <firstterm>pg_attribute</firstterm>. The
317 318
  particular fields are <structfield>attlen</structfield> and
  <structfield>attalign</structfield>. There is no way to directly get a
319 320 321 322 323 324 325
  particular attribute, except when there are only fixed width fields and no
  NULLs. All this trickery is wrapped up in the functions
  <firstterm>heap_getattr</firstterm>, <firstterm>fastgetattr</firstterm>
  and <firstterm>heap_getsysattr</firstterm>.
  
 </para>
 <para>
326

327 328 329 330 331
  To read the data you need to examine each attribute in turn. First check
  whether the field is NULL according to the null bitmap. If it is, go to
  the next. Then make sure you have the right alignment.  If the field is a
  fixed width field, then all the bytes are simply placed. If it's a
  variable length field (attlen == -1) then it's a bit more complicated,
332
  using the variable length structure <type>varattrib</type>.
333 334 335 336
  Depending on the flags, the data may be either inline, compressed or in
  another table (TOAST).
  
 </para>
337
</chapter>