monitoring.sgml 61.1 KB
Newer Older
1
<!-- $PostgreSQL: pgsql/doc/src/sgml/monitoring.sgml,v 1.65 2009/03/23 01:52:38 tgl Exp $ -->
2 3 4 5

<chapter id="monitoring">
 <title>Monitoring Database Activity</title>

Peter Eisentraut's avatar
Peter Eisentraut committed
6 7 8 9 10 11 12 13 14 15
 <indexterm zone="monitoring">
  <primary>monitoring</primary>
  <secondary>database activity</secondary>
 </indexterm>

 <indexterm zone="monitoring">
  <primary>database activity</primary>
  <secondary>monitoring</secondary>
 </indexterm>

16
 <para>
17
  A database administrator frequently wonders, <quote>What is the system
18 19 20 21 22 23 24
  doing right now?</quote>
  This chapter discusses how to find that out.
 </para>

  <para>
   Several tools are available for monitoring database activity and
   analyzing performance.  Most of this chapter is devoted to describing
25
   <productname>PostgreSQL</productname>'s statistics collector,
26
   but one should not neglect regular Unix monitoring programs such as
27 28
   <command>ps</>, <command>top</>, <command>iostat</>, and <command>vmstat</>.
   Also, once one has identified a
29
   poorly-performing query, further investigation might be needed using
30 31
   <productname>PostgreSQL</productname>'s <xref linkend="sql-explain"
   endterm="sql-explain-title"> command.
32
   <xref linkend="using-explain"> discusses <command>EXPLAIN</>
33 34 35 36 37 38 39
   and other methods for understanding the behavior of an individual
   query.
  </para>

 <sect1 id="monitoring-ps">
  <title>Standard Unix Tools</Title>

40 41 42 43 44
  <indexterm zone="monitoring-ps">
   <primary>ps</primary>
   <secondary>to monitor activity</secondary>
  </indexterm>

45 46 47 48 49 50 51
  <para>
   On most platforms, <productname>PostgreSQL</productname> modifies its
   command title as reported by <command>ps</>, so that individual server
   processes can readily be identified.  A sample display is

<screen>
$ ps auxww | grep ^postgres
52
postgres   960  0.0  1.1  6104 1480 pts/1    SN   13:17   0:00 postgres -i
53
postgres   963  0.0  1.1  7084 1472 pts/1    SN   13:17   0:00 postgres: writer process
54
postgres   965  0.0  1.1  6152 1512 pts/1    SN   13:17   0:00 postgres: stats collector process   
Tom Lane's avatar
Typo.  
Tom Lane committed
55
postgres   998  0.0  2.3  6532 2992 pts/1    SN   13:18   0:00 postgres: tgl runbug 127.0.0.1 idle
56 57 58 59 60 61 62
postgres  1003  0.0  2.4  6532 3128 pts/1    SN   13:19   0:00 postgres: tgl regression [local] SELECT waiting
postgres  1016  0.1  2.4  6532 3080 pts/1    SN   13:19   0:00 postgres: tgl regression [local] idle in transaction
</screen>

   (The appropriate invocation of <command>ps</> varies across different
   platforms, as do the details of what is shown.  This example is from a
   recent Linux system.)  The first process listed here is the
63
   master server process.  The command arguments
64
   shown for it are the same ones given when it was launched.  The next two
65 66 67
   processes are background worker processes automatically launched by the
   master process.  (The <quote>stats collector</> process will not be present
   if you have set
68 69 70 71 72
   the system not to start the statistics collector.)  Each of the remaining
   processes is a server process handling one client connection.  Each such
   process sets its command line display in the form

<screen>
Tom Lane's avatar
Typo.  
Tom Lane committed
73
postgres: <replaceable>user</> <replaceable>database</> <replaceable>host</> <replaceable>activity</>
74 75 76 77
</screen>

  The user, database, and connection source host items remain the same for
  the life of the client connection, but the activity indicator changes.
78
  The activity can be <literal>idle</> (i.e., waiting for a client command),
79
  <literal>idle in transaction</> (waiting for client inside a <command>BEGIN</> block),
80
  or a command type name such as <literal>SELECT</>.  Also,
81
  <literal>waiting</> is attached if the server process is presently waiting
82 83 84 85
  on a lock held by another server process.  In the above example we can infer
  that process 1003 is waiting for process 1016 to complete its transaction and
  thereby release some lock or other.
  </para>
86

87 88 89 90 91 92 93
  <para>
   If you have turned off <xref linkend="guc-update-process-title"> then the
   activity indicator is not updated; the process title is set only once
   when a new process is launched.  On some platforms this saves a useful
   amount of per-command overhead, on others it's insignificant.
  </para>

94 95 96 97
  <tip>
  <para>
  <productname>Solaris</productname> requires special handling. You must
  use <command>/usr/ucb/ps</command>, rather than
98
  <command>/bin/ps</command>. You also must use two <option>w</option>
99
  flags, not just one. In addition, your original invocation of the
100
  <command>postgres</command> command must have a shorter
101
  <command>ps</command> status display than that provided by each
102
  server process.  If you fail to do all three things, the <command>ps</>
103
  output for each server process will be the original <command>postgres</>
Bruce Momjian's avatar
Bruce Momjian committed
104
  command line.
105 106
  </para>
  </tip>
107 108 109
 </sect1>

 <sect1 id="monitoring-stats">
110
  <title>The Statistics Collector</Title>
111

112 113 114 115
  <indexterm zone="monitoring-stats">
   <primary>statistics</primary>
  </indexterm>

116 117 118 119
  <para>
   <productname>PostgreSQL</productname>'s <firstterm>statistics collector</>
   is a subsystem that supports collection and reporting of information about
   server activity.  Presently, the collector can count accesses to tables
120 121
   and indexes in both disk-block and individual-row terms.  It also tracks
   total numbers of rows in each table, and the last vacuum and analyze times
122 123
   for each table.  It can also count calls to user-defined functions and
   the total time spent in each one.
124 125 126 127 128
  </para>

  <para>
   <productname>PostgreSQL</productname> also supports determining the exact
   command currently being executed by other server processes.  This is an
129
   independent facility that does not depend on the collector process.
130 131 132 133 134 135 136 137
  </para>

 <sect2 id="monitoring-stats-setup">
  <title>Statistics Collection Configuration</Title>

  <para>
   Since collection of statistics adds some overhead to query execution,
   the system can be configured to collect or not collect information.
138 139 140
   This is controlled by configuration parameters that are normally set in
   <filename>postgresql.conf</>.  (See <xref linkend="runtime-config"> for
   details about setting configuration parameters.)
141 142 143
  </para>

  <para>
144
   The parameter <xref linkend="guc-track-counts"> controls whether
145 146 147 148 149 150
   statistics are collected about table and index accesses.
  </para>

  <para>
   The parameter <xref linkend="guc-track-functions"> enables tracking of
   usage of user-defined functions.
151 152 153
  </para>

  <para>
154
   The parameter <xref linkend="guc-track-activities"> enables monitoring
155
   of the current command being executed by any server process.
156
  </para>
157

158 159 160 161 162
  <para>
   Normally these parameters are set in <filename>postgresql.conf</> so
   that they apply to all server processes, but it is possible to turn
   them on or off in individual sessions using the <xref
   linkend="sql-set" endterm="sql-set-title"> command. (To prevent
163 164 165
   ordinary users from hiding their activity from the administrator,
   only superusers are allowed to change these parameters with
   <command>SET</>.)
166
  </para>
167 168 169 170 171 172 173

  <para>
   The statistics collector communicates with the backends needing 
   information (including autovacuum) through temporary files.
   These files are stored in the <filename>pg_stat_tmp</filename> subdirectory.
   When the postmaster shuts down, a permanent copy of the statistics
   data is stored in the <filename>global</filename> subdirectory. For increased
174 175
   performance, the parameter <xref linkend="guc-stats-temp-directory"> can
   be pointed at a RAM based filesystem, decreasing physical I/O requirements.
176 177
  </para>

178 179 180 181 182 183
 </sect2>

 <sect2 id="monitoring-stats-views">
  <title>Viewing Collected Statistics</Title>

  <para>
184 185 186
   Several predefined views, listed in <xref
   linkend="monitoring-stats-views-table">, are available to show the results
   of statistics collection.  Alternatively, one can
187
   build custom views using the underlying statistics functions.
188 189 190 191 192
  </para>

  <para>
   When using the statistics to monitor current activity, it is important
   to realize that the information does not update instantaneously.
193
   Each individual server process transmits new statistical counts to
194
   the collector just before going idle; so a query or transaction still in
195
   progress does not affect the displayed totals.  Also, the collector itself
196 197
   emits a new report at most once per <varname>PGSTAT_STAT_INTERVAL</varname>
   milliseconds (500 unless altered while building the server).  So the
198
   displayed information lags behind actual activity.  However, current-query
199
   information collected by <varname>track_activities</varname> is
200
   always up-to-date.
201 202 203 204
  </para>

  <para>
   Another important point is that when a server process is asked to display
205
   any of these statistics, it first fetches the most recent report emitted by
206
   the collector process and then continues to use this snapshot for all
207 208
   statistical views and functions until the end of its current transaction.
   So the statistics will appear not to change as long as you continue the
209 210 211 212
   current transaction.  Similarly, information about the current queries of
   all processes is collected when any such information is first requested
   within a transaction, and the same information will be displayed throughout
   the transaction.
213 214 215 216
   This is a feature, not a bug, because it allows you to perform several
   queries on the statistics and correlate the results without worrying that
   the numbers are changing underneath you.  But if you want to see new
   results with each query, be sure to do the queries outside any transaction
217 218 219 220
   block.  Alternatively, you can invoke
   <function>pg_stat_clear_snapshot</function>(), which will discard the
   current transaction's statistics snapshot (if any).  The next use of
   statistical information will cause a new snapshot to be fetched.
221 222
  </para>

223
  <table id="monitoring-stats-views-table">
224 225 226 227 228 229 230 231 232 233 234 235 236
   <title>Standard Statistics Views</title>

   <tgroup cols="2">
    <thead>
     <row>
      <entry>View Name</entry>
      <entry>Description</entry>
     </row>
    </thead>

    <tbody>
     <row>
      <entry><structname>pg_stat_activity</></entry>
237 238 239 240 241 242
      <entry>One row per server process, showing database OID, database
      name, process <acronym>ID</>, user OID, user name, current query,
      query's waiting status, time at which the current transaction and
      current query began execution, time at which the process was
      started, and client's address and port number.  The columns that
      report data on the current query are available unless the parameter
243
      <varname>track_activities</varname> has been turned off.
244 245 246
      Furthermore, these columns are only visible if the user examining
      the view is a superuser or the same as the user owning the process
      being reported on.
247
     </entry>
248 249
     </row>

250 251 252 253
     <row>
      <entry><structname>pg_stat_bgwriter</></entry>
      <entry>One row only, showing cluster-wide statistics from the
      background writer: number of scheduled checkpoints, requested
254
      checkpoints, buffers written by checkpoints and cleaning scans,
Peter Eisentraut's avatar
Peter Eisentraut committed
255
      and the number of times the background writer stopped a cleaning scan
256 257 258 259
      because it had written too many buffers.  Also includes
      statistics about the shared buffer pool, including buffers written
      by backends (that is, not by the background writer) and total buffers
      allocated.
260 261 262
     </entry>
     </row>

263 264
     <row>
      <entry><structname>pg_stat_database</></entry>
265 266 267
      <entry>One row per database, showing database OID, database name,
      number of active server processes connected to that database,
      number of transactions committed and rolled back in that database,
268 269
      total disk blocks read, total buffer hits (i.e., block
      read requests avoided by finding the block already in buffer cache),
270
      number of rows returned, fetched, inserted, updated and deleted.
271 272 273 274 275
     </entry>
     </row>

     <row>
      <entry><structname>pg_stat_all_tables</></entry>
276
      <entry>For each table in the current database (including TOAST tables),
277
      the table OID, schema and table name, number of sequential
278 279 280
      scans initiated, number of live rows fetched by sequential
      scans, number of index scans initiated (over all indexes
      belonging to the table), number of live rows fetched by index
281
      scans, numbers of row insertions, updates, and deletions,
282 283
      number of row updates that were HOT (i.e., no separate index update),
      numbers of live and dead rows,
284 285 286 287 288
      the last time the table was vacuumed manually,
      the last time it was vacuumed by the autovacuum daemon,
      the last time it was analyzed manually,
      and the last time it was analyzed by the autovacuum daemon.
      </entry>
289 290 291 292
     </row>

     <row>
      <entry><structname>pg_stat_sys_tables</></entry>
293 294
      <entry>Same as <structname>pg_stat_all_tables</>, except that only
      system tables are shown.</entry>
295 296 297 298
     </row>

     <row>
      <entry><structname>pg_stat_user_tables</></entry>
299 300
      <entry>Same as <structname>pg_stat_all_tables</>, except that only user
      tables are shown.</entry>
301 302 303 304
     </row>

     <row>
      <entry><structname>pg_stat_all_indexes</></entry>
305 306 307 308 309
      <entry>For each index in the current database,
      the table and index OID, schema, table and index name,
      number of index scans initiated on that index, number of
      index entries returned by index scans, and number of live table rows
      fetched by simple index scans using that index.
310 311 312 313 314
      </entry>
     </row>

     <row>
      <entry><structname>pg_stat_sys_indexes</></entry>
315 316
      <entry>Same as <structname>pg_stat_all_indexes</>, except that only
      indexes on system tables are shown.</entry>
317 318 319 320
     </row>

     <row>
      <entry><structname>pg_stat_user_indexes</></entry>
321 322
      <entry>Same as <structname>pg_stat_all_indexes</>, except that only
      indexes on user tables are shown.</entry>
323 324 325 326
     </row>

     <row>
      <entry><structname>pg_statio_all_tables</></entry>
327 328 329 330 331 332
      <entry>For each table in the current database (including TOAST tables),
      the table OID, schema and table name, number of disk
      blocks read from that table, number of buffer hits, numbers of
      disk blocks read and buffer hits in all indexes of that table,
      numbers of disk blocks read and buffer hits from that table's
      auxiliary TOAST table (if any), and numbers of disk blocks read
333 334 335 336 337 338
      and buffer hits for the TOAST table's index.
      </entry>
     </row>

     <row>
      <entry><structname>pg_statio_sys_tables</></entry>
339 340
      <entry>Same as <structname>pg_statio_all_tables</>, except that only
      system tables are shown.</entry>
341 342 343 344
     </row>

     <row>
      <entry><structname>pg_statio_user_tables</></entry>
345 346
      <entry>Same as <structname>pg_statio_all_tables</>, except that only
      user tables are shown.</entry>
347 348 349 350
     </row>

     <row>
      <entry><structname>pg_statio_all_indexes</></entry>
351 352 353
      <entry>For each index in the current database,
      the table and index OID, schema, table and index name,
      numbers of disk blocks read and buffer hits in that index.
354 355 356 357 358
      </entry>
     </row>

     <row>
      <entry><structname>pg_statio_sys_indexes</></entry>
359 360
      <entry>Same as <structname>pg_statio_all_indexes</>, except that only
      indexes on system tables are shown.</entry>
361 362 363 364
     </row>

     <row>
      <entry><structname>pg_statio_user_indexes</></entry>
365 366
      <entry>Same as <structname>pg_statio_all_indexes</>, except that only
      indexes on user tables are shown.</entry>
367 368 369 370
     </row>

     <row>
      <entry><structname>pg_statio_all_sequences</></entry>
371 372 373
      <entry>For each sequence object in the current database,
      the sequence OID, schema and sequence name,
      numbers of disk blocks read and buffer hits in that sequence.
374 375 376 377 378
      </entry>
     </row>

     <row>
      <entry><structname>pg_statio_sys_sequences</></entry>
379 380
      <entry>Same as <structname>pg_statio_all_sequences</>, except that only
      system sequences are shown.  (Presently, no system sequences are defined,
381 382 383 384 385
      so this view is always empty.)</entry>
     </row>

     <row>
      <entry><structname>pg_statio_user_sequences</></entry>
386 387
      <entry>Same as <structname>pg_statio_all_sequences</>, except that only
      user sequences are shown.</entry>
388
     </row>
389 390 391 392 393 394 395 396 397 398

     <row>
      <entry><structname>pg_stat_user_functions</></entry>
      <entry>For all tracked functions, function OID, schema, name, number
      of calls, total time, and self time.  Self time is the
      amount of time spent in the function itself, total time includes the
      time spent in functions it called. Time values are in milliseconds.
     </entry>
     </row>

399 400 401 402 403 404 405 406 407
    </tbody>
   </tgroup>
  </table>

  <para>
   The per-index statistics are particularly useful to determine which
   indexes are being used and how effective they are.
  </para>

408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433
  <para>
   Beginning in <productname>PostgreSQL</productname> 8.1, indexes can be
   used either directly or via <quote>bitmap scans</>.  In a bitmap scan
   the output of several indexes can be combined via AND or OR rules;
   so it is difficult to associate individual heap row fetches 
   with specific indexes when a bitmap scan is used.  Therefore, a bitmap
   scan increments the
   <structname>pg_stat_all_indexes</>.<structfield>idx_tup_read</>
   count(s) for the index(es) it uses, and it increments the
   <structname>pg_stat_all_tables</>.<structfield>idx_tup_fetch</>
   count for the table, but it does not affect
   <structname>pg_stat_all_indexes</>.<structfield>idx_tup_fetch</>.
  </para>

  <note>
   <para>
    Before <productname>PostgreSQL</productname> 8.1, the
    <structfield>idx_tup_read</> and <structfield>idx_tup_fetch</> counts
    were essentially always equal.  Now they can be different even without
    considering bitmap scans, because <structfield>idx_tup_read</> counts
    index entries retrieved from the index while <structfield>idx_tup_fetch</>
    counts live rows fetched from the table; the latter will be less if any
    dead or not-yet-committed rows are fetched using the index.
   </para>
  </note>

434
  <para>
435 436 437 438 439 440 441
   The <structname>pg_statio_</> views are primarily useful to
   determine the effectiveness of the buffer cache.  When the number
   of actual disk reads is much smaller than the number of buffer
   hits, then the cache is satisfying most read requests without
   invoking a kernel call. However, these statistics do not give the
   entire story: due to the way in which <productname>PostgreSQL</>
   handles disk I/O, data that is not in the
442 443
   <productname>PostgreSQL</> buffer cache might still reside in the
   kernel's I/O cache, and might therefore still be fetched without
444 445 446 447 448
   requiring a physical read. Users interested in obtaining more
   detailed information on <productname>PostgreSQL</> I/O behavior are
   advised to use the <productname>PostgreSQL</> statistics collector
   in combination with operating system utilities that allow insight
   into the kernel's handling of I/O.
449 450 451
  </para>

  <para>
452 453 454 455
   Other ways of looking at the statistics can be set up by writing
   queries that use the same underlying statistics access functions as
   these standard views do.  These functions are listed in <xref
   linkend="monitoring-stats-funcs-table">.  The per-database access
Peter Eisentraut's avatar
Peter Eisentraut committed
456 457
   functions take a database OID as argument to identify which
   database to report on.  The per-table and per-index functions take
458 459 460
   a table or index OID.  The functions for function-call statistics
   take a function OID.  (Note that only tables, indexes, and functions
   in the current database can be seen with these functions.)  The
461
   per-server-process access functions take a server process
Peter Eisentraut's avatar
Peter Eisentraut committed
462
   number, which ranges from one to the number of currently active
463
   server processes.
464 465
  </para>

466
  <table id="monitoring-stats-funcs-table">
467 468 469 470 471 472 473 474 475 476 477 478 479
   <title>Statistics Access Functions</title>

   <tgroup cols="3">
    <thead>
     <row>
      <entry>Function</entry>
      <entry>Return Type</entry>
      <entry>Description</entry>
     </row>
    </thead>

    <tbody>
     <row>
480
      <entry><literal><function>pg_stat_get_db_numbackends</function>(<type>oid</type>)</literal></entry>
481 482
      <entry><type>integer</type></entry>
      <entry>
483
       Number of active server processes for database
484 485 486 487
      </entry>
     </row>

     <row>
488
      <entry><literal><function>pg_stat_get_db_xact_commit</function>(<type>oid</type>)</literal></entry>
489 490 491 492 493 494 495
      <entry><type>bigint</type></entry>
      <entry>
       Transactions committed in database
      </entry>
     </row>

     <row>
496
      <entry><literal><function>pg_stat_get_db_xact_rollback</function>(<type>oid</type>)</literal></entry>
497 498 499 500 501 502 503
      <entry><type>bigint</type></entry>
      <entry>
       Transactions rolled back in database
      </entry>
     </row>

     <row>
504
      <entry><literal><function>pg_stat_get_db_blocks_fetched</function>(<type>oid</type>)</literal></entry>
505 506 507 508 509 510 511
      <entry><type>bigint</type></entry>
      <entry>
       Number of disk block fetch requests for database
      </entry>
     </row>

     <row>
512
      <entry><literal><function>pg_stat_get_db_blocks_hit</function>(<type>oid</type>)</literal></entry>
513 514
      <entry><type>bigint</type></entry>
      <entry>
515
       Number of disk block fetch requests found in cache for database
516 517
      </entry>
     </row>
518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557

     <row>
      <entry><literal><function>pg_stat_get_db_tuples_returned</function>(<type>oid</type>)</literal></entry>
      <entry><type>bigint</type></entry>
      <entry>
       Number of tuples returned for database
      </entry>
     </row>

     <row>
      <entry><literal><function>pg_stat_get_db_tuples_fetched</function>(<type>oid</type>)</literal></entry>
      <entry><type>bigint</type></entry>
      <entry>
       Number of tuples fetched for database
      </entry>
     </row>

     <row>
      <entry><literal><function>pg_stat_get_db_tuples_inserted</function>(<type>oid</type>)</literal></entry>
      <entry><type>bigint</type></entry>
      <entry>
       Number of tuples inserted in database
      </entry>
     </row>

     <row>
      <entry><literal><function>pg_stat_get_db_tuples_updated</function>(<type>oid</type>)</literal></entry>
      <entry><type>bigint</type></entry>
      <entry>
       Number of tuples updated in database
      </entry>
     </row>

     <row>
      <entry><literal><function>pg_stat_get_db_tuples_deleted</function>(<type>oid</type>)</literal></entry>
      <entry><type>bigint</type></entry>
      <entry>
       Number of tuples deleted in database
      </entry>
     </row>
558 559

     <row>
560
      <entry><literal><function>pg_stat_get_numscans</function>(<type>oid</type>)</literal></entry>
561 562 563 564 565 566 567 568
      <entry><type>bigint</type></entry>
      <entry>
       Number of sequential scans done when argument is a table,
       or number of index scans done when argument is an index
      </entry>
     </row>

     <row>
569
      <entry><literal><function>pg_stat_get_tuples_returned</function>(<type>oid</type>)</literal></entry>
570 571
      <entry><type>bigint</type></entry>
      <entry>
572
       Number of rows read by sequential scans when argument is a table,
573
       or number of index entries returned when argument is an index
574 575 576 577
      </entry>
     </row>

     <row>
578
      <entry><literal><function>pg_stat_get_tuples_fetched</function>(<type>oid</type>)</literal></entry>
579 580
      <entry><type>bigint</type></entry>
      <entry>
581 582
       Number of table rows fetched by bitmap scans when argument is a table,
       or table rows fetched by simple index scans using the index
583 584 585 586 587
       when argument is an index
      </entry>
     </row>

     <row>
588
      <entry><literal><function>pg_stat_get_tuples_inserted</function>(<type>oid</type>)</literal></entry>
589 590
      <entry><type>bigint</type></entry>
      <entry>
591
       Number of rows inserted into table
592 593 594 595
      </entry>
     </row>

     <row>
596
      <entry><literal><function>pg_stat_get_tuples_updated</function>(<type>oid</type>)</literal></entry>
597 598
      <entry><type>bigint</type></entry>
      <entry>
599
       Number of rows updated in table (includes HOT updates)
600 601 602 603
      </entry>
     </row>

     <row>
604
      <entry><literal><function>pg_stat_get_tuples_deleted</function>(<type>oid</type>)</literal></entry>
605 606
      <entry><type>bigint</type></entry>
      <entry>
607
       Number of rows deleted from table
608 609 610
      </entry>
     </row>

611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634
     <row>
      <entry><literal><function>pg_stat_get_tuples_hot_updated</function>(<type>oid</type>)</literal></entry>
      <entry><type>bigint</type></entry>
      <entry>
       Number of rows HOT-updated in table
      </entry>
     </row>

     <row>
      <entry><literal><function>pg_stat_get_live_tuples</function>(<type>oid</type>)</literal></entry>
      <entry><type>bigint</type></entry>
      <entry>
       Number of live rows in table
      </entry>
     </row>

     <row>
      <entry><literal><function>pg_stat_get_dead_tuples</function>(<type>oid</type>)</literal></entry>
      <entry><type>bigint</type></entry>
      <entry>
       Number of dead rows in table
      </entry>
     </row>

635
     <row>
636
      <entry><literal><function>pg_stat_get_blocks_fetched</function>(<type>oid</type>)</literal></entry>
637 638 639 640 641 642 643
      <entry><type>bigint</type></entry>
      <entry>
       Number of disk block fetch requests for table or index
      </entry>
     </row>

     <row>
644
      <entry><literal><function>pg_stat_get_blocks_hit</function>(<type>oid</type>)</literal></entry>
645 646 647 648 649 650
      <entry><type>bigint</type></entry>
      <entry>
       Number of disk block requests found in cache for table or index
      </entry>
     </row>

651 652 653 654
     <row>
      <entry><literal><function>pg_stat_get_last_vacuum_time</function>(<type>oid</type>)</literal></entry>
      <entry><type>timestamptz</type></entry>
      <entry>
655
       Time of the last vacuum initiated by the user on this table
656 657 658 659 660 661 662
      </entry>
     </row>

     <row>
      <entry><literal><function>pg_stat_get_last_autovacuum_time</function>(<type>oid</type>)</literal></entry>
      <entry><type>timestamptz</type></entry>
      <entry>
663
       Time of the last vacuum initiated by the autovacuum daemon on this table
664 665 666 667 668 669 670
      </entry>
     </row>

     <row>
      <entry><literal><function>pg_stat_get_last_analyze_time</function>(<type>oid</type>)</literal></entry>
      <entry><type>timestamptz</type></entry>
      <entry>
671
       Time of the last analyze initiated by the user on this table
672 673 674 675 676 677 678
      </entry>
     </row>

     <row>
      <entry><literal><function>pg_stat_get_last_autoanalyze_time</function>(<type>oid</type>)</literal></entry>
      <entry><type>timestamptz</type></entry>
      <entry>
679 680
       Time of the last analyze initiated by the autovacuum daemon on this
       table
681 682 683
      </entry>
     </row>

684
     <row>
685 686 687
       <!-- See also the entry for this in func.sgml -->
      <entry><literal><function>pg_backend_pid</function>()</literal></entry>
      <entry><type>integer</type></entry>
688
      <entry>
689
       Process ID of the server process attached to the current session
690 691
      </entry>
     </row>
692 693

     <row>
694 695
      <entry><literal><function>pg_stat_get_activity</function>(<type>integer</type>)</literal></entry>
      <entry><type>setof record</type></entry>
696
      <entry>
697 698 699 700 701 702 703
       Returns a record of information about the backend with the specified pid, or
       one record for each active backend in the system if <symbol>NULL</symbol> is
       specified. The fields returned are the same as in the 
       <structname>pg_stat_activity</structname> view
      </entry>
     </row>

704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729
     <row>
      <entry><literal><function>pg_stat_get_function_calls</function>(<type>oid</type>)</literal></entry>
      <entry><type>bigint</type></entry>
      <entry>
       Number of times the function has been called.
      </entry>
     </row>

     <row>
      <entry><literal><function>pg_stat_get_function_time</function>(<type>oid</type>)</literal></entry>
      <entry><type>bigint</type></entry>
      <entry>
       Total wall clock time spent in the function, in microseconds.  Includes
       the time spent in functions called by this one.
      </entry>
     </row>

     <row>
      <entry><literal><function>pg_stat_get_function_self_time</function>(<type>oid</type>)</literal></entry>
      <entry><type>bigint</type></entry>
      <entry>
       Time spent in only this function. Time spent in called functions
       is excluded.
      </entry>
     </row>

730 731 732 733 734 735
     <row>
      <entry><literal><function>pg_stat_get_backend_idset</function>()</literal></entry>
      <entry><type>setof integer</type></entry>
      <entry>
       Set of currently active server process numbers (from 1 to the
       number of active server processes).  See usage example in the text
736 737
      </entry>
     </row>
738

739
     <row>
740
      <entry><literal><function>pg_stat_get_backend_pid</function>(<type>integer</type>)</literal></entry>
741 742
      <entry><type>integer</type></entry>
      <entry>
743
       Process ID of the given server process
744 745 746 747
      </entry>
     </row>

     <row>
748
      <entry><literal><function>pg_stat_get_backend_dbid</function>(<type>integer</type>)</literal></entry>
749 750
      <entry><type>oid</type></entry>
      <entry>
751
       Database ID of the given server process
752 753 754 755
      </entry>
     </row>

     <row>
756
      <entry><literal><function>pg_stat_get_backend_userid</function>(<type>integer</type>)</literal></entry>
757 758
      <entry><type>oid</type></entry>
      <entry>
759
       User ID of the given server process
760 761 762 763
      </entry>
     </row>

     <row>
764
      <entry><literal><function>pg_stat_get_backend_activity</function>(<type>integer</type>)</literal></entry>
765 766
      <entry><type>text</type></entry>
      <entry>
767 768 769
       Active command of the given server process, but only if the
       current user is a superuser or the same user as that of
       the session being queried (and
770
       <varname>track_activities</varname> is on)
771 772
      </entry>
     </row>
773

774 775 776 777 778 779 780
     <row>
      <entry><literal><function>pg_stat_get_backend_waiting</function>(<type>integer</type>)</literal></entry>
      <entry><type>boolean</type></entry>
      <entry>
       True if the given server process is waiting for a lock,
       but only if the current user is a superuser or the same user as that of
       the session being queried (and
781
       <varname>track_activities</varname> is on)
782 783 784
      </entry>
     </row>

785
     <row>
786
      <entry><literal><function>pg_stat_get_backend_activity_start</function>(<type>integer</type>)</literal></entry>
787
      <entry><type>timestamp with time zone</type></entry>
788
      <entry>
789
       The time at which the given server process' currently
790 791 792
       executing query was started, but only if the
       current user is a superuser or the same user as that of
       the session being queried (and
793
       <varname>track_activities</varname> is on)
794 795 796
      </entry>
     </row>

797 798 799 800 801 802 803 804
     <row>
      <entry><literal><function>pg_stat_get_backend_xact_start</function>(<type>integer</type>)</literal></entry>
      <entry><type>timestamp with time zone</type></entry>
      <entry>
       The time at which the given server process' currently
       executing transaction was started, but only if the
       current user is a superuser or the same user as that of
       the session being queried (and
805
       <varname>track_activities</varname> is on)
806 807 808
      </entry>
     </row>

809 810 811 812
     <row>
      <entry><literal><function>pg_stat_get_backend_start</function>(<type>integer</type>)</literal></entry>
      <entry><type>timestamp with time zone</type></entry>
      <entry>
813
       The time at which the given server process was started, or
814 815 816 817 818 819 820 821 822 823
       null if the current user is not a superuser nor the same user
       as that of the session being queried
      </entry>
     </row>

     <row>
      <entry><literal><function>pg_stat_get_backend_client_addr</function>(<type>integer</type>)</literal></entry>
      <entry><type>inet</type></entry>
      <entry>
       The IP address of the client connected to the given
824
       server process. Null if the connection is over a Unix domain
825 826 827 828 829 830 831 832 833 834
       socket. Also null if the current user is not a superuser nor
       the same user as that of the session being queried
      </entry>
     </row>

     <row>
      <entry><literal><function>pg_stat_get_backend_client_port</function>(<type>integer</type>)</literal></entry>
      <entry><type>integer</type></entry>
      <entry>
       The IP port number of the client connected to the given
835
       server process.  -1 if the connection is over a Unix domain
836 837 838 839 840
       socket. Null if the current user is not a superuser nor the
       same user as that of the session being queried
      </entry>
     </row>

841 842 843 844
     <row>
      <entry><literal><function>pg_stat_get_bgwriter_timed_checkpoints</function>()</literal></entry>
       <entry><type>bigint</type></entry>
       <entry>
Peter Eisentraut's avatar
Peter Eisentraut committed
845
        The number of times the background writer has started timed checkpoints
846
        (because the <varname>checkpoint_timeout</varname> time has expired)
847 848 849 850 851 852 853
       </entry>
     </row>

     <row>
      <entry><literal><function>pg_stat_get_bgwriter_requested_checkpoints</function>()</literal></entry>
      <entry><type>bigint</type></entry>
      <entry>
Peter Eisentraut's avatar
Peter Eisentraut committed
854
       The number of times the background writer has started checkpoints based
855 856 857
       on requests from backends because the <varname>checkpoint_segments</varname>
       has been exceeded or because the <command>CHECKPOINT</command>
       command has been issued
858 859 860 861 862 863 864
      </entry>
     </row>

     <row>
      <entry><literal><function>pg_stat_get_bgwriter_buf_written_checkpoints</function>()</literal></entry>
      <entry><type>bigint</type></entry>
      <entry>
Peter Eisentraut's avatar
Peter Eisentraut committed
865
       The number of buffers written by the background writer during checkpoints
866 867 868 869
      </entry>
     </row>

     <row>
870
      <entry><literal><function>pg_stat_get_bgwriter_buf_written_clean</function>()</literal></entry>
871 872
      <entry><type>bigint</type></entry>
      <entry>
Peter Eisentraut's avatar
Peter Eisentraut committed
873
       The number of buffers written by the background writer for routine cleaning of
874
       dirty pages
875 876 877 878
      </entry>
     </row>

     <row>
879
      <entry><literal><function>pg_stat_get_bgwriter_maxwritten_clean</function>()</literal></entry>
880 881
      <entry><type>bigint</type></entry>
      <entry>
Peter Eisentraut's avatar
Peter Eisentraut committed
882
       The number of times the background writer has stopped its cleaning scan because
883
       it has written more buffers than specified in the
884
       <varname>bgwriter_lru_maxpages</varname> parameter
885 886 887
      </entry>
     </row>

888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904
     <row>
      <entry><literal><function>pg_stat_get_buf_written_backend</function>()</literal></entry>
      <entry><type>bigint</type></entry>
      <entry>
       The number of buffers written by backends because they needed
       to allocate a new buffer
      </entry>
     </row>

     <row>
      <entry><literal><function>pg_stat_get_buf_alloc</function>()</literal></entry>
      <entry><type>bigint</type></entry>
      <entry>
       The total number of buffer allocations
      </entry>
     </row>

905 906 907 908 909 910 911 912
     <row>
      <entry><literal><function>pg_stat_clear_snapshot</function>()</literal></entry>
      <entry><type>void</type></entry>
      <entry>
       Discard the current statistics snapshot
      </entry>
     </row>

913
     <row>
914
      <entry><literal><function>pg_stat_reset</function>()</literal></entry>
915
      <entry><type>void</type></entry>
916
      <entry>
917 918
       Reset all statistics counters for the current database to zero
       (requires superuser privileges)
919 920
      </entry>
     </row>
921 922 923 924
    </tbody>
   </tgroup>
  </table>

925 926
   <note>
    <para>
927 928
     <function>blocks_fetched</function> minus
     <function>blocks_hit</function> gives the number of kernel
929 930 931 932 933
     <function>read()</> calls issued for the table, index, or
     database; but the actual number of physical reads is usually
     lower due to kernel-level buffering.
    </para>
   </note>
934 935

  <para>
936 937
   All functions to access information about backends are indexed by backend id
   number, except <function>pg_stat_get_activity</function> which is indexed by PID.
938
   The function <function>pg_stat_get_backend_idset</function> provides
939 940
   a convenient way to generate one row for each active server process.  For
   example, to show the <acronym>PID</>s and current queries of all server processes:
941 942

<programlisting>
943 944 945
SELECT pg_stat_get_backend_pid(s.backendid) AS procpid,
       pg_stat_get_backend_activity(s.backendid) AS current_query
    FROM (SELECT pg_stat_get_backend_idset() AS backendid) AS s;
946 947 948 949 950
</programlisting>
  </para>

 </sect2>
 </sect1>
951

952 953 954
 <sect1 id="monitoring-locks">
  <title>Viewing Locks</title>

Peter Eisentraut's avatar
Peter Eisentraut committed
955 956 957 958 959
  <indexterm zone="monitoring-locks">
   <primary>lock</primary>
   <secondary>monitoring</secondary>
  </indexterm>

960 961
  <para>
   Another useful tool for monitoring database activity is the
962
   <structname>pg_locks</structname> system table.  It allows the
963 964 965
   database administrator to view information about the outstanding
   locks in the lock manager. For example, this capability can be used
   to:
966

967 968 969 970 971 972 973 974 975 976 977 978
   <itemizedlist>
    <listitem>
     <para>
      View all the locks currently outstanding, all the locks on
      relations in a particular database, all the locks on a
      particular relation, or all the locks held by a particular
      <productname>PostgreSQL</productname> session.
     </para>
    </listitem>

    <listitem>
     <para>
979
      Determine the relation in the current database with the most
980 981 982 983 984 985 986 987 988 989 990 991 992 993
      ungranted locks (which might be a source of contention among
      database clients).
     </para>
    </listitem>

    <listitem>
     <para>
      Determine the effect of lock contention on overall database
      performance, as well as the extent to which contention varies
      with overall database traffic.
     </para>
    </listitem>
   </itemizedlist>

994 995
   Details of the <structname>pg_locks</structname> view appear in
   <xref linkend="view-pg-locks">.
996
   For more information on locking and managing concurrency with
997
   <productname>PostgreSQL</productname>, refer to <xref linkend="mvcc">.
998 999
  </para>
 </sect1>
1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011

 <sect1 id="dynamic-trace">
  <title>Dynamic Tracing</title>

 <indexterm zone="dynamic-trace">
  <primary>DTrace</primary>
 </indexterm>

  <para>
   <productname>PostgreSQL</productname> provides facilities to support
   dynamic tracing of the database server. This allows an external
   utility to be called at specific points in the code and thereby trace
1012
   execution.
1013 1014 1015
  </para>

  <para>
1016 1017
   A number of probes or trace points are already inserted into the source
   code. These probes are intended to be used by database developers and
1018 1019 1020
   administrators. By default the probes are not compiled into
   <productname>PostgreSQL</productname>; the user needs to explicitly tell
   the configure script to make the probes available.
1021 1022
  </para>

1023
  <para>
1024 1025 1026 1027 1028
   Currently, only the
   <ulink url="http://opensolaris.org/os/community/dtrace/">DTrace</ulink>
   utility is supported, which is available
   on OpenSolaris, Solaris 10, and Mac OS X Leopard. It is expected that
   DTrace will be available in the future on FreeBSD and possibly other
1029 1030 1031 1032 1033
   operating systems.  The
   <ulink url="http://sourceware.org/systemtap/">SystemTap</ulink> project
   for Linux also provides a DTrace equivalent.  Supporting other dynamic
   tracing utilities is theoretically possible by changing the definitions for
   the macros in <filename>src/include/utils/probes.h</>.
1034 1035 1036
  </para>

  <sect2 id="compiling-for-trace">
1037
   <title>Compiling for Dynamic Tracing</title>
1038 1039

  <para>
1040
   By default, probes are not available, so you will need to
1041 1042
   explicitly tell the configure script to make the probes available
   in <productname>PostgreSQL</productname>. To include DTrace support
1043 1044
   specify <option>--enable-dtrace</> to configure.  See <xref
   linkend="install-procedure"> for further information.
Tom Lane's avatar
Tom Lane committed
1045
  </para>
1046 1047 1048
  </sect2>

  <sect2 id="trace-points">
1049
   <title>Built-in Probes</title>
1050 1051

  <para>
1052 1053 1054
   A number of standard probes are provided in the source code,
   as shown in <xref linkend="dtrace-probe-point-table">.
   More can certainly be added to enhance PostgreSQL's observability.
1055 1056
  </para>

1057 1058
 <table id="dtrace-probe-point-table">
  <title>Built-in DTrace Probes</title>
1059 1060 1061 1062 1063
  <tgroup cols="3">
   <thead>
    <row>
     <entry>Name</entry>
     <entry>Parameters</entry>
1064
     <entry>Description</entry>
1065 1066 1067 1068
    </row>
   </thead>

   <tbody>
1069

1070
    <row>
1071
     <entry>transaction-start</entry>
1072
     <entry>(LocalTransactionId)</entry>
1073 1074
     <entry>Probe that fires at the start of a new transaction.
      arg0 is the transaction id.</entry>
1075 1076
    </row>
    <row>
1077
     <entry>transaction-commit</entry>
1078
     <entry>(LocalTransactionId)</entry>
1079 1080
     <entry>Probe that fires when a transaction completes successfully.
      arg0 is the transaction id.</entry>
1081 1082
    </row>
    <row>
1083
     <entry>transaction-abort</entry>
1084
     <entry>(LocalTransactionId)</entry>
1085 1086
     <entry>Probe that fires when a transaction completes unsuccessfully.
      arg0 is the transaction id.</entry>
1087 1088 1089 1090
    </row>
    <row>
     <entry>query-start</entry>
     <entry>(const char *)</entry>
1091 1092
     <entry>Probe that fires when the processing of a query is started.
      arg0 is the query string.</entry>
1093 1094 1095 1096
    </row>
    <row>
     <entry>query-done</entry>
     <entry>(const char *)</entry>
1097 1098
     <entry>Probe that fires when the processing of a query is complete.
      arg0 is the query string.</entry>
1099 1100 1101 1102
    </row>
    <row>
     <entry>query-parse-start</entry>
     <entry>(const char *)</entry>
1103 1104
     <entry>Probe that fires when the parsing of a query is started.
      arg0 is the query string.</entry>
1105 1106 1107 1108
    </row>
    <row>
     <entry>query-parse-done</entry>
     <entry>(const char *)</entry>
1109 1110
     <entry>Probe that fires when the parsing of a query is complete.
      arg0 is the query string.</entry>
1111 1112 1113 1114
    </row>
    <row>
     <entry>query-rewrite-start</entry>
     <entry>(const char *)</entry>
1115 1116
     <entry>Probe that fires when the rewriting of a query is started.
      arg0 is the query string.</entry>
1117 1118 1119 1120
    </row>
    <row>
     <entry>query-rewrite-done</entry>
     <entry>(const char *)</entry>
1121 1122
     <entry>Probe that fires when the rewriting of a query is complete.
      arg0 is the query string.</entry>
1123 1124 1125 1126 1127 1128 1129 1130 1131 1132 1133 1134 1135 1136 1137 1138 1139 1140 1141 1142 1143 1144 1145 1146
    </row>
    <row>
     <entry>query-plan-start</entry>
     <entry>()</entry>
     <entry>Probe that fires when the planning of a query is started.</entry>
    </row>
    <row>
     <entry>query-plan-done</entry>
     <entry>()</entry>
     <entry>Probe that fires when the planning of a query is complete.</entry>
    </row>
    <row>
     <entry>query-execute-start</entry>
     <entry>()</entry>
     <entry>Probe that fires when the execution of a query is started.</entry>
    </row>
    <row>
     <entry>query-execute-done</entry>
     <entry>()</entry>
     <entry>Probe that fires when the execution of a query is complete.</entry>
    </row>
    <row>
     <entry>statement-status</entry>
     <entry>(const char *)</entry>
1147 1148 1149
     <entry>Probe that fires anytime the server process updates its
      <structname>pg_stat_activity</>.<structfield>current_query</> status.
      arg0 is the new status string.</entry>
1150 1151 1152 1153
    </row>
    <row>
     <entry>checkpoint-start</entry>
     <entry>(int)</entry>
1154 1155 1156
     <entry>Probe that fires when a checkpoint is started.
      arg0 holds the bitwise flags used to distinguish different checkpoint
      types, such as shutdown, immediate or force.</entry>
1157 1158 1159 1160
    </row>
    <row>
     <entry>checkpoint-done</entry>
     <entry>(int, int, int, int, int)</entry>
1161 1162 1163 1164 1165
     <entry>Probe that fires when a checkpoint is complete.
      (The probes listed next fire in sequence during checkpoint processing.)
      arg0 is the number of buffers written. arg1 is the total number of
      buffers. arg2, arg3 and arg4 contain the number of xlog file(s) added,
      removed and recycled respectively.</entry>
1166 1167 1168 1169
    </row>
    <row>
     <entry>clog-checkpoint-start</entry>
     <entry>(bool)</entry>
1170 1171 1172
     <entry>Probe that fires when the CLOG portion of a checkpoint is started.
      arg0 is true for normal checkpoint, false for shutdown
      checkpoint.</entry>
1173 1174 1175 1176
    </row>
    <row>
     <entry>clog-checkpoint-done</entry>
     <entry>(bool)</entry>
1177 1178
     <entry>Probe that fires when the CLOG portion of a checkpoint is
      complete. arg0 has the same meaning as for clog-checkpoint-start.</entry>
1179 1180 1181 1182
    </row>
    <row>
     <entry>subtrans-checkpoint-start</entry>
     <entry>(bool)</entry>
1183 1184 1185 1186
     <entry>Probe that fires when the SUBTRANS portion of a checkpoint is
      started.
      arg0 is true for normal checkpoint, false for shutdown
      checkpoint.</entry>
1187 1188 1189 1190
    </row>
    <row>
     <entry>subtrans-checkpoint-done</entry>
     <entry>(bool)</entry>
1191 1192 1193
     <entry>Probe that fires when the SUBTRANS portion of a checkpoint is
      complete. arg0 has the same meaning as for
      subtrans-checkpoint-start.</entry>
1194 1195 1196 1197
    </row>
    <row>
     <entry>multixact-checkpoint-start</entry>
     <entry>(bool)</entry>
1198 1199 1200 1201
     <entry>Probe that fires when the MultiXact portion of a checkpoint is
      started.
      arg0 is true for normal checkpoint, false for shutdown
      checkpoint.</entry>
1202 1203 1204 1205
    </row>
    <row>
     <entry>multixact-checkpoint-done</entry>
     <entry>(bool)</entry>
1206 1207 1208
     <entry>Probe that fires when the MultiXact portion of a checkpoint is
      complete. arg0 has the same meaning as for
      multixact-checkpoint-start.</entry>
1209 1210 1211 1212
    </row>
    <row>
     <entry>buffer-checkpoint-start</entry>
     <entry>(int)</entry>
1213 1214 1215 1216 1217 1218 1219 1220 1221 1222 1223 1224 1225 1226 1227 1228 1229 1230 1231 1232 1233 1234 1235 1236 1237 1238 1239 1240
     <entry>Probe that fires when the buffer-writing portion of a checkpoint
      is started.
      arg0 holds the bitwise flags used to distinguish different checkpoint
      types, such as shutdown, immediate or force.</entry>
    </row>
    <row>
     <entry>buffer-sync-start</entry>
     <entry>(int, int)</entry>
     <entry>Probe that fires when we begin to write dirty buffers during
      checkpoint (after identifying which buffers must be written).
      arg0 is the total number of buffers.
      arg1 is the number that are currently dirty and need to be written.</entry>
    </row>
    <row>
     <entry>buffer-sync-written</entry>
     <entry>(int)</entry>
     <entry>Probe that fires after each buffer is written during checkpoint.
      arg0 is the ID number of the buffer.</entry>
    </row>
    <row>
     <entry>buffer-sync-done</entry>
     <entry>(int, int, int)</entry>
     <entry>Probe that fires when all dirty buffers have been written.
      arg0 is the total number of buffers.
      arg1 is the number of buffers actually written by the checkpoint process.
      arg2 is the number that were expected to be written (arg1 of
      buffer-sync-start); any difference reflects other processes flushing
      buffers during the checkpoint.</entry>
1241 1242 1243 1244
    </row>
    <row>
     <entry>buffer-checkpoint-sync-start</entry>
     <entry>()</entry>
1245 1246
     <entry>Probe that fires after dirty buffers have been written to the
      kernel, and before starting to issue fsync requests.</entry>
1247 1248 1249 1250
    </row>
    <row>
     <entry>buffer-checkpoint-done</entry>
     <entry>()</entry>
1251 1252
     <entry>Probe that fires when syncing of buffers to disk is
      complete.</entry>
1253 1254 1255 1256
    </row>
    <row>
     <entry>twophase-checkpoint-start</entry>
     <entry>()</entry>
1257 1258
     <entry>Probe that fires when the two-phase portion of a checkpoint is
      started.</entry>
1259 1260 1261 1262
    </row>
    <row>
     <entry>twophase-checkpoint-done</entry>
     <entry>()</entry>
1263 1264
     <entry>Probe that fires when the two-phase portion of a checkpoint is
      complete.</entry>
1265 1266 1267 1268
    </row>
    <row>
     <entry>buffer-read-start</entry>
     <entry>(ForkNumber, BlockNumber, Oid, Oid, Oid, bool, bool)</entry>
1269 1270 1271 1272 1273 1274 1275 1276
     <entry>Probe that fires when a buffer read is started.
      arg0 and arg1 contain the fork and block numbers of the page (but
      arg1 will be -1 if this is a relation extension request).
      arg2, arg3, and arg4 contain the tablespace, database, and relation OIDs
      identifying the relation.
      arg5 is true for a local buffer, false for a shared buffer.
      arg6 is true for a relation extension request, false for normal
      read.</entry>
1277 1278
    </row>
    <row>
1279 1280 1281 1282 1283 1284 1285 1286 1287 1288 1289 1290
     <entry>buffer-read-done</entry>
     <entry>(ForkNumber, BlockNumber, Oid, Oid, Oid, bool, bool, bool)</entry>
     <entry>Probe that fires when a buffer read is complete.
      arg0 and arg1 contain the fork and block numbers of the page (if this
      is a relation extension request, arg1 now contains the block number
      of the newly added block).
      arg2, arg3, and arg4 contain the tablespace, database, and relation OIDs
      identifying the relation.
      arg5 is true for a local buffer, false for a shared buffer.
      arg6 is true for a relation extension request, false for normal
      read.
      arg7 is true if the buffer was found in the pool, false if not.</entry>
1291
    </row>
1292 1293
    <row>
     <entry>buffer-flush-start</entry>
1294 1295 1296 1297 1298 1299
     <entry>(ForkNumber, BlockNumber, Oid, Oid, Oid)</entry>
     <entry>Probe that fires before issuing any write request for a shared
      buffer.
      arg0 and arg1 contain the fork and block numbers of the page.
      arg2, arg3, and arg4 contain the tablespace, database, and relation OIDs
      identifying the relation.</entry>
1300 1301 1302
    </row>
    <row>
     <entry>buffer-flush-done</entry>
1303 1304 1305 1306 1307
     <entry>(ForkNumber, BlockNumber, Oid, Oid, Oid)</entry>
     <entry>Probe that fires when a write request is complete.  (Note
      that this just reflects the time to pass the data to the kernel;
      it's typically not actually been written to disk yet.)
      The arguments are the same as for buffer-flush-start.</entry>
1308 1309 1310 1311
    </row>
    <row>
     <entry>buffer-write-dirty-start</entry>
     <entry>(ForkNumber, BlockNumber, Oid, Oid, Oid)</entry>
1312 1313 1314 1315 1316 1317 1318
     <entry>Probe that fires when a server process begins to write a dirty
      buffer.  (If this happens often, it implies that
      <xref linkend="guc-shared-buffers"> is too
      small or the bgwriter control parameters need adjustment.)
      arg0 and arg1 contain the fork and block numbers of the page.
      arg2, arg3, and arg4 contain the tablespace, database, and relation OIDs
      identifying the relation.</entry>
1319 1320 1321 1322
    </row>
    <row>
     <entry>buffer-write-dirty-done</entry>
     <entry>(ForkNumber, BlockNumber, Oid, Oid, Oid)</entry>
1323 1324
     <entry>Probe that fires when a dirty-buffer write is complete.
      The arguments are the same as for buffer-write-dirty-start.</entry>
1325 1326 1327 1328
    </row>
    <row>
     <entry>wal-buffer-write-dirty-start</entry>
     <entry>()</entry>
1329 1330 1331 1332
     <entry>Probe that fires when when a server process begins to write a
      dirty WAL buffer because no more WAL buffer space is available.
      (If this happens often, it implies that
      <xref linkend="guc-wal-buffers"> is too small.)</entry>
1333 1334 1335 1336
    </row>
    <row>
     <entry>wal-buffer-write-dirty-done</entry>
     <entry>()</entry>
1337
     <entry>Probe that fires when a dirty WAL buffer write is complete.</entry>
1338 1339 1340 1341
    </row>
    <row>
     <entry>xlog-insert</entry>
     <entry>(unsigned char, unsigned char)</entry>
1342 1343 1344
     <entry>Probe that fires when a WAL record is inserted.
      arg0 is the resource manager (rmid) for the record.
      arg1 contains the info flags.</entry>
1345 1346 1347 1348
    </row>
    <row>
     <entry>xlog-switch</entry>
     <entry>()</entry>
1349
     <entry>Probe that fires when a WAL segment switch is requested.</entry>
1350 1351 1352 1353
    </row>
    <row>
     <entry>smgr-md-read-start</entry>
     <entry>(ForkNumber, BlockNumber, Oid, Oid, Oid)</entry>
1354 1355 1356 1357
     <entry>Probe that fires when beginning to read a block from a relation.
      arg0 and arg1 contain the fork and block numbers of the page.
      arg2, arg3, and arg4 contain the tablespace, database, and relation OIDs
      identifying the relation.</entry>
1358 1359 1360
    </row>
    <row>
     <entry>smgr-md-read-done</entry>
1361 1362 1363 1364 1365 1366 1367
     <entry>(ForkNumber, BlockNumber, Oid, Oid, Oid, int, int)</entry>
     <entry>Probe that fires when a block read is complete.
      arg0 and arg1 contain the fork and block numbers of the page.
      arg2, arg3, and arg4 contain the tablespace, database, and relation OIDs
      identifying the relation.
      arg5 is the number of bytes actually read, while arg6 is the number
      requested (if these are different it indicates trouble).</entry>
1368 1369 1370 1371
    </row>
    <row>
     <entry>smgr-md-write-start</entry>
     <entry>(ForkNumber, BlockNumber, Oid, Oid, Oid)</entry>
1372 1373 1374 1375
     <entry>Probe that fires when beginning to write a block to a relation.
      arg0 and arg1 contain the fork and block numbers of the page.
      arg2, arg3, and arg4 contain the tablespace, database, and relation OIDs
      identifying the relation.</entry>
1376 1377 1378
    </row>
    <row>
     <entry>smgr-md-write-done</entry>
1379 1380 1381 1382 1383 1384 1385
     <entry>(ForkNumber, BlockNumber, Oid, Oid, Oid, int, int)</entry>
     <entry>Probe that fires when a block write is complete.
      arg0 and arg1 contain the fork and block numbers of the page.
      arg2, arg3, and arg4 contain the tablespace, database, and relation OIDs
      identifying the relation.
      arg5 is the number of bytes actually written, while arg6 is the number
      requested (if these are different it indicates trouble).</entry>
1386 1387
    </row>
    <row>
1388 1389 1390 1391 1392 1393 1394 1395
     <entry>sort-start</entry>
     <entry>(int, bool, int, int, bool)</entry>
     <entry>Probe that fires when a sort operation is started.
      arg0 indicates heap, index or datum sort.
      arg1 is true for unique-value enforcement.
      arg2 is the number of key columns.
      arg3 is the number of kilobytes of work memory allowed.
      arg4 is true if random access to the sort result is required.</entry>
1396 1397
    </row>
    <row>
1398 1399 1400 1401 1402 1403
     <entry>sort-done</entry>
     <entry>(bool, long)</entry>
     <entry>Probe that fires when a sort is complete.
      arg0 is true for external sort, false for internal sort.
      arg1 is the number of disk blocks used for an external sort,
      or kilobytes of memory used for an internal sort.</entry>
1404
    </row>
1405
    <row>
1406
     <entry>lwlock-acquire</entry>
1407
     <entry>(LWLockId, LWLockMode)</entry>
1408 1409 1410
     <entry>Probe that fires when an LWLock has been acquired.
      arg0 is the LWLock's ID.
      arg1 is the requested lock mode, either exclusive or shared.</entry>
1411 1412
    </row>
    <row>
1413
     <entry>lwlock-release</entry>
1414
     <entry>(LWLockId)</entry>
1415 1416 1417
     <entry>Probe that fires when an LWLock has been released (but note
      that any released waiters have not yet been awakened).
      arg0 is the LWLock's ID.</entry>
1418 1419
    </row>
    <row>
1420 1421
     <entry>lwlock-wait-start</entry>
     <entry>(LWLockId, LWLockMode)</entry>
1422 1423 1424 1425 1426
     <entry>Probe that fires when an LWLock was not immediately available and
      a server process has begun to wait for the lock to become available.
      arg0 is the LWLock's ID.
      arg1 is the requested lock mode, either exclusive or shared.</entry>
    </row>
1427
    <row>
1428 1429
     <entry>lwlock-wait-done</entry>
     <entry>(LWLockId, LWLockMode)</entry>
1430 1431 1432 1433
     <entry>Probe that fires when a server process has been released from its
      wait for an LWLock (it does not actually have the lock yet).
      arg0 is the LWLock's ID.
      arg1 is the requested lock mode, either exclusive or shared.</entry>
1434 1435
    </row>
    <row>
1436
     <entry>lwlock-condacquire</entry>
1437
     <entry>(LWLockId, LWLockMode)</entry>
1438 1439 1440 1441 1442
     <entry>Probe that fires when an LWLock was successfully acquired when the
      caller specified no waiting.
      arg0 is the LWLock's ID.
      arg1 is the requested lock mode, either exclusive or shared.</entry>
    </row>
1443
    <row>
1444
     <entry>lwlock-condacquire-fail</entry>
1445
     <entry>(LWLockId, LWLockMode)</entry>
1446 1447 1448 1449
     <entry>Probe that fires when an LWLock was not successfully acquired when
      the caller specified no waiting.
      arg0 is the LWLock's ID.
      arg1 is the requested lock mode, either exclusive or shared.</entry>
1450 1451
    </row>
    <row>
1452
     <entry>lock-wait-start</entry>
1453 1454 1455 1456 1457 1458
     <entry>(unsigned int, unsigned int, unsigned int, unsigned int, unsigned int, LOCKMODE)</entry>
     <entry>Probe that fires when a request for a heavyweight lock (lmgr lock)
      has begun to wait because the lock is not available.
      arg0 through arg3 are the tag fields identifying the object being
      locked.  arg4 indicates the type of object being locked.
      arg5 indicates the lock type being requested.</entry>
1459 1460
    </row>
    <row>
1461
     <entry>lock-wait-done</entry>
1462 1463 1464 1465 1466 1467 1468 1469 1470 1471
     <entry>(unsigned int, unsigned int, unsigned int, unsigned int, unsigned int, LOCKMODE)</entry>
     <entry>Probe that fires when a request for a heavyweight lock (lmgr lock)
      has finished waiting (i.e., has acquired the lock).
      The arguments are the same as for lock-wait-start.</entry>
    </row>
    <row>
     <entry>deadlock-found</entry>
     <entry>()</entry>
     <entry>Probe that fires when a deadlock is found by the deadlock
      detector.</entry>
1472
    </row>
1473

1474 1475 1476
   </tbody>
   </tgroup>
  </table>
1477 1478 1479 1480 1481 1482 1483 1484 1485 1486 1487 1488 1489 1490 1491 1492 1493 1494 1495 1496 1497 1498 1499 1500 1501 1502 1503 1504 1505 1506 1507 1508 1509 1510 1511 1512 1513 1514 1515 1516 1517 1518 1519 1520 1521 1522 1523 1524 1525 1526 1527

 <table id="typedefs-table">
  <title>Defined Types Used in Probe Parameters</title>
  <tgroup cols="2">
   <thead>
    <row>
     <entry>Type</entry>
     <entry>Definition</entry>
    </row>
   </thead>

   <tbody>

    <row>
     <entry>LocalTransactionId</entry>
     <entry>unsigned int</entry>
    </row>
    <row>
     <entry>LWLockId</entry>
     <entry>int</entry>
    </row>
    <row>
     <entry>LWLockMode</entry>
     <entry>int</entry>
    </row>
    <row>
     <entry>LOCKMODE</entry>
     <entry>int</entry>
    </row>
    <row>
     <entry>BlockNumber</entry>
     <entry>unsigned int</entry>
    </row>
    <row>
     <entry>Oid</entry>
     <entry>unsigned int</entry>
    </row>
    <row>
     <entry>ForkNumber</entry>
     <entry>int</entry>
    </row>
    <row>
     <entry>bool</entry>
     <entry>char</entry>
    </row>

   </tbody>
   </tgroup>
  </table>


1528 1529 1530
  </sect2>

  <sect2 id="using-trace-points">
1531
   <title>Using Probes</title>
1532 1533 1534

  <para>
   The example below shows a DTrace script for analyzing transaction
1535
   counts in the system, as an alternative to snapshotting
1536
   <structname>pg_stat_database</> before and after a performance test:
1537
<programlisting>
1538
#!/usr/sbin/dtrace -qs
1539 1540 1541 1542 1543 1544 1545 1546 1547 1548 1549 1550 1551 1552 1553 1554 1555 1556 1557 1558 1559

postgresql$1:::transaction-start
{
      @start["Start"] = count();
      self->ts  = timestamp;
}

postgresql$1:::transaction-abort
{
      @abort["Abort"] = count();
}

postgresql$1:::transaction-commit
/self->ts/
{
      @commit["Commit"] = count();
      @time["Total time (ns)"] = sum(timestamp - self->ts);
      self->ts=0;
}
</programlisting>
   When executed, the example D script gives output such as:
1560
<screen>
1561
# ./txn_count.d `pgrep -n postgres` or ./txn_count.d &lt;PID&gt;
1562 1563 1564 1565 1566
^C

Start                                          71
Commit                                         70
Total time (ns)                        2312105013
1567
</screen>
1568 1569
  </para>
  <para>
1570
   You should remember that DTrace scripts need to be carefully written and
1571 1572 1573
   debugged, otherwise the trace information collected might
   be meaningless. In most cases where problems are found it is the
   instrumentation that is at fault, not the underlying system. When
1574 1575 1576
   discussing information found using dynamic tracing, be sure to enclose
   the script used to allow that too to be checked and discussed.
  </para>
1577
  <para>
1578 1579
   More example scripts can be found in the PgFoundry
   <ulink url="http://pgfoundry.org/projects/dtrace/">dtrace project</ulink>.
1580
  </para>
1581 1582 1583
  </sect2>

  <sect2 id="defining-trace-points">
1584
   <title>Defining New Probes</title>
1585 1586

  <para>
1587 1588 1589
   New probes can be defined within the code wherever the developer
   desires, though this will require a recompilation. Below are the steps
   for inserting new probes:
1590 1591
  </para>

1592 1593 1594 1595 1596 1597 1598 1599 1600
  <procedure>
   <step>
    <para>
     Decide on probe names and data to be made available through the probes
    </para>
   </step>

   <step>
    <para>
1601
     Add the probe definitions to <filename>src/backend/utils/probes.d</>
1602 1603 1604 1605 1606
    </para>
   </step>

   <step>
    <para>
1607 1608 1609
     Include <filename>pg_trace.h</> if it is not already present in the
     module(s) containing the probe points, and insert TRACE_POSTGRESQL
     probe macros at the desired locations in the source code
1610 1611
    </para>
   </step>
1612

1613 1614 1615 1616 1617 1618 1619 1620 1621 1622
   <step>
    <para>
     Recompile and verify that the new probes are available
    </para>
   </step>
  </procedure>

  <formalpara>
   <title>Example:</title>
   <para>
1623 1624
    Here is an example of how you would add a probe to trace all new
    transactions by transaction ID.
1625 1626 1627 1628 1629
   </para>
  </formalpara>

  <procedure>
   <step>
1630 1631 1632
    <para>
     Decide that the probe will be named transaction-start and requires
     a parameter of type LocalTransactionId
1633 1634 1635 1636 1637
    </para>
   </step>

   <step>
    <para>
1638
     Add the probe definition to <filename>src/backend/utils/probes.d</>:
1639
<programlisting>
1640
      ...
1641
      probe transaction__start(LocalTransactionId);
1642
      ...
1643
</programlisting>
1644 1645 1646
     Note the use of the double underline in the probe name. In a DTrace
     script using the probe, the double underline needs to be replaced with a
     hyphen.
1647 1648 1649
    </para>

    <para>
1650 1651
     You should take care that the data types specified for the probe
     parameters match the data types of the variables used in the macro.
1652
     Otherwise, you will get compilation errors.
1653 1654 1655 1656 1657 1658
    </para>
   </step>

   <step>
    <para>
     At compile time, transaction__start is converted to a macro called
1659 1660 1661 1662
     TRACE_POSTGRESQL_TRANSACTION_START (note the underscores are single
     here), which is available by including <filename>pg_trace.h</>.
     Add the macro call to the appropriate location in the source code.
     In this case, it looks like the following:
1663

1664
<programlisting>
1665
    TRACE_POSTGRESQL_TRANSACTION_START(vxid.localTransactionId);
1666
</programlisting>
1667 1668
    </para>
   </step>
1669

1670 1671 1672
   <step>
    <para>
     After recompiling and running the new binary, check that your newly added
1673 1674
     probe is available by executing the following DTrace command.  You
     should see similar output:
1675 1676 1677 1678 1679 1680 1681 1682 1683 1684 1685 1686
<screen>
# dtrace -ln transaction-start
   ID    PROVIDER          MODULE           FUNCTION NAME
18705 postgresql49878     postgres     StartTransactionCommand transaction-start
18755 postgresql49877     postgres     StartTransactionCommand transaction-start
18805 postgresql49876     postgres     StartTransactionCommand transaction-start
18855 postgresql49875     postgres     StartTransactionCommand transaction-start
18986 postgresql49873     postgres     StartTransactionCommand transaction-start
</screen>
    </para>
   </step>
  </procedure>
1687 1688 1689 1690 1691

  </sect2>

 </sect1>

1692
</chapter>