PL/pgSQL A procedural language for the PostgreSQL RDBMS Jan Wieck <jwieck@debis.com> Preface PL/pgSQL is a procedural language based on SQL designed for the PostgreSQL database system. The extensibility features of PostgreSQL are mostly based on the ability to define functions for various operations. Functions could have been written in PostgreSQL's SQL dialect or in the C programming language. Functions written in C are compiled into a shared object and loaded by the database backend process on demand. Also the trigger features of PostgreSQL are based on functions but required the use of the C language. Since version 6.3 PostgreSQL supports the definition of procedural languages. In the case of a function or trigger procedure defined in a procedural language, the database has no builtin knowlege how to interpret the functions source text. Instead, the function and trigger calls are passed into a handler that knows the details of the language. The handler itself is a function compiled into a shared object and loaded on demand. Overview The PL/pgSQL language is case insensitive. All keywords and identifiers can be used in upper-/lowercase mixed. PL/pgSQL is a block oriented language. A block is defined as [<<label>>] [DECLARE -- declarations] BEGIN -- statements END; There can be any number of subblocks in the statements section of a block. Subblocks can be used to hide variables from outside a block of statements (see Scope and visability below). The variables declared in the declarations section preceding a block are initialized to their default values every time the block is entered, not only once per function call. It is important not to misunderstand the meaning of BEGIN/END for grouping statements in PL/pgSQL and the database commands for transaction control. Functions or trigger procedures cannot start or commit transactions and PostgreSQL transactions cannot have subtransactions. Comments There are two types of comments in PL/pgSQL. A double dash '--' starts a comment that extends to the end of the line. A '/*' starts a block comment that extends to the next '*/'. Block comments cannot be nested, but double dash comments can be enclosed into a block comment and double dashes can hide '/*' and '*/'. Declarations All variables, rows and records used in a block or it's subblocks must be declared in the declarations section of the block except for the loop variable of a FOR loop iterating over a range of integer values. The parameters given to the function are automatically declared with the usual identifiers $n. The declarations have the following syntax: <name> [CONSTANT] <type> [NOT NULL] [DEFAULT | := <value>]; Declares a variable of the specified type. If the variable is declared as CONSTANT, the value cannot be changed. If NOT NULL is specified, an assignment of a NULL value results in a runtime error. Since the default value of a variable is the SQL NULL value, all variables declared as NOT NULL must also have a default value. The default value is evaluated at the actual function call. So assigning 'now' to an abstime variable causes the variable to have the time of the actual function call, not when the function was compiled (during it's first call since the lifetime of the database connection). <name> <class>%ROWTYPE; Declares a row with the structure of the given class. Class must be an existing table- or viewname of the database. The fields of the row are accessed in the dot notation. Parameters to a procedure could be tuple types. In that case the corresponding identifier $n will be a rowtype. Only the user attributes of a tuple are accessible in the row. There must be no whitespaces between the classname, the percent and the ROWTYPE keyword. The fields of the rowtype inherit the tables fieldsizes for char() etc. data types (atttypmod from pg_attribute). <name> RECORD; Records are similar to rowtypes, but they have no predefined structure They are used in selections and FOR loops to hold one actual database tuple from a select operation. One and the same record can be used in different selections. Accessing a record or an attempt to assign a value to a record field when there's no actual tuple in it results in a runtime error. The new and old tuples in triggers are given to the trigger procedure as records. This is necessary, because under PostgreSQL one and the same trigger procedure can handle trigger events for different tables. <name> ALIAS FOR $n; For better readability of the code it's possible to define an alias for a positional parameter to the function. RENAME <oldname> TO <newname>; Change the name of a variable, record or rowtype. This is useful if new or old should be referenced by another name inside a trigger procedure. Datatypes The type of a variable can be any of the existing data types of the database. <type> above is defined as: postgesql-basetype or variable%TYPE or rowtype.field%TYPE or class.field%TYPE As for the rowtype declaration, there must be no whitespaces between the classname, the percent and the TYPE keyword. Expressions All expressions used in PL/pgSQL statements are processed using the backends executor. Since even a constant looking expression can have a totally different meaning for a particular data type (as 'now' for abstime), it is impossible for the PL/pgSQL parser to identify real constant values other than the NULL keyword. The expressions are evaluated by internally executing a query SELECT <expr> over the SPI manager. In the expression, occurences of variable identifiers are substituted by parameters and the actual values from the variables are passed to the executor as query parameters. All the expressions used in a PL/pgSQL function are only prepared and saved once. If record fields are used in expressions or database statements, the data types of the fields should not change between calls of one and the same expression. Keep this in mind when writing trigger procedures that handle events for more than one table. Statements Anything not understood by the parser as specified below will be put into a query and sent down to the database engine to execute. The resulting query should not return any data (insert, update, delete queries and all utility statements). Assignment An assignment of a value to a variable or rowtype field is written as: <identifier> := <expr>; If the expressions result data type doesn't match the variables data type, or the variables atttypmod value is known (as for char(20)), the result value will be implicitly casted by the PL/pgSQL executor using the result types output- and the variables type input- functions. Note that this could potentially result in runtime errors generated by the types input functions. An assignment of a complete selection into a record or rowtype can be done as: SELECT expressions INTO <target> FROM fromlist; Target can be a record or rowtype variable, or a comma separated list of variables and record/row fields. If a rowtype or a variable list is used as target, the selected values must exactly match the structure of the target(s) or a runtime error occurs. The fromlist can be followed by any valid qualification, grouping, sorting etc. There is a special condition [NOT] FOUND that can be used immediately after a SELECT INTO to check if the data has been found. SELECT * INTO myrec FROM EMP WHERE empname = myname; IF NOT FOUND THEN RAISE EXCEPTION 'employee % not found', myname; END IF; If the selection returns multiple rows, only the first is moved into the target fields. All others are discarded. Calling another function If a function should be called, this is normally done by a SELECT query. But there are cases where someone isn't interested in the functions result. PERFORM querystring; executes a 'SELECT querystring' over the SPI manager and discards the result. Returning from the function RETURN <expr>; The function terminates and the value of <expr> will be returned to the upper executor. The return value of a function cannot be undefined. If control reaches the end of the toplevel block of the function without hitting a RETURN statement, a runtime error will occur. Aborting and messages As indicated above there is an RAISE statement that can throw messages into the PostgreSQL elog mechanism. RAISE level 'format' [, identifier [...]]; Inside the format, % can be used as a placeholder for the following, comma separated identifiers. The identifiers must specify an existing variable or row/record field. Conditionals IF <expr> THEN -- statements [ELSE -- statements] END IF; The expression <expr> must return a value that at least can be casted into a boolean. Loops There are multiple types of loops. [<<label>>] LOOP -- statements END LOOP; An unconditional loop that must be terminated explicitly by an EXIT statement. The optional label can be used by EXIT statements of nested loops to specify which level of nesting should be terminated. [<<label>>] WHILE <expr> LOOP -- statements END LOOP; A conditional loop that is executed as long as the evaluation of <expr> returns true. [<<label>>] FOR <name> IN [REVERSE] <expr>..<expr> LOOP -- statements END LOOP. A loop that iterates over a range of integer values. The variable <name> is automatically created as type integer and exists only inside the loop. The two expressions giving the lower and upper bound of the range are evaluated only when entering the loop. The iteration step is 1. FOR <recname|rowname> IN <select_clause> LOOP -- statements END LOOP; The record or row is assigned all the rows resulting from the select clause and the statements executed for each. If the loop is terminated with an EXIT statement, the last accessed row is still accessible in the record or rowtype. EXIT [label] [WHEN <expr>]; If no label given, the innermost loop is terminated and the statement following END LOOP is executed next. If label is given, it must be the label of the current or an upper level of nested loops or blocks. Then the named loop or block is terminated and control continues with the statement after the loops/blocks corresponding END. Trigger procedures PL/pgSQL can also be used to define trigger procedures. They are created using CREATE FUNCTION as a function with no arguments and a return type of opaque. There are some PostgreSQL specific details in functions used as trigger procedures. First they have some special variables created above the toplevel statement block. These are: new (record) The new database tuple on INSERT/UPDATE operations at ROW level. old (record) The old database tuple on UPDATE/DELETE operations at ROW level. tg_name (type name) The triggers name from pg_trigger. tg_when (type text) A string of either 'BEFORE' or 'AFTER' depending on the triggers definition. tg_level (type text) A string of either 'ROW' or 'STATEMENT' depending on the triggers definition. tg_op (type text) A string of 'INSERT', 'UPDATE' or 'DELETE' telling for which operation the trigger is actually fired. tg_relid (type oid) The Oid of the relation for which the trigger is actually fired. tg_relname (type name) The relations name for which the trigger is actually fired. tg_nargs (type integer) The number of arguments given to the trigger procedure in the CREATE TRIGGER statement. tg_argv[] (types text) The arguments from the CREATE TRIGGER statement. The index counts from 0 and can be given as expression. Invalid indices (< 0 or >= tg_nargs) result in a NULL value. Second, they must return either NULL, or a record/rowtype containing exactly the structure of the table the trigger was fired for. Triggers fired AFTER might allways return NULL with no effect. Triggers fired BEFORE signal the trigger manager to skip the operation for this actual row when returning NULL. Otherwise, the returned record/rowtype replaces the inserted/updated tuple in the operation. It is possible to replace single values directly in new and return that, or to build a complete new record/rowtype to return. Exceptions PostgreSQL doesn't have a very smart exception handling model. Whenever the parser, planner/optimizer or executor decide that a statement cannot be processed any longer, the whole transaction gets aborted and the the system jumps back into the mainloop using longjmp() to get the next query from the client application. It is possible to hook into the longjmp() mechanism to notice that this happens. But currently it's impossible to tell what really caused the abort (input/output conversion error, floating point error, parse error) And it's possible that the backend is in an inconsistent state at this point so returning to the upper executor or issuing more commands might corrupt the whole database. Thus, the only thing PL/pgSQL currently does when it encounters an abort during execution of a function or trigger procedure is to write some additional DEBUG log messages telling in which function and where (line number and type of statement) this happened. This might change in the future.