Commit 95985127 authored by Tom Lane's avatar Tom Lane

Update README with proposed new method for determining calling convention

of user-defined functions (forget 'C' vs 'newC', instead require an info
function to be present for new-style functions).  Also update some other
out-of-date commentary.
parent f6bc9867
Proposal for function-manager redesign 24-May-2000 Proposal for function-manager redesign 19-Nov-2000
-------------------------------------- --------------------------------------
We know that the existing mechanism for calling Postgres functions needs We know that the existing mechanism for calling Postgres functions needs
...@@ -24,10 +24,6 @@ can be done on an incremental file-by-file basis --- we won't need a ...@@ -24,10 +24,6 @@ can be done on an incremental file-by-file basis --- we won't need a
written in the old style can be left in place indefinitely, to provide written in the old style can be left in place indefinitely, to provide
backward compatibility for user-written C functions. backward compatibility for user-written C functions.
Note that neither the old function manager nor the redesign are intended
to handle functions that accept or return sets. Those sorts of functions
need to be handled by special querytree structures.
Changes in pg_proc (system data about a function) Changes in pg_proc (system data about a function)
------------------------------------------------- -------------------------------------------------
...@@ -37,7 +33,8 @@ This is a boolean value which will be TRUE if the function is "strict", ...@@ -37,7 +33,8 @@ This is a boolean value which will be TRUE if the function is "strict",
that is it always returns NULL when any of its inputs are NULL. The that is it always returns NULL when any of its inputs are NULL. The
function manager will check this field and skip calling the function when function manager will check this field and skip calling the function when
it's TRUE and there are NULL inputs. This allows us to remove explicit it's TRUE and there are NULL inputs. This allows us to remove explicit
NULL-value tests from many functions that currently need them. A function NULL-value tests from many functions that currently need them (not to
mention fixing many more that need them but don't have them). A function
that is not marked "strict" is responsible for checking whether its inputs that is not marked "strict" is responsible for checking whether its inputs
are NULL or not. Most builtin functions will be marked "strict". are NULL or not. Most builtin functions will be marked "strict".
...@@ -67,7 +64,9 @@ typedef struct ...@@ -67,7 +64,9 @@ typedef struct
Oid fn_oid; /* OID of function (NOT of handler, if any) */ Oid fn_oid; /* OID of function (NOT of handler, if any) */
short fn_nargs; /* 0..FUNC_MAX_ARGS, or -1 if variable arg count */ short fn_nargs; /* 0..FUNC_MAX_ARGS, or -1 if variable arg count */
bool fn_strict; /* function is "strict" (NULL in => NULL out) */ bool fn_strict; /* function is "strict" (NULL in => NULL out) */
bool fn_retset; /* function returns a set (over multiple calls) */
void *fn_extra; /* extra space for use by handler */ void *fn_extra; /* extra space for use by handler */
MemoryContext fn_mcxt; /* memory context to store fn_extra in */
} FmgrInfo; } FmgrInfo;
For an ordinary built-in function, fn_addr is just the address of the C For an ordinary built-in function, fn_addr is just the address of the C
...@@ -79,8 +78,9 @@ to denote a not-yet-initialized FmgrInfo struct. fn_extra will always ...@@ -79,8 +78,9 @@ to denote a not-yet-initialized FmgrInfo struct. fn_extra will always
be NULL when an FmgrInfo is first filled by the function lookup code, but be NULL when an FmgrInfo is first filled by the function lookup code, but
a function handler could set it to avoid making repeated lookups of its a function handler could set it to avoid making repeated lookups of its
own when the same FmgrInfo is used repeatedly during a query.) fn_nargs own when the same FmgrInfo is used repeatedly during a query.) fn_nargs
is the number of arguments expected by the function, and fn_strict is is the number of arguments expected by the function, fn_strict is its
its strictness flag. strictness flag, and fn_retset shows whether it returns a set; all of
these values come from the function's pg_proc entry.
FmgrInfo already exists in the current code, but has fewer fields. This FmgrInfo already exists in the current code, but has fewer fields. This
change should be transparent at the source-code level. change should be transparent at the source-code level.
...@@ -109,15 +109,17 @@ context is NULL for an "ordinary" function call, but may point to additional ...@@ -109,15 +109,17 @@ context is NULL for an "ordinary" function call, but may point to additional
info when the function is called in certain contexts. (For example, the info when the function is called in certain contexts. (For example, the
trigger manager will pass information about the current trigger event here.) trigger manager will pass information about the current trigger event here.)
If context is used, it should point to some subtype of Node; the particular If context is used, it should point to some subtype of Node; the particular
kind of context can then be indicated by the node type field. (A callee kind of context is indicated by the node type field. (A callee should
should always check the node type before assuming it knows what kind of always check the node type before assuming it knows what kind of context is
context is being passed.) fmgr itself puts no other restrictions on the use being passed.) fmgr itself puts no other restrictions on the use of this
of this field. field.
resultinfo is NULL when calling any function from which a simple Datum resultinfo is NULL when calling any function from which a simple Datum
result is expected. It may point to some subtype of Node if the function result is expected. It may point to some subtype of Node if the function
returns more than a Datum. Like the context field, resultinfo is a hook returns more than a Datum. (For example, resultinfo is used when calling a
for expansion; fmgr itself doesn't constrain the use of the field. function that returns a set, as discussed below.) Like the context field,
resultinfo is a hook for expansion; fmgr itself doesn't constrain the use
of the field.
nargs, arg[], and argnull[] hold the arguments being passed to the function. nargs, arg[], and argnull[] hold the arguments being passed to the function.
Notice that all the arguments passed to a function (as well as its result Notice that all the arguments passed to a function (as well as its result
...@@ -257,27 +259,15 @@ types. Modules or header files that define specialized SQL datatypes ...@@ -257,27 +259,15 @@ types. Modules or header files that define specialized SQL datatypes
(eg, timestamp) should define appropriate macros for those types, so that (eg, timestamp) should define appropriate macros for those types, so that
functions manipulating the types can be coded in the standard style. functions manipulating the types can be coded in the standard style.
For non-primitive data types (particularly variable-length types) it For non-primitive data types (particularly variable-length types) it won't
probably won't be very practical to hide the pass-by-reference nature of be very practical to hide the pass-by-reference nature of the data type,
the data type, so the PG_GETARG and PG_RETURN macros for those types so the PG_GETARG and PG_RETURN macros for those types won't do much more
probably won't do more than DatumGetPointer/PointerGetDatum plus the than DatumGetPointer/PointerGetDatum plus the appropriate typecast (but see
appropriate typecast. Functions returning such types will need to TOAST discussion, below). Functions returning such types will need to
palloc() their result space explicitly. I recommend naming the GETARG palloc() their result space explicitly. I recommend naming the GETARG and
and RETURN macros for such types to end in "_P", as a reminder that they RETURN macros for such types to end in "_P", as a reminder that they
produce or take a pointer. For example, PG_GETARG_TEXT_P yields "text *". produce or take a pointer. For example, PG_GETARG_TEXT_P yields "text *".
For TOAST-able data types, the PG_GETARG macro will deliver a de-TOASTed
data value. There might be a few cases where the still-toasted value is
wanted, but I am having a hard time coming up with examples. For the
moment I'd say that any such code could use a lower-level macro that is
just ((struct varlena *) DatumGetPointer(fcinfo->arg[n])).
Note: the above examples assume that arguments will be counted starting at
zero. We could have the ARG macros subtract one from the argument number,
so that arguments are counted starting at one. I'm not sure if that would be
more or less confusing. Does anyone have a strong feeling either way about
it?
When a function needs to access fcinfo->flinfo or one of the other auxiliary When a function needs to access fcinfo->flinfo or one of the other auxiliary
fields of FunctionCallInfo, it should just do it. I doubt that providing fields of FunctionCallInfo, it should just do it. I doubt that providing
syntactic-sugar macros for these cases is useful. syntactic-sugar macros for these cases is useful.
...@@ -319,10 +309,6 @@ that this style of coding cannot pass a NULL input value nor cope with ...@@ -319,10 +309,6 @@ that this style of coding cannot pass a NULL input value nor cope with
a NULL result (it couldn't before, either!). We can make the helper a NULL result (it couldn't before, either!). We can make the helper
routines elog an error if they see that the function returns a NULL. routines elog an error if they see that the function returns a NULL.
(Note: direct calls like this will have to be changed at the same time
that their called routines are changed to the new style. But that will
still be a lot less of a constraint than a "big bang" conversion.)
When invoking a function that has a known argument signature, we have When invoking a function that has a known argument signature, we have
usually written either usually written either
result = fmgr(targetfuncOid, ... args ... ); result = fmgr(targetfuncOid, ... args ... );
...@@ -349,6 +335,68 @@ have to change in the first step of implementation, but they can ...@@ -349,6 +335,68 @@ have to change in the first step of implementation, but they can
continue to support the same external appearance. continue to support the same external appearance.
Support for TOAST-able data types
---------------------------------
For TOAST-able data types, the PG_GETARG macro will deliver a de-TOASTed
data value. There might be a few cases where the still-toasted value is
wanted, but the vast majority of cases want the de-toasted result, so
that will be the default. To get the argument value without causing
de-toasting, use PG_GETARG_RAW_VARLENA_P(n).
Some functions require a modifiable copy of their input values. In these
cases, it's silly to do an extra copy step if we copied the data anyway
to de-TOAST it. Therefore, each toastable datatype has an additional
fetch macro, for example PG_GETARG_TEXT_P_COPY(n), which delivers a
guaranteed-fresh copy, combining this with the detoasting step if possible.
There is also a PG_FREE_IF_COPY(ptr,n) macro, which pfree's the given
pointer if and only if it is different from the original value of the n'th
argument. This can be used to free the de-toasted value of the n'th
argument, if it was actually de-toasted. Currently, doing this is not
necessary for the majority of functions because the core backend code
releases temporary space periodically, so that memory leaked in function
execution isn't a big problem. However, as of 7.1 memory leaks in
functions that are called by index searches will not be cleaned up until
end of transaction. Therefore, functions that are listed in pg_amop or
pg_amproc should be careful not to leak detoasted copies, and so these
functions do need to use PG_FREE_IF_COPY() for toastable inputs.
A function should never try to re-TOAST its result value; it should just
deliver an untoasted result that's been palloc'd in the current memory
context. When and if the value is actually stored into a tuple, the
tuple toaster will decide whether toasting is needed.
Functions accepting or returning sets
-------------------------------------
As of 7.1, Postgres has limited support for functions returning sets;
this is presently handled only in SELECT output expressions, and the
behavior is to generate a separate output tuple for each set element.
There is no direct support for functions accepting sets; instead, the
function will be called multiple times, once for each element of the
input set. This behavior will very likely be changed in future releases,
but here is how it works now:
If a function is marked in pg_proc as returning a set, then it is called
with fcinfo->resultinfo pointing to a node of type ReturnSetInfo. A
function that desires to return a set should raise an error "called in
context that does not accept a set result" if resultinfo is NULL or does
not point to a ReturnSetInfo node. ReturnSetInfo contains a single field
"isDone", which should be set to one of these values:
ExprSingleResult /* expression does not return a set */
ExprMultipleResult /* this result is an element of a set */
ExprEndResult /* there are no more elements in the set */
A function returning set returns one set element per call, setting
fcinfo->resultinfo->isDone to ExprMultipleResult for each element.
After all elements have been returned, the next call should set
isDone to ExprEndResult and return a null result. (Note it is possible
to return an empty set by doing this on the first call.)
Notes about function handlers Notes about function handlers
----------------------------- -----------------------------
...@@ -361,49 +409,91 @@ function is invoked many times. (fn_extra can only be used as a hint, ...@@ -361,49 +409,91 @@ function is invoked many times. (fn_extra can only be used as a hint,
since callers are not required to re-use an FmgrInfo struct. since callers are not required to re-use an FmgrInfo struct.
But in performance-critical paths they normally will do so.) But in performance-critical paths they normally will do so.)
Issue: in what context should a handler allocate memory that it intends If the handler wants to allocate memory to hold fn_extra data, it should
to use for fn_extra data? The current palloc context when the handler NOT do so in CurrentMemoryContext, since the current context may well be
is actually called might be considerably shorter-lived than the FmgrInfo much shorter-lived than the context where the FmgrInfo is. Instead,
struct, which would lead to dangling-pointer problems at the next use allocate the memory in context flinfo->fn_mcxt, or in a long-lived cache
of the FmgrInfo. Perhaps FmgrInfo should also store a memory context context. fn_mcxt normally points at the context that was
identifier that the handler could use to allocate space of the right CurrentMemoryContext at the time the FmgrInfo structure was created;
lifespan. (Having fmgr_info initialize this to CurrentMemoryContext in any case it is required to be a context at least as long-lived as the
should work in nearly all cases, though a few places might have to FmgrInfo itself.
set it differently.) At the moment I have not done this, since the
existing PL handlers only need to set fn_extra to point at long-lived
structures (data in their own caches) and don't really care which Telling the difference between old- and new-style functions
context the FmgrInfo is in anyway. -----------------------------------------------------------
Are there any other things needed by the call handlers for PL/pgsql and During the conversion process, we carried two different pg_language
other languages? entries, "internal" and "newinternal", for internal functions. The
function manager used the language code to distinguish which calling
During the conversion process, support for old-style builtin functions convention to use. (Old-style internal functions were supported via
and old-style user-written C functions will be provided by appropriate a function handler.) As of Nov. 2000, no old-style internal functions
function handlers. For example, the handler for old-style builtins remain, so we can drop support for them. We will remove the old "internal"
looks roughly like fmgr_c() used to. pg_language entry and rename "newinternal" to "internal".
The interim solution for dynamically-loaded compiled functions has been
System table updates similar: two pg_language entries "C" and "newC". This naming convention
-------------------- is not desirable for the long run, and yet we cannot stop supporting
old-style user functions. Instead, it seems better to use just one
In the initial phase, two new entries will be added to pg_language pg_language entry "C", and require the dynamically-loaded library to
for language types "newinternal" and "newC", corresponding to provide additional information that identifies new-style functions.
builtin and dynamically-loaded functions having the new calling This avoids compatibility problems --- for example, existing dump
convention. scripts will identify PL language handlers as being in language "C",
which would be wrong under the "newC" convention. Also, this approach
There will also be a change to pg_proc to add the new "proisstrict" should generalize more conveniently for future extensions to the function
column. interface specification.
Then pg_proc entries will be changed from language code "internal" to Given a dynamically loaded function named "foo" (note that the name being
"newinternal" piecemeal, as the associated routines are rewritten. considered here is the link-symbol name, not the SQL-level function name),
(This will imply several rounds of forced initdbs as the contents of the function manager will look for another function in the same dynamically
pg_proc change, but I think we can live with that.) loaded library named "pg_finfo_foo". If this second function does not
exist, then foo is assumed to be called old-style, thus ensuring backwards
The old language names "internal" and "C" will continue to refer to compatibility with existing libraries. If the info function does exist,
functions with the old calling convention. We should deprecate it is expected to have the signature
old-style functions because of their portability problems, but the
support for them will only be one small function handler routine, Pg_finfo_record * pg_finfo_foo (void);
so we can leave them in place for as long as necessary.
The info function will be called by the fmgr, and must return a pointer
The expected calling convention for PL call handlers will need to change to a Pg_finfo_record struct. (The returned struct will typically be a
all-at-once, but fortunately there are not very many of them to fix. statically allocated constant in the dynamic-link library.) The current
definition of the struct is just
typedef struct {
int api_version;
} Pg_finfo_record;
where api_version is 0 to indicate old-style or 1 to indicate new-style
calling convention. In future releases, additional fields may be defined
after api_version, but these additional fields will only be used if
api_version is greater than 2.
These details will be hidden from the author of a dynamically loaded
function by using a macro. To define a new-style dynamically loaded
function named foo, write
PG_FUNCTION_INFO_V1(foo);
Datum
foo(PG_FUNCTION_ARGS)
{
...
}
The function itself is written using the same conventions as for new-style
internal functions; you just need to add the PG_FUNCTION_INFO_V1() macro.
Note that old-style and new-style functions can be intermixed in the same
library, depending on whether or not you write a PG_FUNCTION_INFO_V1() for
each one.
The SQL declaration for a dynamically-loaded function is CREATE FUNCTION
foo ... LANGUAGE 'C' regardless of whether it is old- or new-style.
New-style dynamic functions will be invoked directly by fmgr, and will
therefore have the same performance as internal functions after the initial
pg_proc lookup overhead. Old-style dynamic functions will be invoked via
a handler, and will therefore have a small performance penalty.
To allow old-style dynamic functions to work safely on toastable datatypes,
the handler for old-style functions will automatically detoast toastable
arguments before passing them to the old-style function. A new-style
function is expected to take care of toasted arguments by using the
standard argument access macros defined above.
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment