Update README with proposed new method for determining calling convention

of user-defined functions (forget 'C' vs 'newC', instead require an info function to be present for new-style functions). Also update some other out-of-date commentary.

Update README with proposed new method for determining calling convention
of user-defined functions (forget 'C' vs 'newC', instead require an info function to be present for new-style functions). Also update some other out-of-date commentary.
95985127 · Tom Lane · f6bc9867 · 95985127
Commit 95985127 authored Nov 19, 2000 by Tom Lane
Show whitespace changes
Inline Side-by-side

Showing with 173 additions and 83 deletions

src/backend/utils/fmgr/README src/backend/utils/fmgr/README +173 -83

No files found.
--- a/src/backend/utils/fmgr/README
+++ b/src/backend/utils/fmgr/README
-Proposal for function-manager redesign			24-May-2000
+Proposal for function-manager redesign			19-Nov-2000
 --------------------------------------

 We know that the existing mechanism for calling Postgres functions needs
@@ -24,10 +24,6 @@ can be done on an incremental file-by-file basis --- we won't need a
 written in the old style can be left in place indefinitely, to provide
 backward compatibility for user-written C functions.

-Note that neither the old function manager nor the redesign are intended
-to handle functions that accept or return sets.  Those sorts of functions
-need to be handled by special querytree structures.
-

 Changes in pg_proc (system data about a function)
 -------------------------------------------------
@@ -37,7 +33,8 @@ This is a boolean value which will be TRUE if the function is "strict",
 that is it always returns NULL when any of its inputs are NULL.  The
 function manager will check this field and skip calling the function when
 it's TRUE and there are NULL inputs.  This allows us to remove explicit
-NULL-value tests from many functions that currently need them.  A function
+NULL-value tests from many functions that currently need them (not to
+mention fixing many more that need them but don't have them).  A function
 that is not marked "strict" is responsible for checking whether its inputs
 are NULL or not.  Most builtin functions will be marked "strict".

@@ -67,7 +64,9 @@ typedef struct
    Oid         fn_oid;     /* OID of function (NOT of handler, if any) */
    short       fn_nargs;   /* 0..FUNC_MAX_ARGS, or -1 if variable arg count */
    bool        fn_strict;  /* function is "strict" (NULL in => NULL out) */
+    bool        fn_retset;  /* function returns a set (over multiple calls) */
    void       *fn_extra;   /* extra space for use by handler */
+    MemoryContext fn_mcxt;  /* memory context to store fn_extra in */
 } FmgrInfo;

 For an ordinary built-in function, fn_addr is just the address of the C
@@ -79,8 +78,9 @@ to denote a not-yet-initialized FmgrInfo struct.  fn_extra will always
 be NULL when an FmgrInfo is first filled by the function lookup code, but
 a function handler could set it to avoid making repeated lookups of its
 own when the same FmgrInfo is used repeatedly during a query.)  fn_nargs
-is the number of arguments expected by the function, and fn_strict is
-its strictness flag.
+is the number of arguments expected by the function, fn_strict is its
+strictness flag, and fn_retset shows whether it returns a set; all of
+these values come from the function's pg_proc entry.

 FmgrInfo already exists in the current code, but has fewer fields.  This
 change should be transparent at the source-code level.
@@ -109,15 +109,17 @@ context is NULL for an "ordinary" function call, but may point to additional
 info when the function is called in certain contexts.  (For example, the
 trigger manager will pass information about the current trigger event here.)
 If context is used, it should point to some subtype of Node; the particular
-kind of context can then be indicated by the node type field.  (A callee
-should always check the node type before assuming it knows what kind of
-context is being passed.)  fmgr itself puts no other restrictions on the use
-of this field.
+kind of context is indicated by the node type field.  (A callee should
+always check the node type before assuming it knows what kind of context is
+being passed.)  fmgr itself puts no other restrictions on the use of this
+field.

 resultinfo is NULL when calling any function from which a simple Datum
 result is expected.  It may point to some subtype of Node if the function
-returns more than a Datum.  Like the context field, resultinfo is a hook
-for expansion; fmgr itself doesn't constrain the use of the field.
+returns more than a Datum.  (For example, resultinfo is used when calling a
+function that returns a set, as discussed below.)  Like the context field,
+resultinfo is a hook for expansion; fmgr itself doesn't constrain the use
+of the field.

 nargs, arg[], and argnull[] hold the arguments being passed to the function.
 Notice that all the arguments passed to a function (as well as its result
@@ -257,27 +259,15 @@ types.  Modules or header files that define specialized SQL datatypes
 (eg, timestamp) should define appropriate macros for those types, so that
 functions manipulating the types can be coded in the standard style.

-For non-primitive data types (particularly variable-length types) it
-probably won't be very practical to hide the pass-by-reference nature of
-the data type, so the PG_GETARG and PG_RETURN macros for those types
-probably won't do more than DatumGetPointer/PointerGetDatum plus the
-appropriate typecast.  Functions returning such types will need to
-palloc() their result space explicitly.  I recommend naming the GETARG
-and RETURN macros for such types to end in "_P", as a reminder that they
+For non-primitive data types (particularly variable-length types) it won't
+be very practical to hide the pass-by-reference nature of the data type,
+so the PG_GETARG and PG_RETURN macros for those types won't do much more
+than DatumGetPointer/PointerGetDatum plus the appropriate typecast (but see
+TOAST discussion, below).  Functions returning such types will need to
+palloc() their result space explicitly.  I recommend naming the GETARG and
+RETURN macros for such types to end in "_P", as a reminder that they
 produce or take a pointer.  For example, PG_GETARG_TEXT_P yields "text *".

-For TOAST-able data types, the PG_GETARG macro will deliver a de-TOASTed
-data value.  There might be a few cases where the still-toasted value is
-wanted, but I am having a hard time coming up with examples.  For the
-moment I'd say that any such code could use a lower-level macro that is
-just ((struct varlena *) DatumGetPointer(fcinfo->arg[n])).
-
-Note: the above examples assume that arguments will be counted starting at
-zero.  We could have the ARG macros subtract one from the argument number,
-so that arguments are counted starting at one.  I'm not sure if that would be
-more or less confusing.  Does anyone have a strong feeling either way about
-it?
-
 When a function needs to access fcinfo->flinfo or one of the other auxiliary
 fields of FunctionCallInfo, it should just do it.  I doubt that providing
 syntactic-sugar macros for these cases is useful.
@@ -319,10 +309,6 @@ that this style of coding cannot pass a NULL input value nor cope with
 a NULL result (it couldn't before, either!).  We can make the helper
 routines elog an error if they see that the function returns a NULL.

-(Note: direct calls like this will have to be changed at the same time
-that their called routines are changed to the new style.  But that will
-still be a lot less of a constraint than a "big bang" conversion.)
-
 When invoking a function that has a known argument signature, we have
 usually written either
 	result = fmgr(targetfuncOid, ... args ... );
@@ -349,6 +335,68 @@ have to change in the first step of implementation, but they can
 continue to support the same external appearance.


+Support for TOAST-able data types
+---------------------------------
+
+For TOAST-able data types, the PG_GETARG macro will deliver a de-TOASTed
+data value.  There might be a few cases where the still-toasted value is
+wanted, but the vast majority of cases want the de-toasted result, so
+that will be the default.  To get the argument value without causing
+de-toasting, use PG_GETARG_RAW_VARLENA_P(n).
+
+Some functions require a modifiable copy of their input values.  In these
+cases, it's silly to do an extra copy step if we copied the data anyway
+to de-TOAST it.  Therefore, each toastable datatype has an additional
+fetch macro, for example PG_GETARG_TEXT_P_COPY(n), which delivers a
+guaranteed-fresh copy, combining this with the detoasting step if possible.
+
+There is also a PG_FREE_IF_COPY(ptr,n) macro, which pfree's the given
+pointer if and only if it is different from the original value of the n'th
+argument.  This can be used to free the de-toasted value of the n'th
+argument, if it was actually de-toasted.  Currently, doing this is not
+necessary for the majority of functions because the core backend code
+releases temporary space periodically, so that memory leaked in function
+execution isn't a big problem.  However, as of 7.1 memory leaks in
+functions that are called by index searches will not be cleaned up until
+end of transaction.  Therefore, functions that are listed in pg_amop or
+pg_amproc should be careful not to leak detoasted copies, and so these
+functions do need to use PG_FREE_IF_COPY() for toastable inputs.
+
+A function should never try to re-TOAST its result value; it should just
+deliver an untoasted result that's been palloc'd in the current memory
+context.  When and if the value is actually stored into a tuple, the
+tuple toaster will decide whether toasting is needed.
+
+
+Functions accepting or returning sets
+-------------------------------------
+
+As of 7.1, Postgres has limited support for functions returning sets;
+this is presently handled only in SELECT output expressions, and the
+behavior is to generate a separate output tuple for each set element.
+There is no direct support for functions accepting sets; instead, the
+function will be called multiple times, once for each element of the
+input set.  This behavior will very likely be changed in future releases,
+but here is how it works now:
+
+If a function is marked in pg_proc as returning a set, then it is called
+with fcinfo->resultinfo pointing to a node of type ReturnSetInfo.  A
+function that desires to return a set should raise an error "called in
+context that does not accept a set result" if resultinfo is NULL or does
+not point to a ReturnSetInfo node.  ReturnSetInfo contains a single field
+"isDone", which should be set to one of these values:
+
+    ExprSingleResult             /* expression does not return a set */
+    ExprMultipleResult           /* this result is an element of a set */
+    ExprEndResult                /* there are no more elements in the set */
+
+A function returning set returns one set element per call, setting
+fcinfo->resultinfo->isDone to ExprMultipleResult for each element.
+After all elements have been returned, the next call should set
+isDone to ExprEndResult and return a null result.  (Note it is possible
+to return an empty set by doing this on the first call.)
+
+
 Notes about function handlers
 -----------------------------

@@ -361,49 +409,91 @@ function is invoked many times.  (fn_extra can only be used as a hint,
 since callers are not required to re-use an FmgrInfo struct.
 But in performance-critical paths they normally will do so.)

-Issue: in what context should a handler allocate memory that it intends
-to use for fn_extra data?  The current palloc context when the handler
-is actually called might be considerably shorter-lived than the FmgrInfo
-struct, which would lead to dangling-pointer problems at the next use
-of the FmgrInfo.  Perhaps FmgrInfo should also store a memory context
-identifier that the handler could use to allocate space of the right
-lifespan.  (Having fmgr_info initialize this to CurrentMemoryContext
-should work in nearly all cases, though a few places might have to
-set it differently.)  At the moment I have not done this, since the
-existing PL handlers only need to set fn_extra to point at long-lived
-structures (data in their own caches) and don't really care which
-context the FmgrInfo is in anyway.
-
-Are there any other things needed by the call handlers for PL/pgsql and
-other languages?
-
-During the conversion process, support for old-style builtin functions
-and old-style user-written C functions will be provided by appropriate
-function handlers.  For example, the handler for old-style builtins
-looks roughly like fmgr_c() used to.
-
-
-System table updates
--------------------
-
-In the initial phase, two new entries will be added to pg_language
-for language types "newinternal" and "newC", corresponding to
-builtin and dynamically-loaded functions having the new calling
-convention.
-
-There will also be a change to pg_proc to add the new "proisstrict"
-column.
-
-Then pg_proc entries will be changed from language code "internal" to
-"newinternal" piecemeal, as the associated routines are rewritten.
-(This will imply several rounds of forced initdbs as the contents of
-pg_proc change, but I think we can live with that.)
-
-The old language names "internal" and "C" will continue to refer to
-functions with the old calling convention.  We should deprecate
-old-style functions because of their portability problems, but the
-support for them will only be one small function handler routine,
-so we can leave them in place for as long as necessary.
-
-The expected calling convention for PL call handlers will need to change
-all-at-once, but fortunately there are not very many of them to fix.
+If the handler wants to allocate memory to hold fn_extra data, it should
+NOT do so in CurrentMemoryContext, since the current context may well be
+much shorter-lived than the context where the FmgrInfo is.  Instead,
+allocate the memory in context flinfo->fn_mcxt, or in a long-lived cache
+context.  fn_mcxt normally points at the context that was
+CurrentMemoryContext at the time the FmgrInfo structure was created;
+in any case it is required to be a context at least as long-lived as the
+FmgrInfo itself.
+
+
+Telling the difference between old- and new-style functions
+-----------------------------------------------------------
+
+During the conversion process, we carried two different pg_language
+entries, "internal" and "newinternal", for internal functions.  The
+function manager used the language code to distinguish which calling
+convention to use.  (Old-style internal functions were supported via
+a function handler.)  As of Nov. 2000, no old-style internal functions
+remain, so we can drop support for them.  We will remove the old "internal"
+pg_language entry and rename "newinternal" to "internal".
+
+The interim solution for dynamically-loaded compiled functions has been
+similar: two pg_language entries "C" and "newC".  This naming convention
+is not desirable for the long run, and yet we cannot stop supporting
+old-style user functions.  Instead, it seems better to use just one
+pg_language entry "C", and require the dynamically-loaded library to
+provide additional information that identifies new-style functions.
+This avoids compatibility problems --- for example, existing dump
+scripts will identify PL language handlers as being in language "C",
+which would be wrong under the "newC" convention.  Also, this approach
+should generalize more conveniently for future extensions to the function
+interface specification.
+
+Given a dynamically loaded function named "foo" (note that the name being
+considered here is the link-symbol name, not the SQL-level function name),
+the function manager will look for another function in the same dynamically
+loaded library named "pg_finfo_foo".  If this second function does not
+exist, then foo is assumed to be called old-style, thus ensuring backwards
+compatibility with existing libraries.  If the info function does exist,
+it is expected to have the signature
+
+	Pg_finfo_record * pg_finfo_foo (void);
+
+The info function will be called by the fmgr, and must return a pointer
+to a Pg_finfo_record struct.  (The returned struct will typically be a
+statically allocated constant in the dynamic-link library.)  The current
+definition of the struct is just
+
+	typedef struct {
+		int	api_version;
+	} Pg_finfo_record;
+
+where api_version is 0 to indicate old-style or 1 to indicate new-style
+calling convention.  In future releases, additional fields may be defined
+after api_version, but these additional fields will only be used if
+api_version is greater than 2.
+
+These details will be hidden from the author of a dynamically loaded
+function by using a macro.  To define a new-style dynamically loaded
+function named foo, write
+
+	PG_FUNCTION_INFO_V1(foo);
+
+	Datum
+	foo(PG_FUNCTION_ARGS)
+	{
+		...
+	}
+
+The function itself is written using the same conventions as for new-style
+internal functions; you just need to add the PG_FUNCTION_INFO_V1() macro.
+Note that old-style and new-style functions can be intermixed in the same
+library, depending on whether or not you write a PG_FUNCTION_INFO_V1() for
+each one.
+
+The SQL declaration for a dynamically-loaded function is CREATE FUNCTION
+foo ... LANGUAGE 'C' regardless of whether it is old- or new-style.
+
+New-style dynamic functions will be invoked directly by fmgr, and will
+therefore have the same performance as internal functions after the initial
+pg_proc lookup overhead.  Old-style dynamic functions will be invoked via
+a handler, and will therefore have a small performance penalty.
+
+To allow old-style dynamic functions to work safely on toastable datatypes,
+the handler for old-style functions will automatically detoast toastable
+arguments before passing them to the old-style function.  A new-style
+function is expected to take care of toasted arguments by using the
+standard argument access macros defined above.