=for comment
The part of this file between =for mg_vtable.pl markers is auto
generated by mg_vtable.pl; any changes there need to be made instead to
mg_vtable.pl
=head1 NAME
perlguts - Introduction to the Perl API
=head1 DESCRIPTION
This document attempts to describe how to use the Perl API, as well as
to provide some info on the basic workings of the Perl core. It is far
from complete and probably contains many errors. Please refer any
questions or comments to the author below.
=head1 Variables
=head2 Datatypes
Perl has three typedefs that handle Perl's three main data types:
SV Scalar Value
AV Array Value
HV Hash Value
Each typedef has specific routines that manipulate the various data types.
=for apidoc_section $AV
=for apidoc Ayh||AV
=for apidoc_section $HV
=for apidoc Ayh||HV
=for apidoc_section $SV
=for apidoc Ayh||SV
=head2 What is an "IV"?
Perl uses a special typedef IV which is a simple signed integer type that is
guaranteed to be large enough to hold a pointer (as well as an integer).
Additionally, there is the UV, which is simply an unsigned IV.
Perl also uses several special typedefs to declare variables to hold
integers of (at least) a given size.
Use I8, I16, I32, and I64 to declare a signed integer variable which has
at least as many bits as the number in its name. These all evaluate to
the native C type that is closest to the given number of bits, but no
smaller than that number. For example, on many platforms, a C is
16 bits long, and if so, I16 will evaluate to a C. But on
platforms where a C isn't exactly 16 bits, Perl will use the
smallest type that contains 16 bits or more.
U8, U16, U32, and U64 are to declare the corresponding unsigned integer
types.
If the platform doesn't support 64-bit integers, both I64 and U64 will
be undefined. Use IV and UV to declare the largest practicable, and
C> for the absolute maximum unsigned, but which
may not be usable in all circumstances.
A numeric constant can be specified with L>,
L>, and similar.
=for apidoc_section $integer
=for apidoc Ayh ||IV
=for apidoc_item ||I8
=for apidoc_item ||I16
=for apidoc_item ||I32
=for apidoc_item ||I64
=for apidoc Ayh ||UV
=for apidoc_item ||U8
=for apidoc_item ||U16
=for apidoc_item ||U32
=for apidoc_item ||U64
=head2 Working with SVs
An SV can be created and loaded with one command. There are five types of
values that can be loaded: an integer value (IV), an unsigned integer
value (UV), a double (NV), a string (PV), and another scalar (SV).
("PV" stands for "Pointer Value". You might think that it is misnamed
because it is described as pointing only to strings. However, it is
possible to have it point to other things. For example, it could point
to an array of UVs. But,
using it for non-strings requires care, as the underlying assumption of
much of the internals is that PVs are just for strings. Often, for
example, a trailing C is tacked on automatically. The non-string use
is documented only in this paragraph.)
=for apidoc_section $floating
=for apidoc Ayh||NV
The seven routines are:
SV* newSViv(IV);
SV* newSVuv(UV);
SV* newSVnv(double);
SV* newSVpv(const char*, STRLEN);
SV* newSVpvn(const char*, STRLEN);
SV* newSVpvf(const char*, ...);
SV* newSVsv(SV*);
C is an integer type (C, usually defined as C in
F) guaranteed to be large enough to represent the size of
any string that perl can handle.
=for apidoc_section $string
=for apidoc Ayh||STRLEN
In the unlikely case of a SV requiring more complex initialization, you
can create an empty SV with newSV(len). If C is 0 an empty SV of
type NULL is returned, else an SV of type PV is returned with len + 1 (for
the C) bytes of storage allocated, accessible via SvPVX. In both cases
the SV has the undef value.
SV *sv = newSV(0); /* no storage allocated */
SV *sv = newSV(10); /* 10 (+1) bytes of uninitialised storage
* allocated */
To change the value of an I SV, there are eight routines:
void sv_setiv(SV*, IV);
void sv_setuv(SV*, UV);
void sv_setnv(SV*, double);
void sv_setpv(SV*, const char*);
void sv_setpvn(SV*, const char*, STRLEN)
void sv_setpvf(SV*, const char*, ...);
void sv_vsetpvfn(SV*, const char*, STRLEN, va_list *,
SV **, Size_t, bool *);
void sv_setsv(SV*, SV*);
Notice that you can choose to specify the length of the string to be
assigned by using C, C, or C, or you may
allow Perl to calculate the length by using C or by specifying
0 as the second argument to C. Be warned, though, that Perl will
determine the string's length by using C, which depends on the
string terminating with a C character, and not otherwise containing
NULs.
The arguments of C are processed like C, and the
formatted output becomes the value.
C is an analogue of C, but it allows you to specify
either a pointer to a variable argument list or the address and length of
an array of SVs. The last argument points to a boolean; on return, if that
boolean is true, then locale-specific information has been used to format
the string, and the string's contents are therefore untrustworthy (see
L). This pointer may be NULL if that information is not
important. Note that this function requires you to specify the length of
the format.
The C functions are not generic enough to operate on values
that have "magic". See L later in this document.
All SVs that contain strings should be terminated with a C character.
If it is not C-terminated there is a risk of
core dumps and corruptions from code which passes the string to C
functions or system calls which expect a C-terminated string.
Perl's own functions typically add a trailing C for this reason.
Nevertheless, you should be very careful when you pass a string stored
in an SV to a C function or system call.
To access the actual value that an SV points to, Perl's API exposes
several macros that coerce the actual scalar type into an IV, UV, double,
or string:
=over
=item * C (C) and C (C)
=item * C (C)
=item * Strings are a bit complicated:
=over
=item * Byte string: C or C
If the Perl string is C, then this returns a 2-byte C.
This is suitable for Perl strings that represent bytes.
=item * UTF-8 string: C or C
If the Perl string is C, then this returns a 4-byte C.
This is suitable for Perl strings that represent characters.
B: That C will be encoded via Perl's internal UTF-8 variant,
which means that if the SV contains non-Unicode code points (e.g.,
0x110000), then the result may contain extensions over valid UTF-8.
See L for some methods Perl gives
you to check the UTF-8 validity of these macros' returns.
=item * You can also use C or C
to fetch the SV's raw internal buffer. This is tricky, though; if your Perl
string
is C, then depending on the SV's internal encoding you might get
back a 2-byte B a 4-byte C.
Moreover, if it's the 4-byte string, that could come from either Perl
C stored UTF-8 encoded, or Perl C stored
as raw octets. To differentiate between these you B look up the
SV's UTF8 bit (cf. C) to know whether the source Perl string
is 2 characters (C would be on) or 4 characters (C would be
off).
B Use of C, C, or
similarly-named macros I looking up the SV's UTF8 bit is
almost certainly a bug if non-ASCII input is allowed.
When the UTF8 bit is on, the same B about UTF-8 validity applies
here as for C.
=back
(See L for more details.)
In C, C, and C, the length of the C returned
is placed into the
variable C (these are macros, so you do I use C). If you do
not care what the length of the data is, use C,
C, or C instead.
The global variable C can also be given to
C/C/C
in this case. But that can be quite inefficient because C must
be accessed in thread-local storage in threaded Perl. In any case, remember
that Perl allows arbitrary strings of data that may both contain NULs and
might not be terminated by a C.
Also remember that C doesn't allow you to safely say C. It might work with your
compiler, but it won't work for everyone.
Break this sort of statement up into separate assignments:
SV *s;
STRLEN len;
char *ptr;
ptr = SvPVbyte(s, len);
foo(ptr, len);
=back
If you want to know if the scalar value is TRUE, you can use:
SvTRUE(SV*)
Although Perl will automatically grow strings for you, if you need to force
Perl to allocate more memory for your SV, you can use the macro
SvGROW(SV*, STRLEN newlen)
which will determine if more memory needs to be allocated. If so, it will
call the function C. Note that C can only increase, not
decrease, the allocated memory of an SV and that it does not automatically
add space for the trailing C byte (perl's own string functions typically do
C).
If you want to write to an existing SV's buffer and set its value to a
string, use SvPVbyte_force() or one of its variants to force the SV to be
a PV. This will remove any of various types of non-stringness from
the SV while preserving the content of the SV in the PV. This can be
used, for example, to append data from an API function to a buffer
without extra copying:
(void)SvPVbyte_force(sv, len);
s = SvGROW(sv, len + needlen + 1);
/* something that modifies up to needlen bytes at s+len, but
modifies newlen bytes
eg. newlen = read(fd, s + len, needlen);
ignoring errors for these examples
*/
s[len + newlen] = '\0';
SvCUR_set(sv, len + newlen);
SvUTF8_off(sv);
SvSETMAGIC(sv);
If you already have the data in memory or if you want to keep your
code simple, you can use one of the sv_cat*() variants, such as
sv_catpvn(). If you want to insert anywhere in the string you can use
sv_insert() or sv_insert_flags().
If you don't need the existing content of the SV, you can avoid some
copying with:
SvPVCLEAR(sv);
s = SvGROW(sv, needlen + 1);
/* something that modifies up to needlen bytes at s, but modifies
newlen bytes
eg. newlen = read(fd, s, needlen);
*/
s[newlen] = '\0';
SvCUR_set(sv, newlen);
SvPOK_only(sv); /* also clears SVf_UTF8 */
SvSETMAGIC(sv);
Again, if you already have the data in memory or want to avoid the
complexity of the above, you can use sv_setpvn().
If you have a buffer allocated with Newx() and want to set that as the
SV's value, you can use sv_usepvn_flags(). That has some requirements
if you want to avoid perl re-allocating the buffer to fit the trailing
NUL:
Newx(buf, somesize+1, char);
/* ... fill in buf ... */
buf[somesize] = '\0';
sv_usepvn_flags(sv, buf, somesize, SV_SMAGIC | SV_HAS_TRAILING_NUL);
/* buf now belongs to perl, don't release it */
If you have an SV and want to know what kind of data Perl thinks is stored
in it, you can use the following macros to check the type of SV you have.
SvIOK(SV*)
SvNOK(SV*)
SvPOK(SV*)
Be aware that retrieving the numeric value of an SV can set IOK or NOK
on that SV, even when the SV started as a string. Prior to Perl
5.36.0 retrieving the string value of an integer could set POK, but
this can no longer occur. From 5.36.0 this can be used to distinguish
the original representation of an SV and is intended to make life
simpler for serializers:
/* references handled elsewhere */
if (SvIsBOOL(sv)) {
/* originally boolean */
...
}
else if (SvPOK(sv)) {
/* originally a string */
...
}
else if (SvNIOK(sv)) {
/* originally numeric */
...
}
else {
/* something special or undef */
}
You can get and set the current length of the string stored in an SV with
the following macros:
SvCUR(SV*)
SvCUR_set(SV*, I32 val)
You can also get a pointer to the end of the string stored in the SV
with the macro:
SvEND(SV*)
But note that these last three macros are valid only if C is true.
If you want to append something to the end of string stored in an C,
you can use the following functions:
void sv_catpv(SV*, const char*);
void sv_catpvn(SV*, const char*, STRLEN);
void sv_catpvf(SV*, const char*, ...);
void sv_vcatpvfn(SV*, const char*, STRLEN, va_list *, SV **,
I32, bool);
void sv_catsv(SV*, SV*);
The first function calculates the length of the string to be appended by
using C. In the second, you specify the length of the string
yourself. The third function processes its arguments like C and
appends the formatted output. The fourth function works like C.
You can specify the address and length of an array of SVs instead of the
va_list argument. The fifth function
extends the string stored in the first
SV with the string stored in the second SV. It also forces the second SV
to be interpreted as a string.
The C functions are not generic enough to operate on values that
have "magic". See L later in this document.
If you know the name of a scalar variable, you can get a pointer to its SV
by using the following:
SV* get_sv("package::varname", 0);
This returns NULL if the variable does not exist.
If you want to know if this variable (or any other SV) is actually C,
you can call:
SvOK(SV*)
The scalar C value is stored in an SV instance called C.
Its address can be used whenever an C is needed. Make sure that
you don't try to compare a random sv with C. For example
when interfacing Perl code, it'll work correctly for:
foo(undef);
But won't work when called as:
$x = undef;
foo($x);
So to repeat always use SvOK() to check whether an sv is defined.
Also you have to be careful when using C as a value in
AVs or HVs (see L).
There are also the two values C and C, which contain
boolean TRUE and FALSE values, respectively. Like C, their
addresses can be used whenever an C is needed.
Do not be fooled into thinking that C is the same as C.
Take this code:
SV* sv = (SV*) 0;
if (I-am-to-return-a-real-value) {
sv = sv_2mortal(newSViv(42));
}
sv_setsv(ST(0), sv);
This code tries to return a new SV (which contains the value 42) if it should
return a real value, or undef otherwise. Instead it has returned a NULL
pointer which, somewhere down the line, will cause a segmentation violation,
bus error, or just weird results. Change the zero to C in the
first line and all will be well.
To free an SV that you've created, call C. Normally this
call is not necessary (see L).
=head2 Offsets
Perl provides the function C to efficiently remove characters
from the beginning of a string; you give it an SV and a pointer to
somewhere inside the PV, and it discards everything before the
pointer. The efficiency comes by means of a little hack: instead of
actually removing the characters, C sets the flag C
(offset OK) to signal to other functions that the offset hack is in
effect, and it moves the PV pointer (called C) forward
by the number of bytes chopped off, and adjusts C and C
accordingly. (A portion of the space between the old and new PV
pointers is used to store the count of chopped bytes.)
Hence, at this point, the start of the buffer that we allocated lives
at C in memory and the PV pointer is pointing
into the middle of this allocated storage.
This is best demonstrated by example. Normally copy-on-write will prevent
the substitution from operator from using this hack, but if you can craft a
string for which copy-on-write is not possible, you can see it in play. In
the current implementation, the final byte of a string buffer is used as a
copy-on-write reference count. If the buffer is not big enough, then
copy-on-write is skipped. First have a look at an empty string:
% ./perl -Ilib -MDevel::Peek -le '$a=""; $a .= ""; Dump $a'
SV = PV(0x7ffb7c008a70) at 0x7ffb7c030390
REFCNT = 1
FLAGS = (POK,pPOK)
PV = 0x7ffb7bc05b50 ""\0
CUR = 0
LEN = 10
Notice here the LEN is 10. (It may differ on your platform.) Extend the
length of the string to one less than 10, and do a substitution:
% ./perl -Ilib -MDevel::Peek -le '$a=""; $a.="123456789"; $a=~s/.//; \
Dump($a)'
SV = PV(0x7ffa04008a70) at 0x7ffa04030390
REFCNT = 1
FLAGS = (POK,OOK,pPOK)
OFFSET = 1
PV = 0x7ffa03c05b61 ( "\1" . ) "23456789"\0
CUR = 8
LEN = 9
Here the number of bytes chopped off (1) is shown next as the OFFSET. The
portion of the string between the "real" and the "fake" beginnings is
shown in parentheses, and the values of C and C reflect
the fake beginning, not the real one. (The first character of the string
buffer happens to have changed to "\1" here, not "1", because the current
implementation stores the offset count in the string buffer. This is
subject to change.)
Something similar to the offset hack is performed on AVs to enable
efficient shifting and splicing off the beginning of the array; while
C points to the first element in the array that is visible from
Perl, C points to the real start of the C array. These are
usually the same, but a C operation can be carried out by
increasing C by one and decreasing C and C.
Again, the location of the real start of the C array only comes into
play when freeing the array. See C in F.
=for apidoc_section $AV
=for apidoc Amh||AvALLOC|AV* av
=head2 What's Really Stored in an SV?
Recall that the usual method of determining the type of scalar you have is
to use C macros. Because a scalar can be both a number and a string,
usually these macros will always return TRUE and calling the C
macros will do the appropriate conversion of string to integer/double or
integer/double to string.
If you I need to know if you have an integer, double, or string
pointer in an SV, you can use the following three macros instead:
SvIOKp(SV*)
SvNOKp(SV*)
SvPOKp(SV*)
These will tell you if you truly have an integer, double, or string pointer
stored in your SV. The "p" stands for private.
There are various ways in which the private and public flags may differ.
For example, in perl 5.16 and earlier a tied SV may have a valid
underlying value in the IV slot (so SvIOKp is true), but the data
should be accessed via the FETCH routine rather than directly,
so SvIOK is false. (In perl 5.18 onwards, tied scalars use
the flags the same way as untied scalars.) Another is when
numeric conversion has occurred and precision has been lost: only the
private flag is set on 'lossy' values. So when an NV is converted to an
IV with loss, SvIOKp, SvNOKp and SvNOK will be set, while SvIOK wont be.
In general, though, it's best to use the C macros.
=head2 Working with AVs
There are two main, longstanding ways to create and load an AV. The first
method creates an empty AV:
AV* newAV();
The second method both creates the AV and initially populates it with SVs:
AV* av_make(SSize_t num, SV **ptr);
The second argument points to an array containing C C's. Once the
AV has been created, the SVs can be destroyed, if so desired.
Perl v5.36 added two new ways to create an AV and allocate a SV** array
without populating it. These are more efficient than a newAV() followed by an
av_extend().
/* Creates but does not initialize (Zero) the SV** array */
AV *av = newAV_alloc_x(1);
/* Creates and does initialize (Zero) the SV** array */
AV *av = newAV_alloc_xz(1);
The numerical argument refers to the number of array elements to allocate, not
an array index, and must be >0. The first form must only ever be used when all
elements will be initialized before any read occurs. Reading a non-initialized
SV* - i.e. treating a random memory address as a SV* - is a serious bug.
Once the AV has been created, the following operations are possible on it:
void av_push(AV*, SV*);
SV* av_pop(AV*);
SV* av_shift(AV*);
void av_unshift(AV*, SSize_t num);
These should be familiar operations, with the exception of C.
This routine adds C elements at the front of the array with the C
value. You must then use C (described below) to assign values
to these new elements.
Here are some other functions:
Size_t av_count(AV*);
SSize_t av_top_index(AV*);
SV** av_fetch(AV*, SSize_t key, I32 lval);
SV** av_store(AV*, SSize_t key, SV* val);
C returns the number of elements in the array (including
any empty slots (undefined ones) that are intermixed with filled-in ones).
The C function returns the highest index value in an array (just
like $#array in Perl). If the array is empty, -1 is returned. It is
always equal to S>. The
C function returns the value at index C, but if C
is non-zero, then C will store an undef value at that index.
The C function stores the value C at index C, and does
not increment the reference count of C. Thus the caller is responsible
for taking care of that, and if C returns NULL, the caller will
have to decrement the reference count to avoid a memory leak. Note that
C and C both return C's, not C's as their
return value.
A few more:
void av_clear(AV*);
void av_undef(AV*);
void av_extend(AV*, SSize_t key);
The C function deletes all the elements in the AV* array, but
does not actually delete the array itself. The C function will
delete all the elements in the array plus the array itself. The
C function extends the array so that it contains at least C
elements. If C is less than the currently allocated length of the array,
then nothing is done.
If you know the name of an array variable, you can get a pointer to its AV
by using the following:
AV* get_av("package::varname", 0);
This returns NULL if the variable does not exist.
See L for more
information on how to use the array access functions on tied arrays.
=head3 More efficient working with new or vanilla AVs
Perl v5.36 and v5.38 introduced streamlined, inlined versions of some
functions:
=over
=item * C
=item * C
=item * C
=back
These are drop-in replacements, but can only be used on straightforward
AVs that meet the following criteria:
=over
=item * are not magical
=item * are not readonly
=item * are "real" (refcounted) AVs
=item * have an av_top_index value > -2
=back
AVs created using C, C, C, and
C are all compatible at the time of creation. It is
only if they are declared readonly or unreal, have magic attached, or
are otherwise configured unusually that they will stop being compatible.
Note that some interpreter functions may attach magic to an AV as part
of normal operations. It is therefore safest, unless you are sure of the
lifecycle of an AV, to only use these new functions close to the point
of AV creation.
=head2 Working with HVs
To create an HV, you use the following routine:
HV* newHV();
Once the HV has been created, the following operations are possible on it:
SV** hv_store(HV*, const char* key, U32 klen, SV* val, U32 hash);
SV** hv_fetch(HV*, const char* key, U32 klen, I32 lval);
The C parameter is the length of the key being passed in (Note that
you cannot pass 0 in as a value of C to tell Perl to measure the
length of the key). The C argument contains the SV pointer to the
scalar being stored, and C is the precomputed hash value (zero if
you want C to calculate it for you). The C parameter
indicates whether this fetch is actually a part of a store operation, in
which case a new undefined value will be added to the HV with the supplied
key and C will return as if the value had already existed.
Remember that C and C return C's and not just
C. To access the scalar value, you must first dereference the return
value. However, you should check to make sure that the return value is
not NULL before dereferencing it.
The first of these two functions checks if a hash table entry exists, and the
second deletes it.
bool hv_exists(HV*, const char* key, U32 klen);
SV* hv_delete(HV*, const char* key, U32 klen, I32 flags);
If C does not include the C flag then C will
create and return a mortal copy of the deleted value.
And more miscellaneous functions:
void hv_clear(HV*);
void hv_undef(HV*);
Like their AV counterparts, C deletes all the entries in the hash
table but does not actually delete the hash table. The C deletes
both the entries and the hash table itself.
Perl keeps the actual data in a linked list of structures with a typedef of HE.
These contain the actual key and value pointers (plus extra administrative
overhead). The key is a string pointer; the value is an C. However,
once you have an C, to get the actual key and value, use the routines
specified below.
=for apidoc_section $HV
=for apidoc Ayh||HE
I32 hv_iterinit(HV*);
/* Prepares starting point to traverse hash table */
HE* hv_iternext(HV*);
/* Get the next entry, and return a pointer to a
structure that has both the key and value */
char* hv_iterkey(HE* entry, I32* retlen);
/* Get the key from an HE structure and also return
the length of the key string */
SV* hv_iterval(HV*, HE* entry);
/* Return an SV pointer to the value of the HE
structure */
SV* hv_iternextsv(HV*, char** key, I32* retlen);
/* This convenience routine combines hv_iternext,
hv_iterkey, and hv_iterval. The key and retlen
arguments are return values for the key and its
length. The value is returned in the SV* argument */
If you know the name of a hash variable, you can get a pointer to its HV
by using the following:
HV* get_hv("package::varname", 0);
This returns NULL if the variable does not exist.
The hash algorithm is defined in the C macro:
PERL_HASH(hash, key, klen)
The exact implementation of this macro varies by architecture and version
of perl, and the return value may change per invocation, so the value
is only valid for the duration of a single perl process.
See L for more
information on how to use the hash access functions on tied hashes.
=for apidoc_section $HV
=for apidoc Amh|void|PERL_HASH|U32 hash|char *key|STRLEN klen
=head2 Hash API Extensions
Beginning with version 5.004, the following functions are also supported:
HE* hv_fetch_ent (HV* tb, SV* key, I32 lval, U32 hash);
HE* hv_store_ent (HV* tb, SV* key, SV* val, U32 hash);
bool hv_exists_ent (HV* tb, SV* key, U32 hash);
SV* hv_delete_ent (HV* tb, SV* key, I32 flags, U32 hash);
SV* hv_iterkeysv (HE* entry);
Note that these functions take C keys, which simplifies writing
of extension code that deals with hash structures. These functions
also allow passing of C keys to C functions without forcing
you to stringify the keys (unlike the previous set of functions).
They also return and accept whole hash entries (C), making their
use more efficient (since the hash number for a particular string
doesn't have to be recomputed every time). See L for detailed
descriptions.
The following macros must always be used to access the contents of hash
entries. Note that the arguments to these macros must be simple
variables, since they may get evaluated more than once. See
L for detailed descriptions of these macros.
HePV(HE* he, STRLEN len)
HeVAL(HE* he)
HeHASH(HE* he)
HeSVKEY(HE* he)
HeSVKEY_force(HE* he)
HeSVKEY_set(HE* he, SV* sv)
These two lower level macros are defined, but must only be used when
dealing with keys that are not Cs:
HeKEY(HE* he)
HeKLEN(HE* he)
Note that both C and C do not increment the
reference count of the stored C, which is the caller's responsibility.
If these functions return a NULL value, the caller will usually have to
decrement the reference count of C to avoid a memory leak.
=head2 AVs, HVs and undefined values
Sometimes you have to store undefined values in AVs or HVs. Although
this may be a rare case, it can be tricky. That's because you're
used to using C if you need an undefined SV.
For example, intuition tells you that this XS code:
AV *av = newAV();
av_store( av, 0, &PL_sv_undef );
is equivalent to this Perl code:
my @av;
$av[0] = undef;
Unfortunately, this isn't true. In perl 5.18 and earlier, AVs use C as a marker
for indicating that an array element has not yet been initialized.
Thus, C would be true for the above Perl code, but
false for the array generated by the XS code. In perl 5.20, storing
&PL_sv_undef will create a read-only element, because the scalar
&PL_sv_undef itself is stored, not a copy.
Similar problems can occur when storing C in HVs:
hv_store( hv, "key", 3, &PL_sv_undef, 0 );
This will indeed make the value C, but if you try to modify
the value of C, you'll get the following error:
Modification of non-creatable hash value attempted
In perl 5.8.0, C was also used to mark placeholders
in restricted hashes. This caused such hash entries not to appear
when iterating over the hash or when checking for the keys
with the C function.
You can run into similar problems when you store C or
C into AVs or HVs. Trying to modify such elements
will give you the following error:
Modification of a read-only value attempted
To make a long story short, you can use the special variables
C, C and C with AVs and
HVs, but you have to make sure you know what you're doing.
Generally, if you want to store an undefined value in an AV
or HV, you should not use C, but rather create a
new undefined value using the C function, for example:
av_store( av, 42, newSV(0) );
hv_store( hv, "foo", 3, newSV(0), 0 );
=head2 References
References are a special type of scalar that point to other data types
(including other references).
To create a reference, use either of the following functions:
SV* newRV_inc((SV*) thing);
SV* newRV_noinc((SV*) thing);
The C argument can be any of an C, C, or C. The
functions are identical except that C increments the reference
count of the C, while C does not. For historical
reasons, C is a synonym for C.
Once you have a reference, you can use the following macro to dereference
the reference:
SvRV(SV*)
then call the appropriate routines, casting the returned C to either an
C or C, if required.
To determine if an SV is a reference, you can use the following macro:
SvROK(SV*)
To discover what type of value the reference refers to, use the following
macro and then check the return value.
SvTYPE(SvRV(SV*))
The most useful types that will be returned are:
SVt_PVAV Array
SVt_PVHV Hash
SVt_PVCV Code
SVt_PVGV Glob (possibly a file handle)
Any numerical value returned which is less than SVt_PVAV will be a scalar
of some form.
See L for more details.
=head2 Blessed References and Class Objects
References are also used to support object-oriented programming. In perl's
OO lexicon, an object is simply a reference that has been blessed into a
package (or class). Once blessed, the programmer may now use the reference
to access the various methods in the class.
A reference can be blessed into a package with the following function:
SV* sv_bless(SV* sv, HV* stash);
The C argument must be a reference value. The C argument
specifies which class the reference will belong to. See
L for information on converting class names into stashes.
/* Still under construction */
The following function upgrades rv to reference if not already one.
Creates a new SV for rv to point to. If C is non-null, the SV
is blessed into the specified class. SV is returned.
SV* newSVrv(SV* rv, const char* classname);
The following three functions copy integer, unsigned integer or double
into an SV whose reference is C. SV is blessed if C is
non-null.
SV* sv_setref_iv(SV* rv, const char* classname, IV iv);
SV* sv_setref_uv(SV* rv, const char* classname, UV uv);
SV* sv_setref_nv(SV* rv, const char* classname, NV iv);
The following function copies the pointer value (I) into an SV whose reference is rv. SV is blessed if C
is non-null.
SV* sv_setref_pv(SV* rv, const char* classname, void* pv);
The following function copies a string into an SV whose reference is C.
Set length to 0 to let Perl calculate the string length. SV is blessed if
C is non-null.
SV* sv_setref_pvn(SV* rv, const char* classname, char* pv,
STRLEN length);
The following function tests whether the SV is blessed into the specified
class. It does not check inheritance relationships.
int sv_isa(SV* sv, const char* name);
The following function tests whether the SV is a reference to a blessed object.
int sv_isobject(SV* sv);
The following function tests whether the SV is derived from the specified
class. SV can be either a reference to a blessed object or a string
containing a class name. This is the function implementing the
C<:isa> functionality.
bool sv_derived_from(SV* sv, const char* name);
To check if you've got an object derived from a specific class you have
to write:
if (sv_isobject(sv) && sv_derived_from(sv, class)) { ... }
=head2 Creating New Variables
To create a new Perl variable with an undef value which can be accessed from
your Perl script, use the following routines, depending on the variable type.
SV* get_sv("package::varname", GV_ADD);
AV* get_av("package::varname", GV_ADD);
HV* get_hv("package::varname", GV_ADD);
Notice the use of GV_ADD as the second parameter. The new variable can now
be set, using the routines appropriate to the data type.
There are additional macros whose values may be bitwise OR'ed with the
C argument to enable certain extra features. Those bits are:
=over
=item GV_ADDMULTI
Marks the variable as multiply defined, thus preventing the:
Name used only once: possible typo
warning.
=item GV_ADDWARN
Issues the warning:
Had to create unexpectedly
if the variable did not exist before the function was called.
=back
If you do not specify a package name, the variable is created in the current
package.
=head2 Reference Counts and Mortality
Perl uses a reference count-driven garbage collection mechanism. SVs,
AVs, or HVs (xV for short in the following) start their life with a
reference count of 1. If the reference count of an xV ever drops to 0,
then it will be destroyed and its memory made available for reuse.
At the most basic internal level, reference counts can be manipulated
with the following macros:
int SvREFCNT(SV* sv);
SV* SvREFCNT_inc(SV* sv);
void SvREFCNT_dec(SV* sv);
(There are also suffixed versions of the increment and decrement macros,
for situations where the full generality of these basic macros can be
exchanged for some performance.)
However, the way a programmer should think about references is not so
much in terms of the bare reference count, but in terms of I
of references. A reference to an xV can be owned by any of a variety
of entities: another xV, the Perl interpreter, an XS data structure,
a piece of running code, or a dynamic scope. An xV generally does not
know what entities own the references to it; it only knows how many
references there are, which is the reference count.
To correctly maintain reference counts, it is essential to keep track
of what references the XS code is manipulating. The programmer should
always know where a reference has come from and who owns it, and be
aware of any creation or destruction of references, and any transfers
of ownership. Because ownership isn't represented explicitly in the xV
data structures, only the reference count need be actually maintained
by the code, and that means that this understanding of ownership is not
actually evident in the code. For example, transferring ownership of a
reference from one owner to another doesn't change the reference count
at all, so may be achieved with no actual code. (The transferring code
doesn't touch the referenced object, but does need to ensure that the
former owner knows that it no longer owns the reference, and that the
new owner knows that it now does.)
An xV that is visible at the Perl level should not become unreferenced
and thus be destroyed. Normally, an object will only become unreferenced
when it is no longer visible, often by the same means that makes it
invisible. For example, a Perl reference value (RV) owns a reference to
its referent, so if the RV is overwritten that reference gets destroyed,
and the no-longer-reachable referent may be destroyed as a result.
Many functions have some kind of reference manipulation as
part of their purpose. Sometimes this is documented in terms
of ownership of references, and sometimes it is (less helpfully)
documented in terms of changes to reference counts. For example, the
L function is documented to create a new RV
(with reference count 1) and increment the reference count of the referent
that was supplied by the caller. This is best understood as creating
a new reference to the referent, which is owned by the created RV,
and returning to the caller ownership of the sole reference to the RV.
The L function instead does not
increment the reference count of the referent, but the RV nevertheless
ends up owning a reference to the referent. It is therefore implied
that the caller of C is relinquishing a reference to the
referent, making this conceptually a more complicated operation even
though it does less to the data structures.
For example, imagine you want to return a reference from an XSUB
function. Inside the XSUB routine, you create an SV which initially
has just a single reference, owned by the XSUB routine. This reference
needs to be disposed of before the routine is complete, otherwise it
will leak, preventing the SV from ever being destroyed. So to create
an RV referencing the SV, it is most convenient to pass the SV to
C, which consumes that reference. Now the XSUB routine
no longer owns a reference to the SV, but does own a reference to the RV,
which in turn owns a reference to the SV. The ownership of the reference
to the RV is then transferred by the process of returning the RV from
the XSUB.
There are some convenience functions available that can help with the
destruction of xVs. These functions introduce the concept of "mortality".
Much documentation speaks of an xV itself being mortal, but this is
misleading. It is really I an xV that is mortal, and it
is possible for there to be more than one mortal reference to a single xV.
For a reference to be mortal means that it is owned by the temps stack,
one of perl's many internal stacks, which will destroy that reference
"a short time later". Usually the "short time later" is the end of
the current Perl statement. However, it gets more complicated around
dynamic scopes: there can be multiple sets of mortal references hanging
around at the same time, with different death dates. Internally, the
actual determinant for when mortal xV references are destroyed depends
on two macros, SAVETMPS and FREETMPS. See L and L
and L below for more details on these macros.
Mortal references are mainly used for xVs that are placed on perl's
main stack. The stack is problematic for reference tracking, because it
contains a lot of xV references, but doesn't own those references: they
are not counted. Currently, there are many bugs resulting from xVs being
destroyed while referenced by the stack, because the stack's uncounted
references aren't enough to keep the xVs alive. So when putting an
(uncounted) reference on the stack, it is vitally important to ensure that
there will be a counted reference to the same xV that will last at least
as long as the uncounted reference. But it's also important that that
counted reference be cleaned up at an appropriate time, and not unduly
prolong the xV's life. For there to be a mortal reference is often the
best way to satisfy this requirement, especially if the xV was created
especially to be put on the stack and would otherwise be unreferenced.
To create a mortal reference, use the functions:
SV* sv_newmortal()
SV* sv_mortalcopy(SV*)
SV* sv_2mortal(SV*)
C creates an SV (with the undefined value) whose sole
reference is mortal. C creates an xV whose value is a
copy of a supplied xV and whose sole reference is mortal. C
mortalises an existing xV reference: it transfers ownership of a reference
from the caller to the temps stack. Because C gives the new
SV no value, it must normally be given one via C, C,
etc. :
SV *tmp = sv_newmortal();
sv_setiv(tmp, an_integer);
As that is multiple C statements it is quite common so see this idiom instead:
SV *tmp = sv_2mortal(newSViv(an_integer));
The mortal routines are not just for SVs; AVs and HVs can be
made mortal by passing their address (type-casted to C) to the
C or C routines.
=head2 Stashes and Globs
A B is a hash that contains all variables that are defined
within a package. Each key of the stash is a symbol
name (shared by all the different types of objects that have the same
name), and each value in the hash table is a GV (Glob Value). This GV
in turn contains references to the various objects of that name,
including (but not limited to) the following:
Scalar Value
Array Value
Hash Value
I/O Handle
Format
Subroutine
There is a single stash called C that holds the items that exist
in the C package. To get at the items in other packages, append the
string "::" to the package name. The items in the C package are in
the stash C<:> in PL_defstash. The items in the C<:baz> package are
in the stash C<:> in C<:>'s stash.
=for apidoc_section $GV
=for apidoc Amnh||PL_defstash
To get the stash pointer for a particular package, use the function:
HV* gv_stashpv(const char* name, I32 flags)
HV* gv_stashsv(SV*, I32 flags)
The first function takes a literal string, the second uses the string stored
in the SV. Remember that a stash is just a hash table, so you get back an
C. The C flag will create a new package if it is set to GV_ADD.
The name that C wants is the name of the package whose symbol table
you want. The default package is called C. If you have multiply nested
packages, pass their names to C, separated by C<::> as in the Perl
language itself.
Alternately, if you have an SV that is a blessed reference, you can find
out the stash pointer by using:
HV* SvSTASH(SvRV(SV*));
then use the following to get the package name itself:
char* HvNAME(HV* stash);
If you need to bless or re-bless an object you can use the following
function:
SV* sv_bless(SV*, HV* stash)
where the first argument, an C, must be a reference, and the second
argument is a stash. The returned C can now be used in the same way
as any other SV.
For more information on references and blessings, consult L.
=head2 I/O Handles
Like AVs and HVs, IO objects are another type of non-scalar SV which
may contain input and output L objects or a C
from opendir().
You can create a new IO object:
IO* newIO();
Unlike other SVs, a new IO object is automatically blessed into the
L<:file> class.
The IO object contains an input and output PerlIO handle:
PerlIO *IoIFP(IO *io);
PerlIO *IoOFP(IO *io);
=for apidoc_section $io
=for apidoc Amh|PerlIO *|IoIFP|IO *io
=for apidoc Amh|PerlIO *|IoOFP|IO *io
Typically if the IO object has been opened on a file, the input handle
is always present, but the output handle is only present if the file
is open for output. For a file, if both are present they will be the
same PerlIO object.
Distinct input and output PerlIO objects are created for sockets and
character devices.
The IO object also contains other data associated with Perl I/O
handles:
IV IoLINES(io); /* $. */
IV IoPAGE(io); /* $% */
IV IoPAGE_LEN(io); /* $= */
IV IoLINES_LEFT(io); /* $- */
char *IoTOP_NAME(io); /* $^ */
GV *IoTOP_GV(io); /* $^ */
char *IoFMT_NAME(io); /* $~ */
GV *IoFMT_GV(io); /* $~ */
char *IoBOTTOM_NAME(io);
GV *IoBOTTOM_GV(io);
char IoTYPE(io);
U8 IoFLAGS(io);
=for apidoc_sections $io_scn, $formats_section
=for apidoc_section $reports
=for apidoc Amh|IV|IoLINES|IO *io
=for apidoc Amh|IV|IoPAGE|IO *io
=for apidoc Amh|IV|IoPAGE_LEN|IO *io
=for apidoc Amh|IV|IoLINES_LEFT|IO *io
=for apidoc Amh|char *|IoTOP_NAME|IO *io
=for apidoc Amh|GV *|IoTOP_GV|IO *io
=for apidoc Amh|char *|IoFMT_NAME|IO *io
=for apidoc Amh|GV *|IoFMT_GV|IO *io
=for apidoc Amh|char *|IoBOTTOM_NAME|IO *io
=for apidoc Amh|GV *|IoBOTTOM_GV|IO *io
=for apidoc_section $io
=for apidoc Amh|char|IoTYPE|IO *io
=for apidoc Amh|U8|IoFLAGS|IO *io
Most of these are involved with L.
IoFLAGs() may contain a combination of flags, the most interesting of
which are C (C) for autoflush and C,
settable with Luntaint" >>.
=for apidoc Amnh||IOf_FLUSH
=for apidoc Amnh||IOf_UNTAINT
The IO object may also contains a directory handle:
DIR *IoDIRP(io);
=for apidoc Amh|DIR *|IoDIRP|IO *io
suitable for use with PerlDir_read() etc.
All of these accessors macros are lvalues, there are no distinct
C<_set> macros to modify the members of the IO object.
=head2 Double-Typed SVs
Scalar variables normally contain only one type of value, an integer,
double, pointer, or reference. Perl will automatically convert the
actual scalar data from the stored type into the requested type.
Some scalar variables contain more than one type of scalar data. For
example, the variable C contains either the numeric value of C
or its string equivalent from either C or C