=for comment The part of this file between =for mg_vtable.pl markers is auto generated by mg_vtable.pl; any changes there need to be made instead to mg_vtable.pl =head1 NAME perlguts - Introduction to the Perl API =head1 DESCRIPTION This document attempts to describe how to use the Perl API, as well as to provide some info on the basic workings of the Perl core. It is far from complete and probably contains many errors. Please refer any questions or comments to the author below. =head1 Variables =head2 Datatypes Perl has three typedefs that handle Perl's three main data types: SV Scalar Value AV Array Value HV Hash Value Each typedef has specific routines that manipulate the various data types. =for apidoc_section $AV =for apidoc Ayh||AV =for apidoc_section $HV =for apidoc Ayh||HV =for apidoc_section $SV =for apidoc Ayh||SV =head2 What is an "IV"? Perl uses a special typedef IV which is a simple signed integer type that is guaranteed to be large enough to hold a pointer (as well as an integer). Additionally, there is the UV, which is simply an unsigned IV. Perl also uses several special typedefs to declare variables to hold integers of (at least) a given size. Use I8, I16, I32, and I64 to declare a signed integer variable which has at least as many bits as the number in its name. These all evaluate to the native C type that is closest to the given number of bits, but no smaller than that number. For example, on many platforms, a C is 16 bits long, and if so, I16 will evaluate to a C. But on platforms where a C isn't exactly 16 bits, Perl will use the smallest type that contains 16 bits or more. U8, U16, U32, and U64 are to declare the corresponding unsigned integer types. If the platform doesn't support 64-bit integers, both I64 and U64 will be undefined. Use IV and UV to declare the largest practicable, and C> for the absolute maximum unsigned, but which may not be usable in all circumstances. A numeric constant can be specified with L>, L>, and similar. =for apidoc_section $integer =for apidoc Ayh ||IV =for apidoc_item ||I8 =for apidoc_item ||I16 =for apidoc_item ||I32 =for apidoc_item ||I64 =for apidoc Ayh ||UV =for apidoc_item ||U8 =for apidoc_item ||U16 =for apidoc_item ||U32 =for apidoc_item ||U64 =head2 Working with SVs An SV can be created and loaded with one command. There are five types of values that can be loaded: an integer value (IV), an unsigned integer value (UV), a double (NV), a string (PV), and another scalar (SV). ("PV" stands for "Pointer Value". You might think that it is misnamed because it is described as pointing only to strings. However, it is possible to have it point to other things. For example, it could point to an array of UVs. But, using it for non-strings requires care, as the underlying assumption of much of the internals is that PVs are just for strings. Often, for example, a trailing C is tacked on automatically. The non-string use is documented only in this paragraph.) =for apidoc_section $floating =for apidoc Ayh||NV The seven routines are: SV* newSViv(IV); SV* newSVuv(UV); SV* newSVnv(double); SV* newSVpv(const char*, STRLEN); SV* newSVpvn(const char*, STRLEN); SV* newSVpvf(const char*, ...); SV* newSVsv(SV*); C is an integer type (C, usually defined as C in F) guaranteed to be large enough to represent the size of any string that perl can handle. =for apidoc_section $string =for apidoc Ayh||STRLEN In the unlikely case of a SV requiring more complex initialization, you can create an empty SV with newSV(len). If C is 0 an empty SV of type NULL is returned, else an SV of type PV is returned with len + 1 (for the C) bytes of storage allocated, accessible via SvPVX. In both cases the SV has the undef value. SV *sv = newSV(0); /* no storage allocated */ SV *sv = newSV(10); /* 10 (+1) bytes of uninitialised storage * allocated */ To change the value of an I SV, there are eight routines: void sv_setiv(SV*, IV); void sv_setuv(SV*, UV); void sv_setnv(SV*, double); void sv_setpv(SV*, const char*); void sv_setpvn(SV*, const char*, STRLEN) void sv_setpvf(SV*, const char*, ...); void sv_vsetpvfn(SV*, const char*, STRLEN, va_list *, SV **, Size_t, bool *); void sv_setsv(SV*, SV*); Notice that you can choose to specify the length of the string to be assigned by using C, C, or C, or you may allow Perl to calculate the length by using C or by specifying 0 as the second argument to C. Be warned, though, that Perl will determine the string's length by using C, which depends on the string terminating with a C character, and not otherwise containing NULs. The arguments of C are processed like C, and the formatted output becomes the value. C is an analogue of C, but it allows you to specify either a pointer to a variable argument list or the address and length of an array of SVs. The last argument points to a boolean; on return, if that boolean is true, then locale-specific information has been used to format the string, and the string's contents are therefore untrustworthy (see L). This pointer may be NULL if that information is not important. Note that this function requires you to specify the length of the format. The C functions are not generic enough to operate on values that have "magic". See L later in this document. All SVs that contain strings should be terminated with a C character. If it is not C-terminated there is a risk of core dumps and corruptions from code which passes the string to C functions or system calls which expect a C-terminated string. Perl's own functions typically add a trailing C for this reason. Nevertheless, you should be very careful when you pass a string stored in an SV to a C function or system call. To access the actual value that an SV points to, Perl's API exposes several macros that coerce the actual scalar type into an IV, UV, double, or string: =over =item * C (C) and C (C) =item * C (C) =item * Strings are a bit complicated: =over =item * Byte string: C or C If the Perl string is C, then this returns a 2-byte C. This is suitable for Perl strings that represent bytes. =item * UTF-8 string: C or C If the Perl string is C, then this returns a 4-byte C. This is suitable for Perl strings that represent characters. B: That C will be encoded via Perl's internal UTF-8 variant, which means that if the SV contains non-Unicode code points (e.g., 0x110000), then the result may contain extensions over valid UTF-8. See L for some methods Perl gives you to check the UTF-8 validity of these macros' returns. =item * You can also use C or C to fetch the SV's raw internal buffer. This is tricky, though; if your Perl string is C, then depending on the SV's internal encoding you might get back a 2-byte B a 4-byte C. Moreover, if it's the 4-byte string, that could come from either Perl C stored UTF-8 encoded, or Perl C stored as raw octets. To differentiate between these you B look up the SV's UTF8 bit (cf. C) to know whether the source Perl string is 2 characters (C would be on) or 4 characters (C would be off). B Use of C, C, or similarly-named macros I looking up the SV's UTF8 bit is almost certainly a bug if non-ASCII input is allowed. When the UTF8 bit is on, the same B about UTF-8 validity applies here as for C. =back (See L for more details.) In C, C, and C, the length of the C returned is placed into the variable C (these are macros, so you do I use C). If you do not care what the length of the data is, use C, C, or C instead. The global variable C can also be given to C/C/C in this case. But that can be quite inefficient because C must be accessed in thread-local storage in threaded Perl. In any case, remember that Perl allows arbitrary strings of data that may both contain NULs and might not be terminated by a C. Also remember that C doesn't allow you to safely say C. It might work with your compiler, but it won't work for everyone. Break this sort of statement up into separate assignments: SV *s; STRLEN len; char *ptr; ptr = SvPVbyte(s, len); foo(ptr, len); =back If you want to know if the scalar value is TRUE, you can use: SvTRUE(SV*) Although Perl will automatically grow strings for you, if you need to force Perl to allocate more memory for your SV, you can use the macro SvGROW(SV*, STRLEN newlen) which will determine if more memory needs to be allocated. If so, it will call the function C. Note that C can only increase, not decrease, the allocated memory of an SV and that it does not automatically add space for the trailing C byte (perl's own string functions typically do C). If you want to write to an existing SV's buffer and set its value to a string, use SvPVbyte_force() or one of its variants to force the SV to be a PV. This will remove any of various types of non-stringness from the SV while preserving the content of the SV in the PV. This can be used, for example, to append data from an API function to a buffer without extra copying: (void)SvPVbyte_force(sv, len); s = SvGROW(sv, len + needlen + 1); /* something that modifies up to needlen bytes at s+len, but modifies newlen bytes eg. newlen = read(fd, s + len, needlen); ignoring errors for these examples */ s[len + newlen] = '\0'; SvCUR_set(sv, len + newlen); SvUTF8_off(sv); SvSETMAGIC(sv); If you already have the data in memory or if you want to keep your code simple, you can use one of the sv_cat*() variants, such as sv_catpvn(). If you want to insert anywhere in the string you can use sv_insert() or sv_insert_flags(). If you don't need the existing content of the SV, you can avoid some copying with: SvPVCLEAR(sv); s = SvGROW(sv, needlen + 1); /* something that modifies up to needlen bytes at s, but modifies newlen bytes eg. newlen = read(fd, s, needlen); */ s[newlen] = '\0'; SvCUR_set(sv, newlen); SvPOK_only(sv); /* also clears SVf_UTF8 */ SvSETMAGIC(sv); Again, if you already have the data in memory or want to avoid the complexity of the above, you can use sv_setpvn(). If you have a buffer allocated with Newx() and want to set that as the SV's value, you can use sv_usepvn_flags(). That has some requirements if you want to avoid perl re-allocating the buffer to fit the trailing NUL: Newx(buf, somesize+1, char); /* ... fill in buf ... */ buf[somesize] = '\0'; sv_usepvn_flags(sv, buf, somesize, SV_SMAGIC | SV_HAS_TRAILING_NUL); /* buf now belongs to perl, don't release it */ If you have an SV and want to know what kind of data Perl thinks is stored in it, you can use the following macros to check the type of SV you have. SvIOK(SV*) SvNOK(SV*) SvPOK(SV*) Be aware that retrieving the numeric value of an SV can set IOK or NOK on that SV, even when the SV started as a string. Prior to Perl 5.36.0 retrieving the string value of an integer could set POK, but this can no longer occur. From 5.36.0 this can be used to distinguish the original representation of an SV and is intended to make life simpler for serializers: /* references handled elsewhere */ if (SvIsBOOL(sv)) { /* originally boolean */ ... } else if (SvPOK(sv)) { /* originally a string */ ... } else if (SvNIOK(sv)) { /* originally numeric */ ... } else { /* something special or undef */ } You can get and set the current length of the string stored in an SV with the following macros: SvCUR(SV*) SvCUR_set(SV*, I32 val) You can also get a pointer to the end of the string stored in the SV with the macro: SvEND(SV*) But note that these last three macros are valid only if C is true. If you want to append something to the end of string stored in an C, you can use the following functions: void sv_catpv(SV*, const char*); void sv_catpvn(SV*, const char*, STRLEN); void sv_catpvf(SV*, const char*, ...); void sv_vcatpvfn(SV*, const char*, STRLEN, va_list *, SV **, I32, bool); void sv_catsv(SV*, SV*); The first function calculates the length of the string to be appended by using C. In the second, you specify the length of the string yourself. The third function processes its arguments like C and appends the formatted output. The fourth function works like C. You can specify the address and length of an array of SVs instead of the va_list argument. The fifth function extends the string stored in the first SV with the string stored in the second SV. It also forces the second SV to be interpreted as a string. The C functions are not generic enough to operate on values that have "magic". See L later in this document. If you know the name of a scalar variable, you can get a pointer to its SV by using the following: SV* get_sv("package::varname", 0); This returns NULL if the variable does not exist. If you want to know if this variable (or any other SV) is actually C, you can call: SvOK(SV*) The scalar C value is stored in an SV instance called C. Its address can be used whenever an C is needed. Make sure that you don't try to compare a random sv with C. For example when interfacing Perl code, it'll work correctly for: foo(undef); But won't work when called as: $x = undef; foo($x); So to repeat always use SvOK() to check whether an sv is defined. Also you have to be careful when using C as a value in AVs or HVs (see L). There are also the two values C and C, which contain boolean TRUE and FALSE values, respectively. Like C, their addresses can be used whenever an C is needed. Do not be fooled into thinking that C is the same as C. Take this code: SV* sv = (SV*) 0; if (I-am-to-return-a-real-value) { sv = sv_2mortal(newSViv(42)); } sv_setsv(ST(0), sv); This code tries to return a new SV (which contains the value 42) if it should return a real value, or undef otherwise. Instead it has returned a NULL pointer which, somewhere down the line, will cause a segmentation violation, bus error, or just weird results. Change the zero to C in the first line and all will be well. To free an SV that you've created, call C. Normally this call is not necessary (see L). =head2 Offsets Perl provides the function C to efficiently remove characters from the beginning of a string; you give it an SV and a pointer to somewhere inside the PV, and it discards everything before the pointer. The efficiency comes by means of a little hack: instead of actually removing the characters, C sets the flag C (offset OK) to signal to other functions that the offset hack is in effect, and it moves the PV pointer (called C) forward by the number of bytes chopped off, and adjusts C and C accordingly. (A portion of the space between the old and new PV pointers is used to store the count of chopped bytes.) Hence, at this point, the start of the buffer that we allocated lives at C in memory and the PV pointer is pointing into the middle of this allocated storage. This is best demonstrated by example. Normally copy-on-write will prevent the substitution from operator from using this hack, but if you can craft a string for which copy-on-write is not possible, you can see it in play. In the current implementation, the final byte of a string buffer is used as a copy-on-write reference count. If the buffer is not big enough, then copy-on-write is skipped. First have a look at an empty string: % ./perl -Ilib -MDevel::Peek -le '$a=""; $a .= ""; Dump $a' SV = PV(0x7ffb7c008a70) at 0x7ffb7c030390 REFCNT = 1 FLAGS = (POK,pPOK) PV = 0x7ffb7bc05b50 ""\0 CUR = 0 LEN = 10 Notice here the LEN is 10. (It may differ on your platform.) Extend the length of the string to one less than 10, and do a substitution: % ./perl -Ilib -MDevel::Peek -le '$a=""; $a.="123456789"; $a=~s/.//; \ Dump($a)' SV = PV(0x7ffa04008a70) at 0x7ffa04030390 REFCNT = 1 FLAGS = (POK,OOK,pPOK) OFFSET = 1 PV = 0x7ffa03c05b61 ( "\1" . ) "23456789"\0 CUR = 8 LEN = 9 Here the number of bytes chopped off (1) is shown next as the OFFSET. The portion of the string between the "real" and the "fake" beginnings is shown in parentheses, and the values of C and C reflect the fake beginning, not the real one. (The first character of the string buffer happens to have changed to "\1" here, not "1", because the current implementation stores the offset count in the string buffer. This is subject to change.) Something similar to the offset hack is performed on AVs to enable efficient shifting and splicing off the beginning of the array; while C points to the first element in the array that is visible from Perl, C points to the real start of the C array. These are usually the same, but a C operation can be carried out by increasing C by one and decreasing C and C. Again, the location of the real start of the C array only comes into play when freeing the array. See C in F. =for apidoc_section $AV =for apidoc Amh||AvALLOC|AV* av =head2 What's Really Stored in an SV? Recall that the usual method of determining the type of scalar you have is to use C macros. Because a scalar can be both a number and a string, usually these macros will always return TRUE and calling the C macros will do the appropriate conversion of string to integer/double or integer/double to string. If you I need to know if you have an integer, double, or string pointer in an SV, you can use the following three macros instead: SvIOKp(SV*) SvNOKp(SV*) SvPOKp(SV*) These will tell you if you truly have an integer, double, or string pointer stored in your SV. The "p" stands for private. There are various ways in which the private and public flags may differ. For example, in perl 5.16 and earlier a tied SV may have a valid underlying value in the IV slot (so SvIOKp is true), but the data should be accessed via the FETCH routine rather than directly, so SvIOK is false. (In perl 5.18 onwards, tied scalars use the flags the same way as untied scalars.) Another is when numeric conversion has occurred and precision has been lost: only the private flag is set on 'lossy' values. So when an NV is converted to an IV with loss, SvIOKp, SvNOKp and SvNOK will be set, while SvIOK wont be. In general, though, it's best to use the C macros. =head2 Working with AVs There are two main, longstanding ways to create and load an AV. The first method creates an empty AV: AV* newAV(); The second method both creates the AV and initially populates it with SVs: AV* av_make(SSize_t num, SV **ptr); The second argument points to an array containing C C's. Once the AV has been created, the SVs can be destroyed, if so desired. Perl v5.36 added two new ways to create an AV and allocate a SV** array without populating it. These are more efficient than a newAV() followed by an av_extend(). /* Creates but does not initialize (Zero) the SV** array */ AV *av = newAV_alloc_x(1); /* Creates and does initialize (Zero) the SV** array */ AV *av = newAV_alloc_xz(1); The numerical argument refers to the number of array elements to allocate, not an array index, and must be >0. The first form must only ever be used when all elements will be initialized before any read occurs. Reading a non-initialized SV* - i.e. treating a random memory address as a SV* - is a serious bug. Once the AV has been created, the following operations are possible on it: void av_push(AV*, SV*); SV* av_pop(AV*); SV* av_shift(AV*); void av_unshift(AV*, SSize_t num); These should be familiar operations, with the exception of C. This routine adds C elements at the front of the array with the C value. You must then use C (described below) to assign values to these new elements. Here are some other functions: Size_t av_count(AV*); SSize_t av_top_index(AV*); SV** av_fetch(AV*, SSize_t key, I32 lval); SV** av_store(AV*, SSize_t key, SV* val); C returns the number of elements in the array (including any empty slots (undefined ones) that are intermixed with filled-in ones). The C function returns the highest index value in an array (just like $#array in Perl). If the array is empty, -1 is returned. It is always equal to S>. The C function returns the value at index C, but if C is non-zero, then C will store an undef value at that index. The C function stores the value C at index C, and does not increment the reference count of C. Thus the caller is responsible for taking care of that, and if C returns NULL, the caller will have to decrement the reference count to avoid a memory leak. Note that C and C both return C's, not C's as their return value. A few more: void av_clear(AV*); void av_undef(AV*); void av_extend(AV*, SSize_t key); The C function deletes all the elements in the AV* array, but does not actually delete the array itself. The C function will delete all the elements in the array plus the array itself. The C function extends the array so that it contains at least C elements. If C is less than the currently allocated length of the array, then nothing is done. If you know the name of an array variable, you can get a pointer to its AV by using the following: AV* get_av("package::varname", 0); This returns NULL if the variable does not exist. See L for more information on how to use the array access functions on tied arrays. =head3 More efficient working with new or vanilla AVs Perl v5.36 and v5.38 introduced streamlined, inlined versions of some functions: =over =item * C =item * C =item * C =back These are drop-in replacements, but can only be used on straightforward AVs that meet the following criteria: =over =item * are not magical =item * are not readonly =item * are "real" (refcounted) AVs =item * have an av_top_index value > -2 =back AVs created using C, C, C, and C are all compatible at the time of creation. It is only if they are declared readonly or unreal, have magic attached, or are otherwise configured unusually that they will stop being compatible. Note that some interpreter functions may attach magic to an AV as part of normal operations. It is therefore safest, unless you are sure of the lifecycle of an AV, to only use these new functions close to the point of AV creation. =head2 Working with HVs To create an HV, you use the following routine: HV* newHV(); Once the HV has been created, the following operations are possible on it: SV** hv_store(HV*, const char* key, U32 klen, SV* val, U32 hash); SV** hv_fetch(HV*, const char* key, U32 klen, I32 lval); The C parameter is the length of the key being passed in (Note that you cannot pass 0 in as a value of C to tell Perl to measure the length of the key). The C argument contains the SV pointer to the scalar being stored, and C is the precomputed hash value (zero if you want C to calculate it for you). The C parameter indicates whether this fetch is actually a part of a store operation, in which case a new undefined value will be added to the HV with the supplied key and C will return as if the value had already existed. Remember that C and C return C's and not just C. To access the scalar value, you must first dereference the return value. However, you should check to make sure that the return value is not NULL before dereferencing it. The first of these two functions checks if a hash table entry exists, and the second deletes it. bool hv_exists(HV*, const char* key, U32 klen); SV* hv_delete(HV*, const char* key, U32 klen, I32 flags); If C does not include the C flag then C will create and return a mortal copy of the deleted value. And more miscellaneous functions: void hv_clear(HV*); void hv_undef(HV*); Like their AV counterparts, C deletes all the entries in the hash table but does not actually delete the hash table. The C deletes both the entries and the hash table itself. Perl keeps the actual data in a linked list of structures with a typedef of HE. These contain the actual key and value pointers (plus extra administrative overhead). The key is a string pointer; the value is an C. However, once you have an C, to get the actual key and value, use the routines specified below. =for apidoc_section $HV =for apidoc Ayh||HE I32 hv_iterinit(HV*); /* Prepares starting point to traverse hash table */ HE* hv_iternext(HV*); /* Get the next entry, and return a pointer to a structure that has both the key and value */ char* hv_iterkey(HE* entry, I32* retlen); /* Get the key from an HE structure and also return the length of the key string */ SV* hv_iterval(HV*, HE* entry); /* Return an SV pointer to the value of the HE structure */ SV* hv_iternextsv(HV*, char** key, I32* retlen); /* This convenience routine combines hv_iternext, hv_iterkey, and hv_iterval. The key and retlen arguments are return values for the key and its length. The value is returned in the SV* argument */ If you know the name of a hash variable, you can get a pointer to its HV by using the following: HV* get_hv("package::varname", 0); This returns NULL if the variable does not exist. The hash algorithm is defined in the C macro: PERL_HASH(hash, key, klen) The exact implementation of this macro varies by architecture and version of perl, and the return value may change per invocation, so the value is only valid for the duration of a single perl process. See L for more information on how to use the hash access functions on tied hashes. =for apidoc_section $HV =for apidoc Amh|void|PERL_HASH|U32 hash|char *key|STRLEN klen =head2 Hash API Extensions Beginning with version 5.004, the following functions are also supported: HE* hv_fetch_ent (HV* tb, SV* key, I32 lval, U32 hash); HE* hv_store_ent (HV* tb, SV* key, SV* val, U32 hash); bool hv_exists_ent (HV* tb, SV* key, U32 hash); SV* hv_delete_ent (HV* tb, SV* key, I32 flags, U32 hash); SV* hv_iterkeysv (HE* entry); Note that these functions take C keys, which simplifies writing of extension code that deals with hash structures. These functions also allow passing of C keys to C functions without forcing you to stringify the keys (unlike the previous set of functions). They also return and accept whole hash entries (C), making their use more efficient (since the hash number for a particular string doesn't have to be recomputed every time). See L for detailed descriptions. The following macros must always be used to access the contents of hash entries. Note that the arguments to these macros must be simple variables, since they may get evaluated more than once. See L for detailed descriptions of these macros. HePV(HE* he, STRLEN len) HeVAL(HE* he) HeHASH(HE* he) HeSVKEY(HE* he) HeSVKEY_force(HE* he) HeSVKEY_set(HE* he, SV* sv) These two lower level macros are defined, but must only be used when dealing with keys that are not Cs: HeKEY(HE* he) HeKLEN(HE* he) Note that both C and C do not increment the reference count of the stored C, which is the caller's responsibility. If these functions return a NULL value, the caller will usually have to decrement the reference count of C to avoid a memory leak. =head2 AVs, HVs and undefined values Sometimes you have to store undefined values in AVs or HVs. Although this may be a rare case, it can be tricky. That's because you're used to using C if you need an undefined SV. For example, intuition tells you that this XS code: AV *av = newAV(); av_store( av, 0, &PL_sv_undef ); is equivalent to this Perl code: my @av; $av[0] = undef; Unfortunately, this isn't true. In perl 5.18 and earlier, AVs use C as a marker for indicating that an array element has not yet been initialized. Thus, C would be true for the above Perl code, but false for the array generated by the XS code. In perl 5.20, storing &PL_sv_undef will create a read-only element, because the scalar &PL_sv_undef itself is stored, not a copy. Similar problems can occur when storing C in HVs: hv_store( hv, "key", 3, &PL_sv_undef, 0 ); This will indeed make the value C, but if you try to modify the value of C, you'll get the following error: Modification of non-creatable hash value attempted In perl 5.8.0, C was also used to mark placeholders in restricted hashes. This caused such hash entries not to appear when iterating over the hash or when checking for the keys with the C function. You can run into similar problems when you store C or C into AVs or HVs. Trying to modify such elements will give you the following error: Modification of a read-only value attempted To make a long story short, you can use the special variables C, C and C with AVs and HVs, but you have to make sure you know what you're doing. Generally, if you want to store an undefined value in an AV or HV, you should not use C, but rather create a new undefined value using the C function, for example: av_store( av, 42, newSV(0) ); hv_store( hv, "foo", 3, newSV(0), 0 ); =head2 References References are a special type of scalar that point to other data types (including other references). To create a reference, use either of the following functions: SV* newRV_inc((SV*) thing); SV* newRV_noinc((SV*) thing); The C argument can be any of an C, C, or C. The functions are identical except that C increments the reference count of the C, while C does not. For historical reasons, C is a synonym for C. Once you have a reference, you can use the following macro to dereference the reference: SvRV(SV*) then call the appropriate routines, casting the returned C to either an C or C, if required. To determine if an SV is a reference, you can use the following macro: SvROK(SV*) To discover what type of value the reference refers to, use the following macro and then check the return value. SvTYPE(SvRV(SV*)) The most useful types that will be returned are: SVt_PVAV Array SVt_PVHV Hash SVt_PVCV Code SVt_PVGV Glob (possibly a file handle) Any numerical value returned which is less than SVt_PVAV will be a scalar of some form. See L for more details. =head2 Blessed References and Class Objects References are also used to support object-oriented programming. In perl's OO lexicon, an object is simply a reference that has been blessed into a package (or class). Once blessed, the programmer may now use the reference to access the various methods in the class. A reference can be blessed into a package with the following function: SV* sv_bless(SV* sv, HV* stash); The C argument must be a reference value. The C argument specifies which class the reference will belong to. See L for information on converting class names into stashes. /* Still under construction */ The following function upgrades rv to reference if not already one. Creates a new SV for rv to point to. If C is non-null, the SV is blessed into the specified class. SV is returned. SV* newSVrv(SV* rv, const char* classname); The following three functions copy integer, unsigned integer or double into an SV whose reference is C. SV is blessed if C is non-null. SV* sv_setref_iv(SV* rv, const char* classname, IV iv); SV* sv_setref_uv(SV* rv, const char* classname, UV uv); SV* sv_setref_nv(SV* rv, const char* classname, NV iv); The following function copies the pointer value (I) into an SV whose reference is rv. SV is blessed if C is non-null. SV* sv_setref_pv(SV* rv, const char* classname, void* pv); The following function copies a string into an SV whose reference is C. Set length to 0 to let Perl calculate the string length. SV is blessed if C is non-null. SV* sv_setref_pvn(SV* rv, const char* classname, char* pv, STRLEN length); The following function tests whether the SV is blessed into the specified class. It does not check inheritance relationships. int sv_isa(SV* sv, const char* name); The following function tests whether the SV is a reference to a blessed object. int sv_isobject(SV* sv); The following function tests whether the SV is derived from the specified class. SV can be either a reference to a blessed object or a string containing a class name. This is the function implementing the C<:isa> functionality. bool sv_derived_from(SV* sv, const char* name); To check if you've got an object derived from a specific class you have to write: if (sv_isobject(sv) && sv_derived_from(sv, class)) { ... } =head2 Creating New Variables To create a new Perl variable with an undef value which can be accessed from your Perl script, use the following routines, depending on the variable type. SV* get_sv("package::varname", GV_ADD); AV* get_av("package::varname", GV_ADD); HV* get_hv("package::varname", GV_ADD); Notice the use of GV_ADD as the second parameter. The new variable can now be set, using the routines appropriate to the data type. There are additional macros whose values may be bitwise OR'ed with the C argument to enable certain extra features. Those bits are: =over =item GV_ADDMULTI Marks the variable as multiply defined, thus preventing the: Name used only once: possible typo warning. =item GV_ADDWARN Issues the warning: Had to create unexpectedly if the variable did not exist before the function was called. =back If you do not specify a package name, the variable is created in the current package. =head2 Reference Counts and Mortality Perl uses a reference count-driven garbage collection mechanism. SVs, AVs, or HVs (xV for short in the following) start their life with a reference count of 1. If the reference count of an xV ever drops to 0, then it will be destroyed and its memory made available for reuse. At the most basic internal level, reference counts can be manipulated with the following macros: int SvREFCNT(SV* sv); SV* SvREFCNT_inc(SV* sv); void SvREFCNT_dec(SV* sv); (There are also suffixed versions of the increment and decrement macros, for situations where the full generality of these basic macros can be exchanged for some performance.) However, the way a programmer should think about references is not so much in terms of the bare reference count, but in terms of I of references. A reference to an xV can be owned by any of a variety of entities: another xV, the Perl interpreter, an XS data structure, a piece of running code, or a dynamic scope. An xV generally does not know what entities own the references to it; it only knows how many references there are, which is the reference count. To correctly maintain reference counts, it is essential to keep track of what references the XS code is manipulating. The programmer should always know where a reference has come from and who owns it, and be aware of any creation or destruction of references, and any transfers of ownership. Because ownership isn't represented explicitly in the xV data structures, only the reference count need be actually maintained by the code, and that means that this understanding of ownership is not actually evident in the code. For example, transferring ownership of a reference from one owner to another doesn't change the reference count at all, so may be achieved with no actual code. (The transferring code doesn't touch the referenced object, but does need to ensure that the former owner knows that it no longer owns the reference, and that the new owner knows that it now does.) An xV that is visible at the Perl level should not become unreferenced and thus be destroyed. Normally, an object will only become unreferenced when it is no longer visible, often by the same means that makes it invisible. For example, a Perl reference value (RV) owns a reference to its referent, so if the RV is overwritten that reference gets destroyed, and the no-longer-reachable referent may be destroyed as a result. Many functions have some kind of reference manipulation as part of their purpose. Sometimes this is documented in terms of ownership of references, and sometimes it is (less helpfully) documented in terms of changes to reference counts. For example, the L function is documented to create a new RV (with reference count 1) and increment the reference count of the referent that was supplied by the caller. This is best understood as creating a new reference to the referent, which is owned by the created RV, and returning to the caller ownership of the sole reference to the RV. The L function instead does not increment the reference count of the referent, but the RV nevertheless ends up owning a reference to the referent. It is therefore implied that the caller of C is relinquishing a reference to the referent, making this conceptually a more complicated operation even though it does less to the data structures. For example, imagine you want to return a reference from an XSUB function. Inside the XSUB routine, you create an SV which initially has just a single reference, owned by the XSUB routine. This reference needs to be disposed of before the routine is complete, otherwise it will leak, preventing the SV from ever being destroyed. So to create an RV referencing the SV, it is most convenient to pass the SV to C, which consumes that reference. Now the XSUB routine no longer owns a reference to the SV, but does own a reference to the RV, which in turn owns a reference to the SV. The ownership of the reference to the RV is then transferred by the process of returning the RV from the XSUB. There are some convenience functions available that can help with the destruction of xVs. These functions introduce the concept of "mortality". Much documentation speaks of an xV itself being mortal, but this is misleading. It is really I an xV that is mortal, and it is possible for there to be more than one mortal reference to a single xV. For a reference to be mortal means that it is owned by the temps stack, one of perl's many internal stacks, which will destroy that reference "a short time later". Usually the "short time later" is the end of the current Perl statement. However, it gets more complicated around dynamic scopes: there can be multiple sets of mortal references hanging around at the same time, with different death dates. Internally, the actual determinant for when mortal xV references are destroyed depends on two macros, SAVETMPS and FREETMPS. See L and L and L below for more details on these macros. Mortal references are mainly used for xVs that are placed on perl's main stack. The stack is problematic for reference tracking, because it contains a lot of xV references, but doesn't own those references: they are not counted. Currently, there are many bugs resulting from xVs being destroyed while referenced by the stack, because the stack's uncounted references aren't enough to keep the xVs alive. So when putting an (uncounted) reference on the stack, it is vitally important to ensure that there will be a counted reference to the same xV that will last at least as long as the uncounted reference. But it's also important that that counted reference be cleaned up at an appropriate time, and not unduly prolong the xV's life. For there to be a mortal reference is often the best way to satisfy this requirement, especially if the xV was created especially to be put on the stack and would otherwise be unreferenced. To create a mortal reference, use the functions: SV* sv_newmortal() SV* sv_mortalcopy(SV*) SV* sv_2mortal(SV*) C creates an SV (with the undefined value) whose sole reference is mortal. C creates an xV whose value is a copy of a supplied xV and whose sole reference is mortal. C mortalises an existing xV reference: it transfers ownership of a reference from the caller to the temps stack. Because C gives the new SV no value, it must normally be given one via C, C, etc. : SV *tmp = sv_newmortal(); sv_setiv(tmp, an_integer); As that is multiple C statements it is quite common so see this idiom instead: SV *tmp = sv_2mortal(newSViv(an_integer)); The mortal routines are not just for SVs; AVs and HVs can be made mortal by passing their address (type-casted to C) to the C or C routines. =head2 Stashes and Globs A B is a hash that contains all variables that are defined within a package. Each key of the stash is a symbol name (shared by all the different types of objects that have the same name), and each value in the hash table is a GV (Glob Value). This GV in turn contains references to the various objects of that name, including (but not limited to) the following: Scalar Value Array Value Hash Value I/O Handle Format Subroutine There is a single stash called C that holds the items that exist in the C

package. To get at the items in other packages, append the string "::" to the package name. The items in the C package are in the stash C<:> in PL_defstash. The items in the C<:baz> package are in the stash C<:> in C<:>'s stash. =for apidoc_section $GV =for apidoc Amnh||PL_defstash To get the stash pointer for a particular package, use the function: HV* gv_stashpv(const char* name, I32 flags) HV* gv_stashsv(SV*, I32 flags) The first function takes a literal string, the second uses the string stored in the SV. Remember that a stash is just a hash table, so you get back an C. The C flag will create a new package if it is set to GV_ADD. The name that C wants is the name of the package whose symbol table you want. The default package is called C
. If you have multiply nested packages, pass their names to C, separated by C<::> as in the Perl language itself. Alternately, if you have an SV that is a blessed reference, you can find out the stash pointer by using: HV* SvSTASH(SvRV(SV*)); then use the following to get the package name itself: char* HvNAME(HV* stash); If you need to bless or re-bless an object you can use the following function: SV* sv_bless(SV*, HV* stash) where the first argument, an C, must be a reference, and the second argument is a stash. The returned C can now be used in the same way as any other SV. For more information on references and blessings, consult L. =head2 I/O Handles Like AVs and HVs, IO objects are another type of non-scalar SV which may contain input and output L objects or a C from opendir(). You can create a new IO object: IO* newIO(); Unlike other SVs, a new IO object is automatically blessed into the L<:file> class. The IO object contains an input and output PerlIO handle: PerlIO *IoIFP(IO *io); PerlIO *IoOFP(IO *io); =for apidoc_section $io =for apidoc Amh|PerlIO *|IoIFP|IO *io =for apidoc Amh|PerlIO *|IoOFP|IO *io Typically if the IO object has been opened on a file, the input handle is always present, but the output handle is only present if the file is open for output. For a file, if both are present they will be the same PerlIO object. Distinct input and output PerlIO objects are created for sockets and character devices. The IO object also contains other data associated with Perl I/O handles: IV IoLINES(io); /* $. */ IV IoPAGE(io); /* $% */ IV IoPAGE_LEN(io); /* $= */ IV IoLINES_LEFT(io); /* $- */ char *IoTOP_NAME(io); /* $^ */ GV *IoTOP_GV(io); /* $^ */ char *IoFMT_NAME(io); /* $~ */ GV *IoFMT_GV(io); /* $~ */ char *IoBOTTOM_NAME(io); GV *IoBOTTOM_GV(io); char IoTYPE(io); U8 IoFLAGS(io); =for apidoc_sections $io_scn, $formats_section =for apidoc_section $reports =for apidoc Amh|IV|IoLINES|IO *io =for apidoc Amh|IV|IoPAGE|IO *io =for apidoc Amh|IV|IoPAGE_LEN|IO *io =for apidoc Amh|IV|IoLINES_LEFT|IO *io =for apidoc Amh|char *|IoTOP_NAME|IO *io =for apidoc Amh|GV *|IoTOP_GV|IO *io =for apidoc Amh|char *|IoFMT_NAME|IO *io =for apidoc Amh|GV *|IoFMT_GV|IO *io =for apidoc Amh|char *|IoBOTTOM_NAME|IO *io =for apidoc Amh|GV *|IoBOTTOM_GV|IO *io =for apidoc_section $io =for apidoc Amh|char|IoTYPE|IO *io =for apidoc Amh|U8|IoFLAGS|IO *io Most of these are involved with L. IoFLAGs() may contain a combination of flags, the most interesting of which are C (C) for autoflush and C, settable with Luntaint" >>. =for apidoc Amnh||IOf_FLUSH =for apidoc Amnh||IOf_UNTAINT The IO object may also contains a directory handle: DIR *IoDIRP(io); =for apidoc Amh|DIR *|IoDIRP|IO *io suitable for use with PerlDir_read() etc. All of these accessors macros are lvalues, there are no distinct C<_set> macros to modify the members of the IO object. =head2 Double-Typed SVs Scalar variables normally contain only one type of value, an integer, double, pointer, or reference. Perl will automatically convert the actual scalar data from the stored type into the requested type. Some scalar variables contain more than one type of scalar data. For example, the variable C contains either the numeric value of C or its string equivalent from either C or C