Skip to content

Add OPpTARGET_MY optimization to OP_UNDEF #20077

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Aug 25, 2022

Conversation

richardleach
Copy link
Contributor

This allows the existing undef OP to act on a pad SV. The following
two cases are optimized:

undef my $x, currently implemented as:

    4     <1> undef vK/1 ->5
    3        <0> padsv[$x:1,2] sRM/LVINTRO ->4

my $a = undef, currently implemented as:

    5     <2> sassign vKS/2 ->6
    3        <0> undef s ->4
    4        <0> padsv[$x:1,2] sRM*/LVINTRO ->5

These are now just represented as:
3 <1> undef[$x:1,2] vK/LVINTRO,TARGMY ->4

The undef $x case gets a slight performance boost, as shown in this toy example:
my $x; for (0..10_000_000) { undef $x; undef $x; undef $x; undef $x; undef $x; undef $x; undef $x; undef $x; undef $x; undef $x }
Blead:

            714.52 msec task-clock                #    0.956 CPUs utilized
                 1      context-switches          #    0.001 K/sec
                 0      cpu-migrations            #    0.000 K/sec
               198      page-faults               #    0.277 K/sec
     3,170,725,221      cycles                    #    4.438 GHz
         5,948,953      stalled-cycles-frontend   #    0.19% frontend cycles idle
           231,911      stalled-cycles-backend    #    0.01% backend cycles idle
    10,843,683,798      instructions              #    3.42  insn per cycle
                                                  #    0.00  stalled cycles per insn
     2,260,727,959      branches                  # 3163.998 M/sec

Patched:

            602.19 msec task-clock                #    0.974 CPUs utilized
                 1      context-switches          #    0.002 K/sec
                 0      cpu-migrations            #    0.000 K/sec
               199      page-faults               #    0.330 K/sec
     2,724,503,940      cycles                    #    4.524 GHz
           313,679      stalled-cycles-frontend   #    0.01% frontend cycles idle
           191,871      stalled-cycles-backend    #    0.01% backend cycles idle
     8,943,649,796      instructions              #    3.28  insn per cycle
                                                  #    0.00  stalled cycles per insn
     1,760,721,729      branches                  # 2923.875 M/sec

The $x = undef case, in which more optimization is achieved, performs much better, as shown in this toy example:
my $x; for (0..10_000_000) { $x = undef; $x = undef; $x = undef; $x = undef; $x = undef; $x = undef; $x = undef; $x = undef; $x = undef; $x = undef }

Blead:
          1,224.48 msec task-clock                #    0.990 CPUs utilized
                 2      context-switches          #    0.002 K/sec
                 0      cpu-migrations            #    0.000 K/sec
               205      page-faults               #    0.167 K/sec
     5,663,539,065      cycles                    #    4.625 GHz
         4,336,701      stalled-cycles-frontend   #    0.08% frontend cycles idle
             4,385      stalled-cycles-backend    #    0.00% backend cycles idle
    19,644,168,889      instructions              #    3.47  insn per cycle
                                                  #    0.00  stalled cycles per insn
     3,760,824,142      branches                  # 3071.356 M/sec

Patched:

            628.51 msec task-clock                #    0.974 CPUs utilized
                 1      context-switches          #    0.002 K/sec
                 0      cpu-migrations            #    0.000 K/sec
               200      page-faults               #    0.318 K/sec
     2,716,312,069      cycles                    #    4.322 GHz
           432,725      stalled-cycles-frontend   #    0.02% frontend cycles idle
           224,782      stalled-cycles-backend    #    0.01% backend cycles idle
     8,943,691,044      instructions              #    3.29  insn per cycle
                                                  #    0.00  stalled cycles per insn
     1,760,729,375      branches                  # 2801.443 M/sec

Also some bench.pl comparisons:

expr::sassign::undef_lex
$x = undef

        blead  undef
       ------ ------
    Ir 100.00 318.00
    Dr 100.00 317.65
    Dw 100.00 412.50
  COND 100.00 300.00
   IND 100.00 150.00

COND_m 100.00 100.00
 IND_m 100.00 200.00

 Ir_m1 100.00 100.00
 Dr_m1 100.00 100.00
 Dw_m1 100.00 100.00

 Ir_mm 100.00 100.00
 Dr_mm 100.00 100.00
 Dw_mm 100.00 100.00

expr::sassign::undef_lex_direc
undef $x

        blead  undef
       ------ ------
    Ir 100.00 142.00
    Dr 100.00 158.82
    Dw 100.00 162.50
  COND 100.00 142.86
   IND 100.00 150.00

COND_m 100.00 100.00
 IND_m 100.00 150.00

 Ir_m1 100.00 100.00
 Dr_m1 100.00 100.00
 Dw_m1 100.00 100.00

 Ir_mm 100.00 100.00
 Dr_mm 100.00 100.00
 Dw_mm 100.00 100.00

expr::sassign::undef_my_lex
my $x = undef

        blead  undef
       ------ ------
    Ir 100.00 164.50
    Dr 100.00 180.85
    Dw 100.00 196.15
  COND 100.00 173.68
   IND 100.00 133.33

COND_m 100.00 100.00
 IND_m 100.00 200.00

 Ir_m1 100.00 100.00
 Dr_m1 100.00 100.00
 Dw_m1 100.00 100.00

 Ir_mm 100.00 100.00
 Dr_mm 100.00 100.00
 Dw_mm 100.00 100.00

expr::sassign::undef_my_lex_direc
undef my $x

        blead  undef
       ------ ------
    Ir 100.00 112.43
    Dr 100.00 123.40
    Dw 100.00 119.23
  COND 100.00 115.79
   IND 100.00 133.33

COND_m 100.00 100.00
 IND_m 100.00 150.00

 Ir_m1 100.00 100.00
 Dr_m1 100.00 100.00
 Dw_m1 100.00 100.00

 Ir_mm 100.00 100.00
 Dr_mm 100.00 100.00
 Dw_mm 100.00 100.00

@richardleach
Copy link
Contributor Author

Rebased following OP_PADSV_STORE merge. Awaiting some reviewers.

@haarg
Copy link
Contributor

haarg commented Aug 23, 2022

This isn't just an optree optimization. undef $x and $x = undef have different behavior. $x = undef leaves a PV allocated, but clears its flags. This means re-using the variable for a string of the same size will not allocate any new memory.

Without this patch:

perl -e'use Devel::Peek; my $f; Dump $f; $f = "x" x 40; Dump($f); $f = undef; Dump($f); undef $f; Dump($f);'
SV = NULL(0x0) at 0x1370226d0
  REFCNT = 1
  FLAGS = ()
SV = PV(0x13700b670) at 0x1370226d0
  REFCNT = 1
  FLAGS = (POK,pPOK)
  PV = 0x600000a28210 "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"\0
  CUR = 40
  LEN = 42
SV = PV(0x13700b670) at 0x1370226d0
  REFCNT = 1
  FLAGS = ()
  PV = 0x600000a28210 "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"\0
  CUR = 40
  LEN = 42
SV = PV(0x13700b670) at 0x1370226d0
  REFCNT = 1
  FLAGS = ()
  PV = 0

With this patch:

./perl -Ilib -e'use Devel::Peek; my $f; Dump $f; $f = "x" x 40; Dump($f); $f = undef; Dump($f); undef $f; Dump($f);'
SV = NULL(0x0) at 0x13880df80
  REFCNT = 1
  FLAGS = ()
SV = PV(0x13800b670) at 0x13880df80
  REFCNT = 1
  FLAGS = (POK,pPOK)
  PV = 0x600001172c70 "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"\0
  CUR = 40
  LEN = 42
SV = PV(0x13800b670) at 0x13880df80
  REFCNT = 1
  FLAGS = ()
  PV = 0
SV = PV(0x13800b670) at 0x13880df80
  REFCNT = 1
  FLAGS = ()
  PV = 0

This may be a reasonable change, but would at least need to be noted in the commit message.

@richardleach
Copy link
Contributor Author

This isn't just an optree optimization. undef $x and $x = undef have different behavior. $x = undef leaves a PV allocated, but clears its flags. This means re-using the variable for a string of the same size will not allocate any new memory.

Good point. I think the existing behaviour should be preserved. If there's a flag spare, it looks straightforward to implement.

@demerphq
Copy link
Collaborator

demerphq commented Aug 23, 2022 via email

@demerphq
Copy link
Collaborator

demerphq commented Aug 23, 2022 via email

@richardleach richardleach force-pushed the hydahy/undef-padsv-sassign branch from 3594208 to b06b35c Compare August 23, 2022 15:05
@richardleach
Copy link
Contributor Author

On Tue, 23 Aug 2022 at 15:25, demerphq @.***> wrote:
I vote that the existing behavior be preserved, it can have some interesting performance implications.

That's okay, I understood it as you meant it.

The current branch should now preserve the pre-existing behaviour w.r.t. PVs:

perl5.37.4 -e'use Devel::Peek; my $f; Dump $f; $f = "x" x 40; Dump($f); $f = undef; Dump($f); undef $f; Dump($f);'
SV = NULL(0x0) at 0x5636bb8f1720
  REFCNT = 1
  FLAGS = ()
SV = PV(0x5636bb8cdff0) at 0x5636bb8f1720
  REFCNT = 1
  FLAGS = (POK,pPOK)
  PV = 0x5636bb8f8ed0 "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"\0
  CUR = 40
  LEN = 42
SV = PV(0x5636bb8cdff0) at 0x5636bb8f1720
  REFCNT = 1
  FLAGS = ()
  PV = 0x5636bb8f8ed0 "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"\0
  CUR = 40
  LEN = 42
SV = PV(0x5636bb8cdff0) at 0x5636bb8f1720
  REFCNT = 1
  FLAGS = ()
  PV = 0

I've added the tests you suggested to t/perf/opcount.t, will just tidy them slightly so they don't break unnecessarily in the next opcode update.

@richardleach richardleach force-pushed the hydahy/undef-padsv-sassign branch from b06b35c to a286f49 Compare August 23, 2022 20:29
@richardleach
Copy link
Contributor Author

Ok. tests tidied. I think all review comments to date have been addressed.

This allows the existing `undef` OP to act on a pad SV. The following
two cases are optimized:

`undef my $x`, currently implemented as:
    4     <1> undef vK/1 ->5
    3        <0> padsv[$x:1,2] sRM/LVINTRO ->4

`my $a = undef`, currently implemented as:

    5     <2> sassign vKS/2 ->6
    3        <0> undef s ->4
    4        <0> padsv[$x:1,2] sRM*/LVINTRO ->5

These are now just represented as:

   3     <1> undef[$x:1,2] vK/SOMEFLAGS ->4

Note: The two cases are not quite functionally identical, as `$x = undef`
clears the SV flags but preserves any PV allocation for later reuse,
whereas `undef $x` does free any PV allocation. This behaviour difference
is preserved through use of the OPpUNDEF_KEEP_PV flag.
@richardleach richardleach force-pushed the hydahy/undef-padsv-sassign branch from a286f49 to 2e69278 Compare August 24, 2022 21:18
@demerphq
Copy link
Collaborator

Looks good to me again today,. :-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants