Skip to content

Implement OP_PADSV_STORE - combined sassign/padsv OP #19943

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Aug 17, 2022

Conversation

richardleach
Copy link
Contributor

This commit introduces a new OP to replace simple cases
of OP_SASSIGN and OP_PADSV.

For example, my $x = 1 is currently implemented as:

1  <;> nextstate(main 1 -e:1) v:{
2  <$> const(IV 1) s
3  <0> padsv[$x:1,2] sRM*/LVINTRO
4  <2> sassign vKS/2

But now will be turned into:

1  <;> nextstate(main 1 -e:1) v:{
2  <$> const(IV 1) s
3  <1> padsv_store[$x:1,2] vKMS/LVINTRO

This intended to be a transparent performance optimization.
It should be applicable for RHS optrees of varying complexity.

Notes:

  • This optimization was suggested by @leonerd as a good project for getting feet wet.
  • This was my first foray into Deparse.pm; no idea if this was the most appropriate way of supporting this new op.
  • Ideas for separate optimizations of simple assignments of CONSTs, PADSVs, and UNDEF into single ops are already in the works.

Performance:
The following unrealistic code shows the op-related speedup:

my ($l0, $l1, $l2, $l3, $l3, $l5, $l6, $l7, $l7, $l9);
for (0..10_000_000) {
    $l0 = $l1 = $l2 = $l3 = $l4 = $l5 = $l6 = $l7 = $l8 = $l9 = 1 
}

Blead:

            870.42 msec task-clock                #    0.985 CPUs utilized          
                 1      context-switches          #    0.001 K/sec                  
                 0      cpu-migrations            #    0.000 K/sec                  
               202      page-faults               #    0.232 K/sec                  
     3,800,528,668      cycles                    #    4.366 GHz                    
        79,206,368      stalled-cycles-frontend   #    2.08% frontend cycles idle   
           217,927      stalled-cycles-backend    #    0.01% backend cycles idle    
    15,523,885,258      instructions              #    4.08  insn per cycle         
                                                  #    0.01  stalled cycles per insn
     3,030,769,392      branches                  # 3481.972 M/sec                  
                 0      branch-misses             #    0.00% of all branches        

       0.883580599 seconds time elapsed

Patched:

            721.04 msec task-clock                #    0.983 CPUs utilized          
                 1      context-switches          #    0.001 K/sec                  
                 0      cpu-migrations            #    0.000 K/sec                  
               197      page-faults               #    0.273 K/sec                  
     3,163,626,846      cycles                    #    4.388 GHz                    
        51,457,402      stalled-cycles-frontend   #    1.63% frontend cycles idle   
                30      stalled-cycles-backend    #    0.00% backend cycles idle    
    12,563,719,698      instructions              #    3.97  insn per cycle         
                                                  #    0.00  stalled cycles per insn
     2,470,736,436      branches                  # 3426.638 M/sec                  
                 0      branch-misses             #    0.00% of all branches        

       0.733857229 seconds time elapsed

The extent to which users will notice a benefit in real code will depend on things like how often the optimized op is used and what path is taken through Perl_sv_setsv.

@richardleach richardleach requested a review from leonerd July 11, 2022 21:05
@richardleach richardleach added the needs-work The pull request needs changes still label Jul 11, 2022
@richardleach richardleach removed the request for review from leonerd July 11, 2022 22:29
@richardleach richardleach force-pushed the hydahy/OP_PADSV_STORE branch from 993ac65 to 40d86f6 Compare July 11, 2022 22:34
@richardleach richardleach force-pushed the hydahy/OP_PADSV_STORE branch 2 times, most recently from 530e157 to b631520 Compare July 21, 2022 20:00
@richardleach richardleach removed the needs-work The pull request needs changes still label Jul 21, 2022
@richardleach richardleach force-pushed the hydahy/OP_PADSV_STORE branch from b631520 to ec76523 Compare July 25, 2022 07:47
@richardleach richardleach force-pushed the hydahy/OP_PADSV_STORE branch 3 times, most recently from f650f25 to 0b25d5b Compare July 27, 2022 21:31
@richardleach richardleach force-pushed the hydahy/OP_PADSV_STORE branch from 0b25d5b to 1f54c9d Compare July 27, 2022 22:46
@richardleach richardleach requested a review from leonerd July 27, 2022 22:48
@richardleach richardleach force-pushed the hydahy/OP_PADSV_STORE branch from 1f54c9d to 70c0312 Compare July 27, 2022 22:51
@richardleach
Copy link
Contributor Author

Any further comments on this one, or does it look good to merge?

Copy link
Contributor

@leonerd leonerd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All that's here looks good, but one tiny thing is missing.

regen/op_private needs to remark that the new opcode makes use of the OPp_DEFER bits. Add it to the list around line 416.

@richardleach
Copy link
Contributor Author

Thanks for spotting that. I'll take a look later today. Might not be straightforward - regen complains that:
addbits(): bit 4 of padsv_store already specified (OPpTARGET_MY) at ./regen/op_private line 417.

(I can't remember offhand if B::Deparse needs the OPpTARGET_MY flag.)

@richardleach richardleach force-pushed the hydahy/OP_PADSV_STORE branch from 70c0312 to 87c9ebc Compare August 1, 2022 18:54
@richardleach
Copy link
Contributor Author

Assuming all CI tests pass, OP_PADSV_CONST doesn't seem to need the OPpTARGET_MY flag, so I've taken that out and added in the OPpDEREF flag bits.

@richardleach richardleach added the do not merge Don't merge this PR, at least for now label Aug 5, 2022
@richardleach richardleach force-pushed the hydahy/OP_PADSV_STORE branch from 87c9ebc to a4da112 Compare August 6, 2022 15:41
@richardleach
Copy link
Contributor Author

richardleach commented Aug 6, 2022

After working on #20036 and looking a bit more at this PR, I realised that the basic optimization will not need to support OPpDEREF and have removed that branch from pp_padsv_store.

Note: It also seemed simple enough to add support for assigning simple padsv-padsv-sassign optrees, which I originally did. However, this involved turning OP_PADSV_STORE into an UNOP_aux and stashing the second padoffset in the op_aux slot. That worked fine, but there was no way to make Deparse.pm work (short of hacking around in B.xs, which definitely seems like a bad move.) So I dropped that commit.

Hopefully, UNOPs (and possibly other op types) will soon get an op_src member, at which point the padsv-padsv-sassign commit can be tweaked to use it and submitted as a separate PR.

@richardleach richardleach force-pushed the hydahy/OP_PADSV_STORE branch from a4da112 to 0b993ae Compare August 6, 2022 18:43
@richardleach richardleach removed the do not merge Don't merge this PR, at least for now label Aug 6, 2022
@richardleach richardleach force-pushed the hydahy/OP_PADSV_STORE branch from 0b993ae to 066cb27 Compare August 6, 2022 21:56
richardleach added a commit to richardleach/perl5 that referenced this pull request Aug 7, 2022
This commit introduces a new OP to replace simple cases of OP_SASSIGN
and OP_AELEMFAST_LEX. (Similar concept to GH Perl#19943)

For example, `my @ary; $ary[0] = "boo"` is currently implemented as:
    7     <2> sassign vKS/2 ->8
    5        <$> const[PV "boo"] s ->6
    -        <1> ex-aelem sKRM*/2 ->7
    6           <0> aelemfast_lex[@ary:1,2] sRM ->7
    -           <0> ex-const s ->-

But now will be turned into:
    6     <1> aelemfastlex_store[@ary:1,2] vKS ->7
    5        <$> const(PV "boo") s ->6
    -        <1> ex-aelem sKRM*/2 ->6
    -           <0> ex-aelemfast_lex sRM ->6
    -           <0> ex-const s ->-

This is intended to be a transparent performance optimization.
It should be applicable for RHS optrees of varying complexity.
Copy link
Contributor

@leonerd leonerd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One tiny whitespace issue but otherwise this all looks good now.

Will you be writing a perldelta.pod entry?

t/perf/opcount.t Outdated
Comment on lines 795 to 797
padsv => 1,
padsv_store => 3,
undef => 1,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tiny whitespace alignment issue here. Make the longest one have a single space, and align the others at the => to match.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. I took out the undef assignment, since #20077 will change it, and aligned the spaces on the remainder.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re: perldelta.pod, an entry per optimization seems like too much. Would a placeholder that covers multiple (and could link to individual PRs or commits) suffice? Something like:
"Multiple optree optimizations have been applied in order to speed up common operations."

This commit introduces a new OP to replace simple cases
of OP_SASSIGN and OP_PADSV.

For example, 'my $x = 1' is currently implemented as:

1  <;> nextstate(main 1 -e:1) v:{
2  <$> const(IV 1) s
3  <0> padsv[$x:1,2] sRM*/LVINTRO
4  <2> sassign vKS/2

But now will be turned into:

1  <;> nextstate(main 1 -e:1) v:{
2  <$> const(IV 1) s
3  <1> padsv_store[$x:1,2] vKMS/LVINTRO

This intended to be a transparent performance optimization.
It should be applicable for RHS optrees of varying complexity.
@richardleach richardleach force-pushed the hydahy/OP_PADSV_STORE branch from 066cb27 to 61c4fed Compare August 16, 2022 13:45
Copy link
Contributor

@leonerd leonerd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent, looking good now.
This can be merged assuming CI is happy.

@richardleach richardleach merged commit 9fdd7fc into Perl:blead Aug 17, 2022
@richardleach richardleach deleted the hydahy/OP_PADSV_STORE branch August 17, 2022 10:19
richardleach added a commit to richardleach/perl5 that referenced this pull request Aug 23, 2022
This commit introduces a new OP to replace simple cases of OP_SASSIGN
and OP_AELEMFAST_LEX. (Similar concept to GH Perl#19943)

For example, `my @ary; $ary[0] = "boo"` is currently implemented as:
    7     <2> sassign vKS/2 ->8
    5        <$> const[PV "boo"] s ->6
    -        <1> ex-aelem sKRM*/2 ->7
    6           <0> aelemfast_lex[@ary:1,2] sRM ->7
    -           <0> ex-const s ->-

But now will be turned into:
    6     <1> aelemfastlex_store[@ary:1,2] vKS ->7
    5        <$> const(PV "boo") s ->6
    -        <1> ex-aelem sKRM*/2 ->6
    -           <0> ex-aelemfast_lex sRM ->6
    -           <0> ex-const s ->-

This is intended to be a transparent performance optimization.
It should be applicable for RHS optrees of varying complexity.
richardleach added a commit to richardleach/perl5 that referenced this pull request Aug 25, 2022
This commit introduces a new OP to replace simple cases of OP_SASSIGN
and OP_AELEMFAST_LEX. (Similar concept to GH Perl#19943)

For example, `my @ary; $ary[0] = "boo"` is currently implemented as:
    7     <2> sassign vKS/2 ->8
    5        <$> const[PV "boo"] s ->6
    -        <1> ex-aelem sKRM*/2 ->7
    6           <0> aelemfast_lex[@ary:1,2] sRM ->7
    -           <0> ex-const s ->-

But now will be turned into:
    6     <1> aelemfastlex_store[@ary:1,2] vKS ->7
    5        <$> const(PV "boo") s ->6
    -        <1> ex-aelem sKRM*/2 ->6
    -           <0> ex-aelemfast_lex sRM ->6
    -           <0> ex-const s ->-

This is intended to be a transparent performance optimization.
It should be applicable for RHS optrees of varying complexity.
richardleach added a commit to richardleach/perl5 that referenced this pull request Aug 31, 2022
This commit introduces a new OP to replace simple cases of OP_SASSIGN
and OP_AELEMFAST_LEX. (Similar concept to GH Perl#19943)

For example, `my @ary; $ary[0] = "boo"` is currently implemented as:
    7     <2> sassign vKS/2 ->8
    5        <$> const[PV "boo"] s ->6
    -        <1> ex-aelem sKRM*/2 ->7
    6           <0> aelemfast_lex[@ary:1,2] sRM ->7
    -           <0> ex-const s ->-

But now will be turned into:
    6     <1> aelemfastlex_store[@ary:1,2] vKS ->7
    5        <$> const(PV "boo") s ->6
    -        <1> ex-aelem sKRM*/2 ->6
    -           <0> ex-aelemfast_lex sRM ->6
    -           <0> ex-const s ->-

This is intended to be a transparent performance optimization.
It should be applicable for RHS optrees of varying complexity.
richardleach added a commit that referenced this pull request Sep 7, 2022
This commit introduces a new OP to replace simple cases of OP_SASSIGN
and OP_AELEMFAST_LEX. (Similar concept to GH #19943)

For example, `my @ary; $ary[0] = "boo"` is currently implemented as:
    7     <2> sassign vKS/2 ->8
    5        <$> const[PV "boo"] s ->6
    -        <1> ex-aelem sKRM*/2 ->7
    6           <0> aelemfast_lex[@ary:1,2] sRM ->7
    -           <0> ex-const s ->-

But now will be turned into:
    6     <1> aelemfastlex_store[@ary:1,2] vKS ->7
    5        <$> const(PV "boo") s ->6
    -        <1> ex-aelem sKRM*/2 ->6
    -           <0> ex-aelemfast_lex sRM ->6
    -           <0> ex-const s ->-

This is intended to be a transparent performance optimization.
It should be applicable for RHS optrees of varying complexity.
scottchiefbaker pushed a commit to scottchiefbaker/perl5 that referenced this pull request Nov 3, 2022
This commit introduces a new OP to replace simple cases of OP_SASSIGN
and OP_AELEMFAST_LEX. (Similar concept to GH Perl#19943)

For example, `my @ary; $ary[0] = "boo"` is currently implemented as:
    7     <2> sassign vKS/2 ->8
    5        <$> const[PV "boo"] s ->6
    -        <1> ex-aelem sKRM*/2 ->7
    6           <0> aelemfast_lex[@ary:1,2] sRM ->7
    -           <0> ex-const s ->-

But now will be turned into:
    6     <1> aelemfastlex_store[@ary:1,2] vKS ->7
    5        <$> const(PV "boo") s ->6
    -        <1> ex-aelem sKRM*/2 ->6
    -           <0> ex-aelemfast_lex sRM ->6
    -           <0> ex-const s ->-

This is intended to be a transparent performance optimization.
It should be applicable for RHS optrees of varying complexity.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants