Skip to content

OP_AELEMFASTLEX_STORE - combined sassign/aelemfast_lex #20063

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Sep 7, 2022

Conversation

richardleach
Copy link
Contributor

@richardleach richardleach commented Aug 7, 2022

This commit introduces a new OP to replace simple cases of OP_SASSIGN
and OP_AELEMFAST_LEX. (Similar concept to GH #19943)

For example, my @ary; $ary[0] = "boo" is currently implemented as:

    7     <2> sassign vKS/2 ->8
    5        <$> const[PV "boo"] s ->6
    -        <1> ex-aelem sKRM*/2 ->7
    6           <0> aelemfast_lex[@ary:1,2] sRM ->7
    -           <0> ex-const s ->-

But now will be turned into:

    6     <1> aelemfastlex_store[@ary:1,2] vKS ->7
    5        <$> const(PV "boo") s ->6
    -        <1> ex-aelem sKRM*/2 ->6
    -           <0> ex-aelemfast_lex sRM ->6
    -           <0> ex-const s ->-

This is intended to be a transparent performance optimization.
It should be applicable for RHS optrees of varying complexity.

Performance:
The following unrealistic code shows the op-related speedup:

my (@l0, @l1, @l2, @l3, @l4, @l5, @l6, @l7, @l8, @l9);
for (0..10_000_000) {
    $l0[0] = $l1[1] = $l2[2] = $l3[3] = $l4[4] = $l5[5] = $l6[6] = $l7[7] = $l8[8] = $l9[9] = 1 
}

Blead:

          1,069.74 msec task-clock                #    0.999 CPUs utilized          
                15      context-switches          #    0.014 K/sec                  
                 0      cpu-migrations            #    0.000 K/sec                  
               200      page-faults               #    0.187 K/sec                  
     4,746,898,970      cycles                    #    4.437 GHz                    
        91,116,259      stalled-cycles-frontend   #    1.92% frontend cycles idle   
             1,534      stalled-cycles-backend    #    0.00% backend cycles idle    
    17,304,048,596      instructions              #    3.65  insn per cycle         
                                                  #    0.01  stalled cycles per insn
     3,350,806,733      branches                  # 3132.369 M/sec  

Patched:

            713.06 msec task-clock                #    0.999 CPUs utilized          
                10      context-switches          #    0.014 K/sec                  
                 0      cpu-migrations            #    0.000 K/sec                  
               200      page-faults               #    0.280 K/sec                  
     3,153,196,578      cycles                    #    4.422 GHz                    
        24,931,297      stalled-cycles-frontend   #    0.79% frontend cycles idle   
                50      stalled-cycles-backend    #    0.00% backend cycles idle    
    12,603,789,338      instructions              #    4.00  insn per cycle         
                                                  #    0.00  stalled cycles per insn
     2,550,753,544      branches                  # 3577.208 M/sec   

Note: Name is a bit unwieldy, suggestions welcome. (@leonerd - I think this PR is something different to what you had in mind as OP_PADAV_STORE?)

@richardleach richardleach requested review from leonerd and removed request for leonerd August 17, 2022 19:49
@richardleach richardleach force-pushed the hydahy/OP_AELEMFASTLEX_STORE branch from e497c6d to 7224a5b Compare August 23, 2022 22:46
@richardleach richardleach force-pushed the hydahy/OP_AELEMFASTLEX_STORE branch from 7224a5b to 3d32067 Compare August 25, 2022 23:33
@richardleach
Copy link
Contributor Author

Rebased after #20077 merge.

Copy link
Contributor

@leonerd leonerd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall direction looks good. One small point

@leonerd
Copy link
Contributor

leonerd commented Aug 31, 2022

(@leonerd - I think this PR is something different to what you had in mind as OP_PADAV_STORE?)

Indeed. For OP_PADAV_STORE I was thinking more to optimise code like my @arr; @arr = (1,2,3); by rewriting the entire OP_AASSIGN into OP_PADAV - analogous to the SASSIGN/PADSV case you already did.

This commit introduces a new OP to replace simple cases of OP_SASSIGN
and OP_AELEMFAST_LEX. (Similar concept to GH Perl#19943)

For example, `my @ary; $ary[0] = "boo"` is currently implemented as:
    7     <2> sassign vKS/2 ->8
    5        <$> const[PV "boo"] s ->6
    -        <1> ex-aelem sKRM*/2 ->7
    6           <0> aelemfast_lex[@ary:1,2] sRM ->7
    -           <0> ex-const s ->-

But now will be turned into:
    6     <1> aelemfastlex_store[@ary:1,2] vKS ->7
    5        <$> const(PV "boo") s ->6
    -        <1> ex-aelem sKRM*/2 ->6
    -           <0> ex-aelemfast_lex sRM ->6
    -           <0> ex-const s ->-

This is intended to be a transparent performance optimization.
It should be applicable for RHS optrees of varying complexity.
@richardleach
Copy link
Contributor Author

This shouldn't be merged until 20114 is fixed for Memoize::Once; possibly the peephole code in this PR should be amended to handle the RHS actually being the op_other of e.g. an ORASSIGN.

@richardleach
Copy link
Contributor Author

It doesn't seem like any change akin to c5bbf81 is needed here for Memoize::Once.

I think this is ready to merge.

@demerphq
Copy link
Collaborator

demerphq commented Sep 7, 2022

@richardleach you can merge yourself no?

@richardleach
Copy link
Contributor Author

@richardleach you can merge yourself no?

Yes, I'll do it when I get a moment, assuming no-one shouts "Nooo, don't doo iiitt" between now and then.

@richardleach richardleach merged commit aafefcb into Perl:blead Sep 7, 2022
@richardleach richardleach deleted the hydahy/OP_AELEMFASTLEX_STORE branch September 7, 2022 19:27
@iabyn
Copy link
Contributor

iabyn commented Sep 21, 2022 via email

@richardleach
Copy link
Contributor Author

Thanks for the feedback, @iabyn. On:

  1. Yep, that's true. I probably over-simplified, thinking that negative values would still go the slow route through av_fetch.
  2. Aha!

I'll prepare a PR that addresses both suggestions.

@richardleach
Copy link
Contributor Author

Not sure if I interpreted point 2 correctly, but assert(!(SvTEMP(targ) && SvREFCNT(targ) == 1) fails in multiple places in the test suite.

@richardleach
Copy link
Contributor Author

assert(!(SvTEMP(targ) && SvREFCNT(targ) == 1 && !SvSMAGICAL(targ))); doesn't get tripped by any tests, but I'm still unsure of whether or not it's safe to exchange the full if() condition for it. Guidance welcome!

@iabyn
Copy link
Contributor

iabyn commented Sep 26, 2022 via email

@richardleach
Copy link
Contributor Author

Thanks. I've updated #20331 accordingly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants