The Wayback Machine - https://web.archive.org/web/20210728045124/https://github.com/ruby/ruby/pull/3393
Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable arm64 optimizations that exist for power/x86 #3393

Merged
merged 3 commits into from Aug 13, 2020

Conversation

@AGSaidi
Copy link
Contributor

@AGSaidi AGSaidi commented Aug 6, 2020

Enable a set of optimizations that exist already for power and x86 for aarch64/arm64 systems.

Passes make check after these changes.

Running the string benchmarks the unaligned access change improves performance
by an average of 1.04x, min .96x, max 1.21x, median 1.01x

The gc optimization improves benchmark/gc/hash1 by 5%

The vm_exec changes make a massive difference on some benchmarks (e.g. 1.38x).

AGSaidi added 3 commits Aug 6, 2020
64-bit Arm platforms support unaligned accesses.

Running the string benchmarks this change improves performance
by an average of 1.04x, min .96x, max 1.21x, median 1.01x
Similar to x86 and powerpc optimizations.

|       |compare-ruby|built-ruby|
|:------|-----------:|---------:|
|hash1  |       0.225|     0.237|
|       |           -|     1.05x|
|hash2  |       0.110|     0.110|
|       |       1.00x|         -|
|                               |compare-ruby|built-ruby|
|:------------------------------|-----------:|---------:|
|vm_array                       |     26.501M|   27.959M|
|                               |           -|     1.06x|
|vm_attr_ivar                   |     21.606M|   31.429M|
|                               |           -|     1.45x|
|vm_attr_ivar_set               |     21.178M|   26.113M|
|                               |           -|     1.23x|
|vm_backtrace                   |       6.621|     6.668|
|                               |           -|     1.01x|
|vm_bigarray                    |     26.205M|   29.958M|
|                               |           -|     1.14x|
|vm_bighash                     |    504.155k|  479.306k|
|                               |       1.05x|         -|
|vm_block                       |     16.692M|   21.315M|
|                               |           -|     1.28x|
|block_handler_type_iseq        |       5.083|     7.004|
|                               |           -|     1.38x|
@nurse
nurse approved these changes Aug 6, 2020
#elif defined(__GNUC__) && defined(__aarch64__)
DECL_SC_REG(const VALUE *, pc, "19");
DECL_SC_REG(rb_control_frame_t *, cfp, "20");
#define USE_MACHINE_REGS 1

Comment on lines +99 to +103

This comment has been minimized.

@shyouhei

shyouhei Aug 6, 2020
Member

Does this really benefit? We know that recent compilers are smarter than they were when we wrote those sibling codes. Read more: https://bugs.ruby-lang.org/issues/12225

cc @nurse

This comment has been minimized.

@AGSaidi

AGSaidi Aug 6, 2020
Author Contributor

@shyouhei the only changes between compare-ruby and built-ruby in the number in the commit message above are the two hunks in vm_exec.c. I'm happy to run other benchmarks if you'd like, but it appears to improve substantially. Double checked my result again by removing all diffs and comparing to the ruby I built prior to my patches. The results were +-2% and then reapplied these two hunks and re-ran again, and observed the improvements here (up to 1.38x).

This comment has been minimized.

@nurse

nurse Aug 6, 2020
Member

As far as I remember, there're another example with clang which says it's still effective.
And the commit comment says 1.2x seems worth introducing this change.

This comment has been minimized.

@shyouhei

shyouhei Aug 7, 2020
Member

OK then, we need to investigate what is going on but this pull request can be a separate thing.

This comment has been minimized.

@AGSaidi

AGSaidi Aug 8, 2020
Author Contributor

@nurse anything else you'd like to see before you merge?

This comment has been minimized.

@nurse

nurse Aug 12, 2020
Member

I think this is OK to merge.
@shyouhei Do you have another topic?

This comment has been minimized.

@shyouhei

shyouhei Aug 13, 2020
Member

@nurse No, it is LTGM.

@nurse nurse merged commit 511b55b into ruby:master Aug 13, 2020
100 checks passed
100 checks passed
@github-actions
CodeQL-Build CodeQL-Build
Details
@github-actions
gcc-10
Details
@github-actions
make (check, --jit)
Details
@github-actions
make (check)
Details
@github-actions
check_branch
Details
@github-actions
make (check, ubuntu-20.04)
Details
@github-actions
make (test, windows-2019, 2019)
Details
@github-actions
make (check)
Details
@github-actions
gcc-9
Details
@github-actions
make (check, --jit-wait)
Details
@github-actions
make (check, ubuntu-20.04, -DRUBY_DEBUG)
Details
@github-actions
make (test-bundler-parallel)
Details
@github-actions
gcc-8
Details
@github-actions
make (check, ubuntu-18.04)
Details
@github-actions
make (test-bundled-gems)
Details
@github-actions
gcc-7
Details
@github-actions
make (check, ubuntu-18.04, -DRUBY_DEBUG)
Details
@github-actions
make (leaked-globals)
Details
@github-actions
gcc-6
Details
@github-actions
make (check, ubuntu-16.04)
Details
@github-actions
gcc-5
Details
@github-actions
make (test-bundler-parallel, ubuntu-20.04)
Details
@github-actions
gcc-4.8
Details
@github-actions
make (test-bundler-parallel, ubuntu-20.04, -DRUBY_DEBUG)
Details
@github-actions
clang-11
Details
@github-actions
make (test-bundler-parallel, ubuntu-18.04)
Details
@github-actions
clang-10
Details
@github-actions
make (test-bundler-parallel, ubuntu-18.04, -DRUBY_DEBUG)
Details
@github-actions
clang-9
Details
@github-actions
make (test-bundled-gems, ubuntu-20.04)
Details
@github-actions
clang-8
Details
@github-actions
make (test-bundled-gems, ubuntu-20.04, -DRUBY_DEBUG)
Details
@github-actions
clang-7
Details
@github-actions
make (test-bundled-gems, ubuntu-18.04)
Details
@github-actions
clang-6.0
Details
@github-actions
make (test-bundled-gems, ubuntu-18.04, -DRUBY_DEBUG)
Details
@github-actions
clang-5.0
Details
@github-actions
make (test-all TESTS=--repeat-count=2, ubuntu-20.04)
Details
@github-actions
clang-4.0
Details
@github-actions
make (test-all TESTS=--repeat-count=2, ubuntu-18.04)
Details
@github-actions
clang-3.9
Details
@github-actions
make (leaked-globals, ubuntu-20.04)
Details
@github-actions
make (leaked-globals, ubuntu-18.04)
Details
@github-actions
c++98
Details
@github-actions
c++11
Details
@github-actions
c++14
Details
@github-actions
c++17
Details
@github-actions
c++2a
Details
@github-actions
jemalloc
Details
@github-actions
valgrind
Details
@github-actions
coroutine=ucontext
Details
@github-actions
coroutine=copy
Details
@github-actions
disable-mathn
Details
@github-actions
disable-jit-support
Details
@github-actions
disable-dln
Details
@github-actions
disable-rubygems
Details
@github-actions
OPT_THREADED_CODE=1
Details
@github-actions
OPT_THREADED_CODE=2
Details
@github-actions
OPT_THREADED_CODE=3
Details
@github-actions
NDEBUG
Details
@github-actions
RUBY_DEBUG
Details
@github-actions
ARRAY_DEBUG
Details
@github-actions
BIGNUM_DEBUG
Details
@github-actions
CCAN_LIST_DEBUG
Details
@github-actions
CPDEBUG=-1
Details
@github-actions
ENC_DEBUG
Details
@github-actions
GC_DEBUG
Details
@github-actions
HASH_DEBUG
Details
@github-actions
ID_TABLE_DEBUG
Details
@github-actions
RGENGC_DEBUG=-1
Details
@github-actions
SYMBOL_DEBUG
Details
@github-actions
THREAD_DEBUG=-1
Details
@github-actions
RGENGC_CHECK_MODE
Details
@github-actions
TRANSIENT_HEAP_CHECK_MODE
Details
@github-actions
VM_CHECK_MODE
Details
@github-actions
USE_EMBED_CI=0
Details
@github-actions
USE_FLONUM=0
Details
@github-actions
USE_LAZY_LOAD
Details
@github-actions
USE_RINCGC=0
Details
@github-actions
USE_SYMBOL_GC=0
Details
@github-actions
USE_THREAD_CACHE=0
Details
@github-actions
USE_TRANSIENT_HEAP=0
Details
@github-actions
USE_RUBY_DEBUG_LOG=1
Details
@github-actions
DEBUG_FIND_TIME_NUMGUESS
Details
@github-actions
DEBUG_INTEGER_PACK
Details
@github-actions
ENABLE_PATH_CHECK
Details
@github-actions
GC_DEBUG_STRESS_TO_CLASS
Details
@github-actions
GC_ENABLE_LAZY_SWEEP=0
Details
@github-actions
GC_PROFILE_DETAIL_MEMOTY
Details
@github-actions
GC_PROFILE_MORE_DETAIL
Details
@github-actions
CALC_EXACT_MALLOC_SIZE
Details
@github-actions
MALLOC_ALLOCATED_SIZE_CHECK
Details
@github-actions
IBF_ISEQ_ENABLE_LOCAL_BUFFER
Details
@AGSaidi AGSaidi deleted the AGSaidi:arm64-unaligned branch Aug 19, 2020
matzbot pushed a commit that referenced this pull request Mar 20, 2021
	Enable arm64 optimizations that exist for power/x86 (#3393)

	* Enable unaligned accesses on arm64

	64-bit Arm platforms support unaligned accesses.

	Running the string benchmarks this change improves performance
	by an average of 1.04x, min .96x, max 1.21x, median 1.01x

	* arm64 enable gc optimizations

	Similar to x86 and powerpc optimizations.

	|       |compare-ruby|built-ruby|
	|:------|-----------:|---------:|
	|hash1  |       0.225|     0.237|
	|       |           -|     1.05x|
	|hash2  |       0.110|     0.110|
	|       |       1.00x|         -|

	* vm_exec.c: improve performance for arm64

	|                               |compare-ruby|built-ruby|
	|:------------------------------|-----------:|---------:|
	|vm_array                       |     26.501M|   27.959M|
	|                               |           -|     1.06x|
	|vm_attr_ivar                   |     21.606M|   31.429M|
	|                               |           -|     1.45x|
	|vm_attr_ivar_set               |     21.178M|   26.113M|
	|                               |           -|     1.23x|
	|vm_backtrace                   |       6.621|     6.668|
	|                               |           -|     1.01x|
	|vm_bigarray                    |     26.205M|   29.958M|
	|                               |           -|     1.14x|
	|vm_bighash                     |    504.155k|  479.306k|
	|                               |       1.05x|         -|
	|vm_block                       |     16.692M|   21.315M|
	|                               |           -|     1.28x|
	|block_handler_type_iseq        |       5.083|     7.004|
	|                               |           -|     1.38x|
	---
	 gc.c                           | 13 +++++++++++++
	 gc.h                           |  2 ++
	 include/ruby/internal/config.h |  2 ++
	 regint.h                       |  2 +-
	 siphash.c                      |  2 +-
	 st.c                           |  2 +-
	 vm_exec.c                      |  8 ++++++++
	 7 files changed, 28 insertions(+), 3 deletions(-)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
3 participants