The Wayback Machine - https://web.archive.org/web/20220402064811/https://github.com/huggingface/datasets/pull/535
Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Benchmarks #535

Merged
merged 11 commits into from Aug 27, 2020
Merged

Benchmarks #535

merged 11 commits into from Aug 27, 2020

Conversation

thomwolf
Copy link
Member

@thomwolf thomwolf commented Aug 26, 2020

Adding some benchmarks with DVC/CML

To add a new tracked benchmark:

  • create a new python benchmarking script in ./benchmarks/. The script can use the utilities in ./benchmarks/utils.py and should output a JSON file with results in ./benchmarks/results/.
  • add a new pipeline stage in dvc.yaml with the name of your new benchmark.

That's it

github-actions[bot]
Copy link

@github-actions github-actions bot commented on 2bd385a Aug 26, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Path Metric Value Change
benchmarks/results/benchmark_array_xd.json read_batch_formatted_as_numpy after write_array2d 0.02028 diff not supported
benchmarks/results/benchmark_array_xd.json read_batch_formatted_as_numpy after write_flattened_sequence 0.01713 diff not supported
benchmarks/results/benchmark_array_xd.json read_batch_formatted_as_numpy after write_nested_sequence 0.05165 diff not supported
benchmarks/results/benchmark_array_xd.json read_batch_unformated after write_array2d 0.03896 diff not supported
benchmarks/results/benchmark_array_xd.json read_batch_unformated after write_flattened_sequence 0.4111 diff not supported
benchmarks/results/benchmark_array_xd.json read_batch_unformated after write_nested_sequence 0.50053 diff not supported
benchmarks/results/benchmark_array_xd.json read_col_formatted_as_numpy after write_array2d 0.0081 diff not supported
benchmarks/results/benchmark_array_xd.json read_col_formatted_as_numpy after write_flattened_sequence 0.00373 diff not supported
benchmarks/results/benchmark_array_xd.json read_col_formatted_as_numpy after write_nested_sequence 0.00643 diff not supported
benchmarks/results/benchmark_array_xd.json read_col_unformated after write_array2d 0.03772 diff not supported
benchmarks/results/benchmark_array_xd.json read_col_unformated after write_flattened_sequence 0.40853 diff not supported
benchmarks/results/benchmark_array_xd.json read_col_unformated after write_nested_sequence 0.43381 diff not supported
benchmarks/results/benchmark_array_xd.json read_formatted_as_numpy after write_array2d 0.18192 diff not supported
benchmarks/results/benchmark_array_xd.json read_formatted_as_numpy after write_flattened_sequence 0.14279 diff not supported
benchmarks/results/benchmark_array_xd.json read_formatted_as_numpy after write_nested_sequence 0.53453 diff not supported
benchmarks/results/benchmark_array_xd.json read_unformated after write_array2d 0.04543 diff not supported
benchmarks/results/benchmark_array_xd.json read_unformated after write_flattened_sequence 0.38055 diff not supported
benchmarks/results/benchmark_array_xd.json read_unformated after write_nested_sequence 0.39166 diff not supported
benchmarks/results/benchmark_array_xd.json write_array2d 0.11773 diff not supported
benchmarks/results/benchmark_array_xd.json write_flattened_sequence 1.95853 diff not supported
benchmarks/results/benchmark_array_xd.json write_nested_sequence 2.03393 diff not supported

github-actions[bot]
Copy link

@github-actions github-actions bot commented on 9afaeac Aug 26, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Benchmark: benchmark_array_xd.json

metric read_batch_formatted_as_numpy after write_array2d read_unformated after write_nested_sequence read_formatted_as_numpy after write_flattened_sequence read_col_unformated after write_nested_sequence read_unformated after write_array2d read_formatted_as_numpy after write_nested_sequence write_array2d read_col_formatted_as_numpy after write_array2d read_batch_formatted_as_numpy after write_flattened_sequence read_col_unformated after write_array2d read_batch_unformated after write_nested_sequence write_nested_sequence write_flattened_sequence read_col_formatted_as_numpy after write_flattened_sequence read_batch_formatted_as_numpy after write_nested_sequence read_col_unformated after write_flattened_sequence read_batch_unformated after write_array2d read_unformated after write_flattened_sequence read_batch_unformated after write_flattened_sequence read_formatted_as_numpy after write_array2d read_col_formatted_as_numpy after write_nested_sequence
new 0.015843 0.420557 0.120379 0.440375 0.052844 0.440510 0.133248 0.006893 0.016099 0.031535 0.540828 1.901576 1.742698 0.004162 0.049559 0.375369 0.029470 0.375827 0.380684 0.166508 0.006616
old None None None None None None None None None None None None None None None None None None None None None
diff None None None None None None None None None None None None None None None None None None None None None

github-actions[bot]
Copy link

@github-actions github-actions bot commented on 3f74c3c Aug 26, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Benchmark: benchmark_array_xd.json

metric write_array2d read_unformated after write_flattened_sequence read_col_formatted_as_numpy after write_array2d read_batch_formatted_as_numpy after write_flattened_sequence read_formatted_as_numpy after write_nested_sequence read_unformated after write_array2d read_batch_unformated after write_nested_sequence read_formatted_as_numpy after write_flattened_sequence read_batch_unformated after write_flattened_sequence read_col_unformated after write_flattened_sequence read_col_unformated after write_array2d read_unformated after write_nested_sequence read_batch_formatted_as_numpy after write_array2d read_batch_formatted_as_numpy after write_nested_sequence write_nested_sequence read_formatted_as_numpy after write_array2d read_batch_unformated after write_array2d read_col_formatted_as_numpy after write_nested_sequence write_flattened_sequence read_col_unformated after write_nested_sequence read_col_formatted_as_numpy after write_flattened_sequence
new 0.112083 0.324461 0.006592 0.013271 0.435596 0.037740 0.451111 0.114933 0.335022 0.354036 0.032292 0.356591 0.016453 0.048334 1.912305 0.142521 0.028766 0.007722 1.856903 0.376145 0.004819
old None None None None None None None None None None None None None None None None None None None None None
diff None None None None None None None None None None None None None None None None None None None None None

github-actions[bot]
Copy link

@github-actions github-actions bot commented on f55b9c6 Aug 26, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Benchmark: benchmark_array_xd.json

metric read_col_unformated after write_nested_sequence read_formatted_as_numpy after write_flattened_sequence read_col_formatted_as_numpy after write_nested_sequence read_col_formatted_as_numpy after write_array2d read_batch_unformated after write_flattened_sequence read_batch_formatted_as_numpy after write_nested_sequence read_unformated after write_array2d write_nested_sequence write_flattened_sequence read_batch_formatted_as_numpy after write_array2d read_batch_unformated after write_array2d read_batch_formatted_as_numpy after write_flattened_sequence read_batch_unformated after write_nested_sequence read_col_unformated after write_array2d read_formatted_as_numpy after write_array2d read_unformated after write_flattened_sequence read_formatted_as_numpy after write_nested_sequence write_array2d read_col_unformated after write_flattened_sequence read_unformated after write_nested_sequence read_col_formatted_as_numpy after write_flattened_sequence
new 0.405741 0.109036 0.008678 0.004623 0.344870 0.047032 0.048535 1.867043 1.839632 0.015776 0.028840 0.013365 0.496086 0.032345 0.144169 0.338251 0.452199 0.113647 0.353510 0.394501 0.003315
old None None None None None None None None None None None None None None None None None None None None None
diff None None None None None None None None None None None None None None None None None None None None None

github-actions[bot]
Copy link

@github-actions github-actions bot commented on 415e2c7 Aug 26, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Benchmark: benchmark_indices_mapping.json

metric select shard shuffle sort train_test_split
new 2.165722 4.315449 4.794725 5.004330 4.784939

Benchmark: benchmark_array_xd.json

metric read_batch_formatted_as_numpy after write_array2d read_batch_formatted_as_numpy after write_flattened_sequence read_batch_formatted_as_numpy after write_nested_sequence read_batch_unformated after write_array2d read_batch_unformated after write_flattened_sequence read_batch_unformated after write_nested_sequence read_col_formatted_as_numpy after write_array2d read_col_formatted_as_numpy after write_flattened_sequence read_col_formatted_as_numpy after write_nested_sequence read_col_unformated after write_array2d read_col_unformated after write_flattened_sequence read_col_unformated after write_nested_sequence read_formatted_as_numpy after write_array2d read_formatted_as_numpy after write_flattened_sequence read_formatted_as_numpy after write_nested_sequence read_unformated after write_array2d read_unformated after write_flattened_sequence read_unformated after write_nested_sequence write_array2d write_flattened_sequence write_nested_sequence
new 0.019596 0.014151 0.050607 0.033170 0.429964 0.525586 0.004999 0.004152 0.006640 0.035762 0.440961 0.466149 0.148929 0.114383 0.492467 0.040082 0.409890 0.415031 0.121587 1.917372 2.006842

github-actions[bot]
Copy link

@github-actions github-actions bot commented on b3e4bc9 Aug 26, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Benchmark: benchmark_array_xd.json

metric read_batch_formatted_as_numpy after write_array2d read_batch_formatted_as_numpy after write_flattened_sequence read_batch_formatted_as_numpy after write_nested_sequence read_batch_unformated after write_array2d read_batch_unformated after write_flattened_sequence read_batch_unformated after write_nested_sequence read_col_formatted_as_numpy after write_array2d read_col_formatted_as_numpy after write_flattened_sequence read_col_formatted_as_numpy after write_nested_sequence read_col_unformated after write_array2d read_col_unformated after write_flattened_sequence read_col_unformated after write_nested_sequence read_formatted_as_numpy after write_array2d read_formatted_as_numpy after write_flattened_sequence read_formatted_as_numpy after write_nested_sequence read_unformated after write_array2d read_unformated after write_flattened_sequence read_unformated after write_nested_sequence write_array2d write_flattened_sequence write_nested_sequence
new 0.016454 0.014436 0.050274 0.029738 0.367386 0.492249 0.006601 0.003421 0.008121 0.033053 0.359951 0.404901 0.149146 0.108415 0.500373 0.037859 0.345045 0.371243 0.111501 1.898830 2.008218

Benchmark: benchmark_indices_mapping.json

metric select shard shuffle sort train_test_split
new 2.150440 4.372631 4.661105 4.731103 4.883732

github-actions[bot]
Copy link

@github-actions github-actions bot commented on 7ac08a2 Aug 26, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Benchmark: benchmark_indices_mapping.json

metric select shard shuffle sort train_test_split
new 2.196049 4.292400 4.817259 4.550962 4.881731

Benchmark: benchmark_array_xd.json

metric read_batch_formatted_as_numpy after write_array2d read_batch_formatted_as_numpy after write_flattened_sequence read_batch_formatted_as_numpy after write_nested_sequence read_batch_unformated after write_array2d read_batch_unformated after write_flattened_sequence read_batch_unformated after write_nested_sequence read_col_formatted_as_numpy after write_array2d read_col_formatted_as_numpy after write_flattened_sequence read_col_formatted_as_numpy after write_nested_sequence read_col_unformated after write_array2d read_col_unformated after write_flattened_sequence read_col_unformated after write_nested_sequence read_formatted_as_numpy after write_array2d read_formatted_as_numpy after write_flattened_sequence read_formatted_as_numpy after write_nested_sequence read_unformated after write_array2d read_unformated after write_flattened_sequence read_unformated after write_nested_sequence write_array2d write_flattened_sequence write_nested_sequence
new 0.015546 0.013859 0.050551 0.031996 0.384817 0.487362 0.007239 0.003579 0.008220 0.034057 0.378443 0.422653 0.146439 0.122786 0.489544 0.041093 0.377722 0.410560 0.116973 1.864543 1.881439

@thomwolf thomwolf requested a review from lhoestq Aug 26, 2020
Copy link
Member

@lhoestq lhoestq left a comment

This si so cool !

@thomwolf thomwolf merged commit 5c89ed1 into master Aug 27, 2020
4 checks passed
@thomwolf thomwolf deleted the benchmark branch Aug 27, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants