huggingface / datasets Public
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Benchmarks #535
Benchmarks #535
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Path | Metric | Value | Change |
---|---|---|---|
benchmarks/results/benchmark_array_xd.json | read_batch_formatted_as_numpy after write_array2d | 0.02028 | diff not supported |
benchmarks/results/benchmark_array_xd.json | read_batch_formatted_as_numpy after write_flattened_sequence | 0.01713 | diff not supported |
benchmarks/results/benchmark_array_xd.json | read_batch_formatted_as_numpy after write_nested_sequence | 0.05165 | diff not supported |
benchmarks/results/benchmark_array_xd.json | read_batch_unformated after write_array2d | 0.03896 | diff not supported |
benchmarks/results/benchmark_array_xd.json | read_batch_unformated after write_flattened_sequence | 0.4111 | diff not supported |
benchmarks/results/benchmark_array_xd.json | read_batch_unformated after write_nested_sequence | 0.50053 | diff not supported |
benchmarks/results/benchmark_array_xd.json | read_col_formatted_as_numpy after write_array2d | 0.0081 | diff not supported |
benchmarks/results/benchmark_array_xd.json | read_col_formatted_as_numpy after write_flattened_sequence | 0.00373 | diff not supported |
benchmarks/results/benchmark_array_xd.json | read_col_formatted_as_numpy after write_nested_sequence | 0.00643 | diff not supported |
benchmarks/results/benchmark_array_xd.json | read_col_unformated after write_array2d | 0.03772 | diff not supported |
benchmarks/results/benchmark_array_xd.json | read_col_unformated after write_flattened_sequence | 0.40853 | diff not supported |
benchmarks/results/benchmark_array_xd.json | read_col_unformated after write_nested_sequence | 0.43381 | diff not supported |
benchmarks/results/benchmark_array_xd.json | read_formatted_as_numpy after write_array2d | 0.18192 | diff not supported |
benchmarks/results/benchmark_array_xd.json | read_formatted_as_numpy after write_flattened_sequence | 0.14279 | diff not supported |
benchmarks/results/benchmark_array_xd.json | read_formatted_as_numpy after write_nested_sequence | 0.53453 | diff not supported |
benchmarks/results/benchmark_array_xd.json | read_unformated after write_array2d | 0.04543 | diff not supported |
benchmarks/results/benchmark_array_xd.json | read_unformated after write_flattened_sequence | 0.38055 | diff not supported |
benchmarks/results/benchmark_array_xd.json | read_unformated after write_nested_sequence | 0.39166 | diff not supported |
benchmarks/results/benchmark_array_xd.json | write_array2d | 0.11773 | diff not supported |
benchmarks/results/benchmark_array_xd.json | write_flattened_sequence | 1.95853 | diff not supported |
benchmarks/results/benchmark_array_xd.json | write_nested_sequence | 2.03393 | diff not supported |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Benchmark: benchmark_array_xd.json
metric | read_batch_formatted_as_numpy after write_array2d | read_unformated after write_nested_sequence | read_formatted_as_numpy after write_flattened_sequence | read_col_unformated after write_nested_sequence | read_unformated after write_array2d | read_formatted_as_numpy after write_nested_sequence | write_array2d | read_col_formatted_as_numpy after write_array2d | read_batch_formatted_as_numpy after write_flattened_sequence | read_col_unformated after write_array2d | read_batch_unformated after write_nested_sequence | write_nested_sequence | write_flattened_sequence | read_col_formatted_as_numpy after write_flattened_sequence | read_batch_formatted_as_numpy after write_nested_sequence | read_col_unformated after write_flattened_sequence | read_batch_unformated after write_array2d | read_unformated after write_flattened_sequence | read_batch_unformated after write_flattened_sequence | read_formatted_as_numpy after write_array2d | read_col_formatted_as_numpy after write_nested_sequence |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
new | 0.015843 | 0.420557 | 0.120379 | 0.440375 | 0.052844 | 0.440510 | 0.133248 | 0.006893 | 0.016099 | 0.031535 | 0.540828 | 1.901576 | 1.742698 | 0.004162 | 0.049559 | 0.375369 | 0.029470 | 0.375827 | 0.380684 | 0.166508 | 0.006616 |
old | None | None | None | None | None | None | None | None | None | None | None | None | None | None | None | None | None | None | None | None | None |
diff | None | None | None | None | None | None | None | None | None | None | None | None | None | None | None | None | None | None | None | None | None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Benchmark: benchmark_array_xd.json
metric | write_array2d | read_unformated after write_flattened_sequence | read_col_formatted_as_numpy after write_array2d | read_batch_formatted_as_numpy after write_flattened_sequence | read_formatted_as_numpy after write_nested_sequence | read_unformated after write_array2d | read_batch_unformated after write_nested_sequence | read_formatted_as_numpy after write_flattened_sequence | read_batch_unformated after write_flattened_sequence | read_col_unformated after write_flattened_sequence | read_col_unformated after write_array2d | read_unformated after write_nested_sequence | read_batch_formatted_as_numpy after write_array2d | read_batch_formatted_as_numpy after write_nested_sequence | write_nested_sequence | read_formatted_as_numpy after write_array2d | read_batch_unformated after write_array2d | read_col_formatted_as_numpy after write_nested_sequence | write_flattened_sequence | read_col_unformated after write_nested_sequence | read_col_formatted_as_numpy after write_flattened_sequence |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
new | 0.112083 | 0.324461 | 0.006592 | 0.013271 | 0.435596 | 0.037740 | 0.451111 | 0.114933 | 0.335022 | 0.354036 | 0.032292 | 0.356591 | 0.016453 | 0.048334 | 1.912305 | 0.142521 | 0.028766 | 0.007722 | 1.856903 | 0.376145 | 0.004819 |
old | None | None | None | None | None | None | None | None | None | None | None | None | None | None | None | None | None | None | None | None | None |
diff | None | None | None | None | None | None | None | None | None | None | None | None | None | None | None | None | None | None | None | None | None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Benchmark: benchmark_array_xd.json
metric | read_col_unformated after write_nested_sequence | read_formatted_as_numpy after write_flattened_sequence | read_col_formatted_as_numpy after write_nested_sequence | read_col_formatted_as_numpy after write_array2d | read_batch_unformated after write_flattened_sequence | read_batch_formatted_as_numpy after write_nested_sequence | read_unformated after write_array2d | write_nested_sequence | write_flattened_sequence | read_batch_formatted_as_numpy after write_array2d | read_batch_unformated after write_array2d | read_batch_formatted_as_numpy after write_flattened_sequence | read_batch_unformated after write_nested_sequence | read_col_unformated after write_array2d | read_formatted_as_numpy after write_array2d | read_unformated after write_flattened_sequence | read_formatted_as_numpy after write_nested_sequence | write_array2d | read_col_unformated after write_flattened_sequence | read_unformated after write_nested_sequence | read_col_formatted_as_numpy after write_flattened_sequence |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
new | 0.405741 | 0.109036 | 0.008678 | 0.004623 | 0.344870 | 0.047032 | 0.048535 | 1.867043 | 1.839632 | 0.015776 | 0.028840 | 0.013365 | 0.496086 | 0.032345 | 0.144169 | 0.338251 | 0.452199 | 0.113647 | 0.353510 | 0.394501 | 0.003315 |
old | None | None | None | None | None | None | None | None | None | None | None | None | None | None | None | None | None | None | None | None | None |
diff | None | None | None | None | None | None | None | None | None | None | None | None | None | None | None | None | None | None | None | None | None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Benchmark: benchmark_indices_mapping.json
metric | select | shard | shuffle | sort | train_test_split |
---|---|---|---|---|---|
new | 2.165722 | 4.315449 | 4.794725 | 5.004330 | 4.784939 |
Benchmark: benchmark_array_xd.json
metric | read_batch_formatted_as_numpy after write_array2d | read_batch_formatted_as_numpy after write_flattened_sequence | read_batch_formatted_as_numpy after write_nested_sequence | read_batch_unformated after write_array2d | read_batch_unformated after write_flattened_sequence | read_batch_unformated after write_nested_sequence | read_col_formatted_as_numpy after write_array2d | read_col_formatted_as_numpy after write_flattened_sequence | read_col_formatted_as_numpy after write_nested_sequence | read_col_unformated after write_array2d | read_col_unformated after write_flattened_sequence | read_col_unformated after write_nested_sequence | read_formatted_as_numpy after write_array2d | read_formatted_as_numpy after write_flattened_sequence | read_formatted_as_numpy after write_nested_sequence | read_unformated after write_array2d | read_unformated after write_flattened_sequence | read_unformated after write_nested_sequence | write_array2d | write_flattened_sequence | write_nested_sequence |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
new | 0.019596 | 0.014151 | 0.050607 | 0.033170 | 0.429964 | 0.525586 | 0.004999 | 0.004152 | 0.006640 | 0.035762 | 0.440961 | 0.466149 | 0.148929 | 0.114383 | 0.492467 | 0.040082 | 0.409890 | 0.415031 | 0.121587 | 1.917372 | 2.006842 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Benchmark: benchmark_array_xd.json
metric | read_batch_formatted_as_numpy after write_array2d | read_batch_formatted_as_numpy after write_flattened_sequence | read_batch_formatted_as_numpy after write_nested_sequence | read_batch_unformated after write_array2d | read_batch_unformated after write_flattened_sequence | read_batch_unformated after write_nested_sequence | read_col_formatted_as_numpy after write_array2d | read_col_formatted_as_numpy after write_flattened_sequence | read_col_formatted_as_numpy after write_nested_sequence | read_col_unformated after write_array2d | read_col_unformated after write_flattened_sequence | read_col_unformated after write_nested_sequence | read_formatted_as_numpy after write_array2d | read_formatted_as_numpy after write_flattened_sequence | read_formatted_as_numpy after write_nested_sequence | read_unformated after write_array2d | read_unformated after write_flattened_sequence | read_unformated after write_nested_sequence | write_array2d | write_flattened_sequence | write_nested_sequence |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
new | 0.016454 | 0.014436 | 0.050274 | 0.029738 | 0.367386 | 0.492249 | 0.006601 | 0.003421 | 0.008121 | 0.033053 | 0.359951 | 0.404901 | 0.149146 | 0.108415 | 0.500373 | 0.037859 | 0.345045 | 0.371243 | 0.111501 | 1.898830 | 2.008218 |
Benchmark: benchmark_indices_mapping.json
metric | select | shard | shuffle | sort | train_test_split |
---|---|---|---|---|---|
new | 2.150440 | 4.372631 | 4.661105 | 4.731103 | 4.883732 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Benchmark: benchmark_indices_mapping.json
metric | select | shard | shuffle | sort | train_test_split |
---|---|---|---|---|---|
new | 2.196049 | 4.292400 | 4.817259 | 4.550962 | 4.881731 |
Benchmark: benchmark_array_xd.json
metric | read_batch_formatted_as_numpy after write_array2d | read_batch_formatted_as_numpy after write_flattened_sequence | read_batch_formatted_as_numpy after write_nested_sequence | read_batch_unformated after write_array2d | read_batch_unformated after write_flattened_sequence | read_batch_unformated after write_nested_sequence | read_col_formatted_as_numpy after write_array2d | read_col_formatted_as_numpy after write_flattened_sequence | read_col_formatted_as_numpy after write_nested_sequence | read_col_unformated after write_array2d | read_col_unformated after write_flattened_sequence | read_col_unformated after write_nested_sequence | read_formatted_as_numpy after write_array2d | read_formatted_as_numpy after write_flattened_sequence | read_formatted_as_numpy after write_nested_sequence | read_unformated after write_array2d | read_unformated after write_flattened_sequence | read_unformated after write_nested_sequence | write_array2d | write_flattened_sequence | write_nested_sequence |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
new | 0.015546 | 0.013859 | 0.050551 | 0.031996 | 0.384817 | 0.487362 | 0.007239 | 0.003579 | 0.008220 | 0.034057 | 0.378443 | 0.422653 | 0.146439 | 0.122786 | 0.489544 | 0.041093 | 0.377722 | 0.410560 | 0.116973 | 1.864543 | 1.881439 |
Adding some benchmarks with DVC/CML
To add a new tracked benchmark:
./benchmarks/
. The script can use the utilities in./benchmarks/utils.py
and should output a JSON file with results in./benchmarks/results/
.That's it