Skip to content

Moving to a new masking approach #13341

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 44 commits into
base: develop
Choose a base branch
from
Open

Moving to a new masking approach #13341

wants to merge 44 commits into from

Conversation

jdunkerley
Copy link
Member

@jdunkerley jdunkerley commented Jun 21, 2025

Pull Request Description

  • Updated LongArrayList allowing for initial capacity and copying to an array.
  • Fix for Azure SAS token support.
  • Removes remaining methods from Storage (slice, applyMask and applyFilter).
    • Removes newInstance and newUnderlyingArray from SpecialisedStorage as only used by these.
  • Also removes iteratorWithIndex from base ColumnStorage<?> as unused. Specialised versions are still kept.
  • Remove use of concrete storage types and use interfaces instead.
  • Made the Builder interface return the builder allowing for a more fluid use.
    • Also adjusted BuilderForType<T> to use the typed argument more.
  • Moved makeEmpty into the Builder class as a static method producing a masked single value storage.
  • Removed BoolBuilder.makeConstant, use the Builder method instead.
  • Moved DoubleStorageFacade inside NumericColumnAdapter and privatised.
  • Added StorageIterators.forEachOverBooleanStorage as this was missing.
  • Created reusable implementations of ColumnStorageIterators for Boolean, Long and Double storages.
  • Removed unused ComputedNullableLongStorage.
  • Aligned getData methods on DoubleStorage, LongStorage and BoolStorage to getArray.
  • The new IndexMapper (similar to the old OrderMask) keeps a single mask layer.
    • Has implementations for a single value (Constant), single slice, array remapping and reverse slice.
    • Has caching on merge, so when creating layering, we flatten into a single IndexMapper.
    • Supports the same NOT_FOUND_INDEX that OrderMask had.
    • ToDo: add Java-based unit tests.
  • New MaskOperation:
    • Implements slicing and masking by creating a MaskedStorage with an IndexMapper.
    • MaskedStorage has a parent storage and index mapper.
    • Specialised versions cover the interfaces of ColumnStorage.
    • When masking a masked storage, the `IndexMappers are merged and a new single layer mask is created.
  • Changed SliceRange to be long-based and to have a static method to produce a long[] array.
  • Column now uses ColumnStorage<T> not Storage<T>.
  • Column.applyFilter removed - only used by Table.filter and replaced by Column.mask.
  • Remove Column.applyMask, Table.applyMask and OrderMask.
  • Moved slice and mask to use the MaskOperation.
  • Updated Table.filter to use the MaskOperation.
  • Altered MultiValueKey and MultiValueIndex to have long row indexes.
    • Temporary work around so aggregators still get a List<Integer> will fix in next PR.
  • Updated JoinResult to use IndexMapper instead of OrderMask.
    • Updated CrossJoin to have a progress handler.
    • Updated SortJoin, CompoundHashJoin and SimpleHashJoin to be long-indexed.
    • Updated LookupJoin to be long indexed.
  • Altered GroupingOrderingVisitor to be long-indexed based.
    • Need to revisit in next PR and use builders and examine structure more.
  • Added Column.reverse and Table.reverse using IndexMapper.Reversed.
  • Altered OrderBuilder.buildMask to take a single OrderRule instead of a Vector.
    • Amended to work with long row indexes.
    • Also returns a long[] and then lets you call mask on the column.
  • Amended Fan_Out to use mask approach.
    • Applies to the table to optimise building the new table.

Checklist

Please ensure that the following checklist has been satisfied before submitting the PR:

  • The documentation has been updated, if necessary.
  • Screenshots/screencasts have been attached, if there are any visual changes. For interactive or animated visual changes, a screencast is preferred.
  • All code follows the
    Scala,
    Java,
    TypeScript,
    and
    Rust
    style guides. In case you are using a language not listed above, follow the Rust style guide.
  • Unit tests have been written where possible.
  • If meaningful changes were made to logic or tests affecting Enso Cloud integration in the libraries,
    or the Snowflake database integration, a run of the Extra Tests has been scheduled.
    • If applicable, it is suggested to paste a link to a successful run of the Extra Tests.

Copy link

github-actions bot commented Jun 21, 2025

✨ GUI Checks Results

Summary

  • 🧹 Prettier: ✅ Passed
  • 🧰 GUI Checks: ❌ Failed
  • 📚 Storybook: ❌ Failed

⚠️ Action Required: Please review the failed checks and fix them before merging.

See individual check results for more details.

@jdunkerley jdunkerley marked this pull request as ready for review June 24, 2025 13:02
@jdunkerley jdunkerley added the CI: No changelog needed Do not require a changelog entry for this PR. label Jun 24, 2025
builder.append(item);
yield size == 1
? builder.seal()
: MaskOperation.getSlicedStorage(builder.seal(), new IndexMapper.Constant(size));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Love it.


IndexMapper merge(IndexMapper other) {
if (_mergeCache == null) {
_mergeCache = new LeastRecentlyUsedCache<>(1000);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might want a WeakLeastRecentlyUsedCache derived from WeakHashMap so collected elements don't take up slots that other elements count use.

*/
public static Column slice(Column column, long start, long length) {
long currentSize = column.getSize();
long newSize = Math.max(0, Math.min(currentSize - start, length));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we shouldn't consider it an error to ask for a slice that is outside the original range -- we might tolerate this in the Enso code, but at this level it perhaps should be considered a logic error.

@jdunkerley
Copy link
Member Author

No significant decrease in performance on this PR but the last one does appear to have. Will look into what I can get back in later PRs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CI: No changelog needed Do not require a changelog entry for this PR.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants