Skip to content

fix: Updated semantic chunking tutorial #205

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Mar 2, 2025

Conversation

bhavnicksm
Copy link
Contributor

TL;DR

Add "chonkie[semantic]" to the installs and add the new chunking params during init of the SemanticChunker. Also use potion-base-32M for improved chunk quality.

Copilot summary

This pull request includes updates to the tutorials/semantic_chunking.ipynb file to improve the tutorial's functionality and performance. The most important changes include updating the library installation command, modifying the model used for chunking, and adjusting the execution counts and outputs for various cells.

Library and model updates:

  • Updated the library installation command to include the chonkie[semantic] package.
  • Changed the embedding model from minishlab/potion-base-8M to minishlab/potion-base-32M and adjusted chunking parameters for better performance.

Execution counts and outputs:

  • Updated execution counts for multiple cells to reflect the new order of execution. [1] [2] [3] [4]
  • Modified the outputs of cells to reflect the new chunking results and performance metrics. [1] [2]

Metadata updates:

  • Updated the kernel display name and Python version in the notebook metadata. [1] [2]

Add "chonkie[semantic]" to the installs and add the new chunking params
@Pringled Pringled self-requested a review March 2, 2025 12:55
Copy link
Member

@Pringled Pringled left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for making this PR @bhavnicksm 😄! LGTM, merging 🚀

@Pringled Pringled changed the title Fix #175: Update semantic_chunking.ipynb to have "chonkie[semantic]" and updated init params fix: Updated semantic chunking tutorial Mar 2, 2025
@Pringled Pringled merged commit a00aaab into MinishLab:main Mar 2, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants