Releases: MinishLab/model2vec
Releases · MinishLab/model2vec
0.6.0
What's Changed
- docs: update chonkie link on tutorial readme by @iaurg in #235
- Fix dates in README.md by @Pringled in #238
- fix: add default arg for push_to_hub by @stephantul in #240
- fix: remove direct dependency on specific hf utils by @stephantul in #244
- feat: smaller tokenizers by @stephantul in #243
- feat: update lock by @stephantul in #246
- feat: allow passing validation set explicitly by @JarbasAl in #245
- docs: Added multilingual results by @Pringled in #247
- fix: distillation for models without card by @JarbasAl in #248
- feat: add supertokenizers by @stephantul in #236
- clean-up print statement by @stephantul in #249
- fix: small typing issue by @stephantul in #250
- docs: Added new logo by @Pringled in #252
- fix: missing unk, fix bug by @stephantul in #251
- bump version by @stephantul in #258
- feat: make normalization dependent on spacing by @stephantul in #259
New Contributors
Full Changelog: v0.5.0...v0.6.0
v0.5.0
What's Changed
- fix: Updated semantic chunking tutorial by @bhavnicksm in #205
- rewrite backend by @stephantul in #207
- fix bibtex by @stephantul in #208
- feat: Added py.typed file by @Pringled in #214
- fix: pretokenize tokens before checking vocabulary by @stephantul in #215
- feat: add dimensionality during loading by @stephantul in #216
- feat: add quantization by @stephantul in #217
- feat: save load subfolder by @stephantul in #218
- feat: Added quantization for from_sentence_transformers by @Pringled in #219
- feat: faster inference for large vocab by @stephantul in #221
- feat: track token provenance by @stephantul in #222
- fix: typing issues, bug in infernece by @stephantul in #224
- fix: issues with unk and pad by @stephantul in #225
- bug: fix 0 score in evaluate by @stephantul in #226
- fix: precision during training by @stephantul in #228
- fix: issue with unk in unigram by @stephantul in #227
- docs: add info about quantization and dimensionality reduction by @stephantul in #231
- increment version by @stephantul in #232
New Contributors
- @bhavnicksm made their first contribution in #205
Full Changelog: 0.4.1...v0.5.0
0.4.1
What's Changed
- docs: Added training plot, added more training results by @Pringled in #189
- feat: Added min and max epochs to fit by @Pringled in #190
- docs: Update model card template by @Pringled in #192
- feat: Add multilabel classification for training by @Pringled in #191
- feat: Add evaluate function for classifiers by @Pringled in #195
- docs: Added discord badge by @Pringled in #193
- fix: only allows named args in pretrain by @stephantul in #200
- Bump version by @Pringled in #204
Full Changelog: 0.4.0...0.4.1
0.4.0
What's Changed
- Add fittable by @stephantul in #140
- fix scores in readme by @stephantul in #179
- docs: Refactored main docs, added separate docs directory, added training docs by @Pringled in #181
- docs: Update README.md by @Pringled in #183
- Update README.md by @Pringled in #184
- feat: replace 8m by 32m for training by @stephantul in #182
- docs: update scores in README by @stephantul in #186
- docs: Moved training results to results directory, updated docs and description by @Pringled in #187
- Bump version by @Pringled in #188
Full Changelog: v0.3.9...0.4.0
v0.3.9
What's Changed
- docs: Added new model results by @Pringled in #167
- docs: Update plot by @Pringled in #169
- feat: add trust-remote-code option by @stephantul in #173
- feat: Add SIF-like coef by @stephantul in #174
- increase version by @stephantul in #176
Full Changelog: v0.3.8...v0.3.9
v0.3.8
What's Changed
- docs: fix docstrings in distill by @stephantul in #157
- remove unnecessary import by @stephantul in #161
- remove deduplication tutorial by @stephantul in #159
- fix: issue with modernbert tokenizer, add token pattern to _distill by @stephantul in #158
- fix: fix typing issue by @stephantul in #162
- feat: float pca dims by @stephantul in #163
- feat: Add optional embedding normalization to StaticModel loading by @davidberenstein1957 in #164
- feat: Improve distill for modernBERT by @stephantul in #165
- increase version by @stephantul in #166
New Contributors
- @davidberenstein1957 made their first contribution in #164
Full Changelog: v0.3.7...v0.3.8
v0.3.7
v0.3.6
What's Changed
- Add loading from st by @stephantul in #151
- Bump version by @Pringled in #152
Full Changelog: v0.3.5...v0.3.6
v0.3.5
v0.3.4
What's Changed
- docs: Add txtai integration docs by @Pringled in #130
- docs: Reworked documentation by @Pringled in #131
- feat: Added semantic chunking with chonkie tutorial by @Pringled in #133
- feat: Updated config values by @Pringled in #136
- feat: add support for pattern for unused tokens. by @stephantul in #138
- feat: Add multiprocessing by @Pringled in #141 (suggested by davidmezzetti in #139)
- feat: Added multiprocessing threshold parameter by @Pringled in #142
- docs: Add langchain example by @Pringled in #143
- fix: Removed unneeded tokenize call by @Pringled in #144
- docs: update README.md by @eltociear in #145
- Bump version by @Pringled in #146
New Contributors
- @eltociear made their first contribution in #145
Full Changelog: v0.3.3...v0.3.4