Skip to content

[Feature]: Support highlight #42589

Open
Open
@xiaofan-luan

Description

@xiaofan-luan

Is there an existing issue for this?

  • I have searched the existing issues

Is your feature request related to a problem? Please describe.

Understood. Here’s the updated and concise version:


🔧 Milvus Highlight Design (v0.2)

✅ Schema Definition

Highlight is enabled per field in the collection schema:

{
  "field_name": "content",
  "data_type": "VarChar",
  "enable_highlight": true,
  "highlight_config": {
    "mode": "bm25" | "colbert" | "rerank"
  }
}

🔍 Query API Extension

Highlight must specify the target field:

{
  "highlight": {
    "field": "content",
    "top_k": 5  // optional, for rerank/colbert
  }
}

🧾 Response Format

{
  "field": "content",
  "text": "...",
  "highlights": [
    {"start": 15, "end": 22, "score": 0.81}
  ]
}

Modes
BM25 (keyword)

Based on N-GRAM index.

Highlights matched keywords.

Light-weight, fast.

ColBERT (multi-vector)

Requires ColBERT-based index.

Highlights document tokens with highest dot-product to query tokens.

Token-level semantic match.

ReRank (cross-encoder)

Works with rerank models like BGE-Reranker.

Highlights tokens with high attention/logit score.

High quality, slower.

See https://medium.com/%40hlealpablo/extracting-token-level-importance-from-cross-encoders-for-quick-simple-highlighting-ee4cca36764b for details

Describe the solution you'd like.

No response

Describe an alternate solution.

No response

Anything else? (Additional Context)

No response

Metadata

Metadata

Assignees

Labels

kind/featureIssues related to feature request from users

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions