docsense.models.embeddings

Text embedding model implementation using Qwen.

Classes

EmbeddingModel([model_name,Β device,Β ...])

Text embedding model using Qwen.

class EmbeddingModel(model_name='Qwen/Qwen2-7B', device='cuda', max_length=512, normalize_embeddings=True)[source]

Text embedding model using Qwen.

Parameters:
  • model_name (str)

  • device (str)

  • max_length (int)

  • normalize_embeddings (bool)

__init__(model_name='Qwen/Qwen2-7B', device='cuda', max_length=512, normalize_embeddings=True)[source]

Initialize the embedding model.

Parameters:
  • model_name (str) – Name or path of the Qwen model

  • device (str) – Device to run the model on (β€˜cuda’ or β€˜cpu’)

  • max_length (int) – Maximum sequence length for tokenization

  • normalize_embeddings (bool) – Whether to L2-normalize the embeddings

encode(texts, batch_size=8)[source]

Generate embeddings for the given texts.

Parameters:
  • texts (Union[str, List[str]]) – Single text or list of texts to encode

  • batch_size (int) – Number of texts to process at once

Return type:

ndarray

Returns:

numpy array of embeddings with shape (num_texts, embedding_dim)

get_embedding_dim()[source]

Get the dimension of the embeddings.

Return type:

int

Returns:

Embedding dimension