docsense.models.llm

LLM (Large Language Model) wrapper implementation.

Classes

LLMModel([model_name,Β device,Β max_length,Β ...])

Wrapper for Qwen language model.

class LLMModel(model_name='Qwen/Qwen2-7B', device='cuda', max_length=2048, temperature=0.0, top_p=1.0, repetition_penalty=1.1)[source]

Wrapper for Qwen language model.

Parameters:
__init__(model_name='Qwen/Qwen2-7B', device='cuda', max_length=2048, temperature=0.0, top_p=1.0, repetition_penalty=1.1)[source]

Initialize the LLM model.

Parameters:
  • model_name (str) – Name of the Qwen model to use

  • device (str) – Device to run the model on (β€˜cuda’ or β€˜cpu’)

  • max_length (int) – Maximum sequence length for generation

  • temperature (float) – Sampling temperature (0.0 for deterministic output)

  • top_p (float) – Nucleus sampling parameter (1.0 for no filtering)

  • repetition_penalty (float) – Penalty for repeating tokens

generate(question, context)[source]

Generate answer based on context.

Parameters:
Return type:

Dict[str, Any]

get_max_length()[source]

Get the maximum sequence length.

Return type:

int

Returns:

Maximum sequence length