docsense.indexerο
Document indexing and processing module.
- class Document(content, metadata=None)[source]ο
Represents a document or a chunk of a document.
A Document object contains the actual text content and associated metadata. The metadata can include information like source, timestamps, chunk positions, etc.
- class DocumentLoader[source]ο
Load documents from various sources.
- load_directory(path)[source]ο
Load all supported documents from a directory recursively.
- Parameters:
- Return type:
- Returns:
List of Document objects containing file contents and metadata
- Raises:
FileNotFoundError β If directory does not exist
NotADirectoryError β If path is not a directory
ValueError β If no supported documents are found
- class VectorStore(dimension, index_path=None, use_gpu=True)[source]ο
Vector store for document embeddings using FAISS.
This class implements a vector store that uses FAISS for efficient similarity search of document embeddings. It supports: - GPU acceleration when available - Persistence to disk - Document metadata management - IVF (Inverted File) index for faster search
- __init__(dimension, index_path=None, use_gpu=True)[source]ο
Initialize the vector store.
- Parameters:
- Raises:
ValueError β If dimension is invalid
RuntimeError β If GPU initialization fails
- add_documents(documents, embeddings)[source]ο
Add documents and their embeddings to the store.
- Parameters:
- Raises:
ValueError β If number of documents doesnβt match number of embeddings, or if embedding dimensions donβt match
- Return type:
- search(query_embedding, k=2)[source]ο
Search for most similar documents using the query embedding.
- Parameters:
query_embedding (
ndarray
) β Query vector with shape (dimension,) or (1, dimension)k (
int
) β Number of results to return
- Return type:
- Returns:
List of (document, distance) tuples sorted by similarity (closest first)
- Raises:
ValueError β If query_embedding has invalid shape
- save()[source]ο
Save the index and metadata to disk.
This method saves both the FAISS index and document metadata to the specified index path. The index is saved in FAISS binary format and metadata in JSON.
- Raises:
ValueError β If no index path was specified
IOError β If saving fails
- Return type:
- load()[source]ο
Load the index and metadata from disk.
This method loads both the FAISS index and document metadata from the specified index path. The index is loaded from FAISS binary format and metadata from JSON.
- Raises:
ValueError β If no index path was specified or dimension mismatch
FileNotFoundError β If index or metadata files are missing
IOError β If loading fails
- Return type:
Modules
Document processing module for loading and chunking documents. |
|
Document loader implementation. |
|
Vector store implementation using FAISS. |