Available Processors
Document Parser
Extracts text from various document formats.Text Chunker
Splits text into optimal chunks for embedding.Metadata Extractor
Automatically extracts metadata from documents.Processing Pipeline
You can chain processors together:Supported Formats
| Format | Parser | Notes |
|---|---|---|
| Yes | OCR available for scanned docs | |
| DOCX | Yes | Preserves formatting |
| PPTX | Yes | Extracts slide content |
| HTML | Yes | Cleans and extracts text |
| Markdown | Yes | Direct processing |