Home Unstructured

Unstructured

Unstructured is an open-source ETL solution that converts complex documents into structured data for language models, featuring enterprise-grade capabilities like workflow orchestration, document partitioning, enrichment, chunking, and embedding.

Language
HTML
Latest Release
0.18.22
License
Apache License 2.0

Key Features

  • Transforms unstructured documents into structured data
  • Enterprise-grade workflow automation
  • Supports partitioning, enrichment, and embedding
  • Optimized for extracting data for language models
  • Open-source and production ready

Alternative Tools

LangChainApache TikadoccanoHaystack


Community

Stars
13.4k
Open Issues
233
Forks
1.1k