Is Olmocr free to use?

Yes. Olmocr is open source and distributed under the Apache License 2.0 license.

What language is Olmocr written in?

Olmocr is written primarily in Python.

What are alternatives to Olmocr?

Popular open-source alternatives to Olmocr include pdfplumber, pdftotext, pdfminer.six, unstructured.

What category does Olmocr belong to?

Olmocr is categorized under AI, Tools, Development, Utils.

🤖 AI 🛠️ Tools 👨‍💻 Development 🔨 Utils

17.1k

Olmocr

A toolkit designed for converting and linearizing PDFs to create datasets optimized for large language model (LLM) training and evaluation.

Olmocr is built in Python, distributed under the Apache License 2.0 license, 17.1k GitHub stars, latest release v0.4.27.

Stats refreshed 3 months ago

Language: Python
Latest Release: v0.4.27
License: Apache License 2.0

Our Newsletter

Get new AI tools right in your inbox

Get short emails with useful ai projects, releases, and repos worth watching.

Key Features

PDF linearization for dataset creation
Optimized for large language model (LLM) workflows
Supports automated text extraction
Facilitates preparation of training datasets
Command-line utility for easy usage

Alternative Tools

pdfplumber pdftotext pdfminer.six unstructured

Resources

GitHub Repository

Community

Stars

17.1k

Open Issues

Forks

1.4k

183.6k

N8n

A powerful fair-code workflow automation platform with native AI capabilities. It combines visual workflow building with the ability to include custom code. Self-host or use the cloud version with over 400+ integrations available.

👨‍💻 Development⚙️ DevOps🔁 API🤖 AI

173.4k

AutoGPT

AutoGPT is the vision of accessible AI for everyone, providing tools to focus on what matters in building and utilizing AI.

🤖 AI👨‍💻 Development🛠️ Tools

168.6k

Ollama

Ollama is an AI platform that enables you to easily run and experiment with popular open source and proprietary language models like OpenAI gpt-oss, DeepSeek-R1, Gemma 3, and more—locally, with a focus on privacy and low setup overhead.

🤖 AI👨‍💻 Development🛠️ Tools

159.2k

Transformers

State-of-the-art Machine Learning library for Pytorch, TensorFlow, and JAX, providing thousands of pre-trained models for natural language processing, computer vision, and other areas.

🤖 AI🧰 Framework🔬 Data Science

158.9k

Stable-diffusion-webui

A comprehensive web-based user interface for Stable Diffusion, providing powerful tools for generating and modifying images using AI models directly from your browser.

🤖 AI🌐 Web🛠️ Tools📈 Visualization

139.5k