Data Science Tools

42 tools available

๐Ÿ“˜ 195.1k

JavaScript Algorithms

A comprehensive collection of algorithms and data structures implemented in JavaScript, complete with explanations and additional reading links.

๐Ÿ‘จโ€๐Ÿ’ป Development๐ŸŽ“ Learning & Career๐ŸŸก JavaScript๐Ÿ”ฌ Data Science
๐Ÿค— 153.9k

Transformers

State-of-the-art Machine Learning library for Pytorch, TensorFlow, and JAX, providing thousands of pre-trained models for natural language processing, computer vision, and other areas.

๐Ÿค– AI๐Ÿงฐ Framework๐Ÿ”ฌ Data Science
๐Ÿ”Š 92.1k

Whisper

Robust Speech Recognition via Large-Scale Weak Supervision, capable of transcribing and translating spoken language.

๐Ÿค– AI๐Ÿ› ๏ธ Tools๐Ÿ‘จโ€๐Ÿ’ป Development๐Ÿ”ฌ Data Science
๐Ÿง  71.0k

Awesome Machine Learning

A curated list of awesome Machine Learning frameworks, libraries, and software, providing insights into various useful resources in the domain.

๐Ÿ•ถ๏ธ Awesome๐Ÿค– AI๐Ÿ”ฌ Data Science
๐Ÿ“Š 69.4k

Superset

Apache Superset is an open-source data visualization and data exploration platform, offering a wide range of visualization options and a user-friendly interface for interactive data exploration.

๐Ÿ”ฌ Data Science๐Ÿ“ˆ Visualization๐Ÿ“Š Analytics
๐Ÿ–ผ๏ธ 66.2k

PaddleOCR

PaddleOCR is a powerful and lightweight open source OCR toolkit that enables conversion of images and PDF documents into structured data for AI applications. It supports over 100 languages, making it a versatile bridge between image/PDF content and large language models (LLMs) for a wide range of tasks.

๐Ÿค– AI๐Ÿ”ฌ Data Science๐Ÿ› ๏ธ Tools๐Ÿ Python
๐Ÿ“Š 61.7k

Prometheus

The Prometheus monitoring system and time series database for collecting and storing metrics as time series data.

๐Ÿ”Ž Monitoring๐Ÿ› ๏ธ Tools๐Ÿ”จ Utils๐Ÿ”ฌ Data Science
๐ŸŒ 57.2k

Crawl4AI

Crawl4AI is an open-source and LLM-friendly web crawler and scraper that facilitates efficient data extraction from the web.

๐Ÿค– AI๐ŸŒ Web๐Ÿ› ๏ธ Tools๐Ÿ”ฌ Data Science
๐Ÿ“Š 45.0k

Metabase

Metabase is an easy-to-use open source Business Intelligence and Embedded Analytics tool that enables seamless interaction with data for everyone.

๐Ÿ“Š Analytics๐Ÿ“ˆ Visualization๐Ÿ”ฌ Data Science
๐Ÿ“Š 41.7k

Milvus

A high-performance, cloud-native vector database designed for scalable and efficient vector Approximate Nearest Neighbor (ANN) search.

๐Ÿ’พ Database๐Ÿค– AI๐Ÿ”ฌ Data Science
โšก 39.5k

ClickHouse

ClickHouseยฎ is a high-performance real-time analytics database management system designed for handling large volumes of data at incredible speed.

๐Ÿ’พ Database๐Ÿ“Š Analytics๐ŸŽ๏ธ Performance๐Ÿ”ฌ Data Science
๐Ÿง  37.9k

Mindsdb

MindsDB is an AI's query engine, a platform for building AI models that can learn and answer questions over large-scale federated data.

๐Ÿค– AI๐Ÿ”ฌ Data Science๐Ÿ“Š Analytics๐Ÿ› ๏ธ Tools
๐Ÿ”ง 32.0k

CyberChef

The Cyber Swiss Army Knife - a powerful web app for encryption, encoding, compression, and data analysis.

๐ŸŒ Web๐Ÿ”ฌ Data Science๐Ÿ”’ Security๐Ÿ› ๏ธ Tools
๐Ÿง  30.9k

Tinygrad

A minimalist deep learning framework designed to be simple and easy to understand, inspired by pytorch and micrograd.

๐Ÿ‘จโ€๐Ÿ’ป Development๐Ÿค– AI๐Ÿ”ฌ Data Science
๐Ÿ“Š 28.1k

Redash

Make Your Company Data Driven. Connect to any data source, easily visualize, dashboard, and share your data.

๐Ÿ”ฌ Data Science๐Ÿ“ˆ Visualization๐Ÿ“Š Analytics
๐ŸŒ€ 27.7k

STORM

STORM is an LLM-powered knowledge curation system designed to research topics and generate comprehensive full-length reports with citations. Developed by Stanford's OVAL team, STORM leverages large language models to streamline information gathering and synthesis.

๐Ÿค– AI๐Ÿ“š Documentation๐Ÿ”ฌ Data Science
๐Ÿ—ƒ๏ธ 24.3k

System_prompts_leaks

A collection of extracted system prompts from popular chatbots, including ChatGPT, Claude, and Gemini.

๐Ÿค– AI๐Ÿ› ๏ธ Tools๐Ÿ”ฌ Data Science
๐Ÿท๏ธ 23.9k

Label-studio

Label Studio is a versatile data labeling and annotation tool supporting multiple data types with a standardized output format.

๐Ÿค– AI๐Ÿ”ฌ Data Science๐Ÿ› ๏ธ Tools
๐Ÿค– 23.6k

Haystack

AI orchestration framework to build customizable, production-ready LLM applications. Connect components such as models, vector databases, and file converters to pipelines or agents that can interact with your data. Best suited for building Retrieval-Augmented Generation (RAG), question answering, semantic search, or conversational agent chatbots.

๐Ÿค– AI๐Ÿงฐ Frameworkโš™๏ธ DevOps๐Ÿ”ฌ Data Science
๐Ÿ† 22.9k

Best-of-ml-python

A curated and ranked collection of top machine learning libraries and tools for Python. Regularly updated to highlight the best open-source machine learning projects in the Python ecosystem.

๐Ÿ•ถ๏ธ Awesome๐Ÿ”ฌ Data Science๐Ÿ Python๐Ÿค– AI
๐Ÿค– 22.8k

Pandas-ai

PandasAI allows you to interact with your data sources such as databases and datalakes using conversational language. It leverages Language Model Models (LLMs) and RAG to provide intuitive data analysis.

๐Ÿค– AI๐Ÿ”ฌ Data Science๐Ÿ“Š Analytics๐Ÿ› ๏ธ Tools
๐Ÿ“Š 21.7k

Handsontable

Handsontable is a JavaScript data grid and data table offering a spreadsheet-like look and feel, fully compatible with React, Angular, and Vue. Developed and supported by the Handsontable team.

๐Ÿ‘จโ€๐Ÿ’ป Development๐ŸŸก JavaScript๐Ÿ”ฌ Data Science๐Ÿ“ˆ Visualization
๐Ÿง  21.1k

Graphiti

Graphiti helps to build real-time knowledge graphs tailored for AI agents, enhancing their decision-making and data processing capabilities.

๐Ÿค– AI๐Ÿ”ฌ Data Science๐Ÿ‘จโ€๐Ÿ’ป Development๐Ÿ› ๏ธ Tools
๐ŸŒ 20.6k

Thingsboard

Open-source IoT Platform for comprehensive device management, data collection, processing, and visualization.

๐Ÿ”ฌ Data Science๐Ÿ“ˆ Visualizationโš™๏ธ DevOps
๐Ÿ”„ 20.3k

Airbyte

The leading data integration platform for ETL / ELT data pipelines from APIs, databases, and files to data warehouses, data lakes, and data lakehouses. Both self-hosted and Cloud-hosted.

๐Ÿ” API๐Ÿ’พ Database๐Ÿ”ฌ Data Scienceโš™๏ธ DevOps
๐Ÿ“Š 19.2k

Cube

Cube is a universal semantic layer platform for AI, BI, spreadsheets, and embedded analytics. It serves as an access layer for data analytics, providing real-time data access and integration across multiple sources.

๐Ÿ“Š Analytics๐Ÿค– AI๐Ÿ”ฌ Data Science๐Ÿ” API
๐Ÿ”„ 18.6k

Prefect

Prefect is a powerful workflow orchestration framework for building, running, and monitoring resilient data pipelines in Python.

๐Ÿ‘จโ€๐Ÿ’ป Development๐Ÿ”ฌ Data Scienceโš™๏ธ DevOps๐Ÿ Python
๐Ÿ“˜ 18.5k

Anthropic Cookbook

A collection of notebooks and recipes showcasing innovative and effective ways of using Claude for machine learning and AI experiments.

๐Ÿค– AI๐Ÿ”ฌ Data Science๐Ÿ‘จโ€๐Ÿ’ป Development๐Ÿ“š Documentation
๐Ÿ“„ 13.4k

Unstructured

Unstructured is an open-source ETL solution that converts complex documents into structured data for language models, featuring enterprise-grade capabilities like workflow orchestration, document partitioning, enrichment, chunking, and embedding.

๐Ÿค– AI๐Ÿ”ฌ Data Science๐Ÿ‘จโ€๐Ÿ’ป Development๐Ÿ› ๏ธ Tools
๐Ÿš€ 11.1k

StarRocks

StarRocks is an open source, high-performance analytical database and query engine designed for sub-second analytics both on and off the data lakehouse. It excels in multi-dimensional analytics, real-time analytics, and complex ad-hoc queries, offering flexibility for a wide range of analytical scenarios as a Linux Foundation project.

๐Ÿ’พ Database๐Ÿ“Š Analytics๐ŸŽ๏ธ Performance๐Ÿ”ฌ Data Science

Stay Updated!

Get notified about new tools and updates to existing ones.