Data Science Tools
35 tools available
JavaScript Algorithms
A comprehensive collection of algorithms and data structures implemented in JavaScript, complete with explanations and additional reading links.
Transformers
State-of-the-art Machine Learning library for Pytorch, TensorFlow, and JAX, providing thousands of pre-trained models for natural language processing, computer vision, and other areas.
Whisper
Robust Speech Recognition via Large-Scale Weak Supervision, capable of transcribing and translating spoken language.
Awesome Machine Learning
A curated list of awesome Machine Learning frameworks, libraries, and software, providing insights into various useful resources in the domain.
Superset
Apache Superset is an open-source data visualization and data exploration platform, offering a wide range of visualization options and a user-friendly interface for interactive data exploration.
Prometheus
The Prometheus monitoring system and time series database for collecting and storing metrics as time series data.
Crawl4AI
Crawl4AI is an open-source and LLM-friendly web crawler and scraper that facilitates efficient data extraction from the web.
Metabase
Metabase is an easy-to-use open source Business Intelligence and Embedded Analytics tool that enables seamless interaction with data for everyone.
ClickHouse
ClickHouseยฎ is a high-performance real-time analytics database management system designed for handling large volumes of data at incredible speed.
Milvus
A high-performance, cloud-native vector database designed for scalable and efficient vector Approximate Nearest Neighbor (ANN) search.
Mindsdb
MindsDB is an AI's query engine, a platform for building AI models that can learn and answer questions over large-scale federated data.
CyberChef
The Cyber Swiss Army Knife - a powerful web app for encryption, encoding, compression, and data analysis.
Tinygrad
A minimalist deep learning framework designed to be simple and easy to understand, inspired by pytorch and micrograd.
Redash
Make Your Company Data Driven. Connect to any data source, easily visualize, dashboard, and share your data.
STORM
STORM is an LLM-powered knowledge curation system designed to research topics and generate comprehensive full-length reports with citations. Developed by Stanford's OVAL team, STORM leverages large language models to streamline information gathering and synthesis.
Label-studio
Label Studio is a versatile data labeling and annotation tool supporting multiple data types with a standardized output format.
Haystack
AI orchestration framework to build customizable, production-ready LLM applications. Connect components such as models, vector databases, and file converters to pipelines or agents that can interact with your data. Best suited for building Retrieval-Augmented Generation (RAG), question answering, semantic search, or conversational agent chatbots.
Handsontable
Handsontable is a JavaScript data grid and data table offering a spreadsheet-like look and feel, fully compatible with React, Angular, and Vue. Developed and supported by the Handsontable team.
Pandas-ai
PandasAI allows you to interact with your data sources such as databases and datalakes using conversational language. It leverages Language Model Models (LLMs) and RAG to provide intuitive data analysis.
Thingsboard
Open-source IoT Platform for comprehensive device management, data collection, processing, and visualization.
Airbyte
The leading data integration platform for ETL / ELT data pipelines from APIs, databases, and files to data warehouses, data lakes, and data lakehouses. Both self-hosted and Cloud-hosted.
Cube
Cube is a universal semantic layer platform for AI, BI, spreadsheets, and embedded analytics. It serves as an access layer for data analytics, providing real-time data access and integration across multiple sources.
Prefect
Prefect is a powerful workflow orchestration framework for building, running, and monitoring resilient data pipelines in Python.
Anthropic Cookbook
A collection of notebooks and recipes showcasing innovative and effective ways of using Claude for machine learning and AI experiments.
Graphiti
Graphiti helps to build real-time knowledge graphs tailored for AI agents, enhancing their decision-making and data processing capabilities.
Memvid
Video-based AI memory library for storing millions of text chunks in MP4 files with lightning-fast semantic search, eliminating the need for a traditional database.
System_prompts_leaks
A collection of extracted system prompts from popular chatbots, including ChatGPT, Claude, and Gemini.
Lightdash
Lightdash is a self-serve business intelligence (BI) tool designed to empower data teams to work more efficiently and effectively.
Modelcontextprotocol
Specification and comprehensive documentation for the Model Context Protocol, a standard for managing AI model contextual data efficiently.
Jitsu
Jitsu is an open-source Segment alternative with a fully-scriptable data ingestion engine designed for modern data teams. Set up a real-time data pipeline in minutes, not days.
Stay Updated!
Get notified about new tools and updates to existing ones.