evals
openai · openai/evals
Evals is a framework for evaluating LLMs and LLM systems, and an open-source registry of benchmarks.
B · Good evidence · Execution High
Narrow the list first, then review author, source, risk, and alternatives to pick faster.
Start simple, then layer advanced filters
Showing the first 24 of 45,975 indexed skills.
Applied filters
openai · openai/evals
Evals is a framework for evaluating LLMs and LLM systems, and an open-source registry of benchmarks.
B · Good evidence · Execution High
You get source-backed evidence and hints here; installation decisions always stay with you.
slopus · slopus/happy
Mobile and Web client for Codex and Claude Code, with realtime voice, encryption and fully featured
B · Good evidence · Execution High
openai · openai/tiktoken
tiktoken is a fast BPE tokeniser for use with OpenAI's models.
C · Review first · Execution High
mlc-ai · mlc-ai/web-llm
High-performance In-browser LLM Inference Engine
B · Good evidence · Execution High
xming521 · xming521/WeClone
🚀 One-stop solution for creating your AI twin from chat history 💡 Fine-tune LLMs with your chat logs to capture your unique style, then bind to a chatbot to bring your digital self to life. 从聊天记录创造数字分身的一站式解决方案
B · Good evidence · Execution High
karpathy · karpathy/llm-council
LLM Council works together to answer your hardest questions
C · Review first · Execution High
richards199999 · richards199999/Thinking-Claude
Let your Claude able to think
B · Good evidence · Execution High
kvcache-ai · kvcache-ai/ktransformers
A Flexible Framework for Experiencing Heterogeneous LLM Inference/Fine-tune Optimizations
B · Good evidence · Execution High
anthropics · anthropics/claude-plugins-official
Official, Anthropic-managed directory of high quality Claude Code Plugins.
B · Good evidence · Execution High
openai · openai/skills
Skills Catalog for Codex
B · Good evidence · Execution High
openai · openai/baselines
OpenAI Baselines: high-quality implementations of reinforcement learning algorithms
B · Good evidence · Execution High
argoproj · argoproj/argo-workflows
Workflow Engine for Kubernetes
B · Good evidence · Execution High
ai-shifu · ai-shifu/ChatALL
Concurrently chat with ChatGPT, Bing Chat, Bard, Alpaca, Vicuna, Claude, ChatGLM, MOSS, 讯飞星火, 文心一言 and more, discover the best answers
B · Good evidence · Execution High
anthropics · anthropics/claude-quickstarts
A collection of projects designed to help developers quickly get started with building deployable applications using the Claude API
B · Good evidence · Execution High
ZJU-LLMs · ZJU-LLMs/Foundations-of-LLMs
A book for Learning the Foundations of LLMs
B · Good evidence · Execution High
amplication · amplication/amplication
Amplication brings order to the chaos of large-scale software development by creating Golden Paths for developers - streamlined workflows that drive consistency, enable high-quality code practices, simplify onboarding, and accelerate standardized delivery across teams.
B · Good evidence · Execution High
claude-code-best · claude-code-best/claude-code
原汁原昧 Claude Code 可运行,可构建, 可调试版; Typescript 类型全修复; 企业级可靠性; 安全无毒, lock 文件保真, 可直接 bun i; bun run dev 启动
B · Good evidence · Execution High
dagger · dagger/dagger
Automation engine to build, test and ship any codebase. Runs locally, in CI, or directly in the cloud
B · Good evidence · Execution High
HKUDS · HKUDS/RAG-Anything
"RAG-Anything: All-in-One RAG Framework"
B · Good evidence · Execution High
ansible · ansible/awx
AWX provides a web-based user interface, REST API, and task engine built on top of Ansible. It is one of the upstream projects for Red Hat Ansible Automation Platform.
B · Good evidence · Execution High
Hammerspoon · Hammerspoon/hammerspoon
Staggeringly powerful macOS desktop automation with Lua
B · Good evidence · Execution High
alibaba · alibaba/MNN
MNN: A blazing-fast, lightweight inference engine battle-tested by Alibaba, powering high-performance on-device LLMs and Edge AI.
B · Good evidence · Execution High
llmware-ai · llmware-ai/llmware
Unified framework for building enterprise RAG pipelines with small, specialized models
B · Good evidence · Execution High