codefuse-devops-eval

DevOps benchmark

An evaluation suite for assessing foundation models in the DevOps field.

Industrial-first evaluation benchmark for LLMs in the DevOps/AIOps domain.

690 stars

9 watching

44 forks

Language: Python

last commit: over 1 year ago

Linked from 1 awesome list

Backlinks from these awesome lists:

luban-agi/awesome-aigc-tutorials

Related projects:

Repository	Description	Stars
codefuse-ai/codefuse-devops-model	An industrial-first language model for answering questions in the DevOps domain	596
codefuse-ai/codefuse-chatbot	An AI-powered tool designed to simplify and optimize various stages of the software development lifecycle	1,202
codefuse-ai/test-agent	A tool that empowers software testing with large language models.	565
codefuse-ai/mftcoder	A framework for fine-tuning large language models with multiple tasks to improve their accuracy and efficiency	647
bregman-arie/howtheydevops	A collection of publicly available resources on DevOps practices from companies around the world	733
princeton-nlp/intercode	An interactive code environment framework for evaluating language agents through execution feedback.	198
open-evals/evals	A framework for evaluating OpenAI models and an open-source registry of benchmarks.	19
cloud-cv/evalai	A platform for comparing and evaluating AI and machine learning algorithms at scale	1,779
hkust-nlp/ceval	An evaluation suite providing multiple-choice questions for foundation models in various disciplines, with tools for assessing model performance.	1,650
johnathan79717/codeforces-parser	Generates sample tests and input/output files for competitive programming contests	137
alco/benchfella	Tools for comparing and benchmarking small code snippets	514
microsoft/codexglue	A benchmark dataset and open challenge to improve AI models' ability to understand and generate code	1,575
openai/simple-evals	Evaluates language models using standardized benchmarks and prompting techniques.	2,059
joelwmale/codeception-action	An action for running Codeception tests in GitHub workflows	15
openai/procgen	A benchmark for evaluating reinforcement learning agent performance on procedurally generated game-like environments.	1,030