codefuse-devops-eval

DevOps benchmark

An evaluation suite for assessing foundation models in the DevOps field.

Industrial-first evaluation benchmark for LLMs in the DevOps/AIOps domain.

GitHub

685 stars
9 watching
43 forks
Language: Python
last commit: 4 months ago
Linked from 1 awesome list


Backlinks from these awesome lists:

Related projects:

Repository Description Stars
codefuse-ai/codefuse-devops-model An industrial-first language model for answering questions in the DevOps domain 588
codefuse-ai/codefuse-chatbot An AI-powered tool designed to simplify and optimize various stages of the software development lifecycle 1,181
codefuse-ai/test-agent A tool that empowers software testing with large language models. 559
codefuse-ai/mftcoder A framework for fine-tuning large language models with multiple tasks to improve their accuracy and efficiency 637
bregman-arie/howtheydevops A collection of publicly available resources on DevOps practices from companies around the world 725
princeton-nlp/intercode An interactive code environment framework for evaluating language agents through execution feedback. 194
open-evals/evals A framework for evaluating OpenAI models and an open-source registry of benchmarks. 19
cloud-cv/evalai A platform for comparing and evaluating AI and machine learning algorithms at scale 1,771
hkust-nlp/ceval An evaluation suite providing multiple-choice questions for foundation models in various disciplines, with tools for assessing model performance. 1,636
johnathan79717/codeforces-parser Generates sample tests and input/output files for competitive programming contests 137
alco/benchfella Tools for comparing and benchmarking small code snippets 516
microsoft/codexglue A benchmark dataset and open challenge to improve AI models' ability to understand and generate code 1,560
openai/simple-evals A library for evaluating language models using standardized prompts and benchmarking tests. 1,939
joelwmale/codeception-action An action for running Codeception tests in GitHub workflows 15
openai/procgen A benchmark for evaluating reinforcement learning agent performance on procedurally generated game-like environments. 1,021