codefuse-devops-eval
DevOps benchmark
An evaluation suite for assessing foundation models in the DevOps field.
Industrial-first evaluation benchmark for LLMs in the DevOps/AIOps domain.
685 stars
9 watching
43 forks
Language: Python
last commit: 4 months ago
Linked from 1 awesome list
Related projects:
Repository | Description | Stars |
---|---|---|
codefuse-ai/codefuse-devops-model | An industrial-first language model for answering questions in the DevOps domain | 588 |
codefuse-ai/codefuse-chatbot | An AI-powered tool designed to simplify and optimize various stages of the software development lifecycle | 1,181 |
codefuse-ai/test-agent | A tool that empowers software testing with large language models. | 559 |
codefuse-ai/mftcoder | A framework for fine-tuning large language models with multiple tasks to improve their accuracy and efficiency | 637 |
bregman-arie/howtheydevops | A collection of publicly available resources on DevOps practices from companies around the world | 725 |
princeton-nlp/intercode | An interactive code environment framework for evaluating language agents through execution feedback. | 194 |
open-evals/evals | A framework for evaluating OpenAI models and an open-source registry of benchmarks. | 19 |
cloud-cv/evalai | A platform for comparing and evaluating AI and machine learning algorithms at scale | 1,771 |
hkust-nlp/ceval | An evaluation suite providing multiple-choice questions for foundation models in various disciplines, with tools for assessing model performance. | 1,636 |
johnathan79717/codeforces-parser | Generates sample tests and input/output files for competitive programming contests | 137 |
alco/benchfella | Tools for comparing and benchmarking small code snippets | 516 |
microsoft/codexglue | A benchmark dataset and open challenge to improve AI models' ability to understand and generate code | 1,560 |
openai/simple-evals | A library for evaluating language models using standardized prompts and benchmarking tests. | 1,939 |
joelwmale/codeception-action | An action for running Codeception tests in GitHub workflows | 15 |
openai/procgen | A benchmark for evaluating reinforcement learning agent performance on procedurally generated game-like environments. | 1,021 |