codefuse-devops-eval
DevOps benchmark
An evaluation suite for assessing foundation models in the DevOps field.
Industrial-first evaluation benchmark for LLMs in the DevOps/AIOps domain.
690 stars
9 watching
44 forks
Language: Python
last commit: 8 months ago
Linked from 1 awesome list
Related projects:
Repository | Description | Stars |
---|---|---|
| An industrial-first language model for answering questions in the DevOps domain | 596 |
| An AI-powered tool designed to simplify and optimize various stages of the software development lifecycle | 1,202 |
| A tool that empowers software testing with large language models. | 565 |
| A framework for fine-tuning large language models with multiple tasks to improve their accuracy and efficiency | 647 |
| A collection of publicly available resources on DevOps practices from companies around the world | 733 |
| An interactive code environment framework for evaluating language agents through execution feedback. | 198 |
| A framework for evaluating OpenAI models and an open-source registry of benchmarks. | 19 |
| A platform for comparing and evaluating AI and machine learning algorithms at scale | 1,779 |
| An evaluation suite providing multiple-choice questions for foundation models in various disciplines, with tools for assessing model performance. | 1,650 |
| Generates sample tests and input/output files for competitive programming contests | 137 |
| Tools for comparing and benchmarking small code snippets | 514 |
| A benchmark dataset and open challenge to improve AI models' ability to understand and generate code | 1,575 |
| Evaluates language models using standardized benchmarks and prompting techniques. | 2,059 |
| An action for running Codeception tests in GitHub workflows | 15 |
| A benchmark for evaluating reinforcement learning agent performance on procedurally generated game-like environments. | 1,030 |