codefuse-devops-eval
DevOps benchmark
An evaluation suite for assessing foundation models in the DevOps field.
Industrial-first evaluation benchmark for LLMs in the DevOps/AIOps domain.
690 stars
9 watching
44 forks
Language: Python
last commit: over 1 year ago
Linked from 1 awesome list
Related projects:
| Repository | Description | Stars |
|---|---|---|
| | An industrial-first language model for answering questions in the DevOps domain | 596 |
| | An AI-powered tool designed to simplify and optimize various stages of the software development lifecycle | 1,202 |
| | A tool that empowers software testing with large language models. | 565 |
| | A framework for fine-tuning large language models with multiple tasks to improve their accuracy and efficiency | 647 |
| | A collection of publicly available resources on DevOps practices from companies around the world | 733 |
| | An interactive code environment framework for evaluating language agents through execution feedback. | 198 |
| | A framework for evaluating OpenAI models and an open-source registry of benchmarks. | 19 |
| | A platform for comparing and evaluating AI and machine learning algorithms at scale | 1,779 |
| | An evaluation suite providing multiple-choice questions for foundation models in various disciplines, with tools for assessing model performance. | 1,650 |
| | Generates sample tests and input/output files for competitive programming contests | 137 |
| | Tools for comparing and benchmarking small code snippets | 514 |
| | A benchmark dataset and open challenge to improve AI models' ability to understand and generate code | 1,575 |
| | Evaluates language models using standardized benchmarks and prompting techniques. | 2,059 |
| | An action for running Codeception tests in GitHub workflows | 15 |
| | A benchmark for evaluating reinforcement learning agent performance on procedurally generated game-like environments. | 1,030 |