skyvern
Browser automator
Automates browser-based workflows using LLMs and computer vision to replace brittle automation solutions.
Automate browser-based workflows with LLMs and Computer Vision
11k stars
70 watching
768 forks
Language: Python
last commit: 2 months ago apiautomationbrowserbrowser-automationcomputergptllmplaywrightpythonrpavisionworkflow
Related projects:
Repository | Description | Stars |
---|---|---|
| An all-encompassing tool providing a web interface to access and utilize various AI models for tasks such as text generation, image analysis, music generation, and more. | 4,394 |
| A framework for running AI and batch workloads on any infrastructure, offering unified execution, cost savings, and high GPU availability. | 6,905 |
| A framework for building enterprise LLM-based applications using small, specialized models | 8,303 |
| A platform for automating web browser interactions | 30,979 |
| An efficient C#/.NET library for running Large Language Models (LLMs) on local devices | 2,750 |
| An open-source AI framework and API for building intelligent applications | 5,391 |
| A Chromium-based browser workflow for Alfred 5 that provides features such as search, bookmark management, and autofill data retrieval. | 127 |
| A Python framework to build LLM-powered applications by setting up agents with optional components and having them collaboratively solve problems through message exchange | 2,795 |
| A framework for automating web browsers with a single API to enable cross-browser testing and automation. | 67,755 |
| An interactive video and image annotation tool for computer vision | 12,821 |
| An extension for automating browser tasks by creating a workflow of connected blocks. | 12,581 |
| A framework for building multi-user online games using the actor model. | 13,387 |
| A tool for testing web applications by simulating user interactions. | 10,029 |
| A Python library for building interactive dashboards to explain machine learning models | 2,321 |
| A library that provides pre-trained models and frameworks for multimodal vision-language intelligence tasks such as image captioning and visual question answering. | 10,058 |