What is UI-TARS?
UI-TARS is a free and open-source AI model designed for computer control and automation. It features both a browser agent, which operates within your internet browser, and a desktop agent that can interact with your entire computer system, not just the browser. This versatility allows users to automate a wide range of tasks seamlessly.
UI-TARS is available in three sizes: 2B, 7B, and 72B, each specially trained for tasks involving computer control, screen detection, and predicting subsequent actions. With its advanced vision model, UI-TARS can interpret and interact with visual data, making it a powerful tool for both browser and desktop automation.
Overview of UI-TARS
Feature | Description |
---|---|
Model Name | UI-TARS |
Functionality | AI agent for browser and desktop automation |
Paper | arxiv.org/abs/2501.12326 |
Usage Options | Hugging Face Demo, Local Installation |
Hugging Face Space | huggingface.co/spaces/bytedance-research/UI-TARS |
GitHub Repository | github.com/bytedance/UI-TARS-desktop |
Discord | discord.gg/txAE43ps |
Key Features of UI-TARS
Vision Model
Interprets and interacts with visual data on your screen with high accuracy.
Browser Automation
Automates web-based tasks with intelligent navigation and interaction capabilities.
AI-Powered Automation
Utilizes advanced AI algorithms to automate repetitive tasks efficiently.
Desktop Control
Interacts with desktop applications like Microsoft Office, VS Code, and more.
Multiple Model Sizes
Available in 2B, 7B, and 72B parameters to suit different hardware capabilities.
Getting Started with UI-TARS
What is UI-TARS?
UI-TARS is a free tool that offers both a browser agent and a desktop agent, making it incredibly versatile. The browser agent works within your internet browser, while the desktop agent can interact with your entire computer, not just the browser.
1. Choose Your Model
UI-TARS is available in three sizes: 2B, 7B, and 72B. Select a model based on your hardware capabilities. These models are specially trained for tasks involving computer control, detecting what’s on the screen, and predicting the next steps.
Automated GUI Interaction
UI-TARS can automate interactions with graphical user interfaces, making it easier to perform repetitive tasks.

2. Installation
Download and install UI-TARS from the official repository or use the Hugging Face demo. You can find the GitHub repository at github.com/bytedance/UI-TARS-desktop.
UI-TARS AI Agent Capabilities
UI-TARS can automate tasks in both browsers and desktop applications. It excels in interpreting visual data and performing complex workflows.
How to Use UI-TARS?
Step 1: Download and Install
Visit the GitHub repository to download the latest version of UI-TARS. Follow the installation instructions provided in the repository.
Step 2: Choose Your Model
Decide which model size (2B, 7B, or 72B) you want to use based on your hardware capabilities. The 7B model is generally recommended for most users.
Step 3: Configure Settings
Open the UI-TARS application and configure the settings according to your preferences. Make sure to grant the necessary permissions for the application to function properly.
Step 4: Create Your Automation Script
Use the built-in script editor to create your automation scripts. You can write scripts to automate tasks in both browsers and desktop applications.
Step 5: Test Your Script
Before running your script on important tasks, test it in a safe environment to ensure it behaves as expected.
Step 6: Run Your Automation
Once you are satisfied with your script, run it to automate your tasks. Monitor the process to ensure everything is functioning correctly.
Pros and Cons
Pros
- Open source and free
- Multiple model sizes available
- Both browser and desktop automation
- Advanced vision processing
Cons
- 72B model requires high-end hardware
- System permissions required
- Complex automation