June 25, 2026

Workflow & Agents|Index 02

Gemini 3.5 Flash Gains 'Computer Use' Capabilities

Google's latest Gemini model can now interact with computer interfaces, automating multi-step digital tasks. This marks a significant step towards agentic AI in professional workflows.

Via
AITECH TOKYO Editors
Dateline
Tokyo
Date
June 24, 2026
Time
5 min read
Gemini 3.5 Flash Gains 'Computer Use' Capabilities

Tagline

Gemini 3.5 Flash can now operate a computer interface.

Who & Why

For a Tokyo-based operations manager who needs to automate data entry across multiple legacy systems, this offers a path to delegate multi-step digital workflows to an AI agent.

vs. Existing

This competes with specialized RPA (Robotic Process Automation) software like UiPath or Automation Anywhere, but integrates the intelligence of an LLM to understand context and adapt to varied interfaces, rather than relying solely on pre-defined scripts.

Tokyo Take

This capability is significant globally, but its immediate impact in Tokyo depends on robust Japanese language UI interpretation and integration with Japan-specific software. While the underlying technology is powerful, practical application will require local adaptation for common enterprise tools and web services prevalent in Japan, potentially within 12-24 months for broader adoption.

Google's Gemini 3.5 Flash model has introduced new 'computer use' capabilities. This development allows the AI to interpret screen content, navigate applications, and execute actions across a digital interface.

Announced by Google on June 24, 2026, this functionality extends the model's utility beyond conversational AI. Gemini 3.5 Flash, a faster and more cost-effective variant, can now perform multi-step operations that previously required direct human input or custom scripting.

The core mechanism involves the model perceiving visual information on a screen and translating that understanding into actionable commands. It can interact with software environments as a human user would, extracting data, filling forms, and manipulating elements within an application.

For professionals, this translates into the potential for delegating complex, repetitive digital tasks. An analyst might instruct Gemini to compile specific data from various web portals into a structured report, or a marketing specialist could automate the scheduling and posting of content across multiple social media platforms.

This capability positions Gemini 3.5 Flash in direct competition with emerging AI agents and advanced workflow automation tools. It moves closer to the agentic ambitions seen in solutions like Adept or sophisticated Robotic Process Automation (RPA) platforms, but with the added contextual intelligence of a foundational large language model.

The model can now interpret screen content and execute actions across applications.

While specific pricing for this new feature was not detailed, Gemini 3.5 Flash generally offers a more economical API compared to its Pro counterpart, making it accessible for a broader range of applications. The host country for Google's primary development remains the United States.

Beyond Earth, this capability offers a pathway for autonomous systems to perform complex maintenance, repair, and operational tasks on distant planetary bases or orbital stations. Imagine an AI agent monitoring environmental controls on a lunar habitat, diagnosing equipment failures, and even initiating repair protocols by interacting directly with the habitat's internal systems, all without human intervention from millions of miles away.

The Briefing

World AI tech, read from Tokyo. Once a week, in Japanese.

Each Friday: the five global AI tech stories Japanese business professionals should know about this week, translated and read through a Tokyo lens — what it means for Japan, what to act on, what to keep watching.

We respect your inbox. Unsubscribe anytime.