Workflow & Agents|Index 02
Gemini 3.5 Flash Gains 'Computer Use' Capabilities
Google's latest Gemini model can now interact with computer interfaces, automating multi-step digital tasks. This marks a significant step towards agentic AI in professional workflows.
- Via
- AITECH TOKYO Editors
- Dateline
- Tokyo
- Date
- June 24, 2026
- Time
- 5 min read
Source
Hacker News TopTagline
Gemini 3.5 Flash can now operate a computer interface.
Who & Why
For a Tokyo-based operations manager who needs to automate data entry across multiple legacy systems, this offers a path to delegate multi-step digital workflows to an AI agent.
vs. Existing
This competes with specialized RPA (Robotic Process Automation) software like UiPath or Automation Anywhere, but integrates the intelligence of an LLM to understand context and adapt to varied interfaces, rather than relying solely on pre-defined scripts.
Tokyo Take
This capability is significant globally, but its immediate impact in Tokyo depends on robust Japanese language UI interpretation and integration with Japan-specific software. While the underlying technology is powerful, practical application will require local adaptation for common enterprise tools and web services prevalent in Japan, potentially within 12-24 months for broader adoption.
Google's Gemini 3.5 Flash model has introduced new 'computer use' capabilities. This development allows the AI to interpret screen content, navigate applications, and execute actions across a digital interface.
Announced by Google on June 24, 2026, this functionality extends the model's utility beyond conversational AI. Gemini 3.5 Flash, a faster and more cost-effective variant, can now perform multi-step operations that previously required direct human input or custom scripting.
The core mechanism involves the model perceiving visual information on a screen and translating that understanding into actionable commands. It can interact with software environments as a human user would, extracting data, filling forms, and manipulating elements within an application.
For professionals, this translates into the potential for delegating complex, repetitive digital tasks. An analyst might instruct Gemini to compile specific data from various web portals into a structured report, or a marketing specialist could automate the scheduling and posting of content across multiple social media platforms.
This capability positions Gemini 3.5 Flash in direct competition with emerging AI agents and advanced workflow automation tools. It moves closer to the agentic ambitions seen in solutions like Adept or sophisticated Robotic Process Automation (RPA) platforms, but with the added contextual intelligence of a foundational large language model.
The model can now interpret screen content and execute actions across applications.
While specific pricing for this new feature was not detailed, Gemini 3.5 Flash generally offers a more economical API compared to its Pro counterpart, making it accessible for a broader range of applications. The host country for Google's primary development remains the United States.
Beyond Earth, this capability offers a pathway for autonomous systems to perform complex maintenance, repair, and operational tasks on distant planetary bases or orbital stations. Imagine an AI agent monitoring environmental controls on a lunar habitat, diagnosing equipment failures, and even initiating repair protocols by interacting directly with the habitat's internal systems, all without human intervention from millions of miles away.
Adjacent Tools
Workflow & Agents
The Hidden Costs of Corporate AI Adoption
Companies grapple with spiraling AI expenses as employees integrate generative tools into daily tasks, prompting a new wave of governance solutions.
Workflow & Agents
Stanford HAI Research Exposes Racial Bias in AI Hiring Tools
New research from Stanford University's Human-Centered AI (HAI) institute reveals that AI-driven hiring tools can perpetuate and amplify racial bias, leading to systemic rejection of qualified candidates.
Workflow & Agents
Anthropic's Claude Tag learns corporate knowledge from Slack
Anthropic introduces Claude Tag, an enterprise tool designed to ingest a company's internal Slack communications to provide context-aware AI assistance. It aims to streamline information retrieval and internal knowledge sharing.