Workflow & Agents|Index 02

Gemini 3.5 Flash Gains 'Computer Use' Capabilities

Google's latest Gemini model can now interact with computer interfaces, automating multi-step digital tasks. This marks a significant step towards agentic AI in professional workflows.

Via: AITECH TOKYO Editors
Dateline: Tokyo
Date: June 24, 2026
Time: 5 min read

Source

Hacker News Top

Gemini 3.5 Flash Gains 'Computer Use' Capabilities

Tagline

Gemini 3.5 Flash can now operate a computer interface.

Who & Why

For a Tokyo-based operations manager who needs to automate data entry across multiple legacy systems, this offers a path to delegate multi-step digital workflows to an AI agent.

vs. Existing

This competes with specialized RPA (Robotic Process Automation) software like UiPath or Automation Anywhere, but integrates the intelligence of an LLM to understand context and adapt to varied interfaces, rather than relying solely on pre-defined scripts.

Tokyo Take

This capability is significant globally, but its immediate impact in Tokyo depends on robust Japanese language UI interpretation and integration with Japan-specific software. While the underlying technology is powerful, practical application will require local adaptation for common enterprise tools and web services prevalent in Japan, potentially within 12-24 months for broader adoption.

Google's Gemini 3.5 Flash model has introduced new 'computer use' capabilities. This development allows the AI to interpret screen content, navigate applications, and execute actions across a digital interface.

Announced by Google on June 24, 2026, this functionality extends the model's utility beyond conversational AI. Gemini 3.5 Flash, a faster and more cost-effective variant, can now perform multi-step operations that previously required direct human input or custom scripting.

The core mechanism involves the model perceiving visual information on a screen and translating that understanding into actionable commands. It can interact with software environments as a human user would, extracting data, filling forms, and manipulating elements within an application.

For professionals, this translates into the potential for delegating complex, repetitive digital tasks. An analyst might instruct Gemini to compile specific data from various web portals into a structured report, or a marketing specialist could automate the scheduling and posting of content across multiple social media platforms.

This capability positions Gemini 3.5 Flash in direct competition with emerging AI agents and advanced workflow automation tools. It moves closer to the agentic ambitions seen in solutions like Adept or sophisticated Robotic Process Automation (RPA) platforms, but with the added contextual intelligence of a foundational large language model.

The model can now interpret screen content and execute actions across applications.

While specific pricing for this new feature was not detailed, Gemini 3.5 Flash generally offers a more economical API compared to its Pro counterpart, making it accessible for a broader range of applications. The host country for Google's primary development remains the United States.

Beyond Earth, this capability offers a pathway for autonomous systems to perform complex maintenance, repair, and operational tasks on distant planetary bases or orbital stations. Imagine an AI agent monitoring environmental controls on a lunar habitat, diagnosing equipment failures, and even initiating repair protocols by interacting directly with the habitat's internal systems, all without human intervention from millions of miles away.

The Tokyo Editor's Read

What this AI story could mean for Tokyo in the years ahead.

GoogleのGemini AIが、まるで人間のようにコンピューターを操作できるようになったというニュースです。これは、画面を見て内容を理解し、ボタンをクリックしたり、文字を入力したり、アプリケーションを操作したりといった一連の作業を、私たちが言葉で指示するだけでAIがこなしてくれる、勤勉なアシスタントを手に入れたようなものです。デジタルアシスタントに、コンピューターを扱うための手足と目を与えた、と考えると分かりやすいでしょう。

東京の読者にとっては、銀行のインターフェースや行政手続き、あるいは社内の基幹システムといった領域で、複雑な事務作業がより速く、正確に処理される可能性を秘めています。例えば、顧客サポートのAIが、問い合わせ内容を理解するだけでなく、レガシーシステムを操作して口座情報を検索したり、記録を更新したりできるようになるかもしれません。これにより、定型的な業務は人間の手を介さずに24時間対応可能になるでしょう。

この機能が東京で広く実用化されるまでには、12〜24ヶ月程度の時間がかかると予想されます。ボトルネックとなるのは、基盤技術だけでなく、日本語UIの堅牢な解釈能力、多様な企業システムとのセキュアな連携、そして日本のデータプライバシー規制への準拠です。日本の企業が、それぞれの独自の運用状況に合わせてこの技術を活用するための具体的なアプリケーション開発やパートナーシップを構築する必要があります。

現時点では、この「コンピューター操作」機能を持つLLMを直接提供する日本企業はまだありません。しかし、NTTやソフトバンクといった企業は、LLM開発と企業向けAIソリューションに大規模な投資を行っています。彼らは自社の広大な企業エコシステムやビジネス顧客向けに、同様のエージェント機能を模索する可能性が高いでしょう。既存のネットワークやサービス統合における強みを活かし、ワークフロー自動化の領域に進出していくと考えられます。

Editorial: AITECH TOKYO Editors

Adjacent Tools

Workflow & Agents

The Hidden Costs of Corporate AI Adoption

Companies grapple with spiraling AI expenses as employees integrate generative tools into daily tasks, prompting a new wave of governance solutions.

Via AITECH TOKYO Editors · 5 min read

Source:TechCrunch AI

Workflow & Agents

Stanford HAI Research Exposes Racial Bias in AI Hiring Tools

New research from Stanford University's Human-Centered AI (HAI) institute reveals that AI-driven hiring tools can perpetuate and amplify racial bias, leading to systemic rejection of qualified candidates.

Via AITECH TOKYO Editors · 6 min read

Source:Hacker News Top

Workflow & Agents

Anthropic's Claude Tag learns corporate knowledge from Slack

Anthropic introduces Claude Tag, an enterprise tool designed to ingest a company's internal Slack communications to provide context-aware AI assistance. It aims to streamline information retrieval and internal knowledge sharing.

Via AITECH TOKYO Editors · 6 min read

Source:TechCrunch AI

← Back to grid

Gemini 3.5 Flash Gains 'Computer Use' Capabilities

World AI tech, read from Tokyo. Once a week, in Japanese.

Adjacent Tools

The Hidden Costs of Corporate AI Adoption

Stanford HAI Research Exposes Racial Bias in AI Hiring Tools

Anthropic's Claude Tag learns corporate knowledge from Slack