Google IO 2024: Grand Promises, Sparse Deliveries - A Tech Industry Tease Show
Google IO 2024: Big Promises, Limited Deliveries
Previously, we discussed how companies like OpenAI and Google often make grand announcements about AI developments that remain in the proof-of-concept stage for extended periods. Right now, OpenAI's Sora is still in internal testing. In December 2023, Google introduced Gemini Ultra 1.0, their most powerful AI model, slated for the "Gemini Advanced" subscription tier. As of mid-May 2024, Gemini Ultra has not been released to the public. In contrast, OpenAI impressed us yesterday by immediately releasing their new Gpt4o and ChatGPT at their product event.
At Google IO 2024, nearly 30 new products/features were announced, but only 20% were made available to users. This makes it one of the most extensive tease shows in the tech industry. Before diving into an analysis, here is a summary of the major products supposedly announced at the event:
Product/Feature | Description | Availability |
---|---|---|
Gemini 1.5 Pro | Long context with 1 million tokens, multimodal capabilities, improved translation, coding, and reasoning. | Available today globally. |
Gemini 1.5 Flash | Lightweight model for faster, cost-efficient tasks with multimodal reasoning and long context capabilities. | Available today globally. |
Gemini 2 | New 27 billion parameter model optimized for next-gen GPUs and TPUs. | Available in June 2024. |
Music FX DJ | Generative AI tool for creating music from prompts. | Demonstrated at the event, no specific release date mentioned. |
Search Generative Experience (SGE) | AI Overviews, multimodal search capabilities, real-time information processing. | Launching this week in the U.S., more countries soon. |
Ask Photos | Allows users to ask questions and search within their Google Photos. | Rolling out this summer. |
NotebookLM Audio Overviews | Generates audio discussions based on text materials, personalized and interactive. | Demonstrated at the event, no specific release date mentioned. |
Google Workspace Enhancements | Email summarization, advanced search in Gmail, automatic organization and tracking of receipts. | Rolling out to Labs users this month and in September 2024. |
Gemini-powered virtual teammates | AI assistants with specific roles and objectives integrated into Google Workspace. | Prototyping phase, no specific release date mentioned. |
Gemini App Updates | Voice interaction, dynamic UI, personalized Gems. | Gems rolling out in the coming months, trip planning in summer 2024. |
Trillium TPUs | Sixth generation TPUs with a 4.7x improvement in compute performance. | Available to cloud customers in late 2024. |
Axion CPUs and Blackwell GPUs | High-performance and energy-efficient CPUs and GPUs. | Blackwell GPUs available in early 2025. |
Android AI Enhancements | AI-powered search, context-aware Gemini assistant, on-device foundation model. | Various features rolling out in the coming months. |
LearnLM | AI models for personalized learning experiences, integrated into Search, Android, Gemini, and YouTube. | Rolling out in the coming months. |
SynthID Expansion | Watermarking for AI-generated text and video. | Available soon, with open-source release in the coming months. |
Gemma Open Models | Lightweight models for various tasks, including a new 27 billion parameter model. | Gemma 2 available in June 2024. |
Veo | High-quality, 1080p video generation from text, image, and video prompts; supports various cinematic techniques and editing features. | Features will be available to select creators through VideoFX at labs.google in the coming weeks; the waitlist is open now. |
Google Classroom Enhancements | New tools for lesson planning, customizing lessons, and meeting individual student needs using LearnLM. | Features being developed and tested, no specific release date mentioned. |
NotebookLM | New capabilities with Gemini 1.5 Pro, including personalized audio discussions and study guides. | Demonstrated at the event, no specific release date mentioned. |
Project Astra | Universal AI agent with multimodal understanding, proactive assistance, and natural interaction capabilities. | Some agent capabilities coming to Google products like the Gemini app later this year. |
Music AI Sandbox | Suite of professional music AI tools for creating new instrumental sections, transferring styles between tracks, and more. | Available now, with ongoing collaboration with musicians. |
Google Photos | Enhanced search and organization features using Gemini, allowing users to ask detailed questions and receive contextual answers. | Rolling out this summer. |
Google Search Enhancements | Multi-step reasoning, personalized AI-organized pages, and dynamic visual results. | Rolling out in the coming weeks, with expanded availability by the end of the year. |
Google AI Studio and Vertex AI | Access to Gemini 1.5 Pro and Flash models with enhanced features like video frame extraction and context caching. | Available today globally. |
New Gemini App Features | Live voice interaction, customizable personal experts (Gems), and planning capabilities. | Rolling out this summer and in the coming months. |
LearnLM in YouTube | Interactive educational videos with clarifying questions, helpful explanations, and quizzes. | Rolling out to select Android users. |
Generative AI Tools in Workspace | AI-powered assistant, automation of repetitive tasks, and advanced data analysis. | Rolling out in the coming months. |
Google DeepMind's AlphaFold | New generation model predicting the structure and interactions of nearly all of life's molecules. | Announced recently, available for scientific research. |
Project Gemini for Developers | Long context window, multimodal capabilities, and parallel function calling for AI applications. | Available today globally. |
Gemini Nano | On-device AI foundation model with multimodal capabilities for improved privacy and performance. | Rolling out later this year on Pixel and other devices. |
Google Remains a Contender
Our sources indicated yesterday that Google IO’s standout product would closely resemble OpenAI’s Gpt4o. Despite this, we did not alter our positions as suggested by our sources, believing that large institutions had already absorbed this information and that Google's new products couldn't outshine Gpt4o and the new ChatGPT. Our hypothesis was confirmed today. Google introduced the multimodal model Gemini 1.5 and showcased Project Astra, which offers real-time video and audio understanding similar to OpenAI’s new features. However, aside from a longer context length, nothing particularly stands out compared to OpenAI's offerings. Let alone there are models with longer context windows available, notably, Moonshot AI's Kimi.ai has already been offering a 2 million tokens context window in production for months.
Despite not leading the AI industry as it once did, Google remains a significant player. These near real-time multimodal features are not available everywhere.
Google Steps Up in AI Search
Yesterday, OpenAI’s new ChatGPT features, including live AI search on BING, silently impacted many AI search startups. We previously worried that BING’s lower quality might prevent OpenAI from leading in AI search. Today, Google surprised us with enhanced AI search features. While we believe Google is the best company to deliver this product, it remains uncertain how they will handle potential conflicts of interest with AI summarization and content generation. Additionally, the timeline for these features to reach end users is unclear.
Astra’s Performance Issues and Poorer Human Alignment
Project Astra, Google’s AI assistant, can analyze videos and voice in near real-time, similar to OpenAI's new ChatGPT features. However, Astra's voice is notably robotic and lacks the emotional, human-like alignment of GPT4o. This raises a debate on user preferences, as some still prefer a robotic voice. However, we all believe good alignment is crucial for the future of AGI. Additionally, Astra appears laggier compared to GPT4o, although we lack concrete data to support this observation very exactly.
AI Competition Shift Towards Multimodal Models and Consumer Apps
At CTOL.digital, we unanimously agree that the current LLM models may soon hit a bottleneck or have already done so. GPT5 is still far off, and OpenAI has shifted focus to the consumer app market, with Google following suit. Hardware limitations play a role, but more importantly, there is a limitation in training data. Where can more qualified data be found when existing data has been exhausted? Some experts suggest using new answers generated by LLMs, but the potential for significant improvement remains uncertain. Another factor is the intrinsic limitation of the current generation of LLM, this will take a long time to be resolved by academia.
This shift towards consumer apps by major tech companies is crucial, as it eliminates many startups and is a vital step towards broader adoption and AGI. Alarm to all the VCs and startup founders: this area will see fiercer competitions soon.
Only Time Will Tell
Google has a history of discontinuing products and failing to deliver on its promises. For more information, visit Killed by Google. While the latest showcase succeeded in generating hype, at CTOL.digital, we value the actual delivery of products that provide user value. Only time will tell when these products will truly benefit users and how.