đź§  Google Deepmind's Gemini AI (likely more advanced than ChatGPT) - Updates and Discussion

Hamartia Antidote

Frequent Poster
Joined
Nov 17, 2013
Messages
43,976
Reaction score
25,221
Country of Origin
Country of Residence

android.png


In the world of Artificial Intelligence (AI), Google DeepMind's recent creation, Gemini, is generating a buzz. This innovative development aims to tackle the intricate challenge of replicating human perception, particularly its ability to integrate various sensory inputs. Human perception, inherently multimodal, utilizes multiple channels simultaneously to understand the environment. Multimodal AI, drawing inspiration from this complexity, strives to integrate, comprehend, and reason about information from diverse sources, mirroring human-like perception capabilities.

The Complexity of Multimodal AI

While AI has made strides in handling individual sensory modes, achieving true multimodal AI remains a formidable challenge. Current methods involve training separate components for different modalities and stitching them together, but they often fall short in tasks requiring intricate and conceptual reasoning.

Emergence of Gemini

In the pursuit of replicating human multimodal perception, Google Gemini has emerged as a promising development. This creation offers a unique perspective into AI's potential to decode the intricacies of human perception. Gemini takes a distinctive approach, being inherently multimodal and undergoing pre-training on various modalities. Through further fine-tuning with additional multimodal data, Gemini refines its effectiveness, showing promise in understanding and reasoning about diverse inputs.

What is Gemini?

Google Gemini, introduced on December 6, 2023, is a family of multimodal AI models developed by Alphabet's Google DeepMind unit in collaboration with Google Research. Gemini 1.0 is designed to comprehend and generate content across a spectrum of data types, including text, audio, images, and video.

A standout feature of Gemini is its native multimodality, setting it apart from conventional multimodal AI models. This unique capability enables Gemini to seamlessly process and reason across diverse data types like audio, images, and text. Significantly, Gemini possesses cross-modal reasoning, allowing it to interpret handwritten notes, graphs, and diagrams for tackling complex problems. Its architecture supports the direct ingestion of text, images, audio waveforms, and video frames as interleaved sequences.

Family of Gemini

Gemini boasts a range of models tailored to specific use cases and deployment scenarios. The Ultra model, designed for highly intricate tasks, is expected to be accessible in early 2024. The Pro model prioritizes performance and scalability, suitable for robust platforms like Google Bard. In contrast, the Nano model is optimized for on-device utilization and comes in two versions—Nano-1 with 1.8 billion parameters and Nano-2 with 3.25 billion parameters. These Nano models seamlessly integrate into devices, including the Google Pixel 8 Pro smartphone.

Gemini Vs ChatGPT

According to company sources, researchers have extensively compared Gemini with ChatGPT variants where it has outperformed ChatGPT 3.5 in widespread testing. Gemini Ultra excels on 30 of 32 widely used benchmarks in large language model research. Scoring 90.0% on MMLU (massive multitask language understanding), Gemini Ultra surpasses human experts, showcasing its prowess in massive multitask language understanding. The MMLU consists of combination of 57 subjects such as math, physics, history, law, medicine and ethics for testing both world knowledge and problem-solving abilities. Trained to be multimodal, Gemini can process various media types, setting it apart in the competitive AI landscape.

Use Cases

The emergence of Gemini has given birth to a range of use cases some of which are as follows:

  • Advanced Multimodal Reasoning: Gemini excels in advanced multimodal reasoning, simultaneously recognizing and comprehending text, images, audio, and more. This comprehensive approach enhances its ability to grasp nuanced information and excel in explaining and reasoning, especially in complex subjects like mathematics and physics.
  • Computer Programming: Gemini excels in comprehending and generating high-quality computer programs across widely-used languages. It can also be used as the engine for more advanced coding systems, as demonstrated in solving competitive programming problems.
  • Medical Diagnostics Transformation: Gemini's multimodal data processing capabilities could mark a shift in medical diagnostics, potentially enhancing decision-making processes by providing access to diverse data sources.
  • Transforming Financial Forecasting: Gemini reshapes financial forecasting by interpreting diverse data in financial reports and market trends, providing rapid insights for informed decision-making.

Challenges

While Google Gemini has made impressive strides in advancing multimodal AI, it faces certain challenges that require careful consideration. Due to its extensive data training, it's essential to approach it cautiously to ensure responsible user data use, addressing privacy and copyright concerns. Potential biases in the training data also pose fairness issues, necessitating ethical testing before any public release to minimize such biases. Concerns also exist about the potential misuse of powerful AI models like Gemini for cyber attacks, highlighting the importance of responsible deployment and ongoing oversight in the dynamic AI landscape.

Future Development of Gemini

Google has affirmed its commitment to enhance Gemini, empowering it for future versions with advancements in planning and memory. Additionally, the company aims to expand the context window, enabling Gemini to process even more information and provide more nuanced responses. As we look forward to potential breakthroughs, the distinctive capabilities of Gemini offer promising prospects for the future of AI.

The Bottom Line

Google DeepMind's Gemini signifies a paradigm shift in AI integration, surpassing traditional models. With native multimodality and cross-modal reasoning, Gemini excels in complex tasks. Despite challenges, its applications in advanced reasoning, programming, diagnostics, and finance forecast transformation highlight its potential. As Google commits to its future development, Gemini's profound impact subtly reshapes the AI landscape, marking the beginning of a new era in multimodal capabilities.
 
Apple is in talks to build Google Gemini into the iPhone’s next operating system, powering Siri and other AI-enabled features, according to reports from Bloomberg on Monday. The iPhone maker reportedly held discussions with multiple AI companies about powering iOS 18 with a competitor’s technology, including OpenAI.


Apple has largely been absent from the AI conversation in 2023, leading to great speculation that the iPhone giant may be falling behind Google and OpenAI. However, CEO Tim Cook promised a major AI announcement this year, largely expected at WWDC in June. The company has reportedly built an AI chatbot, Apple GPT, which employees say is inferior to ChatGPT and Gemini. This report seems to confirm that Apple doesn’t have much up its sleeve, as the Cupertino giant looks to outsource the iPhone’s AI features.

Google Gemini, or potentially another competitor’s AI chatbot, is in discussion to power Siri and other apps. However, the iPhone maker is reportedly planning to build some of its own AI features into the next operating system. Apple released an open-source machine-learning framework at the end of 2023, and an AI image editor in February.

However, these discussions come as Google Gemini has largely been paused for its inability to answer controversial questions. Google is currently working on a fix, but the AI chatbot refuses to answer basic questions if they involve hot-button topics, such as race, politics, or gender issues. Apple is likely betting Google figures out Gemini’s issues in the next few months, and hopefully, the iPhone maker can spare itself from a similar embarrassment.

Using Gemini in the iPhone would build on Apple and Google’s partnership in search, which faced scrutiny from regulators in this last year. Google pays Apple 36% of its search revenue, roughly $18 billion a year, according to details exposed in Google’s antitrust hearing. The U.S. Department of Justice alleges these two companies moved as a single entity to dominate the mobile search market.

An Apple partnership in AI would give Google a huge advantage over American competitors such as OpenAI and Anthropic. It would put Google Gemini in the face of over a billion iPhone users.

Apple is under intense pressure to come up with some jaw-dropping AI features, proving that Apple is not falling behind in the next wave of technology. The company scrapped plans for the Apple Car last month, marking the end of a decade-long project that was supposed to inject new growth into Apple. Instead, the iPhone maker shifted employees over to its AI initiatives, where it seems to be putting all its investment behind it.
 
Gemini is significantly worse than ChatGPT, at least for coding.
 

Google Gemini 2 Takes the AI Lead​



Google had fallen far behind in the AI race but has gotten into the lead with Gemini 2.0. Google Gemini 2 is powerful, fast and very low cost.

Google has made updated Gemini 2.0 Flash generally available via the Gemini API in Google AI Studio and Vertex AI. Developers can now build production applications with 2.0 Flash. They are releasing an experimental version of Gemini 2.0 Pro. Gemini 2.0 Pro is theoir best model yet for coding performance and complex prompts. It is available in Google AI Studio and Vertex AI, and in the Gemini app for Gemini Advanced users.

They are releasing a new model, Gemini 2.0 Flash-Lite, our most cost-efficient model yet, in public preview in Google AI Studio and Vertex AI.

Finally, 2.0 Flash Thinking Experimental will be available to Gemini app users in the model dropdown on desktop and mobile.

All of these models will feature multimodal input with text output on release, with more modalities ready for general availability in the coming months.

To view this content we will need your consent to set third party cookies.
For more detailed information, see our cookies page.


To view this content we will need your consent to set third party cookies.
For more detailed information, see our cookies page.


gemini.jpg

gemini2.jpg
 

Google has announced that yet another AI model is coming to Gemini, but this time, it's more than a chatbot. The company's Veo 2 video generator is rolling out to the Gemini app and website, giving paying customers a chance to create short video clips with Google's allegedly state-of-the-art video model.

Veo 2 works like other video generators, including OpenAI's Sora—you input text describing the video you want, and a Google data center churns through tokens until it has an animation. Google claims that Veo 2 was designed to have a solid grasp of real-world physics, particularly the way humans move. Google's examples do look good, but presumably that's why they were chosen.


Prompt: Aerial shot of a grassy cliff onto a sandy beach where waves crash against the shore, a prominent sea stack rises from the ocean near the beach, bathed in the warm, golden light of either sunrise or sunset, capturing the serene beauty of the Pacific coastline.

Veo 2 will be available in the model drop-down, but Google does note it's still considering ways to integrate this feature and that the location could therefore change. However, it's probably not there at all just yet. Google is starting the rollout today, but it could take several weeks before all Gemini Advanced subscribers get access to Veo 2. Gemini features can take a surprisingly long time to arrive for the bulk of users—for example, it took about a month for Google to make Gemini Live video available to everyone after announcing its release.

When Veo 2 does pop up in your Gemini app, you can provide it with as much detail as you want, which Google says will ensure you have fine control over the eventual video. Veo 2 is currently limited to 8 seconds of 720p video, which you can download as a standard MP4 file. Video generation uses even more processing than your average generative AI feature, so Google has implemented a monthly limit. However, it hasn't confirmed what that limit is, saying only that users will be notified as they approach it.


Prompt: An animated shot of a tiny mouse with oversized glasses, reading a book by the light of a glowing mushroom in a cozy forest den.

If you don't want to wait for Veo 2 in the Gemini app, there's a way you can play with it early. Google's new video generator has also been added to Whisk, a Google Labs experiment announced late last year. Whisk allows you to generate images using both text prompts and example images.

Starting today, Whisk has an "animate" option, which uses Veo 2 to turn your still creations into 8-second video clips. Interestingly, Google lists a 100-video monthly limit for Whisk, which could mean the same ceiling on Veo 2 usage in Gemini. Even with the ability to refine the starting image and style, we haven't been overly impressed with Veo 2. So you might run through that allotment in search of what you want.


The video above was supposed to show a mysterious stone monolith on Mars, the rendering of which seems good enough. But we asked to see the Martian moon Phobos crash down on the monolith and turn it to dust. The "moon" just bounces past and goes poof to reveal the same monolith. At least insofar as planetary bodies go, Veo 2's understanding of physics could stand to improve.

Google says it worked hard to ensure Veo 2 is safe and won't generate anything illegal or inflammatory. The generated videos are also marked with a SynthID digital watermark to label them as AI-generated. Although Veo 2 probably isn't at the point that its output will be confused with reality just yet.
 
Timely post. I am on Day 3 of my Google Pixel 9a which comes with Gemini. I have not bothered to look into Gemini yet but, from this post, sounds quite interesting.
As for the geopolitics of AI: 'May the best AI win! '
 

Google Dragontail AI is Scary Good, OpenAI Quasar, xAI Grok 3.5​


Gemini 2.5 Pro is currently ranked as the top released model on the leaderboards. Google’s has more models lined up, and Nightwhisper (probably Gemini Coder) is reportedly amazing and so is Dragontail. There is a flood of new model releases from Google, OpenAI, XAI, Deepseek and others. Here we will review what has been released and what will soon be released over the next couple of months.

To view this content we will need your consent to set third party cookies.
For more detailed information, see our cookies page.



Google upcoming models..
– Nightwhisper
– Dreamtides
– Moonhowler
– Dragontail
– Stargazer
– Shadebrook
– Riverhollow
– Lunarcall

To view this content we will need your consent to set third party cookies.
For more detailed information, see our cookies page.


To view this content we will need your consent to set third party cookies.
For more detailed information, see our cookies page.


To view this content we will need your consent to set third party cookies.
For more detailed information, see our cookies page.


To view this content we will need your consent to set third party cookies.
For more detailed information, see our cookies page.


https://twitter.com/ai_for_success/status/1911261774868869412
 
To view this content we will need your consent to set third party cookies.
For more detailed information, see our cookies page.

Google Gemini 2.5 Pro is Insane...​



To view this content we will need your consent to set third party cookies.
For more detailed information, see our cookies page.

Build with Google Gemini 2.5​

 

Users who are viewing this thread

Back
Top