Unleash the Power of AI: Google's Gemini 2.5 Pro, ChatGPT Updates, and More Cutting-Edge Tools

Unveil the power of AI with Google's Gemini 2.5 Pro, ChatGPT updates, MidJourney's Omni Reference, and more cutting-edge tools. Discover how to leverage these transformative technologies for your business and creative projects.

May 13, 2025

party-gif

Discover the latest AI tools and use cases that can revolutionize your workflow. From Google's Gemini 2.5 Pro that can turn video recordings into applications, to Midjourney's Omni Reference feature for seamless product photography, this blog post explores the cutting-edge advancements in the world of generative AI. Dive in and unlock the power of these transformative technologies.

Gemini 2.5 Pro: Google's Flagship Model Upgrade with Video-to-App Functionality

Google has recently updated its flagship model, Gemini 2.5 Pro, with two significant improvements. Firstly, the model has become excellent at creating front-end applications and websites, rivaling the capabilities of models like Claude. Secondly, and more notably, the model can now take video recordings of applications and rebuild them.

To test this new functionality, the author recorded a 30-second screen recording of a time converter application he uses regularly. He then uploaded the video to the Google AI Studio, where the Gemini 2.5 Pro model is available. By simply instructing the model to "recreate this web app for me," the model was able to generate the files for a clone of the application.

While the initial output did not perfectly match the original, the author provided a follow-up prompt to improve the interface, specifically the way it displays the various time zones. The model was then able to generate three new files that more closely resembled the original application.

The author notes that this process could be even smoother if used with a tool like Cursor, which would streamline the prompting and iteration. Overall, the author is impressed with the model's ability to use the context from the video to recreate the application, though he acknowledges that it is not yet a magical solution and requires some fine-tuning and troubleshooting.

Improvements in ChatGPT: Github Integration and Guidance on Model Usage

First up, there's an improvement for developers with the addition of the ability to connect your Github to the deep research feature. This allows the deep research to access an entire application as it runs, making it easier for beginners to get started on a new repo and understand the codebase.

Additionally, ChatGPT has updated their help center with a short guide explaining when to use the various models. While I would disagree with some of the recommendations, this provides a helpful explanation straight from the source on the appropriate use cases for each model. For example, I would suggest using GPT-4.5 for any writing-related tasks, and GPT-3.0 for quick results or image generation. For business-related tasks, I find GPT-3.0 to be the most effective.

To dive deeper into using GPT-3.0 for business applications, I'll be running a lecture in the AI Advantage community next week, covering various use cases, prompts, and workflows. This community provides a great opportunity to learn advanced techniques and get personalized support for applying generative AI in your own work.

Midjourney's Omni Reference: Transforming Product Photography with AI

Midjourney's latest feature, Omni Reference, allows users to upload a single image and then reference it in multiple image generations. This feature is particularly useful for product photography, as Midjourney's capabilities in recreating human faces are limited.

The Omni Reference feature enables users to take an image of a product, such as a couch or a pair of sneakers, and then generate variations of that product in different scenarios or with different patterns. This can be incredibly useful for creating product marketing materials, as it allows for a high degree of customization and experimentation without the need for extensive photoshoots.

The examples provided by Midjourney showcase the power of this feature, with images of products like Louis Vuitton-inspired UGG boots being seamlessly integrated into various backgrounds and settings. While the logos and branding may not be perfectly recreated, the overall quality and consistency of the product images are impressive.

One key advantage of using Midjourney's Omni Reference for product photography is the speed and cost-effectiveness of the process. Traditional product photography can be time-consuming and expensive, requiring specialized equipment, lighting, and studio space. With Omni Reference, users can quickly generate a wide range of product images, experimenting with different backgrounds, angles, and styling without the need for extensive setup or post-processing.

Overall, Midjourney's Omni Reference feature represents a significant advancement in the field of AI-powered product photography. By leveraging the power of generative AI, businesses and creators can now streamline their product marketing efforts, creating high-quality, customized images with greater efficiency and flexibility.

Parakeet: Nvidia's Open-Source Transcription Model for English

Nvidia has released a brand new, completely open-source transcription model called Parakeet. While the model is currently designed only for the English language, it offers impressive transcription capabilities.

To demonstrate the power of Parakeet, I performed a practical demo. I simply needed to tab over to the microphone, hit record, and then say "I'm testing this brand new model that is supposed to transcribe my speech." In less than 3.5 seconds, Parakeet provided an accurate transcription, complete with timestamps.

This open-source model allows you to run the transcription locally, without the need for a paid subscription. You can easily integrate Parakeet into your own applications, such as recording audio and instantly transcribing it. This eliminates the need for third-party transcription services and provides a seamless, cost-effective solution.

The accuracy of the transcription is excellent, and the built-in timestamps make it easy to use the transcript for further prompting or processing. Overall, Parakeet is a powerful, open-source tool that can be a game-changer for anyone working with audio transcription.

Anthropic's Hen: Single-Image AI Video Avatars

Anthropic, the industry leader in turning videos of people into AI video avatars, has released a new feature that allows you to create these avatars from a single image. Previously, you needed multiple minutes of footage to train a model of yourself, but now you can do it from a single image.

To test this out, the video editor prepared two images for the creator to run through the Hen avatar creation tool. The creator went to the "Avatar IV" section, used the provided images, and selected his voice.

The results were quite impressive. The first avatar generated was able to say "Don't forget to leave a like on the video" with decent lip-sync and facial animation, considering it was created from a single image. The second avatar, using a different image, was able to say "Subscribe if you haven't done that yet" with similar quality.

While the intonation and some of the animation are not perfect, the creator was impressed with the overall quality, especially considering this was done on a free plan and only took a few minutes to create two different avatars. The creator believes this is a great step forward for Hen and the ability to create AI video avatars from a single image.

Sunno 4.5: Impressive AI-Generated Music Compositions

Purely giving it some words to create songs, one of our team members, Mal, created a song called "Pale World" using Sunno 4.5. Let's take a listen:

[Open Spotify and play the song "Pale World"]

Honestly, that could be a soundtrack from a major movie like Dune. I can only speak for myself, but I would be hard-pressed to tell that this is not human-made, which is both scary and impressive. People will be able to create a whole new level of films, games, and even RPG soundtracks with this level of instrumental quality.

The main change we should discuss is the menu feature that Sunno has added. The context length of what you can put into Sunno is now much longer, allowing you to make up to 8-minute-long songs. Additionally, the prompt adherence is much better, so if you specify a certain instrument, it will be there reliably, whereas before it was more of a gamble.

Overall, the progress in AI-generated music is truly impressive, and I'm excited to see how this technology continues to evolve and be used in creative applications.

Other Notable Releases: Notebook Desktop App, Windsurfing Acquisition, and AI Payment Advancements

Notebook Desktop App

Notebook, the popular consumer app for handling long-form content, has announced the launch of a desktop app. You can pre-register for the Android version on the Google Play Store, and the iOS version on the App Store. This desktop app is expected to be released in about 2 weeks, providing users with a seamless experience for managing extensive documents and projects.

Windsurfing Acquisition by OpenAI

In another significant development, OpenAI has acquired Windsurfing, one of the leading AI-powered code editors, for a staggering $3 billion. Windsurfing is a strong competitor to Cursor, offering a Microsoft Word-like experience for writing code. While Cursor remains a personal preference for creating advanced coding projects, Windsurfing is a fantastic option for those looking to get into AI-powered code editing. This acquisition by OpenAI is expected to further integrate Windsurfing into their ecosystem.

AI Payment Advancements

Both Visa and Mastercard have announced news focused on the B2B space, pioneering "agentic payment technology" to power commerce in the age of AI. These developments mark the beginning of a new era, where AI agents can independently facilitate payments, paving the way for more seamless and autonomous financial transactions.

Conclusion

As we've seen, the latest advancements in AI technology are truly remarkable. From Google's Gemini 2.5 Pro model that can recreate applications from video recordings, to Midjourney's Omni Reference feature for seamless product photography, the capabilities of these tools continue to expand.

The improvements to ChatGPT, including the ability to connect to GitHub for deeper research, and the clear guidance on when to use different models, demonstrate the ongoing refinement of these language models.

The introduction of Nvidia's open-source Parakeet transcription model, which offers high-quality transcription without the need for a subscription, is a game-changer for developers. And Anthropic's Hen tool, which can create AI video avatars from a single image, showcases the impressive progress in visual AI.

Finally, the stunning AI-generated music from Suno 4.5 highlights the growing potential for AI to revolutionize creative industries. As these technologies continue to evolve, the possibilities for their practical applications are truly exciting.

Overall, this week's AI news demonstrates the rapid advancements happening in the field, and the importance of staying informed and exploring the latest tools and techniques. The future of AI is here, and it's up to us to harness its power to drive innovation and create new possibilities.

FAQ