Meet Globant’s Advanced Video Search (AVS) for Next-Level Media Exploration

November 14, 2024

Overview

Content searchability is crucial for delivering personalized user experiences in today’s rapidly evolving media landscape. Globant’s Advanced Video Search (AVS) solution, powered by Google Cloud and AI technologies like Vertex AI and Gemini Pro Vision, is designed to transform media discovery and innovation.

With decades of expertise in cutting-edge technologies, Globant’s AVS offers businesses a robust and scalable solution to stay ahead in the cloud-first era, helping clients achieve their digital transformation goals. Globant’s innovation and Google Cloud’s AI brings high-performance video, image, and audio search capabilities to modern media platforms.

Key Features of Globant’s AVS for Media

Deep Customer Understanding
Globant’s AVS is built with an in-depth understanding of user needs, allowing for personalized media search experiences.
Flexible and Customizable
The solution is highly flexible, running on the customer’s tenant with their own data, ensuring a tailored and adaptable approach.
Multi-Type Asset Support
AVS supports various types of assets, including audio, images, video, and text, making it an all-encompassing tool for media search.
Advanced Search Capabilities
Utilizing state-of-the-art AI models from Google Cloud, the solution supports advanced search features, including:
- Text-based search
- Image-based search
- Audio and video metadata search
Fast Time to Market
With scalable APIs and advanced Google Cloud AI technologies, the solution allows for faster deployment and continuous improvement, making it easy for businesses to innovate quickly.

See How It’s Done: Use Case Examples

Media Streaming Platforms:
AVS can significantly improve user experience by providing highly accurate video searches based on descriptions, specific frames, or transcriptions.
Sports Analysis:
Using AVS, analysts can pinpoint key moments in sporting events by searching for specific actions, players, or match highlights.
Film and TV Productions:
Enabling searches for actors, directors, or specific scenes based on detailed descriptions enhances research speed and efficiency.

The Technology Behind Globant’s Advanced Video Search (AVS)

Globant’s AVS is built on a modern, decoupled architecture that leverages Google Cloud’s AI suite to power its advanced search capabilities. This approach ensures scalability, flexibility, and high performance. Below is a breakdown of the technical stack and processes involved:

1. Ingesting and Storing Media Assets

The first step in the AVS implementation is ingesting the media content into Google Cloud Storage. This component acts as a scalable and secure location to store raw media assets, such as:

Raw video files (movies, sports events, etc.)
Audio files (music, podcasts, etc.)
Image files (posters, thumbnails, etc.)

The decoupled nature of the architecture allows for flexible ingestion, where each component can be replaced or updated based on the customer’s needs or any new Google Cloud releases.

2. Pre-processing and Media Enrichment

After the assets are ingested, the pre-processing stage kicks in. This involves several operations that extract valuable metadata from the raw assets, making the content searchable and providing structure to the data. Here are the fundamental operations:

Frame Generation: The system breaks down videos into individual frames at set intervals for video files. This helps identify specific visual moments within the content.
Transcriptions: Audio content is transcribed using Google Cloud’s Speech-to-Text API, generating text that can be indexed and searched.
Description Generation: Automated descriptions are created for videos and images using Google Cloud Vision API and Google Video Intelligence API. This metadata includes information about objects, people, locations, and activities detected in the media files.

These pre-processing steps are critical because they lay the groundwork for advanced search by providing both textual and visual data that can be indexed.

3. Embedding Generation for Media Search

Once the pre-processing is complete, the system uses Google’s AI suite (including Vertex AI and Gemini Pro Vision) to convert the media content into embeddings. Embeddings are vector-based representations that capture the essence of the media asset, whether it be a frame of a video, a transcription, or an image.

Here’s how embeddings are used:

Video and Image Embeddings: These embeddings represent video frames or images and are crucial for image-based search. For example, if a user searches for a specific basketball play, the system compares the query image to these embeddings to return relevant results.
Text and Audio Embeddings: Transcriptions from audio or text content are also converted into embeddings. This allows for highly accurate text-based searches, enabling users to search for a phrase spoken in the video or a keyword related to the content.

These embeddings are stored in a vector database, enabling vector-based search techniques, which allow for fast and precise retrieval of media content, even at scale.

4. Advanced Video Search

Globant’s AVS enables users to search video moments based on specific inputs, such as text or images. The system supports:

Text-based Search: Users can input a description or a keyword, and the search engine scans the video content for corresponding metadata (transcriptions, descriptions, tags) that match the query.
Image-based Search: Users can upload or select an image, and the system compares it to the frame-based embeddings, returning results where that scene or something visually similar appears.

The search is conducted in real-time, thanks to the efficient indexing and embedding storage in Google Cloud’s infrastructure.

5. Indexing and Exposing APIs

After the embedding generation and search capabilities are set up, the media content is indexed and accessible through APIs. The APIs are designed to be scalable and secure, ensuring that media platforms can integrate Globant’s AVS seamlessly into their systems.

Exposed APIs allow media platforms to:
- Access specific media moments by querying with text, images, or metadata.
- Retrieve and interact with indexed content.
- Integrate search features directly into their existing platforms, making it user-friendly and highly responsive.

Key Components and Technologies

To achieve these capabilities, Globant’s AVS relies on a combination of Google Cloud technologies:

Google Cloud Storage: For securely storing raw media assets at scale.
Google Cloud Video Intelligence API: Used to analyze video content, extracting meaningful metadata such as objects, actions, and scene transitions.
Google Cloud Vision API: Provides image recognition and metadata generation for images and video frames.
Google Speech-to-Text API: Used for converting audio content into searchable transcriptions.
Vertex AI: Facilitates machine learning model training and deployment, particularly for generating embeddings from the media content.
Gemini Pro Vision: Enhances visual and audio media processing with cutting-edge AI, making it possible to generate high-quality embeddings for advanced search.
Multimodal Embeddings API: Used to generate embeddings that combine information from different media formats (text, audio, image).
Vector Search Databases: Stores the embeddings, allowing for real-time, vector-based search capabilities.

How AVS Enables Real-Time, Personalized Search

By combining the power of vector-based search with AI-generated embeddings, Globant’s AVS allows media platforms to offer:

Real-time search for specific video or audio moments using text or image-based inputs.
Highly personalized search results, with the ability to search based on character relationships, actions, or even the mood of a scene.

This combination of real-time search and personalization enhances user experience and optimizes content discovery for platforms like streaming services, sports analysis tools, and content production studios.

Building Next-Gen Media Search

Globant’s AVS provides a modern, scalable solution to the growing need for advanced media search capabilities. By leveraging Google Cloud’s AI models and Globant’s expertise in digital transformation, media companies can offer personalized, real-time search functionality that enhances user engagement and content discovery.

With the ability to search across multiple types of assets—video, audio, and images—using either text or image queries, Globant’s AVS sets the standard for media innovation in the cloud era.

Share this post

Globant + TOURISE: Reimagining the Operating Model of Tourism

April 10, 2026

Beyond the Seasonal Spike: What the “Digital Peak” Taught Us About Agentic AI

April 8, 2026

Reframing enterprise software: what changes when production costs shift

April 7, 2026

How Banking Is Being Rewritten in the GCC