Overview / Description
Pegasus 1.5 is a video language model API that converts any video up to two hours into time-based structured metadata, built for developers who need machine-readable video intelligence at scale. Unlike general-purpose models that sample frames and approximate, Pegasus 1.5 reasons continuously over the full temporal arc of a video — tracking entities, causation, and narrative across time — and returns results in a single API call. Developers define a schema for their domain, point the API at a video, and receive time-stamped, structured output they can use directly in downstream applications. The model is also multimodal: pass a reference image and Pegasus 1.5 locates every moment that image appears within the video. Built on TwelveLabs' video-native intelligence platform, Pegasus 1.5 outperforms Gemini 3.1 Pro on multimodal prompting benchmarks by 13.1%. The underlying infrastructure indexes video at approximately 60x real-time speed — an hour of footage indexed in under a minute — with capacity for over 10,000 hours per day. The API supports video segmentation, video clipping, and structured prompts with reference images, making it suitable for media archives, sports analysis, advertising compliance, public sector evidence management, and any workflow where raw video needs to become searchable, structured data. The platform is SOC 2 Type II certified with encrypted data handling, and can be deployed where customers require.
Used For
Video-to-structured-data extraction, media archive search and retrieval, sports highlight and clip generation, advertising brand-safety compliance scanning, public sector evidence management and incident reporting, content moderation at scale, video segmentation and clipping, reference-image-based scene location in long videos, video indexing for AI-ready datasets, enterprise video intelligence pipelines
Pricing
Pros & Cons
Pros
- Processes videos up to 2 hours long and returns time-stamped structured metadata in a single API call, eliminating multi-step pipelines.
- Multimodal reference-image input: pass an image to the API and it identifies every timestamp where that image appears in the video.
- Outperforms Gemini 3.1 Pro on multimodal prompting by 13.1%, with indexing throughput of ~60x real-time speed and 10k+ hours/day capacity.
- Supports video segmentation and video clipping natively, enabling downstream content workflows without additional tooling.
- SOC 2 Type II certified infrastructure with encrypted data handling and flexible deployment options for enterprise requirements.
Cons
- API-first product aimed at developers; no no-code or visual interface for non-technical users.
- Free plan caps at 600 minutes (10 hours) of indexing, which is limiting for production-scale testing.
- Enterprise pricing requires a custom committed-use contract negotiated with sales, adding friction for mid-market buyers.
- Index retention on the Free plan is limited to 90 days from creation, which may disrupt longer development cycles.
Alternatives
Google Gemini (video understanding), OpenAI GPT-4o (multimodal video analysis), AWS Rekognition (video analysis), Azure Video Indexer, Deepgram (audio/video transcription and analysis)