Using multimodal models and latest thumbnails to keep an eye on live streams

Last month, I was chatting with a customer who runs a large live streaming platform using Mux (you've heard of it) — their content is user-generated, meaning their end users can sign up, create a live stream, and go live. They'd built a custom content moderation system that could detect inappropriate content using an in-house model trained on their own content by their moderation team — the kind of thing that keeps platforms safe and advertisers happy. But they had one big gripe for me: getting the latest video frames to analyse.

We're basically playing whack-a-mole with timing," they explained. "We have to calculate the live edge, account for DVR mode latency, reconnects, and slates, and pray our timing is right…

Sound familiar? If you've ever tried to build visual analysis on top of a Mux live stream, you probably know the pain. You want the latest frame from a live stream, but the existing thumbnail API requires you to specify an exact time, relative to when the DVR on-demand asset was created. For a live stream that's... well, live... that's like trying to hit a moving target.

Today, we're solving that problem with a new feature: latest thumbnails for live streams.

Introducing latest thumbnails

Starting today, you can get the most recent thumbnail from any active live stream using a simple additional query parameter when making a request to our image API:

Simple example of latest thumbnail


GET https://image.mux.com/{PLAYBACK_ID}/thumbnail.jpg?latest=true

That's it. No timestamp calculations, no guessing at the live edge, no hoping your math is right. Just add ?latest=true to any thumbnail request and you'll get an image that's current within 10 seconds of the live edge.

This new parameter works with all the existing Image API features you already know and love:

More examples of latest thumbnail


# Get the latest thumbnail in 1080p
GET https://image.mux.com/{PLAYBACK_ID}/thumbnail.jpg?latest=true&height=1920

# Get it in WebP format for better compression, and resize to 640px wide
GET https://image.mux.com/{PLAYBACK_ID}/thumbnail.webp?latest=true&width=640

The latest=true parameter only works with active live streams. Try to use it on a VOD asset or an inactive stream, and you'll get a helpful 400 error explaining that this feature is live stream only.

Building smarter live experiences

So what can you build with this? The possibilities are pretty exciting. Here are some use cases we've been talking about with customers, and some examples that we've ✨ vibe coded ✨ up for you (just kidding, ironically getting AI coding tools to use the OpenAI SDK doesn't get great results… weird huh?).

Live content moderation with OpenAI's moderation API

Remember that customer I mentioned? They can now drop their complex timestamp calculations and just send the simple latest image URLs directly into their custom model for moderation — they just request that their model takes a look at the latest thumbnail every 15 seconds.

They picked every 15 seconds for a few reasons. First it's above the frequency with which we refresh the latest thumbnail on our side (every 10 seconds), so the likelihood of getting cached duplicate frames is low. Second, this aligns well with their internal targets around timeliness of content moderation; but it's important to consider that every product has different targets here depending on the user base, and risk profile.

But what if you don't have the resources to go train and host a custom model for your content? Well thankfully there's lots of off-the-shelf solutions you can now easily use directly with the latest thumbnail URLs.

Our personal favourites are Hive's moderation tools and OpenAI's moderation API. Here's an example of using OpenAI's omni-moderation-latest model to moderate the latest frame from your live streams:

OpenAI Moderation Example


import OpenAI from "openai";

const openaiClient = new OpenAI({
  apiKey: process.env.OPENAI_API_KEY,
});

async function moderateLiveStream(playbackId) {

  const imageUrl = `https://image.mux.com/${playbackId}/thumbnail.jpg?latest=true&width=640`;

  console.log(`Analyzing live stream thumbnail at: ${imageUrl}`);

  try {
    const moderation = await openaiClient.moderations.create({
      model: "omni-moderation-latest",
      input: [
        {
          type: "image_url",
          image_url: {
            url: imageUrl,
          },
        },
      ],
    });

    if (moderation.results[0].flagged) {
      console.warn("Image flagged for moderation.");
      console.log(`Sexual score: ${moderation.results[0].category_scores.sexual}, Violence score: ${moderation.results[0].category_scores.violence}`);
	// Perform your moderation actions here - usually hide or remove the live stream, and get a human to verify
    }

  } catch (error) {
    console.error("Live stream moderation failed:", error);
  }
}

const playbackId = "rYlMNPt02YwWOVcueajlx01WNdqpC58Nqp";

await moderateLiveStream(playbackId);

Let's take a look at that in action — I spun up a new live stream, hooked it up to our simple moderation flow, and added a particularly gorey moment from the trailer from one of my favourite horror movies, The Cabin in the Woods.

Let's take a look at the output more closely:

Moderation analysis output


Analyzing live stream thumbnail at: https://image.mux.com/rYlMNPt02YwWOVcueajlx01WNdqpC58Nqp/thumbnail.jpg?latest=true&width=640
Image flagged for moderation.
Sexual: 0.00006166297347617125, Violence: 0.8971665289081385

Violence: 0.8971665289081385

Which… if you know the scene I'm using, about 1h:15m:10s into the movie, you'll probably agree with OpenAI's moderation model here: it's certainly violent!

What you choose to do with the content at this point is your call. If you're a horror movie streaming platform, this is probably expected, but if you're a user-generated content platform, you should probably remove the content until it can undergo a human review.

And here's what's super cool, the OpenAPI moderation API is free - allowing you to add basic moderation for no extra cost.

You can find the full code for this example in our GitHub examples repository.

Live content analysis and summarisation using GPT-4.1-mini

While moderation is inevitable for any video platform of any meaningful size, your content also needs to be discoverable, and discoverability often means knowing what's happening in video, and at scale either relying on a human team, or on users themselves to appropriately classify content can be unreliable.

Over the last few years, VLMs, and multi modal LLMs have become commonplace, so rather than using a more traditional one-shot classification model to query objects or content in an image, you can now ask more complex questions about the content.

OpenAI's GPT 4.1-mini is a high performance multi-modal modal which can analyse images, and produce generated text output. By sending the latest thumbnail to this model, and using OpenAI's structured outputs functionality, we can get back a JSON document that conforms to our schema, and contains not only a set of tags for our content, but also a text summary of the frame we sent it.

OpenAPI Analysis Example


import OpenAI from "openai";
import { z } from "zod";
import { zodTextFormat } from "openai/helpers/zod";

const openaiClient = new OpenAI({
  apiKey: process.env.OPENAI_API_KEY,
});

// Define a schema for the object we want back from GPT
const schema = z.object({
  keywords: z.array(z.string()).max(10),
  summary: z.string().max(500)
});

async function analyzeLiveStream(playbackId) {

  const imageUrl = `https://image.mux.com/${playbackId}/thumbnail.jpg?latest=true&width=640`;

  console.log(`Analyzing live stream thumbnail at: ${imageUrl}`);

  try {
    const response = await openaiClient.responses.parse({
      model: "gpt-4.1-mini",
      input: [
        {
          role: "system",
          content:
            "You are an image analysis tool. You will be given an image, and be expected to return structured data about the contents.",
        },
        {
          role: "user",
          content: [
            {
              type: "input_text",
              text: "What's in this image? Return a list of around 10 single words, and a summary of the image in 500 characters or less.",
            },
            {
              type: "input_image",
              image_url: imageUrl,
            },
          ],
        },
      ],
      text: {
        format: zodTextFormat(schema, "tags"),
      },
    });

    console.log(response.output_parsed);
    // Do what you want with your tags and summary

  } catch (error) {
    console.error("Live stream analysis failed:", error);
  }
}

const playbackId = "rYlMNPt02YwWOVcueajlx01WNdqpC58Nqp";

await analyzeLiveStream(playbackId);

Let's test this one out with the best movie ever made — Big Buck Bunny.

To save you zooming in, here's what that latest frame summary gave us:

The image shows an animated scene with a large white rabbit standing outside the entrance of a burrow in a grassy area. The burrow is embedded in a small hill covered with green grass, purple flowers, and some rocks. In the background, trees and blue sky with clouds are visible, suggesting a bright, clear day in a natural forest setting

Which is… pretty accurate. But what's extra cool about multi-modal models, is you're in full control over the prompt, so you can engineer it into whatever tone we want. One of my colleagues suggested making it a little more sassy, so let's try that and take another look. Here's our new prompt:

What's in this image? Return a list of around 10 single words, and a summary of the image in 500 characters or less. Be sassy in the summary.

I'll save you the zooming in again:

Meet Big Buck Bunny, the oversized grumpy rabbit who's clearly had enough of your nonsense. Standing in a sunny field by a tree, he's ready to glare down anyone who dares disturb his peace. Fluffy but fierce, this bunny doesn't mess around—someone better bring some carrots or back off. Who knew rabbits could look so hilariously done with the world?

Well, I think GPT-4.1-mini nailed it there.

Unlike the moderation API, using GPT 4.1-mini through OpenAPI's API will cost you money — each frame here takes around 500 tokens in total (90% input, 10% output) to analyse and generate the two sets of data, and at public pricing, that works out to around $0.00026 per frame you analyse (that's $0.0624 per stream, per hour, if you poll every 15 seconds), so be sure to balance analysis frequency, risk model, and budget carefully as you deploy and scale.

You can find the full code for this example in our GitHub examples repository.

If you're allowing customers to upload VOD content to your platform too, you can of course use the same APIs to summarise and moderate that content too.

There's lots more you can build

Beyond content moderation and analysis, latest thumbnails open up a whole world of possibilities. Here are just a few others:

Live stream discovery pages - Show users what they'll actually see when they click on a stream, with thumbnails that refresh automatically with the latest content every 15 seconds
Frozen frame detection - Compare the latest thumbnail with the previous one to automatically detect when a stream has stopped updating (no AI needed here, just simple image comparison using a standard like SSIM will work well)
Black screen monitoring - Use computer vision to analyze thumbnails and alert when streams go to black, helping you catch technical issues more quickly
Test pattern and slate detection - Automatically identify when streams are showing colour bars, test patterns, slates or other technical content that shouldn't be going out on your live stream

Some of these use cases leverage AI for image analysis, while others rely on simpler computer vision techniques or basic image comparison tools. The beauty of having direct access to the latest frame is that you can choose the right tool for each problem you're trying to solve.

What's next?

We've been testing this feature with a few customers, and we're already hearing requests for even fresher thumbnails, low frame rate video outputs, and integration with popular AI services for out-of-the-box moderation.

The latest thumbnails update every 10 seconds by default, which works great for most use cases. If your application needs more frequent updates, just reach out to our support team and we can discuss options for your specific needs.

But for now, we're excited to see what you build. Whether you're keeping platforms safe, building smarter discovery experiences, or creating entirely new categories of live visual applications, latest thumbnails give you the building blocks to make it happen.

The feature is available today for all Mux Video customers at no additional cost. Just add ?latest=true to any thumbnail request on an active live stream and you're all set!

Check out the full documentation and the examples from this blog. As always, we'd love to hear your feedback, and love to hear what you build.

Using multimodal models and latest thumbnails to keep an eye on live streams

Introducing latest thumbnails

Building smarter live experiences

Live content moderation with OpenAI's moderation API

Live content analysis and summarisation using GPT-4.1-mini

There's lots more you can build

What's next?

Written By

Phil Cluff – Director of Product Management, Mux Video

Leave your wallet
where it is

Read more about ai

We want your LLM to read our docs

AI chatbots are banned from our docs… for now

Level up your video experience with these AI example workflows

Check out our newsletter

Using multimodal models and latest thumbnails to keep an eye on live streams

LinkIntroducing latest thumbnails

LinkBuilding smarter live experiences

LinkLive content moderation with OpenAI's moderation API

LinkLive content analysis and summarisation using GPT-4.1-mini

LinkThere's lots more you can build

LinkWhat's next?

Written By

Phil Cluff – Director of Product Management, Mux Video

Leave your wallet where it is

Read more about ai

We want your LLM to read our docs

AI chatbots are banned from our docs… for now

Level up your video experience with these AI example workflows

Check out our newsletter

Introducing latest thumbnails

Building smarter live experiences

Live content moderation with OpenAI's moderation API

Live content analysis and summarisation using GPT-4.1-mini

There's lots more you can build

What's next?

Leave your wallet
where it is