Yesterday, I uncovered the Javelit project in this
article
where I built a small frontend to create and edit images
with Google’s Nano Banana image model.
Javelit
Javelit is an open source project inspired by Streamlit from the Python ecosystem
to enable rapid prototyping and deployment of applications in Java.
Have you ever heard of Javelit?
It’s like Streamlit in the Python ecosystem, but for the Java developer!
I was lucky that the project creator reached out and introduced me to this cool little tool!
Javelit is a tool to quickly build interactive app frontends in Java, particularly for data apps, but it’s not limited to them.
It helps you quickly develop rapid prototypes, with a live-reload loop, so that you can quickly experiment and update the app instantly.
Large Language Models (LLMs) are all becoming “multimodal”.
They can process text, but also other “modalities” in input, like pictures, videos, or audio files.
But models that output more than just text are less common…
Recently, I wrote about my experiments with Nano Banana 🍌 (in Java),
a Gemini chat model flavor that can create and edit images.
This is pretty handy in particular for interactive creative tasks, like for example a marketing assistant that would help you design a new product,
by describing it, by futher tweaking its look, by exposing it in different settings for marketing ads, etc.
Especially since this week, Google announced that
Veo 3 became generally available,
with reduced pricing, a new 9:16 aspect ratio (nice for those vertical viral videos) and even with resolution up to 1080p!
In today’s article, we’ll see how to create videos, in Java, with the GenAI Java SDK.
We’ll create videos either:
By now, you’ve all probably seen the incredible images generated by the Nano Banana model
(also known as Gemini 2.5 Flash Image preview)?
If you haven’t, I encourage you to play with it within Google AI Studio,
and from the Gemini app.
or have a look at the @NanoBanana X/Twitter account which shares some of its greatest creations.
As a Java developer, you may be wondering how you can integrate Nano Banana in your own LLM-powered apps.
This is what this article is about! I’ll show you how you can use this model to:
Like many developers, I’ve been exploring the creative potential of Large Language Models (LLMs). At the beginning of the year, I crafted a project to build an AI agent that could generate short science-fiction stories. I used LangChain4j to create a deterministic workflow to drive Gemini for the story generation, and Imagen for the illustrations. The initial results were fascinating. The model could weave narratives, describe futuristic worlds, and create characters with seemingly little effort. But as I generated more stories, a strange and familiar pattern began to emergeβ¦
I recently gave a talk titled “AI Agents, the New Frontier for LLMs”. The session explored how we can move beyond simple request-response interactions with Large Language Models to build more sophisticated and autonomous systems.
If you’re already familiar with LLMs and Retrieval Augmented Generation (RAG), the next logical step is to understand and build AI agents.
What makes a system “agentic”?
An agent is more than just a clever prompt. Itβs a system that uses an LLM as its core reasoning engine to operate autonomously. The key characteristics that make a system “agentic” include:
A very common question I get when presenting and talking about advanced RAG
(Retrieval Augmented Generation) techniques, is
how to best index and search rich documents like PDF (or web pages),
that contain both text and rich elements, like pictures or diagrams.
Another very frequent question that people ask me is about RAG versus long context windows.
Indeed, models with long context windows usually have a more global understanding of a document,
and each excerpt in its overall context. But of course,
you can’t feed all the documents of your users or customers in one single augmented prompt.
Also, RAG has other advantages like offering a much lower latency, and is generally cheaper.
In the first article of this Advanced RAG series, I talked about an approach I called
sentence window retrieval,
where we calculate vector embeddings per sentence, but the chunk of text returned
(and added in the context of the LLM) actually contains also surrounding sentences
to add more context to that embedded sentence.
This tends to give a better vector similarity than the whole surrounding context.
It is one of the techniques I’m covering in my talk on advanced RAG techniques.