# Quick Tip: Clearing disk space in Cloud Shell Right in the middle of a [workshop](https://glaforge.dev/posts/2024/03/27/gemini-codelab-for-java-developers/) I was delivering, as I was launching Google Cloud console's [Cloud Shell](https://cloud.google.com/shell/docs) environment, I received the dreaded warning message: `no space left on device`. And indeed, I didn't have much space left, and Cloud Shell was reminding me it was high time I clean up the mess! Fortunately, the shell gives a nice hint, with a pointer to this [documentation page](https://cloud.google.com/shell/docs/quotas-limits#clearing_disk_space) with advice on how to reclaim space. The document suggests to run the following command: ```bash du -hs $(ls -A) ``` This command shows the space each file uses within each sub-directory. Here's the output I got after having cleaned up the many caches, directories and projects I didn't need anymore: ``` 20K .bash_history 4.0K .bash_logout 4.0K .bashrc 20M .cache 320M .codeoss 112K .config 8.0K .docker 247M gemini-workshop-for-java-developers 4.0K .gitconfig 341M .gradle 12K .gsutil 4.0K .lesshst 16K .npm 4.0K .profile 0 .python_history 4.0K README-cloudshell.txt 8.0K .redhat 4.0K .ssh 0 .sudo_as_admin_successful 8.0K .vscode ``` You quickly see directories (like `.codeoss` or my `gemini-workshop-for-java-developers`) that fill up the most space, and you can go after each of those repositories and launch some `rm -Rf some-directory` commands here and there. Of course, pay attention to what you're going to delete, as this is irreversible! # LLMs.txt to help LLMs grok your content Since I started my career, I've been sharing what I've learned along the way in this blog. It makes me happy when developers find solutions to their problems, or discover new things, thanks to articles I've written here. So it's important for me that readers are able to find those posts. Of course, my blog is indexed by search engines, and people usually find about it from Google or other engines, or they discover it via the links I share on social media. But with LLM powered tools (like Gemini, ChatGPT, Claude, etc.) you can make your content more easily _grokkable_ by such tools. A [proposal](https://llmstxt.org/) emerged last year to add a new file websites, `llms.txt`, which is a Markdown document that lists all the posts, pages, articles of that website. In a way, it's similar to other kind of indexing-related files like `robots.txt` or `sitemap.xml`. But this time, the idea is to offer the content of your documentation, website, or blog, as Markdown, easily _grokkable_ by LLM powered tools. ## Updating Hugo to generate LLMs.txt files This blog post is powered by [Hugo](https://gohugo.io/), a static site generator. I'll explain how I added 2 new files to my blog: - [llms.txt](https://glaforge.dev/llms.txt) — which references links to all the blog posts, a bit like a sitemap. - [llms-full.txt](https://glaforge.dev/llms-full.txt) — which contains all the blog post articles in one giant file. In `config.toml`, I added two new output formats: ```toml [outputFormats.llms] baseName = "llms" mediaType = "text/plain" isPlainText = true [outputFormats.llms-full] baseName = "llms-full" mediaType = "text/plain" isPlainText = true [outputs] home = ["HTML", "RSS", "llms", "llms-full"] ``` And in my custom theme `layouts\_default` folder, I added two templates for those two files: - `index.llms.txt` references all the articles ``` # Blog of {{ .Site.Title }} > {{ .Site.Params.Description }} You can find [more information about this site here](/about). ## All articles {{ range .Site.RegularPages.GroupByDate "2006" }} ### {{ .Key }} {{ range .Pages }} - [{{- .Title -}}]({{- .Site.BaseURL -}}{{- .RelPermalink -}}) {{- end -}} {{ end }} ``` - `index.llms-full.txt` provides the full content of all the articles ``` {{ range .Site.RegularPages }} # {{ .Title }} {{ .RawContent }} {{- end -}} ``` ## Now what can you do with that? In my recent article on [the power of large context windows for your documentation efforts]({{}})) I mentioned that you could feed all your content into an LLM prompt and have a discussion with it. This is the same idea I'm following here. Thanks to the full export of all my articles, I can ask questions about everything I've written so far. Let's go to [Google AI Studio](https://aistudio.google.com/) and ask Gemini, and its huge token context window! ![](/img/gemini/llms-txt-blog-content.png) When I asked about the main topics of this blog, the above screenshot shows Gemini's answer, highlighting my content about [Apache Groovy](https://groovy-lang.org/), Google Cloud Platform, generative AI, Web APIs, software development best practices. Look at the number of tokens! **My whole blog is about 682k tokens**! Only Gemini could _grok_ all of that and make sense of it. As a follow up question, I also asked _"Can you detail more the articles about Generative AI?"_, and Gemini highlighted some of my recent articles and summarized them all: > Okay, based on the file, here's a breakdown of the articles specifically about Generative AI: > > * **The power of large context windows for your documentation efforts:** This article explores using LLMs with large context windows (like Gemini) to chat with reference documentation, to assist in tasks like creating new modules for LangChain4j. It discusses using "gitingest" to convert a GitHub project into a large text file that can be fed to an LLM. > * **A Generative AI Agent with a real declarative workflow:** This piece compares an imperative Java-based AI agent with a declarative approach using Google Cloud Workflows. The agent generates short sci-fi stories, and the article details the pros and cons of each method. > * **An AI agent to generate short sci-fi stories:** This project details how to build an AI-powered short story generation agent using Java, LangChain4j, Gemini, and Imagen 3, deployed on Cloud Run jobs. It contrasts explicit vs. autonomous agent workflows. > * **Analyzing trends and topics from Bluesky's Firehose with generative AI:** This article describes a project that analyzes the Bluesky social media stream, using vector embeddings and clustering to identify trending topics, and using Gemini to generate summaries of these topics. > * **Let's think with Gemini Flash 2.0's experimental thinking mode and LangChain4j:** This article explores the new "thinking mode" of Gemini 2.0 Flash, using LangChain4j to solve riddles and multimodal problems. > * **Detecting objects with Gemini 2.0 and LangChain4j:** This article discusses using Gemini 2.0's object detection capabilities with LangChain4j to recognize playing cards in images. > * **Semantic code search for Programming Idioms with LangChain4j and Vertex AI embedding models:** Explores the use of Vertex AI embedding models for semantic code search on the Programming Idioms community website. It focuses on using code retrieval task types and LangChain4j for efficient retrieval of code examples. > * **Redacting sensitive information when using Generative AI models:** Details the importance of redacting Personally Identifiable Information (PII) before sending user messages to LLMs, using the Google Cloud Data Loss Prevention (DLP) API. > * **Data extraction: The many ways to get LLMs to spit JSON content:** This article reviews different techniques to make LLMs generate JSON output and extract data from text, using Gemini and LangChain4j. > > These articles demonstrate a clear focus on using various LLMs (primarily Gemini) with tools like LangChain4j and Google Cloud services to address different AI-related problems. Not only did it summarize all those recent articles, but I also like the final sentence that outlines what most of those articles have in common: They are primarily using the Gemini model, with the [LangChain4j framework](https://docs.langchain4j.dev/), and running on Google Cloud Platform. When asked to dive deeper on what is so special about LangChain4j, for instance, then again it highlighted the key features of the framework (that I demonstrated in my articles): > LangChain4j appears to be a central tool for building Generative AI solutions in the provided articles, so I collected a few pieces of information to explain what's so special about it: > > * **Java-Centric Approach:** LangChain4j is a Java library. In a field dominated by Python, LangChain4j is a key tool for Java developers to leverage generative AI without having to switch to a different programming language or environment. > * **LLM Orchestration:** LangChain4j facilitates the orchestration of different components in an AI pipeline, from loading documents and splitting them into chunks, to calculate their embedding vectors, storing them in vector databases, and retrieve them to be passed to the LLM, and the augment the LLM prompt with context from those vectors. This reduces the complexity of AI application development. > * **Integration with Google Cloud Services:** LangChain4j has built-in integrations with Google Cloud's Vertex AI, Google AI Gemini, and Google Cloud Storage. This allows Java developers to easily use Google's LLMs, embedding models, and data storage solutions. > * **Extensible and Customizable:** The library offers a way to plug custom logic and algorithms into the different steps, and to define how the information should be used to interact with the LLM. > * **Higher-Level Abstractions:** LangChain4j provides higher-level abstractions for interacting with LLMs, such as the `AiServices` abstraction that returns type-safe objects, which allows developers to manipulate real Java objects, in a type-safe manner, instead of dealing with raw JSON strings. > * **Capabilities like Tool Calling, Context Window, and the like:** It integrates new LLM capabilities to improve the quality of the AI powered application's responses. > * **Metadata Handling**: It eases the handling of metadata in text segments, including filtering based on metadata during the retrieval phase. > > In the context of the articles, LangChain4j simplifies the implementation of complex tasks like: > > * Creating chatbots that can answer questions based on a specific knowledge base (RAG). > * Analyzing trends in social media data. > * Generating structured content (e.g., JSON) from unstructured text. > * Implementing semantic code search. > > LangChain4j's key value is that it allows Java developers to more easily build, experiment with, and deploy Generative AI applications using their existing skills and infrastructure, while abstracting away much of the underlying complexity. ## Thoughts and summary For the past months, I've been exploring various Retrieval Augmented Generation (RAG) techniques to improve the quality of answers for questions on heaps of documents and data, often chunked and stored in vector databases. However, RAG hits some limits when you want to ask more general questions, or to analyze some content more deeply, by understanding it in its entirety. That's where LLMs like Gemini with large context windows help, as they can figure out the fine print, the relationships between various sections, see common patterns across articles. I believe we should take the **best of both worlds, by finding relevant documents with a hybrid search approach mixing keyword and semantic searches, but then feeding LLMs not only with extracts but with whole documents**. The `llms.txt` initiative is interesting, and I hope it'll be adopted further in the future. It will make it simpler for users to more easily reason about documentation, blogs, etc. via LLMs. As LLMs are getting bigger context windows, the discussion with whole docs or sites become richer, with full understanding of the context of the corpus of documents. # Pretty-print Markdown on the console With Large Language Models loving to output Markdown responses, I've been wanting to display those Markdown snippets nicely in the console, when developing some LLM-powered apps and experiments. At first, I thought I could use a Markdown parser library, and implement some kind of output formatter to display the text nicely, taking advantage of [ANSI color codes and formats](https://gist.github.com/fnky/458719343aabd01cfb17a3a4f7296797). However it felt a bit over-engineered, so I thought _"hey, why not just use some simple regular expressions!"_ (and now you'll tell me I have a second problem with regexes) In this blog post, I just want to share the few lines of code I've added in a utility class to output Markdown content nicely. It's not covering all the bells & whistles of the Markdown syntax (in particular things like Github flavored extensions, like arrays, etc.) but it's good enough for my use case. ## Markdown syntax highlighting on the console ````java public static String markdown(String md) { return md // Bold .replaceAll("\\*\\*(.*?)\\*\\*", "\u001B[1m$1\u001B[0m") // Italic .replaceAll("\\*(.*?)\\*", "\u001B[3m$1\u001B[0m") // Underline .replaceAll("__(.*?)__", "\u001B[4m$1\u001B[0m") // Strikethrough .replaceAll("~~(.*?)~~", "\u001B[9m$1\u001B[0m") // Blockquote .replaceAll("(> ?.*)", "\u001B[3m\u001B[34m\u001B[1m$1\u001B[22m\u001B[0m") // Lists (bold magenta number and bullet) .replaceAll("([\\d]+\\.|-|\\*) (.*)", "\u001B[35m\u001B[1m$1\u001B[22m\u001B[0m $2") // Block code (black on gray) .replaceAll("(?s)```(\\w+)?\\n(.*?)\\n```", "\u001B[3m\u001B[1m$1\u001B[22m\u001B[0m\n\u001B[57;107m$2\u001B[0m\n") // Inline code (black on gray) .replaceAll("`(.*?)`", "\u001B[57;107m$1\u001B[0m") // Headers (cyan bold) .replaceAll("(#{1,6}) (.*?)\n", "\u001B[36m\u001B[1m$1 $2\u001B[22m\u001B[0m\n") // Headers with a single line of text followed by 2 or more equal signs .replaceAll("(.*?\n={2,}\n)", "\u001B[36m\u001B[1m$1\u001B[22m\u001B[0m\n") // Headers with a single line of text followed by 2 or more dashes .replaceAll("(.*?\n-{2,}\n)", "\u001B[36m\u001B[1m$1\u001B[22m\u001B[0m\n") // Images (blue underlined) .replaceAll("!\\[(.*?)]\\((.*?)\\)", "\u001B[34m$1\u001B[0m (\u001B[34m\u001B[4m$2\u001B[0m)") // Links (blue underlined) .replaceAll("!?\\[(.*?)]\\((.*?)\\)", "\u001B[34m$1\u001B[0m (\u001B[34m\u001B[4m$2\u001B[0m)"); } ```` > This can easily be translated into other programming languages. Just be careful with the small differences in syntax of regular expressions. For the following Markdown text: ````markdown # Main title Big title ## Subtitle Small title # Bold and italic Some **bold text**. Bits of _italicized text_. It's **underlined**. And ~~striked through~~. ## Links A [link](https://www.example.com) to an article. ![alt text](image.jpg) ### Quoting > a quote of someone famous, potentially wrapping around multiple lines. # Lists 1. First item 2. Second item 3. Third item - First item - Second item - Third item # Code Some inline `code` inside a paragraph. Return type is `void` and args are `String[]`. A fenced code block: ```java public class Hello { public static void main(String[] args) { System.out.println("Hello World!"); } } ``` ```` On the console output, the above Markdow document would be rendered as follows: ![](/img/misc/markdown-rendered-in-console.png) Do you like this Markdown syntax highlighting? ## Bonus points At first, in my utility class, I only had methods for adding some colors in my program outputs. Even if you don't use Markdown, those touches of color can be useful to differentiate key parts of your output. So I created some methods for wrapping text in ANSI codes: ```java public static String red(String msg) { return "\u001B[31m" + msg + "\u001B[0m"; } public static String green(String msg) { return "\u001B[31m" + msg + "\u001B[0m"; } //... public static String bold(String msg) { return "\u001B[1m" + msg + "\u001B[0m"; } public static String italic(String msg) { return "\u001B[3m" + msg + "\u001B[0m"; } //... ``` You can combine them like in `bold(green(msg))`, add more colors, or even write some fancy rainbow text! As long as the output is still readable, it's all fair game! # Advanced RAG — Sentence Window Retrieval Retrieval Augmented Generation (RAG) is a great way to expand the knowledge of Large Language Models to let them know about your own data and documents. With RAG, LLMs can ground their answers on the information your provide, which reduces the chances of hallucinations. Implementing RAG is fairly trivial with a framework like [LangChain4j](https://docs.langchain4j.dev/tutorials/rag). However, the results may not be on-par with your quality expectations. Often, you'll need to further tweak different aspects of the RAG pipeline, like the document preparation phase (in particular docs chunking), or the retrieval phase to find the best information in your vector database. In this first article (hopefully of a series on advanced RAG techniques) I'd like to explore an approach that may yield better results: **sentence window retrieval**, inspired by the technique described in this [article](https://www.linkedin.com/pulse/sentence-window-retrieval-optimizing-llm-performance-rutam-bhagat-v24of/). > I've explored many techniques in my [advanced RAG techniques]({{}}) presentation, if you feel like discovering other techniques that we'll explore in more details in this series. ## Let's step back to naive chunking First, why do we even split documents in smaller chunks? We split documents into chunks in RAG because: - It's easier to find the specific, relevant piece of information within a smaller chunk than a huge document. - Large Language Models have limited memory. Chunks allow us to feed them just the necessary context, instead of overwhelming them with the whole document. - Smaller chunks lead to more precise retrieval, delivering more accurate answers. The naive approach is to split in chunks of a certain amount of characters. For example, on the Wikipedia page of Berlin, a 100-character split might look as follows: ![](/img/rag/naive-chunk-1.png) If a user asks the question _"What is the population of Berlin?"_, the number of inhabitants is split across two chunks. So neither the first, nor the second chunk would yield the correct information, for the LLM to generate an accurate answer. An obvious improvement is to use overlapping chunks: ![](/img/rag/naive-chunk-2.png) The red chunk and the orange chunk overlap: both contain the gray part as well. Which means that the second chunk contains the number (in full) we're interesteded in. Another possible approach, to avoid splits and overlaps, is to chunk by sentences. After all, human beings write sentences for a good reason, because they bear information that represent a unit of semantic meaning. ![](/img/rag/naive-chunk-3.png) However, both the chunk with overlap example above, as well as the sentence split expose another flaw: Notice that the pronoun `its`, in the second chunk or the second sentence doesn't carry the information that it actually references `Berlin`. So the pronoun misses an important aspect of the sentence: this is a sentence about the population of Berlin. Not any other city. An alternative may be to increase the size of the chunk, and/or the size of the overlap, to avoid information to be split across chunks (like the population figure), and to give more context about possible links between sentences (like our pronoun-city). However, the wider the chunks, the more diluted the semantic meaning in the resulting vector embeddings. With more dillution, it's harder to have query vectors (the user prompt) match the chunks of texts with high similarity values. ## Enters sentence window retrieval The name of the technique comes from this [article](https://www.linkedin.com/pulse/sentence-window-retrieval-optimizing-llm-performance-rutam-bhagat-v24of/) I mentioned. But maybe it's not the best name we could find. Maybe something like _wider-context-sliding-window-embedding_ would be more explicit, but that's a mouthful! Let's have a look at this approach: ![](/img/rag/naive-chunk-4.png) The idea is as follows: - We calculate vector embeddings for the sentence in dark green. - But we save the surrounding sentences in light green (for example, one sentence before, and two after). At retrieval time, the vector similarity calculation will match better with the dark green sentence (in spite of its missing _Berlin_ aspect). But the whole light + dark green context will be added in the prompt of the LLM, instead of the single sentence. The advantages are that: - We keep on carrying meaningful units of meaning with a few sentences, thus avoiding any key information cut between splits, and semantic dillution of bigger chunks. - It helps the LLM resolve links between pronouns and their related entity. The LLM knows that we're talking about Berlin here. ## The canonical RAG implementation in LangChain4j With LangChain4j, the base approach is as follows. Let's start with the **ingestion phase**: ```java // Load the document (the Wikipedia page about Berlin) Document capitalDocument = Document.from(text); // Define an embedding model to calculate vector embeddings, // both for the text of the article, and for the user queries var embeddingModel = VertexAiEmbeddingModel.builder() .project(System.getenv("GCP_PROJECT_ID")) .endpoint(System.getenv("GCP_VERTEXAI_ENDPOINT")) .location(System.getenv("GCP_LOCATION")) .publisher("google") .modelName("text-embedding-005") .build(); // Store the chunks and their vectors in a vector database // (in this example, we'll use a simple in-memory store) var embeddingStore = new InMemoryEmbeddingStore(); // Ingest the document in chunks of 100 characters // with an overlap of 20 characters, // use the in-memory vector store, // and the embedding model for vector calculations EmbeddingStoreIngestor.builder() .documentSplitter(DocumentSplitters.recursive(100, 20)) .embeddingStore(embeddingStore) .embeddingModel(embeddingModel) .build() .ingest(capitalDocument); ``` This is the naive approach using chunks of 100 characters with overlap. Let's see what it looks like during the **retrieval phase**: ```java // Declare the LLM model we want to use VertexAiGeminiChatModel chatModel = VertexAiGeminiChatModel.builder() .project(System.getenv("GCP_PROJECT_ID")) .location(System.getenv("GCP_LOCATION")) .modelName("gemini-2.0-flash-001") .build(); // Create an interface contract // that LangChain4j will implement for us interface CapitalsAssistant { Result learnAboutCapitals(String query); } // AiServices implements the interface // and binds the LLM, and a content retriever // that links the embedding model and vector store CapitalsAssistant assistant = AiServices.builder(CapitalsAssistant.class) .chatLanguageModel(chatModel) .contentRetriever(EmbeddingStoreContentRetriever.builder() .embeddingModel(embeddingModel) .embeddingStore(embeddingStore) .build()) .build(); // Now we can ask questions Result response = assistant.learnAboutCapitals( "How many inhabitants live in Berlin?"); ``` We could also add a memory component, to keep track of the ongoing discussion, it's just one extra line. But here, I stick to just single user questions. ## Let's implement the sentence window retrieval Now, how can we expand the above code to implement the algorithm? We need to split the text in sentences, and keep track of the surrounding sentences, as a sliding window, to give extra context to the LLM. We can store that information as metadata of each text segment. We must prepare the LLM prompt by inserting the surrounding context, instead of single sentences. At ingestion phase, we can plug a `TextSegmentTransformer` that transforms our text chunks, to compute and store the surrounding context in the text segment metadata. We need to override both `transform()` and `transformAll()` methods, because we need to modify all chunks together (to get the surrounding sentences): ```java EmbeddingStoreIngestor ingestor = EmbeddingStoreIngestor.builder() .documentSplitter(new DocumentBySentenceSplitter(200, 20)) .embeddingStore(embeddingStore) .embeddingModel(embeddingModel) .textSegmentTransformer(new TextSegmentTransformer() { @Override public TextSegment transform(TextSegment segment) { return transformAll(Collections.singletonList(segment)) .getFirst(); } @Override public List transformAll(List segments) { List list = new ArrayList<>(); for (int i = 0; i < segments.size(); i++) { TextSegment textSegment = segments.get(i); // Create a sliding window of sentences to gather // the context surrounding the embedded sentence // (2 sentences before, 3 after, // but you could make it configurable) String context = IntStream.rangeClosed(i - 2, i + 3) .filter(j -> j >= 0 && j < segments.size()) .mapToObj(j -> segments.get(j).text()) .collect(Collectors.joining(" ")); // Store the surrounding context as metadata // of the text segment (the current chunk) Metadata metadata = new Metadata(textSegment.metadata().toMap()); metadata.put(METADATA_CONTEXT_KEY, context); list.add(TextSegment.from(textSegment.text(), metadata)); } return list; } }) .build(); ``` That's a bit of code, but I hope to contribute an implementation to LangChain4j directly, so that you don't have to write this algorithm each time you want to apply it. Let's focus now on the retrieval phase now, because we need to inject the surrounding context in the LLM prompt, instead of the sentence chunk itself. We need to create a `RetrievalAugmentor`, and configure the `ContentRetriever` we used before, and a `ContentInjector`: ```java CapitalsAssistant assistant = AiServices.builder(CapitalsAssistant.class) .chatLanguageModel(chatModel) .retrievalAugmentor(DefaultRetrievalAugmentor.builder() // the content retriever is defined // at the level of the retrieval augmentor .contentRetriever(EmbeddingStoreContentRetriever.builder() .embeddingModel(embeddingModel) .embeddingStore(embeddingStore) .build()) // We create a content injector that injects // the surrounding context in the LLM prompt .contentInjector((contents, userMessage) -> { // Retrieves the surrounding sentences // from the text segment's metadata String excerpts = contents.stream() .map(content -> content .textSegment() .metadata() .getString(METADATA_CONTEXT_KEY)) .collect(Collectors.joining("\n\n")); // Customize the prompt for our geography use case return PromptTemplate.from(""" You are a helpful geography assistant knowing everything about the capitals of the world. Here's the question from the user: {{userMessage}} Answer the question using the following information: {{contents}} """).apply(Map.of( "userMessage", userMessage.singleText(), "contents", excerpts )).toUserMessage(); })).build()) .build(); ``` Again, that's a bit of code, but we can make it reusable easily if needed in different contexts. ## Summary With this _sentence window retrieval_ approach, we calculate and store the vector embedding of a sentence, but we inject a wider surrounding context (a few sentences before and after) into the context of the LLM to generate its response with more information than just the single sentence. This tends to avoid the problem of key pieces of information cut in the middle, and to resolve references between sentences (like a pronoun pointing at a named entity defined earlier). It's a technique worth experimenting with, to see if it gives better results in your own scenario. However, before blindly applying a particular technique, be sure to prepare some evaluations: Measure the quality of your RAG pipeline before making changes. Then, measure after having applied a new technique, to see if the answers are better. We'll have to explore the topic of evaluation another day, but in the meantime, I encourage you to read the blog posts of my colleague Mete Atamel who covered [RAG pipeline evaluation](https://atamel.dev/posts/2025/01-09_evaluating_rag_pipelines/), the [DeepEval](https://atamel.dev/posts/2025/01-14_rag_evaluation_deepeval/) tool, and the [RAG triad metric](https://atamel.dev/posts/2025/01-21_improve-rag-with-rag-triad-metrics/). # The power of large context windows for your documentation efforts My colleague Jaana Dogan was [pointing](https://x.com/rakyll/status/) at the Anthropic's MCP (Model Context Protocol) documentation pages which were describing [how to build MCP servers and clients](https://modelcontextprotocol.io/tutorials/building-mcp-with-llms). The interesting twist was about _preparing the documentation_ in order to have Claude _assist you_ in building those MCP servers & clients, _rather than clearly documenting how to do so_. {{< x user="rakyll" id="1890521760690270409" >}} No more tutorials: - **You discuss with the reference documentation.** - **Your chat with the LLM becomes the tutorial!** LLM-powered documentation chats become your tailored tutorial, for your very own specific need and requirements. Not only LLMs can assist you authoring articles, documentation, reports, but **LLMs can craft explanations that help you achieve a particular goal for which there's not already a tutorial or how-to guide available**. Also, sometimes, you overlook some key paragraph of section when browsing through the documentation, and you miss the key information that would have helped you fix the problem at hand. This happened to me recently while using an Obsidian plugin: I needed to configure the plugin in a certain way, and I had the impression it wasn't possible, but the plugin author pointed me at the key paragraph that I somehow glanced over. **Had I asked the question to an LLM-powered chat that has all the reference documentation in its context, maybe it would have found that paragraph I missed?** ## Here come the large context window LLMs... As you may already know, I contribute to the [LangChain4j](https://docs.langchain4j.dev/) open source project, which provides integrations with various LLMs (like Gemini) or vector databases. I hope to start working on an additional module to integrate the new _unified_ [Gemini SDK](https://github.com/googleapis/java-genai). The advantage of this new SDK is that you can call both Gemini _flavors_: the one provided by Google Cloud's Vertex AI offering, as well as DeepMind's Google AI version. One SDK to rule them all! I'm also interested in potentially creating a new vector store module for Google [Cloud Firestore](https://cloud.google.com/firestore/docs/vector-search), which recently added vector calculation support in its Java client library. It would be neat to be able to use Firestore for RAG (Retrieval Augmented Generation) scenarios, taking advantage of the document database, its filtering capabilities, and its ability to do vector similarity searches. LangChain4j's documentation provides some explanations on how to contribute new integrations, but I was interested in trying this approach of chatting with the reference documentation to guide my steps in creating a new LLM module, and a new vector store implementation module. Fortunately, my favorite large language model, [Gemini](https://gemini.google.com/app), has a **huge context window** of [up to 2 million tokens](https://developers.googleblog.com/en/gemini-2-family-expands/)! This should be enough to ingurgitate the whole sources and documentation pages of the project. Additionally, I discovered `gitingest` a great [online tool](https://gitingest.com/) that allows you to **convert a Github project and all its sources into one gigantic text file**... that you can then feed to an LLM. Basically, you take the URL of a Github repository, and you replace the `github.com` part with `gitingest.com`. For example, for LangChain4j, the URL becomes: `https://gitingest.com/langchain4j/langchain4j`. Then, a download button lets you get the whole content of the repository. ## The prompt For this experiment, I didn't code anything, I simply used [Google AI Studio](https://aistudio.google.com/app/prompts/new_chat). I selected `Gemini 2.0 Pro experimental`: ![](/img/gemini/gemini-2.0-pro-exp.png) Then, I loaded the big text file of the LangChain4j project, created by gitingest, and look at the number of tokens: ![](/img/gemini/2-million-tokens.png) Yes, the whole repository and documentation is almost 2 million tokens long! Only Gemini can ingest that many tokens! Time to ask my question: > I'd like to create a new language module for LangChain4j. How should I do that? What are the key steps to follow to propose a new model? ## The response {{< details summary="Click to reveal the detailed answer" >}} Okay, let's outline the steps to create a new language model module for LangChain4j and propose its inclusion. Based on the provided file structure, you'll be focusing on creating a new module similar to the existing ones (e.g., `langchain4j-open-ai`, `langchain4j-ollama`, etc.). Here's a breakdown of the process, referencing the structure you've provided: **Key Steps and Considerations** 1. **Understand the Abstractions and SPI:** LangChain4j, like its Python counterpart, is built around core abstractions. You need to understand these to implement your integration correctly. The core abstractions you _must_ implement are: - **`ChatLanguageModel` / `StreamingChatLanguageModel`:** For conversational models (like ChatGPT, Gemini). Implement `ChatLanguageModel` for synchronous responses, and `StreamingChatLanguageModel` if the model supports streaming responses token by token. - **`LanguageModel` / `StreamingLanguageModel`:** For models with a simpler text-in, text-out interface (less common these days). - **`EmbeddingModel`:** If the model provider offers embedding capabilities. - **`ModerationModel`:** If the model provider offers content moderation. - **`ScoringModel`:** If the model provider offers scoring/ranking capabilities. - **Builder Factories:** You'll also need to create builder factories (SPIs) for each model type you implement. These are how users will construct your model classes. See examples like `AzureOpenAiChatModelBuilderFactory`. These are registered using the Java ServiceLoader mechanism (the `META-INF/services` files). 2. **Choose a Module Structure (and Repository):** - **Community Repo (Preferred for new integrations):** Start your integration in the [`langchain4j-community`](https://github.com/langchain4j/langchain4j-community) repository. This is the recommended approach for new contributions. It allows for easier initial review and iteration before considering a move to the core `langchain4j` repository. Clone this repo, don't fork the main `langchain4j` repo directly. - **Main `langchain4j` Repo (For Core Integrations):** If your integration is with a very widely used and well-established model provider (like OpenAI, Google, etc.), _and_ you are confident in its stability and long-term maintenance, you _might_ propose it for the main repo. However, start in `langchain4j-community` first. - **Module Naming:** Follow the pattern: `langchain4j-{provider-name}` (e.g., `langchain4j-my-llm`). - **Directory Structure:** Create a directory structure mirroring the existing modules (see `langchain4j-open-ai` or `langchain4j-ollama` as good examples): ``` langchain4j-{provider-name}/ pom.xml (Your module's Maven build file) src/ main/ java/ dev/ langchain4j/ model/ {providername}/ (e.g., myllm) {ProviderName}ChatModel.java (Your implementation) internal/ (API client and related classes) spi/ (Builder factory for your model) {ProviderName}ChatModelBuilderFactory.java resources/ META-INF/ services/ (Files to register your builder factory, see examples) test/ java/ dev/ langchain4j/ model/ {providername}/ {ProviderName}ChatModelIT.java (Integration tests) ``` 3. **Implement the API Client:** - **Official SDK (Preferred):** If the LLM provider has an official Java SDK, use it. This is usually the best approach for stability, performance, and access to all features. See `langchain4j-bedrock` for an example using an official SDK. - **HTTP Client (If no SDK):** If there's no official SDK, use the JDK's built-in `java.net.http.HttpClient` (available since Java 11). This minimizes external dependencies. Avoid adding new dependencies unless absolutely necessary. See `http-clients/langchain4j-http-client-jdk` for how LangChain4j wraps this. Avoid using the older `okhttp3` directly if possible, prefer `langchain4j-http-client-jdk` (or `langchain4j-http-client-spring-restclient` if building a Spring Boot starter). - **JSON Handling:** Use Jackson for JSON serialization/deserialization, as it's already a dependency. - **Error Handling:** Make sure to handle HTTP errors (non-2xx responses) appropriately. Throw a `dev.langchain4j.exception.HttpException` for these. - **Request/Response Logging:** Implement logging for requests and responses (see `langchain4j-anthropic` for a complete example). This is _very_ helpful for debugging. 4. **Implement the Model Interface(s):** - Implement `ChatLanguageModel`, `StreamingChatLanguageModel`, `EmbeddingModel`, etc., as appropriate, based on the provider's capabilities. - Use the `Builder` pattern for your model classes to allow for flexible configuration. - Make sure your implementation handles request/response mapping and error handling correctly. - Implement `TokenCountEstimator` if possible, so the `TokenWindowChatMemory` can calculate the token usage. Implement `DimensionAwareEmbeddingModel` to report the output dimension from the embedding model. 5. **Write Tests:** - **Unit Tests:** Create unit tests for any complex logic, utility methods, and request/response mappers. - **Integration Tests (ITs):** Create integration tests (e.g., `MyLlmChatModelIT.java`) that interact with the _real_ LLM provider's API. These are _crucial_ for ensuring your integration works correctly. - Use environment variables (e.g., `MYLLM_API_KEY`) to store API keys and other secrets. Do _not_ hardcode them. - Use `@EnabledIfEnvironmentVariable` to skip the tests if the required environment variables are not set. - Extend `AbstractChatModelIT`, `AbstractStreamingChatModelIT`, `AbstractEmbeddingModelIT`, and/or `AbstractScoringModelIT` to get a set of basic tests. - Test all relevant features of the model (e.g., text generation, streaming, different parameters, tool use, JSON mode). - Add test for concurrent requests if possible. - Consider adding a test for the `Tokenizer` interface (see examples in `langchain4j-core`). - Add `@RetryingTest` if model response is inconsistent 6. **Add to BOM (Bill of Materials):** Add your new module to `langchain4j-bom/pom.xml`. This helps users manage dependencies. 7. **Documentation:** - **Update `README.md`:** Add your integration to the list of supported models and embedding stores. - **Create Markdown Documentation:** Create Markdown files in the `docs/docs/integrations/` directory, following the structure of existing integrations. You'll need: - A main file (e.g., `my-llm.md`). - An entry in `docs/docs/integrations/language-models/index.md` and in `docs/sidebars.js`. - An entry in `_category_.json` files in `docs/docs/integrations/language-models` and `docs/docs/integrations/embedding-stores` - **Examples (Highly Recommended):** Create a simple example in the [`langchain4j-examples`](https://github.com/langchain4j/langchain4j-examples) repository. This is _very_ helpful for users. 8. **General Guidelines (from CONTRIBUTING.md):** - **Java 17:** Maintain compatibility with Java 17. - **Minimal Dependencies:** Avoid adding new dependencies if possible. If necessary, try to use libraries already present. Run `mvn dependency:analyze` to check. - **Backwards Compatibility:** Avoid breaking changes. If necessary, deprecate old methods/fields instead of removing them. - **Naming Conventions:** Follow existing naming conventions. - **No Lombok:** Avoid using Lombok in new code; remove it from existing code if you touch it. - **Javadoc:** Add Javadoc where needed. - **Code Style:** Run `make lint` and `make format` before committing. - **Large Features:** Discuss large features with maintainers (@langchain4j) before implementation. 9. **Open a Pull Request (Draft First):** - Open a draft PR in the `langchain4j-community` repository. - Fill out all sections of the PR template. - Once the PR is reviewed and approved, you will be asked to finalize it (add documentation, examples, etc.). **Example Code Snippets (Illustrative)** ```java // MyLlmChatModel.java (in the appropriate package) public class MyLlmChatModel implements ChatLanguageModel { private final MyLlmClient client; // Your custom client private final String modelName; @Builder public MyLlmChatModel(String baseUrl, String apiKey, String modelName, Duration timeout) { this.client = new MyLlmClient(baseUrl, apiKey, timeout); // Your custom client this.modelName = modelName; } @Override public Response generate(List messages) { // 1. Convert Langchain4j messages to MyLLM's request format // 2. Call client.chat(...) // 3. Convert MyLLM's response to LangChain4j's Response // 4. Handle errors (throw HttpException) return null; // TODO: Implement } // ... other methods, builder, etc. ... } // MyLlmClient.java (in a suitable package, e.g., dev.langchain4j.model.myllm.internal) class MyLlmClient { // ... implementation using java.net.http.HttpClient ... } // MyLlmChatModelBuilderFactory.java (in a suitable package, e.g., dev.langchain4j.model.myllm.spi) public class MyLlmChatModelBuilderFactory implements Supplier { @Override public MyLlmChatModel.Builder get() { return MyLlmChatModel.builder(); } } // META-INF/services/dev.langchain4j.model.chat.spi.ChatLanguageModelBuilderFactory (resource file) // Add a line with the fully qualified name of your factory: // dev.langchain4j.model.myllm.spi.MyLlmChatModelBuilderFactory ``` **Key Points and Common Mistakes** - **Thorough Testing:** Integration tests are absolutely essential. Test with real API calls. - **Error Handling:** Handle HTTP errors (non-2xx responses) from the LLM provider's API. - **Token Usage:** If the LLM provider gives you token usage information, include it in the `Response`. - **Finish Reason:** If the LLM provider gives you a finish reason (e.g., "stop", "length"), include it in the `Response`. - **Consistency:** Maintain consistency with existing LangChain4j modules in terms of naming, structure, and coding style. - **Documentation:** Clear, concise, and accurate documentation is critical. Follow the patterns you see in existing modules. - **SPI Registration:** Remember to create the `src/main/resources/META-INF/services/` files to register your model implementation as a service. This comprehensive guide, combined with the provided file structure and examples, should give you a strong foundation for contributing your new language model integration to LangChain4j. Remember to start in the `langchain4j-community` repository for initial development and review. Good luck! {{< /details >}} ## The verdict Honestly, this is really good! It follows the advice in the [contributing guidelines](https://github.com/langchain4j/langchain4j/blob/main/CONTRIBUTING.md). But since it has access to the whole code base, it understands pretty well the various steps to follow, the classes to implement, etc. A few nits though: for example, it confirms we shouldn't use Lombok for creating builders, but the example code shows a `@Builder` annotation from Lombok, because it's still used in several modules. Or there's this reference to the `make` tool, although the project doesn't use it at all. The code suggests implement the `generate()` methods, but newer models should now implement the new `chat()` method instead, but Gemini can't really know about this, as this is fairly new and not many models implement those interfaces for now. But other than that, it's excellent! Gemini created a contributing guide by looking at almost 2 million tokens of code and documentation. But this doesn't end here, you can continue asking other questions, to get more insight into particular aspects of the development of a new module, and Gemini would continue the discussion, guiding you and helping you in your task. I found that quite impressive. ## Let's continue the conversation with the code base and documentation As I was interested in creating a new embedding store for Firestore, I then added in the context the [documentation](https://cloud.google.com/firestore/docs/vector-search) of the Java SDK for Firestore. And asked the following question: > I would also like to create a new embedding store for the Google Cloud Firestore database. > What are the steps to create a new LangChain4j embedding store module? Let's see what it suggested: {{< details summary="Click to reveal the detailed answer" >}} Okay, here's a breakdown of how to create a new LangChain4j embedding store module for Google Cloud Firestore, along with the key steps and considerations, mirroring the structure of existing modules like `langchain4j-milvus`. **Project Structure** Your project structure should follow the established pattern. I'll create a simplified version based on the most relevant parts from the provided file listing. The full structure would be much larger (like the main `langchain4j` project), but this captures the essentials: ``` langchain4j/ └── langchain4j-embedding-store-google-firestore/ (or similar name) ├── pom.xml (Your module's Maven build file) └── src/ └── main/ └── java/ └── dev/ └── langchain4j/ └── store/ └── embedding/ └── google/ └── firestore/ └── GoogleFirestoreEmbeddingStore.java (Main class) └── (Any other supporting classes, like request/response objects) └── test/ └── java/ └── dev/ └── langchain4j/ └── store/ └── embedding/ └── google/ └── firestore/ └── GoogleFirestoreEmbeddingStoreIT.java (Integration test) ``` **Steps to Create the Module** 1. **Project Setup (Maven Module):** - Create a new directory named `langchain4j-embedding-store-google-firestore` (or a similar, descriptive name) within the `langchain4j` parent directory. - Create a `pom.xml` file inside this new directory. This file will define your module as a Maven project and specify its dependencies. Use `langchain4j-milvus/pom.xml` as a guide, but adapt it: - **Parent:** Set the parent to `langchain4j-parent`, as shown in your example. - **ArtifactId:** Use `langchain4j-embedding-store-google-firestore`. - **Name and Description:** Update these appropriately. - **Dependencies:** Include `langchain4j-core` as a compile-time dependency. Add the Google Cloud Firestore Java client library as a dependency: ```xml com.google.cloud google-cloud-firestore YOUR_VERSION_HERE ``` Add any other necessary dependencies, such as logging (SLF4J), Lombok (if desired, but try to avoid in new code), etc. _Minimize dependencies as much as possible._ - **Test Dependencies:** Include JUnit, AssertJ, and any mocking libraries (like Mockito) in the `test` scope. - **Licenses:** Ensure your module has the correct Apache 2.0 license information in the POM. 2. **Implement `EmbeddingStore`:** - Create `GoogleFirestoreEmbeddingStore.java` in the `dev.langchain4j.store.embedding.google.firestore` package. - Implement the `EmbeddingStore` interface from `langchain4j-core`. This is the _crucial_ part. You'll need to implement the following methods, mapping them to Firestore operations: - `add(Embedding embedding)`: Adds a single embedding to the store, generating a unique ID. - `add(String id, Embedding embedding)`: Adds an embedding with a specified ID. - `add(Embedding embedding, TextSegment textSegment)`: Adds an embedding with associated text and metadata. - `addAll(List embeddings)`: Adds multiple embeddings. - `addAll(List ids, List embeddings, List textSegments)`: Adds multiple embeddings with associated IDs and text segments. - `removeAll(Collection ids)`: Delete embeddings by id - `removeAll()`: Deletes everything - `removeAll(Filter filter)`: Deletes records using the condition filter - `search(EmbeddingSearchRequest request)`: Find related embeddings - `findRelevant(Embedding referenceEmbedding, int maxResults, double minScore)`: Finds relevant embeddings. - **Considerations for Firestore Implementation:** - **Data Model:** How will you store the embeddings (as float arrays) and associated data (text, metadata)? Firestore uses a NoSQL document model. You'll likely store each `TextSegment` and its `Embedding` as a document in a collection. - **Metadata:** Firestore supports storing metadata as document fields. You'll need a way to map `TextSegment` metadata (which is a `Map`) to Firestore document fields. You have a few options here, mirroring what existing modules do: - **Individual Columns (Preferred):** Each metadata key becomes a separate field in the document. This is efficient for querying but requires knowing the metadata keys in advance. The Mariadb integration uses this approach with `MetadataColumDefinition`. - **Single JSON Field:** Store all metadata as a single JSON string. This is flexible but less efficient for filtering. The Chroma integration uses this. - **Mixed Approach:** Common metadata fields (like "source" or "document_id") could be separate fields, and a catch-all "metadata" field could store the rest as JSON. - **Vector Search:** Firestore now has native Vector Search, which is excellent! You'll use this for the `findRelevant` method. The key will be understanding how to map LangChain4j's `EmbeddingSearchRequest` (which includes `maxResults` and `minScore`) to the appropriate Firestore vector search query. - **Filtering:** Implement filtering based on metadata using Firestore's query capabilities. LangChain4j's `Filter` interface needs to be translated into a Firestore query. Look at existing implementations like `ChromaMetadataFilterMapper` for inspiration, but adapt it to Firestore's query syntax. - **Error Handling:** Wrap Firestore exceptions in `RuntimeException` or a more specific custom exception. - **Configuration:** Use a `Builder` pattern (like `OpenAiChatModel` does) to allow users to configure: - Firestore project ID, database, collection name - Credentials (API key, service account) - Timeout settings - Maximum number of results (`maxResults`) - Minimum relevance score (`minScore`) - Field names for text, embedding, and metadata (if configurable) - Possibly options for automatically creating the collection/indexes. - **Indexing:** You _must_ create the necessary index in Firestore for vector search to work. The documentation you provided explains how to do this with the `gcloud` CLI. The code should, ideally, at least _check_ if the index exists. It _could_ even attempt to create it, but this would require more permissions. - **Concurrency:** if there is any concurrency, the code should be thread-safe. - **Implement `GoogleFirestoreEmbeddingModel`:** You should also implement the `EmbeddingModel` interface, by adapting the google API to the langchain4j interface. 3. **Create SPI Builder Factory:** - Create a `GoogleFirestoreEmbeddingStoreBuilderFactory` class that implements `Supplier`. - Create a file named `META-INF/services/dev.langchain4j.spi.store.embedding.EmbeddingStoreFactory` in `src/main/resources`. - Add a single line to this file containing the fully qualified name of your factory class (e.g., `dev.langchain4j.store.embedding.google.firestore.GoogleFirestoreEmbeddingStoreBuilderFactory`). 4. **Write Integration Tests:** - Create `GoogleFirestoreEmbeddingStoreIT.java` (and potentially `...WithFilteringIT.java`, `...WithRemovalIT.java`, etc., mirroring the existing modules). - Extend `EmbeddingStoreIT` (or `EmbeddingStoreWithFilteringIT`, etc.) to inherit a basic set of tests. - Implement the abstract methods (like `embeddingStore()`) to provide instances of your store and a compatible embedding model. - Add tests specific to Google Cloud Firestore features and limitations. - Use `@EnabledIfEnvironmentVariable` to conditionally run the tests only when the necessary environment variables (credentials) are set. See `OllamaChatModelIT` for an example. 5. **Add to BOM (Bill of Materials):** Add your new module to `langchain4j-bom/pom.xml` to manage its version consistently. 6. **Documentation:** - Add the new model in all the `index.md` files - Write a guide in `docs/docs/integrations/embedding-stores` - Create a README file **Code Example (Conceptual)** ```java // In your GoogleFirestoreEmbeddingStore.java package dev.langchain4j.store.embedding.google.firestore; import dev.langchain4j.data.embedding.Embedding; import dev.langchain4j.data.segment.TextSegment; import dev.langchain4j.model.embedding.EmbeddingModel; import dev.langchain4j.store.embedding.EmbeddingMatch; import dev.langchain4j.store.embedding.EmbeddingStore; import lombok.Builder; import org.slf4j.Logger; import org.slf4j.LoggerFactory; import java.util.List; // ... other imports ... public class GoogleFirestoreEmbeddingStore implements EmbeddingStore { private static final Logger log = LoggerFactory.getLogger(GoogleFirestoreEmbeddingStore.class); // ... (fields for Firestore client, collection name, etc.) ... @Builder public GoogleFirestoreEmbeddingStore(...) { // ... (initialize Firestore client, etc.) ... } @Override public String add(Embedding embedding) { // ... implementation ... } @Override public void add(String id, Embedding embedding) { // ... implementation ... } @Override public String add(Embedding embedding, TextSegment textSegment) { // ... implementation ... } @Override public List addAll(List embeddings) { // ... implementation ... } @Override public void addAll(List ids, List embeddings, List embedded) { // ... implementation ... } @Override public List> findRelevant(Embedding referenceEmbedding, int maxResults, double minScore) { // ... implementation using Firestore vector search ... // ... map Firestore results to EmbeddingMatch objects ... } // ... other methods ... } // In a separate file: GoogleFirestoreEmbeddingStoreBuilderFactory.java package dev.langchain4j.store.embedding.google.firestore; import dev.langchain4j.model.embedding.EmbeddingModel; import dev.langchain4j.spi.store.embedding.EmbeddingStoreFactory; public class GoogleFirestoreEmbeddingStoreBuilderFactory implements EmbeddingStoreFactory { @Override public GoogleFirestoreEmbeddingStore.Builder get() { return GoogleFirestoreEmbeddingStore.builder(); } } ``` **Key Improvements and Considerations from the examples:** - **Error Handling:** Wrap Firestore API calls in `try-catch` blocks and throw `RuntimeException` (or a more specific custom exception) to handle potential errors. - **Logging:** Use SLF4J (`log.debug()`, `log.info()`, `log.warn()`, `log.error()`) for logging. This is crucial for debugging and monitoring. - **Configuration:** The `Builder` should allow configuration of all necessary parameters (endpoint, credentials, collection name, timeouts, etc.). Use `ensureNotNull` and `ensureNotBlank` to validate. - **Dependency Management:** The `pom.xml` should correctly specify the Google Cloud Firestore client library as a dependency. - **Testcontainers:** Consider using Testcontainers for integration tests to spin up a Firestore emulator. This will make your tests more reliable and portable. (See examples in the provided code, such as for `CassandraEmbeddingStoreDockerIT`). - **Metadata:** Implement proper handling of metadata, including mapping it to/from Firestore document fields. Decide on your strategy (separate fields vs. JSON). - **Filtering:** Implement filtering based on metadata using Firestore's query capabilities. You will likely need a helper class like `ChromaMetadataFilterMapper` to translate `Filter` objects into Firestore queries. - **Service Provider Interface (SPI):** The `spi` package and `META-INF/services` file are _critical_ for LangChain4j to discover your implementation. This is how `AiServices` (and other parts of the framework) dynamically load components. Without this, your module won't be usable. - **Return type**: The `findRelevant` method should take a {@link dev.langchain4j.store.embedding.EmbeddingSearchRequest} and return a {@link dev.langchain4j.store.embedding.EmbeddingSearchResult}. This would ensure that we have consistency through all embedding stores and allow advanced filtering and reranking features. - **Interface segregation**: Introduce a `EmbeddingStoreWithFiltering` interface, that extends `EmbeddingStore` and adds a `removeAll(Filter filter)` method. - **Metadata handling:** Implement the `MetadataHandler` interface, or extend one of its base classes if it's a good fit. - **Support Vector Search API in different database modes:** Support vector search in the different modes of Cosmos DB, implementing the search in each of them. - **Implement the `close()` method:** You should close the connection to Firestore in the `close()` method. - **Test your embedding store against the `EmbeddingStoreWithFilteringIT` and `EmbeddingStoreWithRemovalIT`:** The provided `langchain4j-core` contains those abstract test classes that you can use to test your embedding store. - **Add documentation:** Create Markdown files in the `docs/docs/integrations/embedding-stores` directory, following the structure of existing integrations. - **Add an example:** to the [examples repository](https://github.com/langchain4j/langchain4j-examples). By following these steps and adapting the provided example code, you can create a robust and well-integrated LangChain4j module for Google Cloud Firestore. {{}} ## Second verdict **Again, it's pretty spot on and acurate!** Still a mention to Lombok, or a strange reference to Cosmos DB, but I generally like how it advises me to look at existing modules, telling me that this particular one may be a good inspiration to get my implementation right. I've never implemented an embedding store so far, and I think newer models should implement a `search()` method instead of `findRelevant()`. So I asked how to implement this method, using the documentation of the Firestore Java SDK, and a minute later, it suggested a concrete implementation. I won't copy the output again, as I don't want to make this article too long (it already is). I haven't tried this implementation, but the code and explanations seemed pretty convincing, so when I get some time, I'll try to see if it's correct. ## Conclusion **Only Gemini proposes a 2 million token context window**. And I'm glad it does, because the LangChain4j projects is close to that limit! With models with smaller windows, I would have had to be way more selective, and send in the prompt just the right types of artifacts (ie. just the LLM modules, or just the embedding store implementations). Thanks to the huge window, I was able to feed the whole repository in its entirety! **Does it mean it's the end to writing proper tutorials or how-to guides? Certainly not.** But I find that very interesting that I'm able to have this kind of highly detailed conversation with the LLM, without having to understand all the tiny little details of the underlying project, as the model is able to grok it for me, and distills just the right level of information for me to do the task I asked about. What's very interesting is that **I can continue the conversation** to go in various directions, or **zoom on some specific aspects**, which may not necessarily be covered by existing tutorials or guides. **It's as if I was pair programming with the founder of the project.** # A Generative AI Agent with a real declarative workflow In my [previous article]({{< ref "/posts/2025/01/27/an-ai-agent-to-generate-short-scifi-stories" >}}), I detailed how to build an AI-powered short story generation **agent** using Java, [LangChain4j](https://docs.langchain4j.dev/), Gemini, and Imagen 3, deployed on Cloud Run jobs. This approach involved writing **explicit** Java code to orchestrate the entire workflow, defining each step programmatically. This follow-up article explores an alternative, **declarative** approach using [Google Cloud Workflows](https://cloud.google.com/workflows). I've [written extensively on Workflows]({{< ref "/tags/workflows" >}}) in the past, so for those AI agents that exhibit a very explicit plan and orchestration, I believe Workflows is also a great approach for such declarative AI agents. ## From imperative to declarative: defining the workflow The Java-based agent employed an imperative style, where the code explicitly defined the sequence of operations. Each step, from story conception to image selection, was a method call within the `ExplicitStoryGeneratorAgent` class (you can check the [code of this class](https://github.com/glaforge/short-genai-stories/blob/main/fictionStoryAgent/src/main/java/storygen/ExplicitStoryGeneratorAgent.java)). This provided fine-grained control and allowed for parallelization. [Cloud Workflows](https://cloud.google.com/workflows) offers a declarative approach. Instead of writing code, you define the workflow in a YAML file. This file specifies the steps, their inputs and outputs, and the order in which they should be executed. You can also easily create loops (sequential or parallel), and you can implement _human in the loop_ callbacks if needed. The workflow engine then interprets this definition and executes the steps accordingly. > YAML can be a fair bit more cumbersome to write, compared to using a programming language, but non-developers could get a workflow definition rolling, without having to fire an IDE to code. But for a Java developer, it's certainly simpler to write code, with the help of their favorite programming environment. In this article, I've created a simplified variant: I removed the _LLM-as-judge_ step that picked the best images. And I've created pictures for the whole story, not for each chapter. So it's not exactly the same agent workflow as in the preivous article. But I don't want you to drown in too much YAML! The workflow: - creates the story with **Gemini 2**, - creates a prompt (for the whole story, not for each chapter), - generates images with **Imagen 3**, - saves the result in [Cloud Firestore](https://cloud.google.com/firestore) Let's have a look at the full YAML definition, and read the comments explaining what each step does: ```yaml main: params: [input] steps: # Let's define the Gemini and Image models we want to use: - setup: assign: - GEMINI_MODEL: ${"projects/" + sys.get_env("GOOGLE_CLOUD_PROJECT_ID") + "/locations/us-central1/publishers/google/models/gemini-2.0-flash-exp"} - IMAGEN_MODEL: ${"projects/" + sys.get_env("GOOGLE_CLOUD_PROJECT_ID") + "/locations/us-central1/publishers/google/models/imagen-3.0-generate-002"} # We call Gemini to generate the story - generate_story: call: googleapis.aiplatform.v1.projects.locations.endpoints.generateContent args: model: ${GEMINI_MODEL} region: 'us-central1' body: contents: role: user parts: # Let's write a sci-fi story! - text: "Write a short science-fiction story" generationConfig: temperature: 2.0 responseMimeType: application/json # Use a JSON schema to define the format of the output responseSchema: type: OBJECT properties: title: type: STRING description: The title of the short story content: type: STRING description: The body of the story required: ['title', 'content'] # You can define system instructions systemInstruction: parts: - text: > You are a creative fiction author, and your role is to write stories. You write a story as requested by the user. A story always has a title, and is made of 5 long chapters. Each chapter has a title, is split into paragraphs, and is at least 20 sentences long. result: short_story # Assign the story, title, content into some variables - get_story: assign: - story_output: ${json.decode(short_story.candidates[0].content.parts[0].text)} - title: ${story_output.title} - content: ${story_output.content} # Let's call Gemini again, but for creating a prompt for Imagen - generate_image_prompt: call: googleapis.aiplatform.v1.projects.locations.endpoints.generateContent args: model: ${GEMINI_MODEL} region: 'us-central1' body: contents: role: user parts: - text: ${content} systemInstruction: parts: - text: | You are an expert artist who masters crafting great prompts for image generation models, to illustrate short stories. When given a short story, reply with a concise prompt that could be used to create an illustration with the Imagen 3 model. Don't use any flags like those used with MidJourney. Just answer with the short concise text prompt. Your answer MUST start with "A cartoon of ", as we want to use cartoon or comics illustrations. The user gives you the following image prompt for the chapter to illustrate: result: image_prompt # Retrieve the prompt from Gemini's output - assign_prompt: assign: - prompt: ${image_prompt.candidates[0].content.parts[0].text} # Time to generate the images - image_generation: call: googleapis.aiplatform.v1.projects.locations.endpoints.predict args: endpoint: ${IMAGEN_MODEL} region: us-central1 body: instances: - prompt: ${prompt} parameters: # Store images in Google Cloud Storage storageUri: 'gs://short-scifi-stories-generated-images' result: images # Utility step to create the picture data for Firestore - prepare_images_uri_list: steps: - create_empty_list: assign: - uris: [] - uris_for_firestore: [] - loop_over_images: for: value: img_object in: ${images.predictions} steps: - append_uri: assign: - uris: ${list.concat(uris, img_object.gcsUri)} - stringUriMap: stringValue: ${img_object.gcsUri} - uris_for_firestore: ${list.concat(uris_for_firestore, stringUriMap)} # Let's prepare the final output to return # as the result of the workflow execution - prepare_result: assign: - final_result: title: ${title} content: ${content} prompt: ${prompt} images: ${uris} createdAt: ${sys.now()} # Finally, let's save the story in Firestore - save_to_firestore: call: googleapis.firestore.v1.projects.databases.documents.createDocument args: collectionId: short-story parent: ${"projects/" + sys.get_env("GOOGLE_CLOUD_PROJECT_ID") + "/databases/(default)/documents"} query: documentId: ${uuid.generate()} body: fields: title: stringValue: ${final_result.title} content: stringValue: ${final_result.content} prompt: stringValue: ${final_result.prompt} images: arrayValue: values: ${uris_for_firestore} createdAt: timestampValue: ${time.format(final_result.createdAt,"GMT")} # Return the data - return_output: return: ${final_result} ``` This YAML file defines the entire story generation process. It calls the Gemini and Imagen APIs, extracts the necessary information from the responses, and saves the final result to Firestore. No Java code is required to manage the flow of execution. ## Key differences and trade-offs Let's zoom in on the pros and cons of both approaches. ### Imperative / programming approach: - **Pros**: - Fine-grained control over the workflow. - Explicit parallelization for performance optimization. - Familiar programming and debugging tools. - Cloud Run jobs is fully managed and scaled by Google Cloud. - Job execution can be scheduled by Cloud Scheduler. - **Cons**: - You need to be familiar with a programming language & environment. - It can potentially be challenging to maintain as the workflow evolves. - The approach used required being familiar with running & scheduling containers as jobs. ### Declarative / workflow based approach: - **Pros**: - Pretty easy-to-understand workflow definitions. - Workflows offers a visualisation of the steps (also during execution). - Parallelization can be defined explicitly (with the `parallel` keyword on iterations or step branches). - Simplified maintenance and updates. Just need to update the YAML in the console. - Workflows is scalable and reliable out of the box without extra effort. - Workflow execution can be scheduled by Cloud Scheduler. - **Cons**: - YAML authoring can be painful, if you're not familiar with the APIs you call. - Parallelization is declarative but might be limited depending on the workflow definition and Google Cloud Workflows capabilities. You would have more control with a programming language. - There's no emulator to run workflows locally, so you might have to create copies and work on these, to not affect the production workflow. - Debugging relies on workflow execution logs, which might be less intuitive than traditional debugging. ## Choosing the right approach It depends! :sweat_smile: Of course, the choice between these approaches depends on the specific project requirements. If fine-grained control and explicit parallelization are critical, the imperative programming approach might be preferable. However, for simpler workflows where ease of development and maintainability are critical, Cloud Workflows offers an interesting alternative. You can easily make a tweak to the workflow directly from the Google Cloud console if needed. In the case of this story generation agent, the declarative approach sounds like a good fit, but the YAML authoring can be a bit painful at times, as you have to look up the various payload schemas for the APIs to invoke, to be able to make the service calls. But that's definitely a plus as well, in the sense that pretty much all the products and services offered on Google Cloud Platform can easily be called via REST endpoints, and Workflows excels at that. ## Conclusion Explicit declarative planning helps AI agents stay focused, and ensures a high level of predictability. My experience with agents which plan their own actions has been mixed, as sometimes the LLM hallucinates function calls, or calls functions with bogus parameters. In the previous [previous article]({{< ref "/posts/2025/01/27/an-ai-agent-to-generate-short-scifi-stories" >}}), I used an imperative programming approach, but in this article today, I developed a simplified equivalent with a declarative workflow definition. [Google Cloud Workflows](https://cloud.google.com/workflows) offers a powerful and convenient way to build and manage declarative AI agents — and obviously any other kind of process that needs to call APIs. By defining the workflow declaratively, you can focus on the logic of your agent rather than the details of execution. While it might not be suitable for every use case, it's definitely a valuable tool to consider when building AI-powered applications on Google Cloud! # An AI agent to generate short sci-fi stories This project demonstrates how to build a fully automated short story generator using Java, [LangChain4j](https://docs.langchain4j.dev/), Google Cloud's **Gemini** and **Imagen 3** models, and a serverless deployment on [Cloud Run](http://cloud.run/). Every night at midnight UTC, a new story is created, complete with AI-generated illustrations, and published via Firebase Hosting. So if you want to read a new story every day, head over to: ## → [short-ai-story.web.app](https://short-ai-story.web.app/) ← The code of this agent is available on Github. So don't hesitate to **check out the code**: ## → [github.com/glaforge/short-genai-stories](https://github.com/glaforge/short-genai-stories) ← Let's have a closer look at the architecture and workflow of this automated storytelling machine. ## The agent: the storyteller's brain At the heart of the system lies the `ExplicitStoryGeneratorAgent`, a Java class orchestrating the entire story generation process. This agent follows a clear, multi-step workflow: * **Story conception (Gemini)**: The agent first calls the Gemini large language model (LLM) to generate the core story elements: a title, five chapters each with title and content. * **Image prompt engineering (Gemini)**: For each chapter's content, the agent again leverages Gemini to craft tailored image generation prompts. This ensures that the image prompts are relevant to the specific content of each chapter. * **Illustration generation (Imagen 3)**: Using the generated prompts, the agent calls Imagen 3 to produce a set of image candidates (four by default) for each chapter. * **Image selection (Gemini, self-reflection)**: In a unique _"self-reflection"_ step, the agent presents the generated images back to Gemini, asking the LLM to select the image that best visually represents the chapter's narrative. This crucial step ensures that the illustrations truly complement the story. * **Persistence (Firestore)**: Once the story, chapter titles, content, and selected images are finalized, the agent stores them in a Firestore database (a NoSQL document database). This makes retrieving complete data relatively straightforward from the web frontend, thanks to the [Firebase](https://firebase.google.com/) framework. For the more visual people among us, this diagram illustrates the steps above: ![Agent workflow diagram](/img/short-ai-stories/agent-workflow.png) **Note:** The critique step where Gemini is asked to judge the best image isn't really necessary, to be honest. Imagen generates images that adhere very much to the given prompt. So either of them would seem suitable to illustrate each chapter. But it was interesting to implement a _self-reflection_ step in this workflow. ## Digression: Explicit vs. Autonomous Agent Workflows This project utilizes an _explicit workflow agent_, where the story generation process is meticulously defined and controlled by the Java code. This approach contrasts with fully _autonomous agents_, which rely on the LLM to plan and execute the workflow dynamically. Let's explore the key differences and trade-offs between these two approaches: ### Explicit workflow agent (code-driven planning): * **Predictable execution**: The Java code dictates the exact sequence of steps, ensuring a highly predictable and reliable workflow. Each stage, from story conception to image selection, is explicitly programmed, leaving no room for unexpected deviations. * **Improved performance through parallelization**: With explicit control, tasks that can be executed concurrently (such as generating images for different chapters or judging the best image for each chapter) can be easily parallelized. This significantly reduces the overall execution time. * **Easier debugging and maintenance**: The clear, structured code makes debugging and maintenance straightforward. The flow of execution is transparent to the developer, and any errors can be readily identified and addressed. * **Limited flexibility**: The explicit nature of the workflow could be seen as offering less flexibility. Indeed, the code needs to be updated to handle changes of the workflow. However, it's not necessarily worse than endlessly tweaking prompts to coerce an LLM to plan correctly the needed workflow changes. ### Autonomous agent (LLM-driven planning): * **Dynamic workflow**: Autonomous agents use the LLM's capabilities to plan and execute the workflow. This allows for greater flexibility and adaptability to different story generation requirements. The LLM can theoretically decide which steps to take, in which order, and how many times. * **Potential for hallucinations and errors**: Relying on the LLM for planning introduces the risk of hallucinations and incorrect function calls. The LLM might generate nonsensical steps, omit crucial actions, provide incorrect parameters to functions, or execute functions in an illogical order. This can lead to unpredictable results and make it harder to catch potential errors. Even with perfect prompts, LLMs might make mistakes in function calling. This is actually the problem I encountered when trying this approach first. * **Debugging challenges**: Debugging autonomous agents can be more complex. The dynamic nature of the workflow makes it harder to trace the execution path and identify the source of errors. Troubleshooting often involves analyzing the logs of the LLM and the tools it requested to call, which can be challenging to interpret at times. * **Less control over execution**: With autonomous agents, developers cede some control over the execution flow to the LLM. While this offers flexibility, it also means less fine-grained control over performance optimization. Parallelization opportunities, for example, might not be readily apparent or easily exploitable. Currently, when receiving paralell function call requests, LangChain4j doesn't yet offer the possibility to request their paralellization. The autonomous approach would have looked like the following diagram: ![Autonomous agent diagram](/img/short-ai-stories/autonomous-approach.png) With this approach, the main agent generates the story, then would call first the prompt creation tool, then the generation image tool, and would finish with the tool to pick up the best image. However, in my experience, in spite of a good amount of prompt engineering tweaks, I couldn't get this to work reliably. I tried with different versions of Gemini (1.5 Flash, 1.5 Pro, and 2.O Flash experimental, from worst to best outcome), but sometimes, for example, it would request to judge images before they had been generated, or the URLs of the images would be hallucinated instead of coming from the outcome of the judge. So I prefered moving to a more explicit approach. I invite you to read this great article from Anthropic about [building effective agents](https://www.anthropic.com/research/building-effective-agents) which also makes the distiction between _agents_ (fully autonomous planning agents) and _workflows_ (the more explicit approach with code driving the execution planning). They also recommend to stick to _workflows_ when the logic of the agent is very clear upfront: when you can draw a workflow on a sheet of paper, that's surely because you do need a workflow. ### Choosing the right approach: The choice between explicit and autonomous workflows depends on the specific requirements of the project. Explicit workflows are best suited for applications where predictability, reliability, and performance are paramount. Autonomous agents are more appropriate when flexibility, adaptability, and dynamic planning are prioritized, even at the cost of potential errors and increased debugging complexity. This project prioritizes the former over the latter, which explains why an explicit agent is preferred for this use case. In other words, it's better to have a predictable solution, which then can be easily parallelized to reduce latency, than a non-predictable solution that is slower. ## A closer look at the code As you'll be exploring the [code base](https://github.com/glaforge/short-genai-stories), I'd like to highlight a few points. The `ExplicitStoryGeneratorAgent` class uses a structured and predictable approach to orchestrating the LLM. Its core logic resides within the `main()` method, outlining a clear, step-by-step workflow: ```java // 1️⃣ let's prepare the story Story story = prepareStory("a science-fiction novel"); // 2️⃣ iterate over each chapter in parallel List newChaptersWithImages = story.chapters.stream() .parallel() .map(chapter -> { // 3️⃣ prepare an impage prompt for each chapter String imagePrompt = prepareImagePromptForChapter(chapter); // 4️⃣ generate up to 4 images per chapter List imagesForChapter = generateImages(imagePrompt); // 5️⃣ judge the best image for this chapter String bestImage = pickBestImageForChapter( chapter.chapterContent, imagesForChapter); return new Story.Chapter( chapter.chapterTitle, chapter.chapterContent, bestImage); }).toList(); Story newStoryWithImages = new Story(story.title, newChaptersWithImages); // 6️⃣ save the story to Firestore saveToFirestore(newStoryWithImages); ``` Story generation depends on _structured output_: The agent uses Gemini to generate the story's title and five chapters, each with a title and content. Crucially, it leverages Java `record`s and `responseSchema` to ensure type safety and consistent outputs. You'll notice the use of `@Description` annotations to ensure the LLM really understands what each field corresponds to: ```java record Story( @Description("The title of the story") String title, @Description("The chapters of the story") List chapters) { record Chapter( @Description("The title of the chapter") String chapterTitle, @Description("The content of the chapter") String chapterContent, @Description("The Google Cloud Storage URI of the image...") String gcsURI) { } } ``` To configure the model generation to use structured outputs, here's how the schema of this output is defined: ```java var chatModel = VertexAiGeminiChatModel.builder() .project(GCP_PROJECT_ID) .location(GCP_LOCATION) .modelName(CHAT_MODEL_NAME) .temperature(1.5f) .responseSchema(Schema.newBuilder() .setType(Type.OBJECT) .putProperties("title", Schema.newBuilder() .setDescription("The title of the story") .setType(Type.STRING) .build()) .putProperties("chapters", Schema.newBuilder() .setDescription("The list of 5 chapters") .setType(Type.ARRAY) .setItems(Schema.newBuilder() .setDescription( "A chapter with a title, and its content") .setType(Type.OBJECT) .putProperties("chapterTitle", Schema.newBuilder() .setType(Type.STRING) .setDescription("The title of the chapter") .build()) .putProperties("chapterContent", Schema.newBuilder() .setType(Type.STRING) .setDescription("The content of the chapter, " + "made of 20 sentences") .build()) .addAllRequired( List.of("chapterTitle", "chapterContent")) .build()) .build()) .addAllRequired(List.of("title", "chapters")) .build()) .build(); ``` It's possible to simplify the schema creation by taking advantage of a helper class. This schema could have been simplified to: ```java // ... .responseSchema(SchemaHelper.fromClass(Story.class)) // ... ``` To instruct the LLM at each step, I tend to use system instructions for setting the role and goal for the LLM, but I use user messages to give the more variable part, like the chapter's content, or the image prompt. Here's an example: ```java Response response = chatModel.generate( SystemMessage.from(""" You are a creative fiction author, and your role is to write stories. You write a story as requested by the user. A story always has a title, and is made of 5 long chapters. Each chapter has a title, is split into paragraphs, \ and is at least 20 sentences long. """), UserMessage.from(storyType) ); ``` The `storyType` variable in the user message contains the type of story to generate, like `"a science-fiction story"`. It's currently set in stone, but you could parameterize this to generate fantasy novels, love stories, etc. The _self-reflection_ step, where the LLM judges which is the best illustration for a chapter is taking advantage of Gemini's multimodal capabilities. Indeed, Gemini receives the instruction of picking the best image out of a few, and it is given the text of the request (and the URLs of the pictures), as well as inline references to those images (ie. the Google Cloud Storage URI, pointing at the location of the pictures). Thus, this is a multimodal request, as both text and images are passed in the prompt: ```java List imagesForChapter = generateImages(imagePrompt); String bestImage = pickBestImageForChapter(chapter.chapterContent, imagesForChapter); // Inside pickBestImageForChapter we have: List messages = new ArrayList<>(); messages.add(SystemMessage.from("...prompt to select best image...")); messages.add(UserMessage.from("...chapter content...")); imagesForChapter.forEach(imageUrl -> { // Send each URL as text and as image to the model messages.add(UserMessage.from(imageUrl + "\n")); messages.add(UserMessage.from(ImageContent.from(imageUrl))); }); Response response = chatModel.generate(messages); // ... parse best image from response ``` ## Building the application The project employs a standard Java development workflow using Maven for dependency management and building: * **Dependencies**: The `pom.xml` file defines the project's dependencies, including LangChain4j (for LLM orchestration), the Google Cloud Firestore library (for data persistence), and Google Cloud's Gemini and Imagen libraries. * **Packaging**: The Maven build process packages the application into a JAR, and its dependencies by its side. I followed the approach explained in that [article](https://medium.com/holisticon-consultants/dont-build-fat-jars-for-docker-applications-6252a5571248): to build a JAR with its dependencies on the side, instead of a _shaded / fat_ JAR. One benefit I see is that the dependencies are one container layer, while the application itself is another, so it should make Docker building layer faster, as the dependencies don't change often, and that dependency layer would be cached. * **Containerization (Docker)**: A `Dockerfile` is used to containerize the application. The container image includes the executable JAR and dependencies, as well as the Java runtime environment. I used Azul's [Zulu distroless Java 21](https://hub.docker.com/layers/azul/zulu-openjdk-distroless/21-latest/images/sha256-986ca559b15738378f4a67a563c117159136c5eff705b91ba68f0fe8883246ef?context=explore) base image. The container is finally built thanks to Cloud Build. ## Deployment and automation To automate story generation and deployment, the project leverages several Google Cloud services: * **Cloud Build**: Cloud Build automates the process of building the Docker container image. The provided `justfile` contains commands and recipes to build and submit the container image (I covered `just` in a [previous article](https://glaforge.dev/posts/2023/06/07/just-a-handy-command-line-tool/), a nifty little tool to parameterize and run common commands for the project). I simply followed the tutorial in the Cloud Build documentation to [submit a build via the CLI](https://cloud.google.com/build/docs/running-builds/submit-build-via-cli-api) (the `gcloud` CLI SDK), after having done some IAM setup as explained [here](https://cloud.google.com/build/docs/build-push-docker-image) to be able to push the built image in [Artifact Registry](https://cloud.google.com/artifact-registry/docs). * **Cloud Run jobs**: The application runs as a Cloud Run job. Contrary to Cloud Run services, where incoming HTTP requests trigger the service, here, jobs are triggered and run to completion. The Cloud Run job allows for serverless execution of the story generation agent. I followed this guide to [create jobs](https://cloud.google.com/run/docs/create-jobs#gcloud). Don't forget to set up the [required environment variables](https://cloud.google.com/run/docs/configuring/jobs/environment-variables). * **Cloud Scheduler**: Cloud Scheduler triggers the Cloud Run job every day at midnight UTC. This automation ensures that a new story is generated and published daily. To configure this, this page explains how to set [scheduled triggers](https://cloud.google.com/run/docs/execute/jobs-on-schedule#command-line). * **Firebase Hosting**: Firebase Hosting serves the static assets of the website (HTML, CSS, JavaScript) that displays the stories. Firebase also provides easy access to the Firestore database where the stories are stored, at the last stage of our agentic workflow. ## Further possible improvements I'm not in the business of selling novels, so I won't really spend much more time improving this application. However, I noticed a few areas where this project could be improved. ### More creativity When reading the short stories, you'll notice a certain lack of creativity. Somehow, the stories often happen around the years 2340, the action takes places on Xylos, and some characters appear very frequently, like Aris Thorne. Similarly, some words or concepts appear all the time, like the words _echoes_, _obsidian_, _crimson_, etc. Maybe the model has seen such novels, with such personas, locations, time period, in its training. I've seen online some people getting the [same kind of stories](https://www.semanticpen.com/tools/ai-generated-superhero-origin-story-generator/female-superhero-origin-story--e894532b-b775-4fc8-82d4-e0afcb31afad), and even a [book](https://www.amazon.com/Veins-Starlight-Forbidden-Astral-Weavers-ebook/dp/B0DPKY5C4R) with the same characters or location. I think it'd be interesting to explore how to make the stories more diverse and varied. For example by adding more steps in the workflow to work on character creation, on different narrative arcs, on environment definitions. For science-ficiton only, there are tons of [types of sci-fi stories](https://www.rachelagreco.com/30-types-of-science-fiction-every-sci-fi-lover-should-know/). My friend, Philippe Charrière, worked on [how to generate random RPG character names with LLMs](https://k33g.hashnode.dev/how-to-generate-random-rpg-character-names-with-an-llm). He shared plenty of ideas on how to guide LLMs to get more creative with personas. ### Character definition for illustration consistency Speaking of character creation, if you look at the illustrations, you'll see that the characters often don't have the same style or appearance. Indeed, I don't give Imagen the whole context of the story when I let Gemini create the image prompts. A possible area of improvement could be to work on proper character definitions (face characteristics, attire, etc.), and ensure that the information is passed through to Imagen. The same would apply for the setting, like the planet, the spaceship details, and more. ### Chapter legibility Each story is split into 5 chapters, of about 20 sentences or so. I tried to make Gemini to generate paragraphs, to improve legibility. However, in spite of a bit of time spent on tweaking the prompts, I failed to coerce it to create paragraphs to delineate the key sections of the chapters. When prompting can't solve this, an extra LLM call loop can take the chapter's content and make it more readable. ## Conclusion The key take away of this experimetnation, is that **when you can describe your AI agent's plan of action with an explicit and predictable workflow, you should definitely follow that route, and avoid giving the LLM the freedom to handle the planning alone**. LLM autonomous planning works much better in more unpredictable cases, where steps can't be foreseen. Be sure to use the right approach! Again, Gemini and Imagen were up to the task for this new project and gave great stories and illustrations, even if the creativity could be improved. And I'll keep using [LangChain4j](https://docs.langchain4j.dev/) as my Swiss-army knife for all my Generative AI projects, as it works reliably, and offers rich capabilities. Knowing that I would build a _workflow_, I also explored the use of my beloved [Google Cloud Workflows](https://cloud.google.com/workflows) which I've [written a lot about](https://glaforge.dev/tags/workflows/). I'll likely write another (shorter) article where I'll show how to create such GenAI workflows with it, stay tuned. This project was also a good opportunity for me to use Cloud Run jobs. I love Cloud Run for all my serverless, auto-scaled, fully-managed, HTTP services, but I hadn't used a Cloud Run _job_ so far. For such _batch_ kind of tasks, this is the right tool for the _job_ (pun intended)! There's also [Google Cloud Batch](https://cloud.google.com/batch) but it's more for heavier computation kind of workloads. So what's next? Checkout the website to [read a short story every day](https://short-ai-story.web.app/), and explore the [code base](https://github.com/glaforge/short-genai-stories) to better understand how stories are baked. If you want to replicate this application, and haven't yet tried Google Cloud, feel free to use the [$300 of credits for new users](https://cloud.google.com/free). # Analyzing trends and topics from Bluesky's Firehose with generative AI First article of the year, so let me start by wishing you all, my dear readers, a very happy new year! And what is the subject of this new piece of content? For a while, I've been interested in analyzing trends and topics in social media streams. I recently joined Bluesky (you can follow me at [@glaforge.dev](https://bsky.app/profile/glaforge.dev)), and contrarily to X, it's possible to access its Firehose (the stream of all the messages sent by its users) pretty easily, and even for free. So let's see what we can learn from the firehose! Without further ado, here's the end goal! ![](/img/bluesky/bluesky-visualization.png) ## Bluesky's Firehose — a stream of social messages The underlying protocol used by Bluesky is the [AT Protocol](https://atproto.com/). There's an API to access Bluesky's streams via this protocol, but it's a bit cumbersome to use. In order to reduce the quantity of data sent via the AT protocol over its "relay" network, the Bluesky team introduced [JetStream](https://docs.bsky.app/blog/jetstream), to relay all the messages as well, via WebSockets, in JSON format, for a fraction of the size of the AT protocol payloads. You can also read about how they [shrinked the payloads by 99%](https://jazco.dev/2024/09/24/jetstream/)! The [JetStream Github repository](https://github.com/bluesky-social/jetstream) shares the endpoints you can use to access the firehose, and gives some details about the various types of payloads (new messages, likes, shares, etc.) It also mentioned a nice little tool called [websocat](https://github.com/vi/websocat) which is a command line tool to connect to WebSockets — very handy to analyze the payloads. To better understand the JSON message formats, I used websocat, as well as Simon Willison's client-side [online tool](https://simonwillison.net/2024/Nov/20/bluesky-websocket-firehose/) to access the JetStream, and see the flows of messages. ## A bird's eye view of the project Before diving into the code, and showing how to fetch the Bluesky posts, I'd like to give you a high level overview of what we're going to implement. ![](/img/bluesky/bluesky-jetstream-trends.png) > I used [Napkin.ai](https://www.napkin.ai/) to generate this diagram! > Go check out this tool. You can paste your article, and for a given paragraph, it can suggest nice diagrams to represent them. The key steps towards a topic visualization of the stream are: 1. **Data acquisition** — The first step is to subscribe to the firehose via WebSockets to gather enough data points to make it interesting to extract trends from them. 2. **Embedding messages** — In order to compare users' posts, the text of the posts should be transformed into vector embeddings via an embedding model, which represents posts into a multidimensional space in which distances can be calculated (text whose vector is closer to another one is usually semantically similar). 3. **Clustering messages** — Now that we have all the vector embeddings, a clustering algorithm is used to create groups of messages that are close to each other in vector space, and form a cluster of data points on the same topic. 4. **Generating a summary of clusters' messages** — The clustering algorithm grouped messages into different clusters. However, at that point, we don't know what all those grouped messages are about. That's where a generative AI model is called to make sense of those messages, to get a short description of them. 5. **Preparing the data for visualization** — Armed with the clusters of posts and their descriptions, the data for the visualization is prepared. 6. **Visualizing the trends** — The last step is to visualize those clusters of messages with a nice visualization. For that purpose, I decided to present the groups of messages as bubbles (the more posts in a bubble, bigger the bubble is). ## Let's get coding! In the article, I'll show only key snippets, sometimes simplifying the code a little bit, but you'll be able to checkout all the code in this [Github repository](https://github.com/glaforge/bluesky-topic-analysis). As usual, the code will be in Java, and I'm going to use my favorite Generative AI framework: [LangChain4j](https://docs.langchain4j.dev/). For the large language model, my choice went for [Gemini](https://deepmind.google/technologies/gemini/), and for the embedding model, I'm calculating vectors thanks to Google Cloud Vertex AI [embedding models](https://cloud.google.com/vertex-ai/generative-ai/docs/embeddings/get-text-embeddings). Clusters of messages will be created with the [Apache Commons Math](https://commons.apache.org/proper/commons-math/userguide/ml.html) library. The visualization will be implemented in JavaScript with the [D3.js](https://d3js.org/) library. ### Acquiring Bluesky messages via WebSocket Let's kick off the project by establishing a real-time connection to the Bluesky firehose using WebSockets, thanks to JDK 11's HTTP client. This allows us to receive a constant stream of public posts as they happen. The `liveMessages()` method manages the WebSocket connection and filters incoming messages based on language: ```java httpClient.newWebSocketBuilder().buildAsync( URI.create(JETSTREAM_WS_ENDPOINT), new WebSocket.Listener() { @Override public CompletionStage onText(WebSocket webSocket, CharSequence data, boolean last) { // ... process incoming message ... } }); ``` The `onText()` method within the `WebSocket.Listener` is our gateway to the firehose. Each incoming message, received as a JSON string, needs to be parsed into a usable Java object. Here's where Google's `Gson` library and Java `record`s come into play. We've defined a set of nested Java records that mirror the Bluesky message structure: ```java record Message(Commit commit, String did) { record Commit(Record record, String cid) { record Record(String text, List langs, Date createdAt) {} } } ``` These records give us a strongly typed way to access message data. The Message record holds the actual post content (`text`), a list of languages (`langs`), and the creation timestamp (`createdAt`), nested within `Commit` and `Record` records. We use Gson to deserialize the JSON strings into these records: ```java Message message = GSON.fromJson(String.valueOf(text), Message.class); ``` ### Calculating vector embeddings for all the messages To analyze the semantic similarity between posts, we convert each post's text into a numerical vector representation, or embedding. This is achieved using a Vertex AI embedding model, via LangChain4j's [Vertex AI module](https://docs.langchain4j.dev/integrations/embedding-models/google-vertex-ai/): ```java EmbeddingModel embeddingModel = VertexAiEmbeddingModel.builder() .project(System.getenv("GCP_PROJECT_ID")) .location(System.getenv("GCP_LOCATION")) .endpoint(System.getenv("GCP_VERTEXAI_ENDPOINT")) .modelName("text-embedding-005") .publisher("google") .build(); ``` We're using `text-embedding-005` which is a good embedding model and understands multiple spoken languages (which is important for analyzing posts coming from a hundred different spoken languages or so). As embedding all messages takes a while, we're batching the calculation in parallel: ```java List allSegments = allMessages.stream() .map(message -> TextSegment.from(message.commit().record().text())) .toList(); List allEmbeddings = IntStream.range(0, numberOfParallelBatches) .parallel() .mapToObj(i -> embeddingModel.embedAll(allSegments...) .flatMap(List::stream) .toList(); ``` ### Creating clusters of posts With embeddings in hand, we can now group similar posts together using the [DBSCAN](https://en.wikipedia.org/wiki/DBSCAN) clustering algorithm (Density-based spatial clustering of applications with noise) : ```java var clusters = new DBSCANClusterer( MAXIMUM_NEIGHBORHOOD_RADIUS, MINIMUM_POINTS_PER_CLUSTER) .cluster(clusterableEmbeddedMessages); ``` For 10k posts, using a minimum of 10 points per cluster sounds about right. As a rule of thumb, I got good visualizations with one cluster point per 1k messages (ie. 10 points per cluser for 10k messages, 20 points per cluster for 20k messages). The maximum neighborhood radius at 0.5 also looked like a good value. I tried smaller and bigger values, but either the cluster are too specific and narrow with low values, or too broad and generalist with higher values. It's important to check for yourself the _hyperparameters_ of the algorithms you chose for your use case. Some values might be better than others, and they are very much use-case dependant. There's no magic numbers, you have to experiment to find the right mix for you! Using a different embedding model (like `text-multilingual-embedding-002`), reducing the dimensionality to 128 dimensions, I had to use a max neighborhood radius of 0.2 instead, to get a good number of clusters. ### Generating a description for clusters of messages At this point, we have topic clusters. But they're just bags of numbers without a real meaning for us, human beings. What we need is a way to make sense of those clusters, to know what topic they cover. We configure the Vertex AI Gemini model, thanks to LangChain4j's [Gemini module](https://docs.langchain4j.dev/integrations/language-models/google-vertex-ai-gemini), with a max number of tokens, to avoid situations where a topic description is too long: ```java ChatLanguageModel chatModel = VertexAiGeminiChatModel.builder() .project(System.getenv("GCP_PROJECT_ID")) .location(System.getenv("GCP_LOCATION")) // .modelName("gemini-2.0-flash-exp") .modelName("gemini-1.5-flash-002") .maxOutputTokens(25) .build(); ``` You can use both the latest Gemini 1.5 Flash, or the new 2.0 Flash experimental model. If you're hitting quota limits, as 2.0 is currently only in preview, 1.5 will give great results too. To make the clusters more understandable, we call Gemini to generate a concise summary for each cluster, passing all the messages contained in that cluster: ```java Response modelResponse = chatModel.generate( SystemMessage.from(""" Summarize the following list of social media messages in one simple description. Don't give a full sentence saying the social messages are about a topic, just give the topic directly in 10 words or less, without mentioning the messages are social media posts or reactions. """), UserMessage.from(appendedMessages) ); ``` When I was running this code on January 1st, I was seeing topics like `New Year's greetings and well wishes` or `Happy New Year 2025 wishes and hopeful sentiments for the year`. But some categories of topics often come back, like a big cluster of emojis expressing various expressions, or people sharing video links on YouTube, or pictures from Instagram. I also saw some interesting trends as they came up, like weather alerts for snow storms, or someone famous receiving congratulations for announcing some anniversary. There are also repeated posts tagging people to request funding for some cause. Funnily, in the morning, I was often seeing people sharing in how many steps they solved the [Wordle](https://www.nytimes.com/games/wordle/index.html) word puzzle! I filtered the messages to analyze only English messages for the purpose of this demo, but there are a bunch of users setting their language as English, but posting in another language. However it's not really a problem for Gemini which happily handles more than a hundred spoken languages. ### Preparing the data for visualization The cluster summaries and their sizes (number of posts) are then formatted as JSON data, for ingestion by D3.js: ```javascript const data = { name: "Bluesky topic clusters", children: [ {name: "Summary of Cluster 1", value: 396}, // ... other clusters ] }; ``` This JSON structure is ideal for consumption by D3.js, which we'll use for visualization. The `FirehoseConsumer` class writes this JSON data to the `newdata.js` file, which is integrated in the static web assets and loaded by D3. ### Visualizing the data with D3.js Finally, the `visualisation.js` script uses D3.js to create an interactive bubble chart. Each bubble represents a cluster, with its surface area corresponding to the number of posts in that cluster. The color of the circles is also dynamically generated: ```java const colorScale = d3.scaleQuantize() .domain([0, maxValue]) .range(colorPalette); //.. later, in the circle .attr("fill", d => d.children == null ? colorScale(d.r) : "#fefef0") ``` What's more interesting in this part of the project is how the visualization is created. I was inspired by the circle packing visualization seen in this [article](https://observablehq.com/@d3/pack/2), which uses D3.js's [circle packing layout method](https://d3js.org/d3-hierarchy/pack). I borrowed heavily from this example, and tweaked it for my needs, and to my liking. ```java const pack = d3.pack() .size([width - margin * 2, height - margin * 2]) .padding(4); const root = pack(d3.hierarchy(data) .sum(d => d.value) .sort((a, b) => b.value - a.value)); ``` The tricky part, as well, was how to render and layout the text of the topics, along with the number of posts per cluster, inside each circle. I got it working by appending a custom div, as a _foreign object_ in the SVG document, and by tweaking the positioning: ```javascript node.filter(d => !d.children) .append("foreignObject") .attr("x", d => -0.8 * d.r) // center horizontally .attr("y", d => -1.1*d.r) // center vertically, manually adjusted .attr("width", d => 1.6 * d.r) .attr("height", d => 2 * d.r) .append("xhtml:div") .classed("foreignDiv", true) .style("font-size", d => d.r / 5.3 + "px") // dynamic font sizing .html(d => "" + format(d.value) + "" + d.data.name + "
" ); ``` Lots of hard-coded values to make it look nice! To put everything together: an HTML file imports D3.js, our `newdata.js` file containing the cluster definitions, the `visualization.js` file creates the bubble chart, plus some CSS in `styles.css`. And when running the Java class, the `newdata.js` is generated and updated in the `static` web asset folder. ## Experiments, and what else to explore ### No live demo available Interesting topic visualizations happen when you have collected enough messages to analyze. Gathering about 10 thousand posts seemed to offer good results, but in spite of the 10+ million users on Bluesky, you still need about 4 or 5 minutes to store that many messages. Without mentioning the time it takes to calculate the embeddings (about 30 seconds in parallel), and the clustering algorithm (about 1 minute and a half with a runtime complexity of `n*log(n)`). So this is not ideal for a _real-time_ analysis of the current trending topics. That's why I haven't posted a demo application online, as it's too slow to wait for the result to appear on screen. What might be interesting to explore is somehow a live updating view that would be re-calculated every couple of minutes or so, over a sliding window of messages, but the clustering duration is still a problem. However, it's also something that could quickly become costly, considering the number of embedding calculations and generative summaries to generate each time. ### Different embedding models Before parallelizing / batching the vector embedding calculations (which still take half a minute), I also tried a non-cloud hosted embedding model, like a quantized version of the [all-MiniLM-L6-v2](https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2) embedding model, which can run locally without a big GPU. I used it in some other projects with success, but for this clustering exercise, I found the result of poor quality, as if it wasn't knowledgeable enough to discern different topics. I paid attention to restricting the messages to only English messages, as I knew that that small model was more at ease with English, but that didn't really help. Ideally, I'd like to find a fast embedding model with good classification capabilities. But read on, for another idea on speeding up the clustering part of the equation. ### Different clustering algorithms DBSCAN isn't super fast, with a `n*log(n)` runtime complexity. Apache Commons Math also offers a [KMeans++](https://en.wikipedia.org/wiki/K-means%2B%2B) implementation that is faster (with a more linear runtime) but the `k` hyperparameter to specify is always giving a fixed number of clusters. One one hand, it's nice to have a more predictable visualization (neither too few, nor too many bubbles with small text to display), on the other hand, the fact the number of clusters si set in stone, leads the clusters to be too generic and too broad, and there's always one cluster that contains everything that couldn't be clustered in meaningful groups. In spite of its runtime complexity, I like DBSCAN for the fact it creates quite diverse but acurate clusters, as it figures itself how many clusters to create, depending on the various topics it'll come across. There's another library that I'd like to try some day, that's [Smile](https://haifengl.github.io/clustering.html). It supports even more clustering algorithms than Apache Commons Math. Something interesting going on for Smile is also its [dimensionality reduction algorithms](https://haifengl.github.io/manifold.html#t-sne) (that they call _manifold learning_) like [t-SNE](https://en.wikipedia.org/wiki/T-distributed_stochastic_neighbor_embedding) and [UMAP](https://en.wikipedia.org/wiki/Nonlinear_dimensionality_reduction#Uniform_manifold_approximation_and_projection). Why am I mentioning dimensionality reduction? For one, it's super handy for visualizing the clusters in 2D or 3D. But another idea I wanted to try was that if the reduction is fast enough, maybe applying the clustering algorithm on lower-dimensioned data would be much faster. The _projection_ (reducing the dimensionality) before clustering approach is also the one this [project from HuggingFace](https://github.com/huggingface/text-clustering) followed to cluster the [Cosmopedia](https://huggingface.co/datasets/HuggingFaceTB/cosmopedia) dataset. Indeed, Vertex AI embeddings generate vectors of 768 dimensions. That said, some of the Vertex AI embeddings are [_Matryoshka_ embeddings](https://huggingface.co/blog/matryoshka), so we could also calculate clusters on truncated vectors, without losing too much accuracy, without even doing dimenstionality reduction! Both `text-embedding-005` and `text-multilingual-embedding-002` support reducing the vector dimension, so it's worth trying. You just need to set `outputDimensionality(128)` on the embedding model builder to reduce the dimensions down to 128. Then the clustering time can be go down to 15 seconds instead of 80 seconds like with full 768-dimension vectors. ### What else to try? * In this experiment, I analyzed text, but users post hashtags, pictures, links, on their profiles. It might be interesting to look at what is trending in terms of hashtags, or analyze the sentiment of messages related to such a hashtag. * Looking at links, maybe it'd be interesting to also see what is shared, which news article is more popular... * Regarding pictures, we could perhaps see which animals are more trendy? And do some fun analysis of favorite animals in different countries... * Another interesting analysis could be to cluster user profiles, to find users posting on the same topics. * I'd like to think more about how to make this application more lively, and make users explore indvidual posts contained in each clusters. Many more things to try out and explore! ## Summary The generated visualization offers an intuitive and engaging way to explore the **trending topics on Bluesky**. And **generative AI** tools like **Gemini** and **Vertex AI** are here to help creating such data explorations. This project combines the power of **real-time data streaming**, **AI-driven analysis**, and (not-yet-interactive) **visualization** to provide a valuable tool for understanding the ever-evolving conversations on Bluesky. It sets the stage for more sophisticated analysis, such as tracking topic evolution over time, sentiment analysis within clusters, and identification of key influencers within specific discussions. As always, this project also confirmed that **Java** and **[LangChain4j](https://docs.langchain4j.dev/)** are my two best buddies to explore topics with generative AI approaches (no need for Python!) And I was happy to use [D3.js](https://d3js.org/) again for visualization purposes. It's not easy to master, but it's a super powerful library! I'm also glad that [Gemini Code Assist](https://cloud.google.com/products/gemini/code-assist?e=0&hl=en) helped me work with D3.js, to develop and enhance the visualization. Finally, of course, the **Gemini chat model** and **Vertex AI embedding model** were perfect for the task, giving high quality embedding vectors, and clear synthetic summaries of social media posts. Don't hesitate to [check out the code](https://github.com/glaforge/bluesky-topic-analysis) and play with this project! # Let's think with Gemini Flash 2.0's experimental thinking mode and LangChain4j Yesterday, Google released yet another cool Gemini model update, with **Gemini 2.0 Flash [thinking mode](https://ai.google.dev/gemini-api/docs/thinking-mode)**. Integrating natively and transparently some [chain of thought](https://www.promptingguide.ai/techniques/cot) techniques, the model is able to take some more thinking time, and automatically decomposes a complex task into smaller steps, and explores various paths in its thinking process. Thanks to this approach, Gemini 2.0 Flash is able to solve more complex problems than Gemini 1.5 Pro or the recent Gemini 2.0 Flash experiment. **And of course, you can already use Gemini 2.0 Flash Thinking with [LangChain4j](https://docs.langchain4j.dev/)!** So let's _delve_ into some examples together! ## Solving a riddle I found some [riddles online](https://parade.com/947956/parade/riddles/) and thought I would try one with both the new Gemini 2.0 Flash model, and the new thinking flavor as well, in order to contrast their way of working, and the outcome. ```java var model = GoogleAiGeminiChatModel.builder() .apiKey(System.getenv("GOOGLE_AI_GEMINI_API_KEY")) .modelName("gemini-2.0-flash-exp") .build(); String response = model.generate(""" The day before yesterday I was 21, and next year I will be 24. When is my birthday? """); ``` Gemini replies with: ``` This is a classic riddle! Here's how to solve it: * **"The day before yesterday I was 21"** means yesterday you were 22. * **"Next year I will be 24"** means you are currently 23. * If you were 22 yesterday and 23 today, your birthday must be **today**. Therefore, your birthday is **today**. ``` The answer is good, it's today, but it doesn't really tell exactly which date is the birthday. Now let's update the model name to use the new thinking mode: ```java var model = GoogleAiGeminiChatModel.builder() .apiKey(System.getenv("GOOGLE_AI_GEMINI_API_KEY")) .modelName("gemini-2.0-flash-thinking-exp-1219") .build(); String response = model.generate(""" The day before yesterday I was 21, and next year I will be 24. When is my birthday? """); ``` Gemini advanced thinking process shows up: ``` Here's a thinking process to solve this riddle: 1. **Analyze the core statements:** * "The day before yesterday I was 21." * "Next year I will be 24." 2. **Break down the time references:** Identify the different points in time being referenced. * "The day before yesterday" refers to a past date. * "Yesterday" refers to the day after "the day before yesterday". * "Today" is the present day. * "Tomorrow" is the day after today. * "Next year" refers to the year after the current year. 3. **Assign ages to the time references:** Based on the statements, assign ages to the person at these different time points. * The day before yesterday: 21 years old. * Yesterday: Since the person's age changes by one each birthday, yesterday they were either still 21 or had just turned 22. * Today: Continuing the progression, today they are either 22 or 23. * Next year: 24 years old. 4. **Focus on the age jump:** The key is the jump in age from "today" to "next year." To go from being 22 or 23 today to being 24 next year *definitively* means their birthday must fall *between* now and next year. 5. **Work backward from "next year":** If they will be 24 next year, they must turn 24 sometime within that year. This means they are currently 23 *or* they are still 23 and about to turn 24 soon. 6. **Work forward from "the day before yesterday":** If they were 21 the day before yesterday, then yesterday they turned 22. This means today they are 22. 7. **Reconcile the information:** We have two possibilities for the current age: 22 (from the first statement) and potentially 23 (considering the jump to 24 next year). The only way both statements can be true is if the birthday is *very* soon. 8. **Consider the implications of the age progression:** * If today they are 22, and next year they will be 24, the birthday must be *before* the end of the current year. * Since they were 21 the day before yesterday, they must have turned 22 *yesterday*. 9. **Determine the date:** If they turned 22 yesterday, and the statement refers to "next year I will be 24," this can only happen if today is the *last day* of the year, and their birthday is *today*. 10. **Verify the solution:** Let's check if this works: * If today is December 31st: * The day before yesterday (December 29th) they were 21. * Yesterday (December 30th) they turned 22. * Today (December 31st) they are 22. * Next year (after December 31st), they will turn 23 and then 24. This works! 11. **Formalize the answer:** The birthday is on December 31st. This is a classic riddle that plays on how we perceive time. Here's how to solve it: * **"The day before yesterday I was 21"**: This means yesterday you were 22. * **"Next year I will be 24"**: This means you are currently 23. For both of these statements to be true, today must be **January 1st**. Let's break it down: * **January 1st:** You are 23. * **December 31st (yesterday):** You were 22. * **December 30th (the day before yesterday):** You were 21. * **Sometime later this year (your birthday):** You will turn 24. Therefore, your birthday is on **December 31st**. ``` Now, thanks to this advanced reasoning approach, it's able to tell me the exact date of the birthday. I've also tried the model's reasoning capabilities on coding problems (puzzles like Advent of Code ones), and it's really doing a great job at solving most of them! ## Multimodal reasoning Remember that Gemini is a **multimodal** model? That means it can accept other _modalities_ like images, in addition to text. So let's have a look at another example, with a word game from the [Reader's digest](https://www.rd.com/article/visual-brain-teasers-and-puzzles/). Only one of the rows is an anagram of a five-letter English word... ![](/img/gemini/word-puzzle.jpg) This time, it's a word game, but the information is inside an image. So we need to pass both the image, and the problem statement. ```java var model = GoogleAiGeminiChatModel.builder() .apiKey(System.getenv("GOOGLE_AI_GEMINI_API_KEY")) .modelName("gemini-2.0-flash-thinking-exp-1219") .build(); Response response = model.generate(UserMessage.from( ImageContent.from( new String(Base64.getEncoder().encode(readBytes(FILE_URI))), "image/png"), TextContent.from(""" Only one of the rows in the picture is an anagram of a five-letter English word. Do you know which one it is? """) )); ``` If I ask that puzzle to Gemini 2.0 Flash (by changing the model name used above), it'll tell me (quickly) that: ``` Certainly! The anagram in the image is **O L C R E**, which unscrambled is **COLOR**. ``` Gemini 2.0 Flash is at the very top of the LLM [leaderboard](https://lmarena.ai/?leaderboard) but it gets the answer wrong. Notice that `COLOR` is not an anagram of `OLCRE` at all. Now let's see what the _thinking_ model says: ``` The user wants me to find the row that is an anagram of a five-letter English word. I need to go through each row and see if I can rearrange the letters to form a valid five-letter word. Row 1: L E G T N - Could be GENTL but need 5 letters. Could be GLENT, but is not a common word. Row 2: O L C R E - Could be CLORE, COLER, RECOL. RECOL is the most promising one. Row 3: T C O H E - Could be ECHOT. Row 4: L U D I B - Could be BUILD. Row 5: M E U D Q - No obvious combination. Row 6: H I N W L - Could be WHILN. Row 7: W E S O L - Could be LOWES, OWELS, SOWEL, WOLES. From the analysis, RECOL seems to be a valid word and has 5 letters. Let me double check other possibilities. Row 1: L E G T N - TENGLE? No. Row 3: T C O H E - Could be CHETO, but not a common word. Row 4: L U D I B - Could be BLUID, but not a common word. Row 6: H I N W L - Could be WHILE. Out of the possible options, WHILE and RECOL are valid five-letter words. Let me re-examine the instruction "Only one of the rows in the picture is an anagram of a five-letter English word." Row 1: L E G T N - No obvious five-letter word. Row 2: O L C R E - CLORE, COLER, RECOL Row 3: T C O H E - ECHOT Row 4: L U D I B - BUILD Row 5: M E U D Q - No obvious five-letter word. Row 6: H I N W L - WHILE Row 7: W E S O L - LOWES, OWELS, SOWEL, WOLES There seems to be more than one. Let me re-check the letter counts. All rows have 5 letters. Rechecking my anagram findings: Row 2: O L C R E - Could form RECOL Row 3: T C O H E - Could form ECHOT Row 4: L U D I B - Could form BUILD Row 6: H I N W L - Could form WHILE Row 7: W E S O L - Could form LOWES Let me think again. The instruction says *only one*. I might be making a mistake in identifying a valid word. Re-examining: Row 1: L E G T N - No common five-letter word. Row 2: O L C R E - RECOL Row 3: T C O H E - ECHOT Row 4: L U D I B - BUILD Row 5: M E U D Q - No common five-letter word. Row 6: H I N W L - WHILE Row 7: W E S O L - LOWES Perhaps some of these are not valid English words. Let me verify the words. RECOL - Yes, it's a valid word. ECHOT - No, ECHO is a word, ECHOT is not. BUILD - Yes, it's a valid word. WHILE - Yes, it's a valid word. LOWES - Yes, it's a valid word. I am still finding more than one valid word. Let me go through each row very carefully again. Row 1: L E G T N - No obvious 5-letter word Row 2: O L C R E - Could be CLORE, COLER, RECOL. Row 3: T C O H E - Could be CHOTE. No such word. Row 4: L U D I B - Could be BUILD. Row 5: M E U D Q - No obvious 5-letter word. Row 6: H I N W L - Could be WHILE. Row 7: W E S O L - Could be LOWES. Let me verify the validity of each potential word. RECOL - Valid English word. BUILD - Valid English word. WHILE - Valid English word. LOWES - Valid English word (name of a store). I am still struggling to find only one. Perhaps I am overlooking something simple. Let me reread the question carefully. "Only one of the rows in the picture is an anagram of a five-letter English word." Let me try rearranging the letters systematically for each row. Row 1: L E G T N - No five letter word. Row 2: O L C R E - RECOL Row 3: T C O H E - No five letter word. Row 4: L U D I B - BUILD Row 5: M E U D Q - No five letter word. Row 6: H I N W L - WHILE Row 7: W E S O L - LOWES I am still identifying multiple valid words. Could there be a constraint I am missing? The hints are just crops, so they don't add new information. Let me consider uncommon words. Row 2: O L C R E - Could be ORCEL. Row 3: T C O H E - Row 7: W E S O L - Could be OLWES. I am still finding multiple possibilities. The question states *only one*. This implies that my initial assessment of multiple possibilities is incorrect. Let me re-examine each row with more scrutiny for common five-letter words. Row 1: L E G T N - No. Row 2: O L C R E - RECOL. Row 3: T C O H E - No. Row 4: L U D I B - BUILD. Row 5: M E U D Q - No. Row 6: H I N W L - WHILE. Row 7: W E S O L - LOWES. I have identified four potential candidates. There must be something I am not considering. Is there a specific type of five-letter word? No constraint mentioned. Let me check my understanding of anagram. An anagram is a rearrangement of letters to form a new word. Perhaps some of the words I identified are not standard English words. RECOL - Yes, it is. BUILD - Yes, it is. WHILE - Yes, it is. LOWES - Yes, it is. I am still at a loss. The instruction is very clear: "Only one of the rows... is an anagram of a five-letter English word." Let me try a different approach. Assume only one is correct and try to disprove the others. If Row 2 (O L C R E - RECOL) is the answer, then none of the others should form a valid five-letter word. Row 4 (L U D I B) - BUILD is a valid word. This contradicts the "only one" rule. If Row 4 (L U D I B - BUILD) is the answer, then none of the others should form a valid five-letter word. Row 2 (O L C R E) - RECOL is a valid word. This contradicts the "only one" rule. If Row 6 (H I N W L - WHILE) is the answer, then none of the others should form a valid five-letter word. Row 2 (O L C R E) - RECOL is a valid word. This contradicts the "only one" rule. If Row 7 (W E S O L - LOWES) is the answer, then none of the others should form a valid five-letter word. Row 2 (O L C R E) - RECOL is a valid word. This contradicts the "only one" rule. There must be something very obvious I am missing. Let me read the rows aloud and see if something jumps out. L E G T N O L C R E T C O H E L U D I B M E U D Q H I N W L W E S O L Is it possible that some of the letter combinations seem like a word but aren't actually? Let's focus on making sure each identified word is definitively correct. RECOL - Define: To collect again. Yes, it's a valid word. BUILD - Define: To construct. Yes, it's a valid word. WHILE - Define: A period of time. Yes, it's a valid word. LOWES - While primarily known as a store name, it can be a plural noun. However, the question is about anagrams of five-letter words. Let's go back to each row and try to form words methodically. Row 1: L E G T N - Tingle, Longe, Glent. None are common five-letter words. Row 2: O L C R E - Clore, Coler, Lorec, Recol. RECOL is a valid word. Row 3: T C O H E - Chote, Cothe, EchoT. None are common five-letter words. Row 4: L U D I B - Bluid, Build. BUILD is a valid word. Row 5: M E U D Q - No valid word. Row 6: H I N W L - Whiln, While. WHILE is a valid word. Row 7: W E S O L - Lowes, Owels, Sowle, Woles. LOWES is a valid word. The problem states *only one*. This is the key. There must be something about one of these words that makes it the correct answer over the others. Could it be about the commonality of the word? While all are valid, some are more frequently used. RECOL - Less common. BUILD - Very common. WHILE - Very common. LOWES - Common (as a store name). Could there be a nuance in the definition or usage? Let me consider if the capitalization or the "RD" at the bottom has any relevance (unlikely for an anagram puzzle). The crops are there to help see better, so the visual aspect is important. Let me look at the letters themselves and see if there's a pattern or something I've missed. Maybe one of the identified words isn't as standard as the others in a pure linguistic sense. Consider "LOWES". It's primarily a proper noun. Is the puzzle looking for common nouns or verbs? The question says "a five-letter English word," which is quite broad. Let me revisit the less common words I dismissed earlier. Row 1: L E G T N - GLENT (a small secluded valley). Row 3: T C O H E - None. Row 6: H I N W L - WHILN (the pod or husk of seeds). If GLENT is considered, we have another possibility. If WHILN is considered, another possibility. The constraint "only one" is crucial. Let's assume, for a moment, that the answer is indeed one of the ones I identified and I need to find a reason to exclude the others. Could the structure of the letter arrangement provide a clue? No obvious pattern. Let me focus on the most common five-letter words from my list: BUILD and WHILE. If one of these is the answer, why isn't the other? Perhaps there's a very simple word I'm overlooking in one of the rows. Row 3: T C O H E - THROE. THROE is a valid five-letter word. Now my list is: RECOL, BUILD, WHILE, LOWES, THROE. Still more than one. I am stuck on the "only one" constraint. What could make one row uniquely solvable? Could it be about pronunciation or homophones? Unlikely for an anagram. Let me go back to the instructions and the image. There are no additional instructions. Is it possible there's a very obscure five-letter English word? Let's review each row and try all possible combinations. This will be tedious, but the "only one" constraint suggests a unique solution. Row 1: L E G T N - ELTNG, ELNTG, ETLNG, ETNLG, ENLTG, ENTLG... This is not efficient. Let me rely on my vocabulary for common words first. Consider the possibility that one of the words I identified isn't a standard English word in the context of this puzzle. Could "LOWES" as a store name not count? But the letters can form the word. Let's try working backward from the assumption that only one row is correct. If I pick a row, can I definitively say the others don't form any five-letter words? If Row 2 (O L C R E) is the answer (RECOL), then the others cannot form a valid five-letter word. But Row 4 (BUILD), Row 6 (WHILE), and Row 7 (LOWES) do. There must be a constraint I am not understanding or a word I am misinterpreting. Could it be about the frequency of letter usage in the English language? Unlikely. Let me consider common knowledge or general vocabulary. Perhaps one of the letter sets can *only* form one five-letter word, while others can form multiple. Row 2: O L C R E - RECOL, CLORE, COLER. Row 4: L U D I B - BUILD. Only one common word. Row 6: H I N W L - WHILE. Only one common word. Row 7: W E S O L - LOWES, OWELS, SOWLE, WOLES. This is a potential lead. Row 4 and Row 6 seem to have a single, very common five-letter word. Let me double-check Row 4 (L U D I B). Are there other common five-letter words? BLUID is not common. Let me double-check Row 6 (H I N W L). Are there other common five-letter words? WHILN is less common. The wording "anagram of a five-letter English word" is precise. Could it be that some of the words I found are plurals or other forms that don't fit the strict definition? No, RECOL, BUILD, WHILE are base forms. Let me consider the visual arrangement again. Is there any hidden message or pattern? The lines separate the rows. The "RD" in the corner is likely just a watermark. The color is uniform. Back to the anagrams. The "only one" is the key. If I had to pick one, which row is *unambiguously* a single five-letter English word? Row 4: L U D I B - BUILD. It's hard to make another common five-letter word from these letters. Let's confirm this. Consider permutations. BLUID is the only other plausible one, and it's not common. Could the answer be BUILD based on this uniqueness? Let me test this hypothesis against the other rows. Row 2: O L C R E - RECOL, CLORE, COLER. Multiple options. Row 6: H I N W L - WHILE. WHILN is another possibility, though less common. Row 7: W E S O L - LOWES, OWELS, etc. Multiple options. This line of reasoning seems promising. The "only one" could refer to the row that forms *only one* common five-letter English word. If this is the case, Row 4 (L U D I B - BUILD) fits this criterion best. Solution: The row that is an anagram of a five-letter English word is **LUDIB**, which can be rearranged to form the word **BUILD**. While some other rows can form five-letter words (such as RECOL from OLCRE, WHILE from HINWL, and LOWES from WESOL), the question states that "only one of the rows... is an anagram of a five-letter English word." Considering the commonality and unambiguous nature of the resulting word, "BUILD" from "LUDIB" stands out as the most likely single correct answer. Other rows could potentially form multiple valid five-letter words, making "LUDIB" unique in its most obvious and common anagram. ``` That's a pretty long answer, but it's correct! The answer actually concatenates both the long thinking process, and at the end, the simpler answer deduced from that thinking process. With a version of the API endpoint that distinguishes clearly both thinking and answer parts, and with an update to the LangChain4j framework, we'll make it configurable to return or not the thinking steps, in case you want to just return the final answer. ## Thinking about it... This is very interesting to see the thinking process the model follows when reasoning about the problem. Compared to some competing model which hides its thoughts, Gemini shares everything it went through. And it does it faster! You would likely not use a reasoning model for each and every question you could ask an LLM, obviously. However, for solving advanced problems that require a deeper thought process, this is definitely the way to go! And of course, I'm happy that [LangChain4j](https://docs.langchain4j.dev/) lets me play with this new model out of the box! If you don't feel like coding right away in Java, you can also play with the model in [Google AI Studio](https://aistudio.google.com/app/prompts/new_chat). # Detecting objects with Gemini 2.0 and LangChain4j Hot on the heels of the [announcement of Gemini 2.0](https://blog.google/technology/google-deepmind/google-gemini-ai-update-december-2024/), I played with the new experimental model both from within [Google AI Studio](https://aistudio.google.com/app/prompts/new_chat), and with [LangChain4j](https://docs.langchain4j.dev/). Google released Gemini 2.0 Flash, with new modalities, including interleaving images, audio, text, video, both in input and output. Even a live bidirectional speech-to-speech mode, which is really exciting! When experimenting with AI Studio, what attracted my attention was AI Studio's new [starter apps](https://aistudio.google.com/starter-apps) section. There are 3 examples (including links to Github projects showing how they were implemented): - **spatial understanding** — get Gemini to recognize objects in pictures, and give you bounding boxes for those objects - **video analyzer** — to summarize, describe scenes, extract texts and objects from videos - **map explorer** — an integration with Google Maps to explore the world The first one, on detecting objects, reminded me of an old demo of mine I had developed with Gemini 1.0 Pro Vision to recognise the cards of the Skyjo card game (a fun little card game I've been playing a lot with my youngest daughter): ![](/img/gemini/skyjo-ai-studio.png) If you look at the screenshot above, you'll see some prompt suggestions to get bounding boxes around detected objects. You'll notice that the model seems pretty capable at recnogising the numbers on those cards. And with some bits of prompt engineering, it ignores cards facing down (attribute a value of 0 for those cards). In the end, you can sum up all the points, and have the current score for your cards. Back in the day, Gemini 1.0 was making quite a few mistakes when detecting and recognising the values of the cards, in particular when the cards were tilted, or upside down. But Gemini 2.0 Flash has greatly improved, and is much more capable. So I decided to see: - if LangChain4j works well with Gemini 2.0 Flash, - and if I can craft a prompt that detects my cards flawlessly. And I'm glad to report that for all the photos I had taken of my games (14 pictures), I managed to score a 100% score of recognition. Of course, LangChain4j is happy to call Gemini 2 without a problem (although we'll have to update the framework with the new modalities when a Java SDK is made available) ## Let's code! I'll skip some of the boilerplate code to iterate over all my test pictures, properly labeled with the card values. But you can have a look at this [gist](https://gist.github.com/glaforge/d6e845c673a5441823efc800d2d6bbf6) with all the code. First, let's create some Java `record`s to represent the cards, their bounding box, and number labels: ```java record Card( int label, BoundingBox boundingBox ) { record BoundingBox(int x1, int y1, int x2, int y2) {} } ``` We'll use GSON for marshalling/unmarshalling those card details. ```java var model = VertexAiGeminiChatModel.builder() .project(System.getenv("PROJECT_ID")) .location(System.getenv("LOCATION")) .modelName("gemini-2.0-flash-exp") .responseMimeType("application/json") .responseSchema(SchemaHelper.fromClass(Card[].class)) .temperature(0.1f) .build(); ``` Notice that we're using the new model: `gemini-2.0-flash-exp` (it's labeled _experimental_ for now). And also pay attention to the response MIME type, which is JSON, and the fact we're defining a response schema: We instruct Gemini to return a valid JSON object whose schema corresponds to the `record`s we've just defined. Next, let's load all the cards pictures and details (our sample dataset): ```java var cardsExamples = processImageFiles Path.of("skyjo-counter/samples")); ``` Now we can iterate over all the cards, to check that Gemini 2 recognises all of them: ```java for (CardsExample example : cardsExamples) { System.out.println("File: " + example.imageFile()); // ... } ``` Let's look at the convoluted prompt I came up with to ensure to recognise all my sample pictures: ```java Response response = model.generate( SystemMessage.from(""" Detect playing cards with numbers, with no more than 12 items. Output a JSON list of cards, where each entry contains the 2D bounding box in `boundingBox` and the `label` is the big number displayed in the center of the card. If you see the text "SKYJO" on the card, use 0 as the label in `label`. Ignore the small numbers in the corners of the cards. Ignore cards with text written on them. Be careful when reading the numbers, as sometimes some cards are tilted, cut, or upside down. """), UserMessage.from( ImageContent.from(example.imageFile().toUri()), TextContent.from(""" Detect the cards of this image. """) )); ``` We give Gemini some system instructions to pay attention to the (maximum 12) cards numbers, to return bounding boxes around the detected cards, and to give the big number at the center of the cards as the label. There are some extra instructions for cards upside down, to ignore the small numbers in the corners, or to pay attention to the fact some cards may be cut, tilted, etc. This prompt may not necessarily perfect, but at least it worked for all my pictures! Then, as user message, we pass both the picture, and the request to detect the cards in the picture. Last step, let's parse the JSON returned structure with GSON (I could have used LangChain4j's `AiServices` for a cleaner and more type-safe approach), and we're counting the points. If the sum of points isn't correct, we display the cards that have been recognised, for troubleshooting purpose. Let's check the output: ``` File: skyjo-counter/samples/-1 -1 -2 0 3 0 4.jpg ==> Your points: 3 File: skyjo-counter/samples/1 4 1 -1 3 0 0 3 3 3.jpg ==> Your points: 17 File: skyjo-counter/samples/3 9 3 4 5 2 4 5.jpg ==> Your points: 35 File: skyjo-counter/samples/3 5 2 4 5.jpg ==> Your points: 19 File: skyjo-counter/samples/-1 4.jpg ==> Your points: 3 File: skyjo-counter/samples/1 0 2.jpg ==> Your points: 3 File: skyjo-counter/samples/1 0 3 4 0 3 1 -1 2.jpg ==> Your points: 13 File: skyjo-counter/samples/4 4 1 2 0 2 1 2 3.jpg ==> Your points: 19 File: skyjo-counter/samples/0 -1 -1 -2 0 0 0 0 -1.jpg ==> Your points: -5 File: skyjo-counter/samples/4 1 -2 2 4 2 3 3 0 5.jpg ==> Your points: 22 File: skyjo-counter/samples/4 3 0 -2 -1 -1 2 1 3.jpg ==> Your points: 9 File: skyjo-counter/samples/6 1 2 6 1 3.jpg ==> Your points: 19 File: skyjo-counter/samples/3 3 5 2 5.jpg ==> Your points: 18 File: skyjo-counter/samples/1 -2 5 2 -1 8 0.jpg ==> Your points: 13 ``` The picture file names contain the values of the cards, so it was easy to check for the ground truth! And if we sum up those numbers, we should come up with the same number of points. ## Now what? Well, first of all, I'm happy that LangChain4j works with Gemini 2.0! Secondly, that the quality of object detection keeps on progressing nicely. Thirdly, I might have to update my old demo, to make it a PWA app that could run on mobile, so that I don't have to count the sum of the card numbers in my head, because I'm lazy! # Semantic code search for Programming Idioms with LangChain4j and Vertex AI embedding models _By Guillaume Laforge & Valentin Deleplace_ The [Programming Idioms](https://programming-idioms.org/coverage) community website created by [Valentin](https://www.linkedin.com/in/deleplacevalentin/) lets developers share typical implementations in various programming languages for usual tasks like printing the famous “Hello World!” message, counting the characters in a string, sorting collections, or formatting dates, to name a few. And many more: there are currently 350 idioms, covering 32 programming languages. It’s a nice way to discover how various languages implement such common tasks! The website features a typical keyword-based search feature, which is able to search through idiom descriptions, source code, comments, and tags. However, we (Guillaume & Valentin) were curious to see if we could enhance the search with a more semantic focus, taking advantage of Vertex AI **embedding models**, and their ability to **search through code from natural language queries**. With a semantic search, you’re not limited to results that match some keywords from a query, but you’ll get results even when using synonyms, or descriptions of what the code is doing. Embedding models take a string in input, and generate a multidimensional floating point vector representation of that string. What’s interesting with those vectors is that input strings whose vectors are close to each other (for instance via a [cosine similarity](https://en.wikipedia.org/wiki/Cosine_similarity) calculation) are generally close to each other semantically speaking as well. This is why you can create **semantic searches**: you can search for semantically similar strings, even if they don’t share the same keywords and use synonyms instead. You can explore Guillaume’s article “[The power of embeddings: How numbers unlock the meaning of data](https://glaforge.dev/posts/2024/07/02/the-power-of-embeddings-how-numbers-unlock-the-meaning-of-data/)” to learn more about embedding models. In the code shown in this article, we’ll be coding in Java, and we will be using the [LangChain4j](https://docs.langchain4j.dev/) open source framework. You can view the [full source code in this gist](https://gist.github.com/glaforge/4e45fa4222dd803d6d8bbf2b9335e90d), and below, we’ll highlight the key elements of this program. We’ll be using the latest version of Google Cloud Vertex AI embedding models: `text-embedding-005`. Why is it important? Because this new version supports a new [task type](https://cloud.google.com/vertex-ai/generative-ai/docs/embeddings/get-text-embeddings): `CODE_RETRIEVAL_QUERY`. With this embedding model, there are different task types that optimize the embedding of text for different purposes, like for document retrieval, question & answering, fact verification… and now for code retrieval queries. **With this code retrieval query task type, you can search for code snippets with natural language**! This will come in handy for us when we want to do natural language semantic search throughout our programming idioms! Before implementing our smart code search, if you want to learn more about the new task types of our embedding models, please go check this video: _New "task type" embedding from the DeepMind team improves RAG search quality_ {{< youtube BgfSCTdlvAA >}} ## Let’s collect the idioms The [Programming Idioms](https://programming-idioms.org/) website exposes a simple REST API. An endpoint allows you to get all the idioms in one HTTP GET call, but you can also access individual idioms via another GET request: - https://programming-idioms.org/api/idioms/all — Lists all the idioms - https://programming-idioms.org/api/idiom/202 — A single idiom identified by its ID Idioms contain various fields like their title, description, keywords, and provide one or more implementations in various programming languages. For example, the “Sum of squares” idiom starts like this: ```json { Id: 202, OrigId: 0, Title: "Sum of squares", LeadParagraph: "Calculate the sum of squares _s of _data, an array of floating point values.", ExtraKeywords: "reduce", Author: "Bart", CreationDate: "2019-09-28T20:37:11.726064Z", LastEditor: "programming-idioms.org", EditSummary: "New Java implementation by user [reilas]", LastEditedImplID: 6839, OriginalAttributionURL: "", Picture: "", ImageURL: "", ImageWidth: 0, ImageHeight: 0, ImageAlt: "", Version: 40, VersionDate: "2024-11-08T22:54:02.691646Z", Implementations: [ { Id: 3466, OrigId: -1, Author: "Bart", CreationDate: "2019-09-28T20:37:11.726064Z", LastEditor: "programming-idioms.org", LanguageName: "Pascal", CodeBlock: "var data: array of double; ... s := SumOfSquares(data); ...", OriginalAttributionURL: "", DemoURL: "", DocumentationURL: "", AuthorComment: "", Version: 2, VersionDate: "2021-12-07T10:07:15.952746Z", Rating: 0, Checked: false, ImportsBlock: "uses math;", PictureURL: "", Protected: false }, … ] } ``` What’s interesting for us, for a semantic code search engine, are the following idiom fields: - `Id` — the unique ID of the idiom - `Title` — that describes the idiom in a short way - `LeadParagraph` — which is a more detailed definition of the idiom - ExtraKeywords — words related to the idiom, for search And for the implementations, the fields: - `Id` — the unique ID of the idiom implementation - `CodeBlock` — which contains the source code of the implemented idiom - `LanguageName` — which says which programming language was used for that implementation - AuthorComment — a small explanation about the implementation We can represent those two notions, idiom & implementations, as Java records: ```java record Idiom( @SerializedName("Id") long id, @SerializedName("Title") String title, @SerializedName("LeadParagraph") String description, @SerializedName("ExtraKeywords") String keywords, @SerializedName("Implementations") Implementation[] implementations ) { record Implementation( @SerializedName("Id") long id, @SerializedName("LanguageName") String language, @SerializedName("CodeBlock") String code, @SerializedName("AuthorComment") String comment ) { } } ``` The annotations are here to map between the JSON key names and the Java record field names. We load all the idioms from the website, and we create `TextSegment`s, which is the class used by [LangChain4j](https://docs.langchain4j.dev/) to pass to the embedding model for creating vectors. ```java Idiom[] idioms = loadIdioms(); for (Idiom idiom : idioms) { System.out.println("-> " + idiom.title); for (var implementation : idiom.implementations) { var implementation = idiom.implementations[j]; if (implementation.code != null && !implementation.code.isBlank()) { allCodeSegments.add(new TextSegment( implementation.code, new Metadata() .put("idiomId", idiom.id) .put("title", idiom.title) .put("description", idiom.description) .put("titleAndDescription", idiom.title + ": " + idiom.description) .put("keywords", idiom.keywords) .put("implementationId", implementation.id) .put("language", implementation.language) )); } } } ``` Notice that we also add some metadata. Not only do we embed the code snippets, but we also add some extra information like the title, description, keywords, or programming language. This will be useful for showing the results found during the semantic search. We create a metadata field that concatenates the title and description of the idiom, as this is useful meta-information that the embedding model can use when calculating the vector embeddings. The `text-embedding-005` model pays attention to that information, and this will influence the calculations and enrich the semantic context of the vector embedding. ## Calculating embedding vectors To compute those embeddings, we configure and use the `text-embedding-005` embedding model offered by Vertex AI. We define two instances of the model, with two distinct task types: ```java private static final VertexAiEmbeddingModel EMBEDDING_MODEL = VertexAiEmbeddingModel.builder() .project(System.getenv("GCP_PROJECT_ID")) .location(System.getenv("GCP_LOCATION")) .modelName("text-embedding-005") .publisher("google") .taskType(VertexAiEmbeddingModel.TaskType.RETRIEVAL_DOCUMENT) .titleMetadataKey("titleAndDescription") .maxSegmentsPerBatch(150) .build(); private static final VertexAiEmbeddingModel EMBEDDING_MODEL_FOR_RETRIEVAL = VertexAiEmbeddingModel.builder() .project(System.getenv("GCP_PROJECT_ID")) .location(System.getenv("GCP_LOCATION")) .modelName("text-embedding-005") .publisher("google") .taskType(VertexAiEmbeddingModel.TaskType.CODE_RETRIEVAL_QUERY) .titleMetadataKey("titleAndDescription") .build(); ``` We use the `EMBEDDING_MODEL` with a `RETRIEVAL_DOCUMENT` task type for the calculation of the vector embedding, but we use the `EMBEDDING_MODEL_FOR_RETRIEVAL` instance one, with a `CODE_RETRIEVAL_QUERY` task type for the retrieval. The [documentation on task types](https://cloud.google.com/vertex-ai/generative-ai/docs/embeddings/task-types#benefits_of_task_types) explains that it helps optimize the vector embedding calculation for different types of tasks. And this is what allows us to compare natural language queries like `"calculating string length"` with the actual code that computes the length of a string. Task types put the questions and answers closer in the embedding space. We calculate all the embeddings in batch with: ```java InMemoryEmbeddingStore embeddingStore = new InMemoryEmbeddingStore<>(); List allEmbeddings = EMBEDDING_MODEL.embedAll(allCodeSegments).content(); embeddingStore.addAll(allEmbeddings, allCodeSegments); embeddingStore.serializeToFile(filePath); ``` ## Embedding the query and searching With vector databases, when doing a search, we compare a vector embedding of what we’re searching for, with all the vector embeddings stored. So now that we have all our code snippets embedded, we need to compare an embedding of a user query to all those snippets. The in-memory embedding store can calculate cosine similarities between vectors for us. Simplifying the code from the [gist](https://gist.github.com/glaforge/4e45fa4222dd803d6d8bbf2b9335e90d) a little, what we do here is to calculate the embedding for the user query, and prepare an embedding search request: ```java Embedding queryEmbedding = EMBEDDING_MODEL_FOR_RETRIEVAL.embed(question) .content(); var searchRequestBuilder = EmbeddingSearchRequest.builder() .maxResults(5) .minScore(0.8) .queryEmbedding(queryEmbedding) .build(); EmbeddingSearchResult searchResult = embeddingStore.search(searchRequest); ``` We chose to return only the 5 best search results, whose minimal score is above 0.8 (the score is a value between 0 and 1, with 1 being the highest). Then, we can iterate over the hits, and display the results for this search with some formatting: ```java searchResult.matches().forEach(match -> { TextSegment matchedSegment = match.embedded(); System.out.format(""" ——— %s ——— (score: %4.5f) ————————— Title: %s Description: %s Code: %s """, matchedSegment.metadata().getString("language"), match.score(), matchedSegment.metadata().getString("title"), matchedSegment.metadata().getString("description"), matchedSegment.text() ); }); ``` We can try different queries: - How can I make an HTTP POST request? - How to count the characters in a string? - How to use the LibXML parser in Perl? For example, for the first query, the top results look interesting, with some good scores: ``` ——— Java ——— (score: 0.85341) ————————— Title: Make HTTP POST request Description: Make a HTTP request with method POST to the URL u Code: String s = HttpClient.newHttpClient().send(HttpRequest.newBuilder() .uri(URI.create(u)) .POST(HttpRequest.BodyPublishers.ofString(content)) .build(), HttpResponse.BodyHandlers.ofString()) .body(); ——— D ——— (score: 0.84189) ————————— Title: Make HTTP POST request Description: Make a HTTP request with method POST to the URL u Code: auto response = post(u, content); ——— Go ——— (score: 0.84010) ————————— Title: Make HTTP POST request Description: Make a HTTP request with method POST to the URL u Code: response, err := http.Post(u, contentType, body) ——— Go ——— (score: 0.83938) ————————— Title: Make HTTP POST request Description: Make a HTTP request with method POST to the URL u Code: response, err := http.PostForm(u, formValues) ——— Lisp ——— (score: 0.83770) ————————— Title: Make HTTP POST request Description: Make a HTTP request with method POST to the URL u Code: (dex:post u) ``` Our search implementation found the right idioms and implementations. ## Restricting the search with metadata filtering Now if we try to be more specific, like our question that asks explicitly to search for a specific programming language like Perl, the search would yield results in all programming languages. But the user wanted only Perl examples! Instead, to have better and more precise results, we can take advantage of LangChain4j’s [metadata filtering](https://docs.langchain4j.dev/integrations/embedding-stores/). You remember that we added various metadata information to our embedded text segments? We included the programming language used in the code snippet in a language metadata field. With metadata filtering, we can focus the search only on a subset of vector embeddings whose language metadata field matches the programming language we’re interested in. Let’s update our search query as follows: ```java var searchRequestBuilder = EmbeddingSearchRequest.builder() .maxResults(5) .minScore(0.8) .queryEmbedding(queryEmbedding) .filter(new IsEqualTo("language", programmingLanguageRecognised)) .build(); ``` We added a `filter()` method, that checks that the language is equal to some value. But then, it means we have to know up-front that the user wants results just for one specific programming language. We could have some kind of UI element that users have to fill to select the programming language. But in our search query, we had a user providing the programming language directly in that query: `"How to use the LibXML parser in Perl?"` In such a situation, we can’t rely on a UI component or CLI parameter, we have to guess the programming language requested from the query string itself. This is where Gemini can come to the rescue, with a little bit of prompting, we can ask the generative model to tell us if a programming language is present in the query, and which one. First, let’s have a look at the programming languages offered by Programming Idioms: ```java private static final List KNOWN_PROGRAMMING_LANGUAGES = List.of("UNKNOWN", "Go", "Rust", "Python", "Perl", "Ruby", "Java", "JS", "C#", "Dart", "Pascal", "PHP", "C++", "Haskell", "D", "Lua", "Clojure", "Fortran", "Elixir", "Kotlin", "Erlang", "C", "Lisp", "VB", "Groovy", "Ada", "Scala", "Scheme", "Smalltalk", "Obj-C", "Cobol", "Prolog", "Caml" ); ``` We added an `UNKNOWN` value, when the language is not specified or recognised. Now we configure a Gemini 1.5 Flash model, specifying a response schema to restrict the model’s answer to a value contained in the language enumeration of possible programming languages: ```java private static final ChatLanguageModel GEMINI_MODEL = VertexAiGeminiChatModel.builder() .project(System.getenv("GCP_PROJECT_ID")) .location(System.getenv("GCP_LOCATION")) .modelName("gemini-1.5-flash-002") .responseSchema(Schema.newBuilder() .setType(Type.STRING) .addAllEnum(KNOWN_PROGRAMMING_LANGUAGES) .build()) .build(); ``` Let’s prompt Gemini to find the programming language in the user query (if present): ```java String programmingLanguageRecognised = GEMINI_MODEL.generate( SystemMessage.from(""" Your role is to classify the user message to decide if it is a question about a particular programming language or not. If you don't know, or if the programming language is not specified, reply with `UNKNOWN`, otherwise reply with just the name of the programming language recognized among the following list: """ + KNOWN_PROGRAMMING_LANGUAGES), UserMessage.from(question) ).content().text(); ``` Gemini will either reply with `UNKNOWN` if no programming language was mentioned, or with the language it has recognized. Now, when making a search for an idiom in a particular language, only implementations in that language are returned, giving much better results, in line with the expectations of the user. ## Possible further improvements Where can we go from there? We can make the search a little bit snappier, or further enhance the quality of the search results. Let’s talk first about the search speed. Searching through the in-memory vector database is pretty fast, and only requires a couple dozen milliseconds. After all, it’s all in memory, and there’s not millions of records in the database. But what takes more time are the round trips to the cloud hosted embedding models and for the generative model calls. Depending on the cloud region you use, and from where you call the program, an embedding request can take up to a second and a half, and the Gemini call less than a second. So making a request to Gemini to guess the programming language, then calling the embedding model to embed the query for comparison with the in-memory database, would be roughly two and a half seconds long if done serially. Since both operations are unrelated, we can call them in parallel using an executor service with two threads: ```java List> futures; try (var executorService = Executors.newFixedThreadPool(2)) { futures = executorService.invokeAll(List.of( () -> recognizeProgrammingLanguage(question), () -> embedQuery(question) )); } String programmingLanguageRecognised = (String) futures.get(0).get(); Embedding queryEmbedding = (Embedding) futures.get(1).get(); ``` With this trick, the embedding and programming language guessing takes as much time as the longest of both tasks. Usually, it seems the embedding is the longest. So we shave a second of wait time for the user. It’s a win! The other aspect we could improve further is the quality of search results. We already improved it by applying two techniques: using a code retrieval task type with our embedding model, and also the programming language filtering to avoid returning languages the user isn’t interested in. However, there’s another approach we haven’t explored (this could be the topic for another article) which is to combine the existing keyword-based search provided by the Programming Idioms website, with our semantic search. This is what is called **hybrid search**: combining the results of two or more searches, to give better results, applying techniques like [Reciprocal Rank Fusion](https://medium.com/@devalshah1619/mathematical-intuition-behind-reciprocal-rank-fusion-rrf-explained-in-2-mins-002df0cc5e2a) to merge results. Embedding and generative models understand text pretty well, but can struggle with acronyms, product names, etc, that they haven’t seen much (if at all) in their training set. But keyword-based searches excel at that. So by combining the best of both worlds, our little website search box could tackle more queries, and give the best answers to our users. ## Summary This article explored semantic code search for programming idioms using Vertex AI [embedding models](https://cloud.google.com/vertex-ai/generative-ai/docs/embeddings) and the [LangChain4j](https://docs.langchain4j.dev/) framework. We aimed to enable natural language queries for code examples, going beyond keyword-based searches. Key learnings included: - **Embedding models** represented text as multidimensional vectors, capturing semantic similarities. - **Vertex AI's text-embedding-005** model, particularly the `CODE_RETRIEVAL_QUERY` task type, was optimized for code-related searches. - **LangChain4j** provided a framework for building LLM applications in Java. - **Gemini**, a generative AI model, could be used to infer the programming language from a user's query, improving search accuracy. - **Parallel processing** enhanced search speed by concurrently executing embedding and language recognition tasks. - **Metadata filtering** allowed for more precise searches based on attributes like the programming language name. - **Hybrid search**, combining semantic and keyword-based approaches, could further improve search quality. Overall, the article demonstrated how we could build a fast and intelligent programming idiom search engine that understands natural language queries and retrieves contextually relevant code examples. # Redacting sensitive information when using Generative AI models As we are making our apps smarter with the help of Large Language Models, we must keep in mind that we are often dealing with potentially sensitive information coming from our users. In particular, in the context of chatbots, our application users have the ability to input any text in the conversation. Personally Identifiable Information (PII) should be dealt with the highest level of attention, because we care about our users, we don't want to leak their personal details, and we must comply with all sorts of laws or regulations. In a word, we are [responsible AI](https://cloud.google.com/responsible-ai) developers. In this article, we'll learn about the Google Cloud [Data Loss Prevention](https://cloud.google.com/security/products/dlp) (DLP) API. It's a very powerful and rich service, which allows you to identify, classify, filter, redact any PII like names, passport numbers, bank account numbers, and more. Today, with DLP, our goal is to redact the PII information sent by our user, before sending the user's message to our LLM. In a nutshell (in pseudo-code), instead of doing: ```java String userMessage = "..."; String response = model.generate(userMessage); ``` We want to add an instruction in the middle to redact the personally identifiable information before sending it to the LLM, so we need to to insert a method in between: ```java String userMessage = "..."; String redactedMessage = redact(userMessage); String response = model.generate(redactedMessage); ``` We'll simply redact the user message, but remember that there are other areas where you can apply good practices when handling user information. For example, when you store data, when you log interactions, etc. ## Meet our user! Our user, let's call her Alicia, is a bit talkative, and shares way too much information that she should. Let's imagine that she is travelling, and lost her wallet, and needs some money to be wired in a rush. Maybe she could send a message to our travel application that looks as follows: ```java String userMessage = """     My name is Alicia Bob.     My number is +33612345678, can you call me please?     Please wire some $$$ on FR7630001007941234567890185     You can check my passport if needed, it's 78TH67845.     """; ``` Woh! In one message she gave her name, her phone number, her bank account (IBAN), and even her passport number! But our application doesn't necessarily need all those details! In our code, we're sending that information to our Gemini model, using [LangChain4j](https://docs.langchain4j.dev/): ```java var model = VertexAiGeminiChatModel.builder()     .project(System.getenv("GCP_PROJECT_ID"))     .location(System.getenv("GCP_LOCATION"))     .modelName("gemini-1.5-flash-002")     .build(); String redactedMessage = redact(userMessage); System.out.println(redactedMessage); String response = model.generate(redactedMessage); ``` Our mission, if we accept it, is to implement the `redact()` method that will remove all the PII information from that request. ## Redacting this \[REDACTED\] message! First, let's have a look at all the code of our `redact()` method, and we'll explain bits and pieces further down. You can also look at this [gist](https://gist.github.com/glaforge/c7d7188aa3ff01a0f691b1e474ec0260) on Github with all the code as well. ```java public static String redact(String userMessage) {   try (var dlp = DlpServiceClient.create()) {     var item = ContentItem.newBuilder().setValue(userMessage).build();     var inspectConfigbuilder = InspectConfig.newBuilder();     var redactConfig = DeidentifyConfig.newBuilder();     var infoTypeTransfBuilder = InfoTypeTransformations.newBuilder();     Stream.of("PERSON_NAME", "PHONE_NUMBER", "PASSPORT", "IBAN_CODE")       .forEach(toRedact -> {           var infoType = InfoType.newBuilder() .setName(toRedact) .build();           inspectConfigbuilder.addInfoTypes(infoType);           var replaceValueConfig =             ReplaceValueConfig.newBuilder()               .setNewValue(Value.newBuilder()               .setStringValue("[" + toRedact + "]").build())               .build();           var primitiveTransformation =           PrimitiveTransformation.newBuilder()               .setReplaceConfig(replaceValueConfig).build();           var infoTypeTransformation = InfoTypeTransformations.InfoTypeTransformation.newBuilder()            .addInfoTypes(infoType)             .setPrimitiveTransformation(primitiveTransformation)               .build();           infoTypeTransfBuilder .addTransformations(infoTypeTransformation);         });     redactConfig.setInfoTypeTransformations(       infoTypeTransfBuilder);     DeidentifyContentRequest request =       DeidentifyContentRequest.newBuilder()         .setParent( LocationName.of(System.getenv("GCP_PROJECT_ID"), "global") .toString())         .setItem(item)         .setDeidentifyConfig(redactConfig)         .setInspectConfig(inspectConfigbuilder)         .build();     DeidentifyContentResponse response = dlp.deidentifyContent(request);     return response.getItem().getValue();   } catch (IOException e) {     throw new RuntimeException("Failed to redact message.", e); } } ``` As you can see, the DLP API is quite a bit verbose, but it's really super powerful, and is capable of more than just redacting PII information. First of all, we need to create a client for the DLP service (which is `AutoCloseable`, hence the `try` with resources pattern): ```java try (var dlp = DlpServiceClient.create()) { ... } catch (...) {...} ``` We create a `ContentItem` from our user message: ```java var item = ContentItem.newBuilder()     .setValue(userMessage)     .build(); ``` Then we'll create some `InfoType`s which represent the different kinds of identifiable information we're interested in: ```java Stream.of("PERSON_NAME", "PHONE_NUMBER", "PASSPORT", "IBAN_CODE")     .forEach(toRedact -> {         var infoType = InfoType.newBuilder().setName(toRedact).build();         inspectConfigbuilder.addInfoTypes(infoType); ``` Here, we care only for the person's name, phone number, passport, and IBAN codes. But there are a ton of [other details we can redact](https://cloud.google.com/sensitive-data-protection/docs/infotypes-reference). The next few instructions will associate a text transformation rule to transform the PII information into some redacted format. We could have used just something like `[REDACTED]` but we are going to reuse the name of the info type: `[PERSON_NAME]`, `[PHONE_NUMBER]`, `[PASSPORT]`, and `[IBAN_CODE]`: ```java var replaceValueConfig =     ReplaceValueConfig.newBuilder()         .setNewValue(Value.newBuilder()             .setStringValue("[" + toRedact + "]").build())         .build(); var primitiveTransformation =     PrimitiveTransformation.newBuilder()         .setReplaceConfig(replaceValueConfig).build(); var infoTypeTransformation = InfoTypeTransformations.InfoTypeTransformation.newBuilder()      .addInfoTypes(infoType)     .setPrimitiveTransformation(primitiveTransformation)     .build(); ``` We add all those text transformations to the information type transformation builder, and then it's time to actually make the request to the DLP service: ```java DeidentifyContentRequest request =     DeidentifyContentRequest.newBuilder()         .setParent( LocationName.of(System.getenv("GCP_PROJECT_ID"), "global") .toString())         .setItem(item)         .setDeidentifyConfig(redactConfig)         .setInspectConfig(inspectConfigbuilder)         .build(); DeidentifyContentResponse response = dlp.deidentifyContent(request); return response.getItem().getValue(); ``` We wire everything together by creating a `DeidentifyContentRequest` instance with our user message (the item) and all PII identification and transformation configuration. We configured the DLP service by passing our Google Cloud project ID, after having [enabled the API](https://console.cloud.google.com/apis/library/dlp.googleapis.com). We call the DLP service with `dlp.deidentifyContent(request)` and finally we can get the redacted value with `response.getItem().getValue()`. So what does our original user message look like now, once redaction is applied? Let's see: ``` My name is [PERSON_NAME] [PERSON_NAME]. My number is [PHONE_NUMBER], can you call me please? Please wire some $$$ on [IBAN_CODE] You can check my passport if needed, it's [PASSPORT]. ``` No more personally identifiable information left! ## Summary Our user's trust is one of the most important things we must care about. Not only for compliance purposes but also simply because it's the right thing to do. There are so many hackers out there trying to get access to such information, for nefarious reasons. Let's not offer them an extra chance to harm our users. In this article and sample code, we've seen that the [Google Cloud DLP API](https://cloud.google.com/security/products/dlp?hl=en) is able to redact information, but it can be used in a myriad of ways, for example to analyze data at rest as well, or you can deidentify / reidentify information as well. Be sure to check out what this service is capable of doing. We focused on just a few PII details, but DLP supports a [huge number of identifiable information](https://cloud.google.com/sensitive-data-protection/docs/infotypes-reference). There's a big [list of snippets](https://cloud.google.com/sensitive-data-protection/docs/samples/?hl=en) of code that you can have a look at to see what you can do with the DLP API. There are [SDKs](https://cloud.google.com/sensitive-data-protection/docs/libraries) for various programming languages, if you use another language than Java. And check out the [documentation](https://cloud.google.com/sensitive-data-protection/docs/sensitive-data-protection-overview)! You can apply this technique to filter user input before sending it to a generative model, but you can also apply it in output as well, when/if you log user messages, or store data in databases or other places. And remember, be mindful of your user's data! # Data extraction: The many ways to get LLMs to spit JSON content Data extraction from unstructured text is a very important task where LLMs shine, as they understand human languages well. Rumor has it that 80% of the worldwide knowledge and data comes in the form of unstructured text (vs 20% for data stored in databases, spreadsheets, JSON/XML, etc.) Let’s see how we can get access to that trove of information thanks to LLMs. In this article, we’ll have a look at different techniques to make LLMs generate JSON output and extract data from text. This applies to most LLMs and frameworks, but for illustration purposes, we’ll use [Gemini](https://deepmind.google/technologies/gemini/) and [LangChain4j](https://docs.langchain4j.dev/) in Java. We’ll explore the following approaches: - prompting - function calling - structured output with a JSON mode - structured output with a JSON response schema ## Let’s get started Your mission, if you accept it, is to extract the name and age from the biography of a person: ``` Anna is a 23 year old artist based in Brooklyn, New York. She was born and raised in the suburbs of Chicago, where she developed a love for art at a young age. She attended the School of the Art Institute of Chicago, where she studied painting and drawing. After graduating, she moved to New York City to pursue her art career. Anna's work is inspired by her personal experiences and observations of the world around her. She often uses bright colors and bold lines to create vibrant and energetic paintings. Her work has been exhibited in galleries and museums in New York City and Chicago. ``` From that text, we want to extract the following JSON snippet: ```json { "name": "Anna", "age": 23 } ``` ## Let’s just ask politely\! The first approach is to simply craft a user message, via prompting, that requests the response to be returned as JSON. A simple prompt suffice: ``` Return the name and age of the person described in the biography below. Give the name and age in the form of a JSON object following this structure: `{"name": "Jon Doe", "age": 36}` Only return JSON, without any explanation, without surrounding markdown code markup. Here is the biography: Anna is a 23 year old artist based in Brooklyn, New York. She was born and raised in the suburbs of Chicago, where she developed a love for art at a young age. She attended the School of the Art Institute of Chicago, where she studied painting and drawing. After graduating, she moved to New York City to pursue her art career. Anna's work is inspired by her personal experiences and observations of the world around her. She often uses bright colors and bold lines to create vibrant and energetic paintings. Her work has been exhibited in galleries and museums in New York City and Chicago. JSON: ``` Sometimes, LLMs don’t always follow precisely the instructions. So you have to nudge them a little bit by requesting them to really output only JSON, as sometimes they wrap their answers with messages like “Here is the name and age of the person…” or with extra Markdown code blocks. So you may have to further tweak the prompt. Quick illustration with Gemini and LangChain4j: ```java String biography = "Anna is a 23 year old artist…"; var model = VertexAiGeminiChatModel.builder() .project(System.getenv("PROJECT_ID")) .location(System.getenv("LOCATION")) .modelName("gemini-1.5-pro-002") .build(); String response = model.generate(""" Return the name and age of the person described in the biography below. Give the name and age in the form of a JSON object following this structure: `{"name": "Jon Doe", "age": 36}` Only return JSON, without any explanation, without surrounding markdown code markup. Here is the biography: """ + biography + """ JSON: """); System.out.println(response); // {"name": "Anna", "age": 23} ``` The output is a `String`, so you have to parse it with your favorite JSON parser, but the data has been successfully extracted into a JSON object. Most LLMs support the notion of system instructions. Usually, LLMs obey a bit more closely to those instructions, than via user prompts. So you could also rewrite the example above by splitting the instructions inside system instructions, and put only the biography in the user prompt. ## Function calling to the rescue\! Before the advent of JSON modes and response schemas (that we’ll review in the next sections) a more certain way to get JSON outputs was to take advantage of function calling. You have to encourage the LLM to request a function call to extract the information. Here’s the trick. ```json { "name": "extractNameAndAgeFromBiography", "description": "extract the name and age of a person described in the biographical text given in input", "parameters": { "type": "object", "properties": { "name": { "type": "string" }, "age": { "type": "integer" } }, "required": ["name", "age"] } } ``` You define a function whose signature looks like `extractNameAndAgeFromBiography(String name, int age)`, following the OpenAPI specification. You should add very precise descriptions for the function and its arguments. Here, I could have added more information about the parameters, but the names seemed self-explanatory to me. Then you can just pass the biography directly, and it should just work out of the box. You can add system instructions to request the model to call that method to find the name and age of the person. But sometimes, some LLMs also allow you to force the LLM to request a call to a function. What does it look like in Java with LangChain4j? ```java var model = VertexAiGeminiChatModel.builder() .project(System.getenv("PROJECT_ID")) .location(System.getenv("LOCATION")) .modelName("gemini-1.5-pro-002") .toolCallingMode(ToolCallingMode.ANY) .allowedFunctionNames(List.of("extractNameAndAgeFromBiography")) .build(); ``` I specified the tool calling mode: this `ANY` value instructs the model to call one of the methods defined in the allowed function names list. It is a forced call request. The model will have to request the call. ```java Response response = model.generate( List.of( SystemMessage.from(""" Return the name and age of the person described by the user by calling the function `extractNameAndAgeFromBiography()` and passing the name and the age of the person recognized. """), UserMessage.from(biography) ), ToolSpecification.builder() .description(""" extract the name and age of a person described in the biographical text given in input """) .name("extractNameAndAgeFromBiography") .parameters( JsonObjectSchema.builder() .addStringProperty("name") .addIntegerProperty("age") .required("name", "age") .build() ) .build() ); ``` The `generate()` call is a bit more convoluted. With forced tool calling, the system message is not mandatory, but it can help ensure all parameters are passed as arguments. Look at how we defined the contract of the function `extractNameAndAgeFromBiography()` by creating an object with a string and integer properties. Now we’ll extract the function call request. We don’t look at the text content, as the model returns a tool execution request instead: ```java System.out.println(response .content() .toolExecutionRequests() .getFirst() .arguments()); // {"name":"Anna","age":23.0} ``` You can retrieve just the arguments, as a JSON string. It’s already following the JSON object structure we wished to obtain. You might notice a minor annoyance here, though, which is the fact the age is not an integer, but a floating point number. I’m not entirely sure at this point why we don’t get an integer. I’ll have to dig a little deeper… Let’s now have a look at the JSON mode and response schema approaches. ## JSON mode approach Some LLMs started offering the ability to request the model to output valid JSON. It’s not necessarily 100% certain that it will follow your requested format (for example, some JSON object keys could sometimes be named differently) but it works most of the time. With the JSON mode (sometimes called structured output, or constrained decoding), we come back to our first approach, by prompting the LLM to generate JSON. But this time, we don’t have to nudge the LLM as much, because it must always generate valid JSON in output. It won’t add any Markdown markup, or any commentary. Let’s see how to use the JSON mode with Gemini and LangChain4j: ```java var model = VertexAiGeminiChatModel.builder() .project(System.getenv("PROJECT_ID")) .location(System.getenv("LOCATION")) .modelName("gemini-1.5-pro-002") .responseMimeType("application/json") .build(); ``` Notice how we set the response MIME type to application/json? That’s how we enable Gemini to always return valid JSON in output. ```java Response response = model.generate( List.of( SystemMessage.from(""" Return the name and age of the person described in the biography below. Give the name and age in the form of a JSON object following this structure: `{"name": "Jon Doe", "age": 36}` """ ), UserMessage.from(biography) ) ); System.out.println(response.content().text()); // {"name": "Anna", "age": 23} ``` We just needed to encourage Gemini to follow the JSON structure shown in the example in the system instruction. We don’t have to give further nudges to the model to not output Markdown code markup, or to prevent it from adding extra explanations. This gives great results, but to go even further and ensure that the returned JSON document is compliant with the format you really wish to get, you can also define a JSON response schema. That’s what we’re gonna see next. ## Even better with JSON schema for structured output In addition to the response MIME type, you can specify the JSON schema that the JSON response must comply with. Let’s complement the previous example, and add that schema definition: ```java var model = VertexAiGeminiChatModel.builder() .project(System.getenv("PROJECT_ID")) .location(System.getenv("LOCATION")) .modelName("gemini-1.5-pro-002") .responseMimeType("application/json") .responseSchema(Schema.newBuilder() .setType(Type.OBJECT) .putProperties("name", Schema.newBuilder() .setType(Type.STRING) .setDescription( "The name of the person described in the biography") .build()) .putProperties("age", Schema.newBuilder() .setType(Type.INTEGER) .setDescription( "The age of the person described in the biography") .build()) .build()) .addAllRequired(List.of("name", "age")) .build(); ``` The response should be an object with a string name, and an integer age properties. ## Bonus points with type safe objects with LangChain4j In our LangChain4j based examples, in Java, each time, the low-level APIs offered by the framework always responded with JSON strings. But as a Java developer, we’d prefer to manipulate real Java objects instead. Of course, you can take advantage of the unmarshalling capabilities of your favorite JSON library. But what if the framework provided a higher level abstraction and did all the work for you? That’s where we’ll use LangChain4j’s AI services. First, let’s define a data structure to hold the name and age of our biographies, with a Java record: ```java record Person(String name, int age) { } ``` The next step is to create a contract that the framework will implement for you. In input, a string biography, and in output, a Person record: ```java interface PersonExtractor { @SystemMessage(""" Your role is to extract the name and age of the person described in the biography. """) Person extractPerson(String biography); } ``` Notice how we annotate the method with a system instruction that instructs the model what its role is. We still need to instantiate our chat model: ```java var model = VertexAiGeminiChatModel.builder() .project(System.getenv("PROJECT_ID")) .location(System.getenv("LOCATION")) .modelName("gemini-1.5-pro-002") .responseMimeType("application/json") .responseSchema(SchemaHelper.fromClass(Person.class)) .build(); ``` We specify again the response MIME type, and also the response schema. But we’re using a convenience method provided by the SchemaHelper class to derive a schema from a Java class (here, our Person record). Now we can instantiate our person extractor contract as follows: ```java PersonExtractor extractor = AiServices.create(PersonExtractor.class, model); ``` And finally, we can pass it the biography in input: ```java Person person = extractor.extractPerson(bio); System.out.println(person.name()); // Anna System.out.println(person.age()); // 23 ``` We have an instance of our Person record in output that is properly populated with the name and age of the person described in our biography\! That way, as Java developers, we manipulate a real Java object, in a type-safe manner\! Our application is enhanced by an LLM, but from a developer perspective, we manipulate interfaces and objects. ## Summary Lots of articles, videos, or presentations often talk about the chatbot use case, when creating applications powered by large language models. However, data extraction is another very important and useful task where LLMs shine. In this article, we saw different approaches to do data extraction: via prompting, function calling, or with a JSON mode or JSON schema. If your LLM supports the ability to set a response schema, that’s definitely the best way to get the JSON output you expect. Also, if the LLM orchestration framework you use supports it, be sure to check if it’s able to return type-safe objects that you can manipulate with your programming language directly, without having to parse the JSON string yourself. # Things you never dared to ask about LLMs Along my learning journey about generative AI, lots of questions popped up in my mind. I was very curious to learn how things worked under the hood in Large Language Models (at least having an intuition rather than knowing the maths in and out). Sometimes, I would wonder about how tokens are created, or how hyperparameters influence text generation. Before the [dotAI](https://www.dotai.io/) conference, I was invited to talk at the meetup organised by [DataStax](https://www.datastax.com/). I presented about all those things you never dared to ask about LLMs, sharing both the questions I came up with while learning about generative AI, and the answers I found and discovered along the way. Without further ado, here's the deck: {{< speakerdeck 476be803290048d6935e585bf87d1e5f >}} ## Abstract > ## Things you never dared to ask about LLMs > > Large Language Models (LLMs) have taken the world by storm, powering applications from chatbots to content generation. > Yet, beneath the surface, these models remain enigmatic. > > This presentation will “delve” into the hidden corners of LLM technology that often leave developers scratching their heads. > It’s time to ask those questions you’ve never dared ask about the mysteries underpinning LLMs. > > Here are some questions we’ll to answer: > > Do you wonder why LLMs spit tokens instead of words? Where do those tokens come from? > * What’s the difference between a “foundation” / “pre-trained” model, and an “instruction-tuned” one? > * We’re often tweaking (hyper)parameters like temperature, top-p, top-k, but do you know how they really affect how tokens are picked up? > * Quantization makes models smaller, but what are all those number encodings like fp32, bfloat16, int8, etc? > * LLMs are good at translation, right? Do you speak the Base64 language too? > > We’ll realize together that LLMs are far from perfect: > * We’ve all heard about hallucinations, or should we say confabulations? > * What is this reversal curse that makes LLMs ignore some facts from a different viewpoint? > * You’d think that LLMs are deterministic at low temperature, but you’d be surprised by how the context influences LLMs’ answers… > > Buckle up, it’s time to dispel the magic of LLMs, and ask those questions we never dared to ask! This talk wasn't recorded, but I hope to give this presentation again sometime soon, and hopefully, it'll be recorded then. If that happens, I'll share the video recording once it's available. ## Illustrations: Imagen 3 to the rescure For those who are curious about the cute little robots that appear in this presentation, I've generated them with DeepMind's [Imagen 3](https://deepmind.google/technologies/imagen-3/) image generation model. The quality of the output was really lovely, and I might have been a bit overboard with the number of generated robots in this deck. I would start pretty much all my prompts with _"cartoon of a cute little robot..."_ For my Java developer friends, you can [generate images with Imagen via LangChain4j](https://glaforge.dev/posts/2024/10/01/ai-nktober-generating-ink-drawings-with-imagen/) (as explained in that article where I generated black'n white ink drawings). # Advanced RAG Techniques **Retrieval Augmented Generation** (RAG) is a pattern to let you prompt a large language model (LLM) about your own data, via in-context learning by providing extracts of documents found in a vector database (or potentially other sources too). Implementing RAG isn't very complicated, but the results you get are not necessarily up to your expectations. In the presentations below, I explore various **advanced techniques to improve the quality of the responses returned by your RAG system**: Ingestion chunking techniques like: - Embedding of sliding windows of sentences - Hypothetical question embedding - Contextual retrieval embedding (invented recently by Anthropic) - Semantic chunking (created by Greg Kamradt) Retrieval techniques, including: - Query compression - Hypothetical Document Embedding (HyDE) And I also mention how an _agentic_ approach can help for more advanced and complex needs, with providing intermerdiary results, combined in a final response. **Agentic RAG** is a very important and promising approach that I'll certainly come back to in upcoming articles. At Devoxx Belgium 2024, I gave a 50-minute session, and a 3-hour long deep dive with my friend [Cédrick Lunven](https://x.com/clunven) from Datastax (we used the great [Astra DB](https://www.datastax.com/products/datastax-astra) vector database in our demos). You'll find both decks and videos below. ## Code available on Github All the code presented in those sessions is available in this [Github repository](https://github.com/datastaxdevs/conference-2024-devoxx/) ## RAG: from dumb implementation to serious results ### Abstract > Embarking on your RAG journey may seem effortless, but achieving satisfying results often proves challenging. Inaccurate, incomplete, or outdated answers, suboptimal document retrieval, and poor text chunking can quickly dampen your initial enthusiasm. > > In this session, we'll leverage LangChain4j to elevate your RAG implementations. We'll explore: > > - Advanced Chunking Strategies: Optimize document segmentation for improved context and relevance. > - Query Refinement Techniques: Expand and compress queries to enhance retrieval accuracy. > - Metadata Filtering: Leverage metadata to pinpoint the most relevant documents. > - Document Reranking: Reorder retrieved documents for optimal result presentation. > - Data Lifecycle Management: Implement processes to maintain data freshness and relevance. > - Evaluation and Presentation: Assess the effectiveness of your RAG pipeline and deliver results that meet user expectations. > > Join us as we transform your simplistic RAG experience from one of frustration to delight your users with meaningful and accurate answers. ### Presentation slide deck {{< speakerdeck a2207c4bc9b9447da5a397107da19d0f >}} ### YouTube video recording {{< youtube 6_wUUYKBdE0 >}} ## RAG: from dumb implementation to serious results ### Abstract > It’s easy to get started with Retrieval Augmented Generation, but you’ll quickly be disappointed with the generated answers: inaccurate or incomplete, missing context or outdated information, bad text chunking strategy, not the best documents returned by your vector database, and the list goes on. > > After meeting thousands of developers across Europe, we’ve explored those pain points, and will share with you how to overcome them. As part of the team building a vector database we are aware of the different flavors of searches (semantic, meta-data, full text, multimodal) and embedding model choices. We have been implementing RAG pipelines across different projects and frameworks and are contributing to LangChain4j. > > In this deep-dive, we will examine various techniques using LangChain4j to bring your RAG to the next level: with semantic chunking, query expansion & compression, metadata filtering, document reranking, data lifecycle processes, and how to best evaluate and present the results to your users. ### Presentation slide deck {{< speakerdeck 5f7120a2dbeb4ffd917102321231cbc0 >}} ### YouTube video recording {{< youtube RN7thifOmkI >}} # A Gemini and Gemma tokenizer in Java It's always interesting to know _how the sausage is made_, don't you think? That's why, a while ago, I looked at [embedding model tokenization](https://glaforge.dev/posts/2024/02/05/visualize-palm-based-llm-tokens/), and I implemented a little [visualization](https://tokens-lpj6s2duga-ew.a.run.app/) to see the tokens in a colorful manner. Yet, I was still curious to see how Gemini would tokenize text... Both LangChain4j Gemini modules (from [Vertex AI](https://docs.langchain4j.dev/integrations/language-models/google-vertex-ai-gemini) and from [Google AI Labs](https://docs.langchain4j.dev/integrations/language-models/google-ai-gemini)) can count the tokens included in a piece of text. However, both do so by calling a REST API endpoint method called `countTokens`. This is not ideal, as it requires a network hop to get the token counts, thus adding undesired extra latency. Wouldn't it be nicer if we could count tokens locally instead? Interestingly, both Gemini and the open-weights [Gemma](https://ai.google.dev/gemma) models share the same tokenizer and token vocabulary. Also, the tokenizer is based on [SentencePiece](https://github.com/google/sentencepiece), which is a tokenizer/detokenizer implementing the byte-pair-encoding (BPE) and unigram language algorithms. If you look at the [Gemma code on HuggingFace](https://huggingface.co/google/gemma-2-9b-it/tree/main), you'll see a `tokenizer.json` file that you can open to see the available tokens in the vocabulary, and a `tokenizer.model` file which is some kind of binary compressed variation. Knowing that the list of tokens supported by Gemini and Gemma is available in those files, and how they are encoded, I was curious to see if I could implement a Java tokenizer that could run locally, rather than calling a remote endpoint. The `SentencePiece` implementation from Google is a C++ library, but I didn't really feel like wrapping it myself with JNI, and fortunately, I discovered that the [DJL](https://djl.ai/) project had done the JNI wrapping job already. So let's see how to tokenize text for Gemini and Gemma, in Java! ## Gemini and Gemma tokenization in Java with DJL First of all, let's setup the dependency on DJL's `SentencePiece` module: - From Maven: ```xml ai.djl.sentencepiece sentencepiece 0.30.0 ``` - From Gradle: ```groovy implementation 'ai.djl.sentencepiece:sentencepiece:0.30.0' ``` I saved the `tokenizer.model` file locally. Note that it's a 4MB file, as Gemini/Gemma have a very large vocabulary of around a quarter million of tokens! Now, let's instantiate an `SpTokenizer` object that loads this vocabulary file, and tokenize some text: ```java import ai.djl.sentencepiece.SpTokenizer; // ... Path model = Paths.get("src/test/resources/gemini/tokenizer.model"); byte[] modelFileBytes = Files.readAllBytes(model); try (SpTokenizer tokenizer = new SpTokenizer(modelFileBytes)) { List tokens = tokenizer.tokenize(""" When integrating an LLM into your application to extend it and \ make it smarter, it's important to be aware of the pitfalls and \ best practices you need to follow to avoid some common problems \ and integrate them successfully. This article will guide you \ through some key best practices that I've come across. """); for (String token: tokens) { System.out.format("[%s]%n", token); } System.out.println("Token count: " + tokens.size()); } ``` When running this Java class, you'll see the following output: ``` [When] [▁integrating] [▁an] [▁L] [LM] [▁into] [▁your] [▁application] ... Token count: 61 ``` ## Next steps Do we need next steps? Yes, why not! My idea is to contribute a tokenizer module to LangChain4j, so that the Vertex AI Gemini and the Google AI Gemini modules can both import it, instead of relying on remote endpoint calls to count tokens. # AI Inktober — Generating ink drawings with Imagen 3 Every year, in October, takes place the [Inktober challenge](https://inktober.com/): every day of the month, you have to do a drawing representing the word of the day. The list of _prompts_ this year is the following: ![Inktober 2024 prompts](/img/ainktober/prompts.png) I participated to some of the daily challenges the past few years, but I never did all of them. But this year, for the fun, I thought I could ask Google's [Imagen 3](https://deepmind.google/technologies/imagen-3/) image model to draw for me! (Or at least to draw something I could try to reproduce.) Of course, the goal of the challenge is not to generate images with the help of an AI. On the contrary, the idea is about the pleasure you can have drawing yourself, with your own hands! However, I was curious to see how Imagen would perform on such a challenge. So I fired up my favorite Java AI framework: [LangChain4j](https://docs.langchain4j.dev/), as it supports Imagen 3, as image model. > **Note:** Imagen 3 is generally available on Google Cloud's Vertex AI platform, but it's behind an _allow list_. > So you have to [request access](https://docs.google.com/forms/d/1cqt9padvfMgqn23W5FMPTqh7bW1KLkEOsC5G6uC-uuM/viewform) to be able to use it. You will need the following dependency: - For Gradle users: ```groovy implementation 'dev.langchain4j:langchain4j-vertex-ai:0.35.0' ``` - For Maven users: ```xml dev.langchain4j langchain4j-vertex-ai 0.35.0 ``` Now let's have a look at the code: ```java import dev.langchain4j.data.image.Image; import dev.langchain4j.model.output.Response; import dev.langchain4j.model.vertexai.VertexAiImageModel; import java.nio.file.Path; public class AInktober { public static void main(String[] args) { VertexAiImageModel imagenModel = VertexAiImageModel.builder() .endpoint(System.getenv("GCP_VERTEXAI_ENDPOINT")) .location(System.getenv("GCP_LOCATION")) .project(System.getenv("GCP_PROJECT_ID")) .publisher("google") .modelName("imagen-3.0-fast-generate-001") .aspectRatio(VertexAiImageModel.AspectRatio.SQUARE) .negativePrompt("watercolor, gray shades") .persistTo(Path.of("/tmp/imagen")) .build(); String prompt = """ A black and white ink drawing of a backpack, on a fully white background """; Response imageResponse = imagenModel.generate(prompt); System.out.println(imageResponse.content().url()); } } ``` - I have set up several environment variables containing my Google Cloud project details. - I decided to use `imagen-3.0-fast-generate-001`, which generates images faster (and cheaper!) than `imagen-3.0-generate-001` at the cost of a slightly lower quality (but for ink drawings, that's not really visible). - I went with square images, but you can use landscape, portrait, and wider variants too. - I added a negative prompt, because some images looked a bit more like watercolor at times, but I wanted images more black and white. - I persist all the generated images into a temporary folder. - My prompt contains the first subject of the day, a _"backpack"_, and I specify that I want a black and white ink drawing, but I also added that I wanted a white background, as sometimes the background can be fully black, or some sepia shade. So what does the first image look like? ![Inktober 2024's backpack](/img/ainktober/ainktober-01-backpack.png) It definitely looks like an ink drawing of a backpack! Don't worry, I won't post a new article each day for the new daily image prompt. Instead, I'll share the other days on my usual social media channels (see the bottom of the blog to find them out.) Be sure to checkout Imagen 3, it's pretty good! # Lots of new cool Gemini stuff in LangChain4j 0.35.0 While [LangChain4j](https://docs.langchain4j.dev/) 0.34 introduced my [new Google AI Gemini module](http://localhost:1313/posts/2024/09/05/new-gemini-model-in-langchain4j/), a new 0.35.0 version is already here today, with some more cool stuff for Gemini and Google Cloud! Let's have a look at what's in store! ## Gemini 1.5 Pro 002 and Gemini 1.5 Flash 002 This week, [Google announced](https://developers.googleblog.com/en/updated-production-ready-gemini-models-reduced-15-pro-pricing-increased-rate-limits-and-more/) the release of the new versions of the Google 1.5 models: * `google-1.5-pro-002` * `google-1.5-flash-002` Of course, both models are supported by LangChain4j! The Google AI Gemini module also supports the `gemini-1.5-flash-8b-exp-0924` 8-billion parameter model. Versions `002` come with: * much improved math and reasoning capabilities \ (7%-20% increase depending on the benchmark), * 2x faster output, and 3x lower latency, * and also roughly a 50% price cut! ## Google Cloud Storage document loader When implementing Retrieval Augmented Generation (RAG), you must load the documents from somewhere. You can feed the docs directly in the context, but LangChain4j comes with the notion of [document loaders](https://docs.langchain4j.dev/tutorials/rag#document-loader). There are existing document loaders for the file system, for files at remote URLs, or source files stored in Github. In this release, I've implemented a **Google Cloud Storage document loader**, which lets you reference documents stored inside cloud storage buckets. Create a GCS document loader with the new builder: ```java var gcsLoader = GoogleCloudStorageDocumentLoader.builder() .project(System.getenv("GCP_PROJECT_ID")) .build(); ``` Then you can load a single document, and parse it: ```java Document document = gcsLoader.loadDocument( "BUCKET_NAME", "FILE_NAME.txt", new TextDocumentParser()); ``` All the documents in a bucket: ```java List documents = gcsLoader.loadDocuments( "BUCKET_NAME", new TextDocumentParser()); ``` Or just the a list of files filtered with a _glob_ pattern: ```java List documents = gcsLoader.loadDocuments( "BUCKET_NAME", "*.txt", new TextDocumentParser()); ``` ## Vertex AI Ranking API When implementing Retrieval Augmented Generation (RAG), your vector database returns a certain number of results. They are usually sorted by vector similarity. But it's not necessarily because the vectors have the highest similarity, that they are necessarily the best matches to answer a user query. In order to palliate this problem, there are ranking or reranking APIs and models that exist to order results according to how well they match the query. The Vertex AI platform from Google Cloud offers a [ranking API](https://cloud.google.com/generative-ai-app-builder/docs/ranking) for that purpose, a little known API that deserves more awareness. I implemented a `ScoringModel` for this Vertex AI Ranking API: ```java VertexAiScoringModel scoringModel = VertexAiScoringModel.builder() .projectId(System.getenv("GCP_PROJECT_ID")) .projectNumber(System.getenv("GCP_PROJECT_NUMBER")) .projectLocation(System.getenv("GCP_LOCATION")) .model("semantic-ranker-512") .build(); Response> score = scoringModel.scoreAll(Stream.of( "The sky appears blue due to a phenomenon called Rayleigh " + "scattering. Sunlight is comprised of all the colors of " + "the rainbow. Blue light has shorter wavelengths than other " + "colors, and is thus scattered more easily.", "A canvas stretched across the day,\n" + "Where sunlight learns to dance and play.\n" + "Blue, a hue of scattered light,\n" + "A gentle whisper, soft and bright." ).map(TextSegment::from).collect(Collectors.toList()), "Why is the sky blue?"); // [0.8199999928474426, 0.4300000071525574] ``` In the example above, a user asks _why the sky is blue_. The Ranking API attempts to determine which of two excerpts best matches this question. The first excerpt appears to be an explanation of this celestial phenomenon, while the second sounds more like a poem. When scoring these text fragments, we observe that the first one has a higher value (0.82 vs. 0.43). It is also possible to score just one piece of text with the `score(text, query)` and `score(segment, query)` methods. Now what's interesting is that this LangChain4j notion of scoring models is also well integrated in the RAG pipeline: ```java VertexAiScoringModel scoringModel = VertexAiScoringModel.builder() .projectId(System.getenv("GCP_PROJECT_ID")) .projectNumber(System.getenv("GCP_PROJECT_NUM")) .projectLocation(System.getenv("GCP_LOCATION")) .model("semantic-ranker-512") .build(); ContentAggregator contentAggregator = ReRankingContentAggregator.builder() .scoringModel(scoringModel) ... .build(); RetrievalAugmentor retrievalAugmentor = DefaultRetrievalAugmentor.builder() ... .contentAggregator(contentAggregator) .build(); return AiServices.builder(Assistant.class) .chatLanguageModel(...) .retrievalAugmentor(retrievalAugmentor) .build(); ``` When creating the AI service, you specify the chat model to use. Additionally, you can integrate a _retrieval augmentor_, which allows you to configure a _content aggregator_. The content aggregator, in turn, can specify a _scoring model_. This process involves three steps, but it enables you to leverage the ranking of semantic search results when implementing RAG. This means you can prioritize the most relevant results based on their semantic similarity, not solely on their vector similarity. ## New parameters for the Vertex AI embedding models Embedding models are critical for RAG, and LangChain4j has had support for the Google Cloud Vertex AI embedding models for a long time. But there are a couple of new flags that have recently been introduced: ```java EmbeddingModel embeddingModel = VertexAiEmbeddingModel.builder() .project(PROJECT_ID) .location("us-central1") .publisher("google") .modelName(MODEL_NAME) .autoTruncate(true) .outputDimensionality(512) ``` The `autoTruncate(true)` method automatically truncates text to embed to a maximum of 2048 tokens. If your input is longer than this limit, you would get an error from the model. With auto-truncation, no more error, but if your text is truncated, you might miss a bit of meaning from the part that was cut off. The other new method is `outputDimensionality(512)`. The Vertex AI embedding models usually default to 768-dimensional vectors. However, our [latest embedding models](https://cloud.google.com/vertex-ai/generative-ai/docs/embeddings/get-text-embeddings) are [_Matryoshka_ embedding models](https://huggingface.co/blog/matryoshka), which means that the most meaningful values in the vector comes first. So when you do vector comparisons, you can make calculations quicker if you focus on the lowest dimensions, and with this new method, you can just return vectors with less dimensions directly. ## Google AI embedding model Speaking of embedding models, if you use the Google AI Gemini model instead of the Vertex AI flavor, you can now also access our embedding models without relying on the Vertex AI models, thanks to the new embedding model for Google AI: ```java var embeddingModel = GoogleAiEmbeddingModel.builder() .apiKey(GOOGLE_AI_GEMINI_API_KEY) .modelName("embedding-001") .maxRetries(3) .logRequestsAndResponses(true) .titleMetadataKey("title") .taskType(GoogleAiEmbeddingModel.TaskType.RETRIEVAL_DOCUMENT) .outputDimensionality(512) .build(); ``` This new embedding model is the same as the one coming from Vertex AI, and has the same feature set. ## Google AI Gemini token count estimation and tokenizer The Google AI Gemini model implements the `TokenCountEstimator` interface, which means you can use the `estimateTokenCount()` method to count tokens: ```java var gemini = GoogleAiGeminiChatModel.builder() .apiKey(GOOGLE_AI_GEMINI_API_KEY) .modelName("gemini-1.5-flash") .build(); int countedTokens = gemini.estimateTokenCount( "What is the capital of France?"); ``` There is also now a `GoogleAiGeminiTokenizer` class, implementing the misnamed `Tokenizer` interface (misnamed because it's not tokenizing text, it's just counting tokens): ```java var geminiTokenizer = GoogleAiGeminiTokenizer.builder() .apiKey(GOOGLE_AI_GEMINI_API_KEY) .modelName("gemini-1.5-flash") .build(); int count = tokenizer.estimateTokenCountInText("Hello world!"); ``` Note that both the `estimateTokenCount()` method and the `GoogleAiGeminiTokenizer` call a remote API endpoint. They don't use a tokenizer class to count the tokens, so those calls incur some network hops. What's interesting with the `Tokenizer`s is that they can be used by document splitters to split documents according to the number of tokens, rather than by characters or other boundaries: ```java DocumentSplitter splitter = DocumentSplitters.recursive( maxSegmentSizeInTokens, maxOverlapSizeInTokens, geminiTokenizer); ``` Currently, only the Google AI module implements this `Tokenizer` interface, but it can be used with the Vertex AI Gemini module as well. But later down the road, I think I'll also implement it for the Vertex AI module. ## Chat listener support Both the Google AI Gemini and the Vertex AI modules implement the new chat listener support. * Vertex AI Gemini: ```java VertexAiGeminiChatModel.builder() .project(System.getenv("GCP_PROJECT_ID")) .location(System.getenv("GCP_LOCATION")) .modelName("gemini-1.5-pro-002") .listeners(singletonList(listener)) .build(); ``` * Google AI Gemini: ```java GoogleAiGeminiChatModel.builder() .apiKey(System.getenv("GEMINI_API_KEY")) .modelName("gemini-1.5-flash-002") .listeners(singletonList(listener)) .build(); ``` Let's have a look at the listener interface, which allows you to listen to model requests, responses, and errors: ```java public interface ChatModelListener { default void onRequest(ChatModelRequestContext reqContext) {...} default void onResponse(ChatModelResponseContext respContext) {...} default void onError(ChatModelErrorContext errContext) {...} } ``` The various `*Context` parameters contain various details about how the model was parameterized, what the prompt was, or what error was encountered. It might be interesting to follow the recent [OpenTelemetry GenAI recommendations](https://opentelemetry.io/docs/specs/semconv/gen-ai/) and implement a listener that directly plugs into your observability solution! ## Enum structured output I'll finish the laundry list of features with the **enum** structured output. The Gemini models have great support for structured output. Not only can you ask for JSON outputs, but you can also specify a JSON schema so that the model follows that schema for generating its JSON response. This is of utmost importance for deterministic parseable results that fit well with your strongly typed programming language. Gemini lets you return arbitray JSON objects and arrays. But for tasks like classification or sentiment analysis, it is also able to return a single enum value, rather than a JSON object that would have a property containing the value. * Vertex AI Gemini: ```java VertexAiGeminiChatModel model = VertexAiGeminiChatModel.builder() .project(System.getenv("GCP_PROJECT_ID")) .location(System.getenv("GCP_LOCATION")) .modelName(GEMINI_1_5_PRO) .responseSchema(Schema.newBuilder() .setType(Type.STRING) .addAllEnum(Arrays.asList("POSITIVE", "NEUTRAL", "NEGATIVE")) .build()) .build(); Response response = model.generate(asList( SystemMessage.from( "Your role is to analyse the sentiment of user's messages"), UserMessage.from( "This is super exciting news, congratulations!") )); System.out.println(response.content().text()); // POSITIVE ``` * Google AI Gemini: ```java GoogleAiGeminiChatModel gemini = GoogleAiGeminiChatModel.builder() .apiKey(System.getenv("GEMINI_API_KEY")) .modelName("gemini-1.5-flash") .responseFormat(ResponseFormat.builder() .type(JSON) .jsonSchema(JsonSchema.builder() .rootElement(JsonObjectSchema.builder() .properties(new LinkedHashMap() {{ put("sentiment", JsonEnumSchema.builder() .enumValues("POSITIVE", "NEUTRAL", "NEGATIVE") .build()); }}) .build()) .build()) .build()) .build(); ChatResponse response = gemini.chat(ChatRequest.builder() .messages( SystemMessage.from( "Your role is to analyse the sentiment of user's messages"), UserMessage.from( "This is super exciting news, congratulations!" ) ) .build()); System.out.println(response.aiMessage().text()); // POSITIVE ``` This is particularly useful for all sorts of classification tasks! ## Documentation updates As I often used to say when working on the Apache Groovy project: > "A feature doesn't exist if it's not documented." > > — Guillaume Laforge With that motto in mind, I thought it was high time that I expanded the documentation for the Gemini related pages of the LangChain4j documentation: * [Google Cloud Storage document loader](https://docs.langchain4j.dev/integrations/document-loaders/google-cloud-storage) * [Google Cloud Ranking API](https://docs.langchain4j.dev/integrations/scoring-reranking-models/vertex-ai) * [Vertex AI embedding models](https://docs.langchain4j.dev/integrations/embedding-models/google-vertex-ai) * [Google AI Gemini models](https://docs.langchain4j.dev/integrations/language-models/google-ai-gemini) * [Google Cloud Vertex AI Gemini models](https://docs.langchain4j.dev/integrations/language-models/google-vertex-ai-gemini) ## Tell me what you use LangChain4j's Gemini support for! I'm really curious about what you are developing with LangChain4j, and even more so, if you're using the various Gemini components. Don't hesitate to reach out to me, via the social media platforms mentioned below! I'm also interested in the features you'd like to see prioritized and implemented. # Some advice and good practices when integrating an LLM in your application When integrating an LLM into your applicaton to extend it and make it smarter, it's important to be aware of the pitfalls and best practices you need to follow to avoid some common problems and integrate them successfully. This article will guide you through some key best practices that I've come across. ## Understanding the Challenges of Implementing LLMs in Real-World Applications One of the first challenges is that LLMs are constantly being improved. This means that the model you start using could change under the hood, and suddenly your application doesn't work as it did before. Your prompts might need adjustments to work with the newer version, or worse, they might even lead to unintended results! Furthermore, you need to consider how to effectively manage your prompts, especially when your applications start to get more complex. Prompts can easily become convoluted and difficult to modify. Imagine having to comb through a hundred lines of code in your application to find a specific prompt just to tweak a single word. That's why **prompt externalization** (not keeping your prompts inside your source files) is going to be important, to easily find your prompts, and have a birds-eye view of all of them! Keeping track of changes and ensuring that the LLM's behavior remains consistent throughout your development process is another challenge. How can you ensure that a particular feature still functions correctly after upgrading your prompts, or even changing model versions? You need to **version your prompts** (we'll cover that in more details in a moment). Think of your prompts like code — just like your software code, prompts should have version control for easy management. Versioning ensures that you can quickly revert to previous versions if necessary, while providing a helpful audit trail to see exactly what changes have occurred in your prompt management process. ## Prompt Engineering for Consistent and Effective LLM Applications You've probably noticed that one of the main things that determines how well your application works with a Large Language Model (LLM) is the _prompt_ you use to guide it. Prompts act like a guidebook for the LLM, explaining what you expect from it and how it should format its response. You've likely heard about the importance of using good prompts, but how do you go about creating prompts that are reliable and adaptable in the long run? Think of your prompts like code artifacts. Just as you version your code to keep track of changes and ensure consistency, you should also version your prompts. This allows you to: - **Keep track of your prompt evolution:** You'll have a clear record of how your prompts have changed over time, which helps you understand the application's evolving behavior. - **Create a helpful audit trail:** Having versions of your prompts will help you understand exactly how the application behaved at specific times. This is essential for debugging, diagnosing issues, or understanding how user feedback impacted your application. - **Enable rollbacks:** If you encounter an issue or want to test different prompt versions, you can easily revert to a previous state to ensure that you can isolate problems, revert to previously working versions, or simply experiment with different phrasing. But simply versioning prompts isn't enough. Imagine you need to make a change to one particular prompt in a massive LLM-powered application. It might involve a lot of tedious code-hunting. That's where **prompt externalization**, that we mentioned earlier, comes in! Externalizing prompts is all about taking them out of your code and treating them like a separate configuration file. This way, they are: - **Easy to modify:** Changing your prompts becomes a breeze. Just go to your external prompt file, make the adjustments, and you're ready to go! No more scouring through complex code to find a single prompt in some string variables somewhere. - **More flexible:** By using externalized prompts, you can easily experiment with different versions or phrasing without rewriting your entire application. This lets you quickly adapt your prompts in response to user feedback or changes in your model. - **Easier to manage:** Keeping prompts in their own dedicated file makes it easy to maintain them, making sure that your prompts are consistent and up-to-date. This approach becomes increasingly valuable as your applications become more complex and you have a growing set of prompts to maintain. There are open source projects, or open formats that have emerged recently, to externalize prompts. For examples Firebase's [GenKit](https://firebase.google.com/docs/genkit) LLM framework came up with their [dotPrompt](https://firebase.google.com/docs/genkit/dotprompt) format, which not only externalizes the prompt itself, but also the name of the model, its configuration (temperature, etc.) ## Model Versioning: Preventing Surprises Now let's cover the importance of managing model versions, a critical aspect of ensuring that your LLM-powered application continues to work reliably. Imagine that you've built a great application using a particular model, and you're proud of the results. However, what happens when the LLM provider releases an updated version? It might offer performance improvements, but the updates can also change how the model responds to your prompts, potentially leading to unexpected issues or even breaking your application. To avoid these unexpected changes, the key principle is to **pin the specific version of the LLM model** that you use for your application. For example, when using Gemini 1.5 Pro, if you use the version `gemini-1.5-pro`, you're actually using the latest version of the model. Currently, it's `gemini-1.5-pro-001`. But if tomorrow Google releases `gemini-1.5-pro-002`, your application would suddenly start using that new version. So be very explicit in the model version. Here's why this is essential: - **Avoid Drifting Model Behavior:** The update to an LLM might come with subtle changes that can lead to a shift in the model's responses, and you may not always be able to anticipate these changes beforehand. This can lead to inconsistency, where a prompt that generated a certain output in one version of the model generates a completely different (and perhaps undesirable) output in a newer version. - **Maintain Application Consistency:** To keep your application performing reliably, you want to control the LLM’s responses as much as possible, and pinning the model version ensures that you can do this. If you're using a specific model, the prompts that are part of your application work in the context of that model's specific training and behaviors. Pinning the version helps you avoid unexpected changes that may interfere with your prompts' effectiveness. - **Simplify Auditing and Debugging:** In case of an unexpected issue or an unexplained change in your LLM's behavior, being able to easily trace back the specific model version that's running provides invaluable context for debugging and understanding why those changes occurred. It helps isolate issues to specific model versions, so you can resolve them quicker. While using the latest and greatest LLM version might seem tempting for its improved capabilities, remember: **the consistent performance and reliability of your application should be a top priority.** By pinpointing the model version you use, you gain better control over its behavior and maintain a smooth and predictable experience for your users. ## Optimizing for Efficiency: The Power of Response Caching Even with well-crafted prompts, pinned versions, generating responses from a Large Language Model (LLM) can still be expensive. This is where **response caching** comes in, offering a crucial way to improve both the performance and the cost-efficiency of your application. Models like [Gemini](https://deepmind.google/technologies/gemini/) support [context caching](https://cloud.google.com/vertex-ai/generative-ai/docs/context-cache/context-cache-overview). Quoting the documentation: > Use context caching to reduce the cost of requests that contain repeat content with high input token counts. Cached context items, such as a large amount of text, an audio file, or a video file, can be used in prompt requests to the Gemini API to generate output. Requests that use the same cache in the prompt also include text unique to each prompt. For example, each prompt request that composes a chat conversation might include the same context cache that references a video along with unique text that comprises each turn in the chat. The minimum size of a context cache is 32,768 tokens. By **caching frequently used responses** or heavy multimodal documents, you avoid having to generate them over and over again, leading to a dramatic improvement in performance and a reduction in LLM usage costs. Imagine users frequently asking the same question, like “What are the benefits of using your app?". By caching the response to this question, you'll be able to provide users with a fast and efficient response without burdening the LLM each time. But how do you actually implement caching? You can choose different strategies for your caching system, each with its own benefits: - **Context Caching:** If your model, like Gemini, supports caching already, be sure to understand how it works, what can be cached or not, the pros and cons, or potential limitations. - **Basic Caching:** Store LLM responses based on the exact input. If you encounter a query that you've already generated, you can provide a pre-cached response, saving on processing time. You could also do some minimal string modifications to normalize whitespace, put everything in lowercase, etc, to get the chance to cache very similar prompts. - **Advanced Caching with Similarity Search:** Even with string normalization, you might find that users don't always ask the exact same question, but the query can still be extremely similar. Think of typos, minor word substitutions, synonyms, or variations in punctuation. Instead of treating every query as unique, consider **approximate nearest neighbor search** and **embedding vector similarity**. This approach helps you find queries that are nearly identical, even with minor variations. You can then leverage this functionality to serve the same cached response for queries that are semantically similar, increasing the effectiveness of your caching strategy and ensuring that you only compute distinct queries once. For vector similarity, make sure to test with different inputs, to find the right threshold to say that a new prompt is equivalent to an older cached prompt/response pair. Caching responses not only speeds up your LLM-powered application, lowering the perceived latency, but also significantly cuts down on LLM usage costs, helping you keep your application running smoothly while maximizing cost-effectiveness. ## Building Safeguards: Ensuring Robustness with Guardrails Let's shift our focus to building safety mechanisms. This is crucial for creating reliable, trustworthy applications. Enter the concept of **guardrails**, which are safety systems designed to protect your application and users from unexpected, unwanted, or even harmful outcomes. Think of guardrails like a protective fence, ensuring that the LLM stays within safe boundaries while performing its tasks. Imagine if someone tried to make an inappropriate request, or worse, a request that could cause harm. This is where guardrails step in. Guardrails serve two main purposes: - **Input Validation:** Guardrails can examine the user input and determine whether it's acceptable for your application and whether it aligns with your intended use case. Imagine preventing your LLM from processing prompts with malicious language or data that could cause harm to users. - **Output Filtering:** Guardrails are not only for examining the user's input but also for checking the outputs of the LLM. By analyzing the LLM's generated responses, you can filter out inappropriate content or responses that don't meet your requirements. What are the three primary types of guardrails? - **Model's safety settings:** Models have usually been fine tuned to avoid certain harmful content in both input and output. They also give you access to safety settings, with different harm categories and safety thresholds. You should test those settings and how they can be configured for your use case. For example, have a look at the available [safety filters](https://cloud.google.com/vertex-ai/generative-ai/docs/multimodal/configure-safety-filters) for Gemini. - **Static Guardrails:** These are predefined rules that are set before the LLM begins to process any input. These can be rules that dictate certain formatting for prompts, limitations on input length, or even basic checks for prohibited terms or requests. Static guardrails offer fast processing times, since the checks are performed beforehand on the input strings, in your own code. - **Dynamic Guardrails:** These are flexible guardrails that work in conjunction with the LLM or the embedding model used for text classification, continuously adapting to changes in user input or the output of the model itself. They allow you to handle complex or unpredictable situations and perform nuanced checks to maintain the safety and integrity of your application. You might have a look at Google Cloud Natural Language Processing's [moderation endpoint](https://cloud.google.com/natural-language/docs/moderating-text), or the free [Perspective API](https://perspectiveapi.com/) used by newspapers. When you implement guardrails, it's also critical to consider performance impact. You want to make sure that these safeguards don't add too much latency and delay user experience. That's where **parallel processing** can come into play! Instead of waiting for the guardrail check to finish before starting the LLM generation, consider launching both tasks in parallel, optimizing speed and efficiency without sacrificing safety. Then, if the guardrails raise a red flag, you can stop the response generation, and reply right away to the user that the input content was problematic. For the response, unless you have a streaming kind of guardrail system, you might have to wait for the whole response to be generated before evaluating it with the guardrail, in which case, you can't really do parallel processing. Always remember: Guardrails should be continuously refined and updated as you identify new potential risks. Gathering feedback from users, giving them the ability to report a bad response is one approach. But you should also monitor your application LLM responses to do some vibe-checks at random to ensure your application is behaving correctly. ## Evaluating and Monitoring for Consistent Performance Onto the most crucial aspects of any application, regardless of its technology, is **evaluation and monitoring**. This is essential for ensuring your LLM-powered application continues to function reliably and meets your expectations as it interacts with users in the real world. Imagine you make an update to your application, or perhaps even a simple tweak to one of your prompts. Without proper monitoring, you won't know if those changes had unintended consequences. You could end up with an app that gives unexpected results, leads to user frustration, or even creates unforeseen safety issues. That's where a robust evaluation and monitoring framework comes into play! Your LLM-powered app needs a systematic way to ensure that everything is running smoothly and effectively. You need to: - **Establish Evaluation Metrics:** You need clear guidelines to judge the LLM's performance. Think of key metrics like accuracy, relevance, and coherence. - **Accuracy:** This measures how often the LLM generates correct and factually accurate responses. This is particularly crucial if your application is designed for providing reliable information or carrying out fact-based tasks. - **Relevance:** You need to make sure the LLM stays focused on the core issue. It should respond to your prompts in a meaningful and helpful way, instead of giving irrelevant or off-topic responses. - **Coherence:** You need to check if the LLM produces well-written and logical text. Coherent responses are easily understood by users and don't leave them feeling confused or disoriented. - **Gather User Feedback:** It's essential to go beyond just numbers. Your application's performance shouldn't just be evaluated on your own terms. Get feedback from the users, gather data on how they are using the application, and check their satisfaction with the outputs of your application. You can even ask users to provide their opinions on specific generated answers, giving you valuable insights into what resonates with them and how you can improve. Consider using tools like "thumbs up" or "thumbs down" buttons, offering an easy way for users to indicate their sentiment towards the LLM's responses, or a way to report and explain what wasn't up to the level of their expectations. - **Build a “Golden Responses” Dataset:** Create a collection of carefully chosen inputs and their desired, accurate responses. These “golden” examples act like benchmarks, helping you measure how closely the LLM matches your expected results for specific tasks. By periodically checking how your LLM performs against these golden examples, you can get a clear picture of potential issues and make necessary adjustments. You can use this set as a starting point to track potential regressions and make sure the LLM's behavior is aligned with your expectations. - **Implement Continuous Monitoring:** Monitoring shouldn't be a one-time event. It's an ongoing process, like keeping a watchful eye on your application as it functions in the real world. By monitoring in real-time, you can detect anomalies, unexpected issues, or performance regressions promptly. It allows you to address these issues before they cause significant problems for your users. Maybe checkout the recent [OpenTelemetry guidelines for Gen AI](https://opentelemetry.io/docs/specs/semconv/gen-ai/) to observe how your system and LLM are performing live. You can further improve your LLM-powered application by analyzing the user's requests and responses generated by the LLM, especially those flagged by users as problematic or unexpected. These can be added to your collection of golden responses, constantly refining the process of evaluation. This helps your application evolve based on real-world interactions. ## Addressing Data Privacy Concerns Another important topic to keep in mind: **data privacy**. LLMs have access to a vast amount of text data, which makes them incredibly powerful. But this same power brings with it the responsibility of safeguarding sensitive information. If your application handles user data, you need to ensure that you're handling it with utmost care, protecting it from unauthorized access and ensuring that you comply with relevant privacy regulations. Think of data privacy as a trust contract. You, as the developer, are entrusted with safeguarding the sensitive information of your users. It's your responsibility to implement measures that keep this data secure and prevent breaches or misuse. Here are some key steps to address data privacy concerns in your LLM application: - **Implement strong security measures:** Use robust encryption methods to secure your application and data. Employ security best practices such as access controls, secure storage, and secure communication channels. - **Stay aligned with data privacy regulations:** Comply with relevant privacy regulations like GDPR, CCPA, and HIPAA. You might need to review your data handling policies and make necessary adjustments. - **Ensure data anonymization:** When working with sensitive data, always strive to anonymize or pseudonymize it to the fullest extent possible. You can utilize techniques like differential privacy, aggregation, or removing identifying details to protect user information (with [Google Cloud Data Loss Prevention](https://cloud.google.com/security/products/dlp) API for example). - **Be transparent with users:** Communicate clearly with your users about how you collect, use, and store their data. Offer users options to control their data, and provide mechanisms to update or delete their information if needed. By prioritizing data privacy in your LLM application, you not only uphold ethical standards but also build trust with your users. Your users should be confident that their information is being handled with respect and care, encouraging long-term trust in your application. ## Tailoring LLMs for Specific Business Goals Remember that LLMs are tools, and the success of your LLM application ultimately hinges on aligning its capabilities with your unique goals and your target audience. So, how do you get the most out of an LLM in your business? **First, define your goals.** What specific tasks can an LLM help you accomplish? What pain points are you trying to solve? Once you understand the big picture, you can break down those goals into actionable tasks that the LLM can potentially assist with. **Then, it's time to find the right LLM for the job.** Not all LLMs are created equal. Different models excel at specific tasks, have varying levels of language support, and even require different levels of computational resources. For example, if your business uses many different languages, you’ll want an LLM with multilingual support. To select the best LLM for your needs, ask yourself: - **What specific task does this LLM need to perform?** Different LLMs excel at different tasks like text generation, summarization, or translation. - **How does the LLM's accuracy compare with the level of accuracy required for your application?** The model needs to generate results with the appropriate level of precision for your goals. - **How much computational power does it need to run this LLM?** Consider your budget and available infrastructure when making this selection, when hosting the model on your own. A cloud hosted model might be better (and cheaper) depending on your usage patterns, and if you don't want the hassle to handle your own infrastructure and GPUs. - **What language capabilities does the LLM offer?** Is the model good at the languages you need to use, or are there specific domains where the model is particularly strong? It's not just about spoken languages, with code as well, some models maybe better dealing with a particular programming language than another one. You can often find models with specialized skills. You may find, for example, a model trained on scientific papers if your work requires the processing of highly technical content, or a model trained on a particular field, such as text of laws, to be highly effective in that domain. Once you’ve chosen your LLM, the next step could be **fine-tuning**, where you’d tailor the model to your specific needs. It’s like customizing a tool to do the exact job you need it to do. For example, imagine your application is helping people book vacations. You can train the model on a massive amount of vacation-related text data so it can accurately understand and respond to vacation-specific questions, making your application highly relevant for its intended purpose. But fine-tuning is not necessarily for the faint of heart, and can be complicated to do right. While choosing and fine-tuning are critical steps, **assessing potential risks is equally important.** Think about potential unintended consequences. LLMs, for example, might not always be factual or accurate in their responses. You'll need to find ways to manage those potential issues, often incorporating guardrails to mitigate potential harms or biases, or implementing techniques like Retrieval Augmented Generation to ground the model's responses on your own data and documents. Ultimately, you'll want to make your application a tool that not only works reliably but also gives real value to your business. By understanding your business goals, choosing the right model, customizing it effectively, and understanding the potential risks, you’re on the right path to success! ## Looking Ahead: Emerging Trends and Future Directions Remember that this field is constantly changing! New capabilities are emerging, and existing models are getting even smarter and more efficient. This is an exciting time to be working with LLMs because the possibilities feel endless! While it's fantastic to get your application off the ground using the latest LLMs, it's equally important to be open to continuous improvement. What's great today may not be optimal in the future. The world of LLMs is one where ongoing development is key! Here are a few tips: - **Embrace continuous learning.** You should always be seeking out information about the newest developments in the field, how LLMs are being enhanced, and the impact those changes could have on your applications. Look out for improvements to existing models, new LLM models coming out, and fresh research. - **Think ahead.** What new features could you integrate in your application that take advantage of those advancements? Would your app benefit from a specific, task-oriented model that focuses on summarization, question answering, or code generation? Maybe there's a model out there that will significantly boost performance and help you offer a smoother, more feature-rich experience for your users! - **Prepare for evolution.** Remember that LLMs aren’t static! Your app should be built with a framework for easy adaptation. Consider how you can adapt to model updates or new model releases in a structured way, perhaps by putting in place frameworks for incorporating new models seamlessly and managing prompt changes for various models, like [LangChain4j](https://docs.langchain4j.dev/) (if you're a Java developer) which offers higher-level abstractions and that allows you to switch models easily. The landscape of LLMs is evolving rapidly. Stay up-to-date with the latest developments and ensure your applications can adapt, allowing you to unlock the full potential of LLMs for your business! # New Gemini model in LangChain4j A new version of [LangChain4j](https://docs.langchain4j.dev/), the super powerful LLM toolbox for Java developers, was released today. In [0.34.0](https://github.com/langchain4j/langchain4j/releases), a new Gemini model has been added. This time, this is not the Gemini flavor from Google Cloud Vertex AI, but the [Google AI](https://ai.google.dev/gemini-api/) variant. It was a frequently requested feature by LangChain4j users, so I took a stab at developing a new chat model for it, during my summer vacation break. ## Gemini, show me the code! Let's dive into some code examples to see it in action! But first, you'll need an API key. So just follow the instructions to [obtain your Gemini API key](https://ai.google.dev/gemini-api/docs/api-key). I've saved mine in the `GEMINI_AI_KEY` environment variable, so that I don't have to hardcode it in my source files. The code examples below have been compiled with Java 21. I've imported the following libraries in my build tool: - `dev.langchain4j:langchain4j-google-ai-gemini:0.34.0` - `dev.langchain4j:langchain4j-core:0.34.0` - `dev.langchain4j:langchain4j:0.34.0` ### Let's be polite and say hello My mom always told me to be polite and to say hello: ```java ChatLanguageModel gemini = GoogleAiGeminiChatModel.builder() .apiKey(System.getenv("GEMINI_AI_KEY")) .modelName("gemini-1.5-flash") .build(); String response = gemini.generate("Konnichiwa Gemini!"); System.out.println("Gemini> " + response); // Gemini> Konnichiwa! It's nice to hear from you. // What can I do for you today? ``` ### Don't you like strawberries? In the first example, I used the usual `generate()` method to send my greeting to Gemini. But LangChain4j 0.34 introduces some new signatures and classes to interact with an LLM: - `ChatRequest`: a new class that contains your conversation messages, the tools this request can use, and a response format definition to decide what should be the shape of the output - `ChatResponse`: this class holds the LLM's response, the token usage information, and the _finish_ reason (ie. if the response was cut, filtered, or was generated till the end) - `ChatResponse chat(ChatRequest req)`: this new method is added to the LLM contract to interact with it. ```java ChatLanguageModel gemini = GoogleAiGeminiChatModel.builder() .apiKey(System.getenv("GEMINI_AI_KEY")) .modelName("gemini-1.5-flash") .build(); ChatResponse chatResponse = gemini.chat(ChatRequest.builder() .messages(UserMessage.from( "How many R's are there in the word 'strawberry'?")) .build()); String response = chatResponse.aiMessage().text(); System.out.println("Gemini> " + response); // Gemini> There are **three** R's in the word "strawberry". ``` ### Let's roll the JSON dice! Both Gemini 1.5 Flash and Pro allow you to specify that the output should be valid JSON. It's sometimes called the **JSON mode**. ```java ChatLanguageModel gemini = GoogleAiGeminiChatModel.builder() .apiKey(System.getenv("GEMINI_AI_KEY")) .modelName("gemini-1.5-flash") .responseMimeType("application/json") .build(); String roll = gemini.generate("Roll a 6-sided dice"); System.out.println(roll); // {"roll": "3"} ``` Gemini will always reply with valid JSON structures. Here, the JSON object key is not always `roll`, and is sometimes `die`, `dice_roll`, etc. But you could tweak your prompt to ask for a specific key name. Gemini follows the instructions very precisely, but it's not guaranteed 100% that it will really use the requested key name. But fear not, there's an even more powerful solution, thanks to response formats! ### Let's cook something with our strawberries You can configure Gemini to make it generate outputs that comply with a JSON schema. It's sometimes called **controlled generation**, or **constrained decoding**. Let's say we have a schema that represents recipes! It's time to do something with our strawberries! ```java ChatLanguageModel gemini = GoogleAiGeminiChatModel.builder() .apiKey(System.getenv("GEMINI_AI_KEY")) .modelName("gemini-1.5-flash") .responseSchema(JsonSchema.builder() .rootElement(JsonObjectSchema.builder() .properties(Map.of( "title", JSON_STRING_SCHEMA, "preparationTimeMinutes", JSON_INTEGER_SCHEMA, "ingredients", JsonArraySchema.builder() .items(JSON_STRING_SCHEMA) .build(), "steps", JsonArraySchema.builder() .items(JSON_STRING_SCHEMA) .build() )) .build()) .build()) .build(); String recipeResponse = gemini.generate( "Suggest a dessert recipe with strawberries"); System.out.println(recipeResponse); /* {"ingredients": ["1 pint fresh strawberries, hulled and sliced", "1/2 cup sugar", "1/4 cup water", "1 tablespoon lemon juice", "1/2 teaspoon vanilla extract", "1 cup heavy cream, whipped"], "preparationTimeMinutes": 30, "steps": ["In a saucepan, combine the sugar, water, and lemon juice. Bring to a boil over medium heat, stirring until the sugar is dissolved.", "Reduce the heat to low and simmer for 5 minutes, or until the syrup thickens slightly.", "Remove from heat and stir in the vanilla extract.", "Pour the syrup over the strawberries in a bowl and stir to coat.", "Refrigerate for at least 30 minutes, or until chilled.", "To serve, top the strawberries with whipped cream and enjoy!"], "title": "Strawberry Shortcake"} */ ``` Gemini strictly follows the specified JSON schema, and generates a JSON object that matches. This is particularly important when you integrate LLMs in your application. You want a deterministic format for the output that can easily be parsed and handled by your system. ### Tasty strawberries from Japan! A few months back, I had the chance to visit Japan with my family, and they have some really gorgeous and tasty strawberries there! And don't get me started on strawberry daifukus (mochis with fruits inside) we had in Osaka! But before tasting those lovely confections, we need to plan our trip to Japan. In the previous example, you might have found that a bit painful to describe the JSON schema. For the integration in a Java application, you might have some more complex data structures to represent, so deriving the big schema to define them can be tedious. Fortunately, there's a little trick to get the JSON schema for a Java class (or record, enum, etc.). Let's define and describe our trip itinerary object: ```java @Description("details of a trip itinerary") record TripItinerary( String country, Integer numberOfPersons, Month month, @Description("key highlights when visiting the city") List cityHighlights ) { enum Month { JANUARY, FEBRUARY, MARCH, APRIL, MAY, JUNE, JULY, AUGUST, SEPTEMBER, OCTOBER, NOVEMBER, DECEMBER } record CityHighlights( String cityName, List visitHighlights ) { } } ``` The itinerary is represented by some records, enums, and lists, and the `@Description` annotation can help the LLM to better understand what some elements might be about (in particular when you have some cryptic field names, but here, it's not strictly necessary as Gemini is smart enough to understand what each field is about) Now let's ask for our Japan itinerary: ```java ChatLanguageModel gemini = GoogleAiGeminiChatModel.builder() .apiKey(System.getenv("GEMINI_AI_KEY")) .modelName("gemini-1.5-flash") .temperature(2.0) .responseSchema(jsonSchemaFrom(TripItinerary.class).get()) .build(); ``` We derive the JSON schema from the `TripItinerary` class. No need to tediously craft a JSON schema for it. Let's see what Gemini suggests for our visit: ```java Response tripResponse = gemini.generate( SystemMessage.from("You are an expert trip planner"), UserMessage.from(""" Suggest an itinerary for Japan. Cities visited: Tokyo, Kyoto, Osaka. Trip for a family of 4 persons. Provide key highlights for each city visited. """) ); ``` We tell Gemini to act as an expert trip planner, and we give some details about the travellers, the cities we'd like to visit. So what's the JSON structured output for this Japan visit? ```java System.out.println(tripResponse); /* {"cityHighlights": [{"cityName": "Tokyo", "visitHighlights": ["Explore the vibrant Shibuya Crossing and the iconic Shibuya Scramble.", "Visit the Meiji Jingu Shrine, a serene oasis in the heart of Tokyo.", "Experience the fascinating world of technology at the Miraikan National Museum of Emerging Science and Innovation.", "Enjoy a traditional tea ceremony at one of Tokyo's many teahouses.", "Get lost in the eclectic streets of Harajuku and admire the unique fashion styles.", "Embark on a scenic boat trip on the Sumida River, passing by Tokyo Skytree.", "Indulge in a delightful sushi dinner at a renowned Tsukiji Fish Market.", "Discover the charm of Ueno Park, home to museums, temples, and the Ueno Zoo."]}, {"cityName": "Kyoto", "visitHighlights": ["Wander through the serene gardens of the Golden Pavilion (Kinkaku-ji).", "Immerse yourselves in the rich history of the Kiyomizu-dera Temple, famous for its wooden stage.", "Explore the ancient Gion district, known for its traditional wooden buildings and geisha houses.", "Stroll through the Fushimi Inari Shrine, famous for its thousands of red torii gates.", "Discover the treasures of the Nishiki Market, offering a diverse selection of food and crafts.", "Experience a traditional geisha performance at one of Kyoto's exclusive theaters.", "Learn the art of calligraphy at a traditional workshop in the Gion district.", "Relax in the serene atmosphere of the Ryoan-ji Zen Garden.", "Witness the beauty of the Arashiyama Bamboo Grove."]}, {"cityName": "Osaka", "visitHighlights": ["Experience the vibrant Dotonbori district, renowned for its neon lights, street food, and entertainment.", "Explore the Osaka Castle, a historic landmark and symbol of the city.", "Enjoy the breathtaking panoramic views from the Abeno Harukas, Japan's tallest skyscraper.", "Visit the Osaka Aquarium Kaiyukan, home to diverse marine life from around the world.", "Stroll through the lively Kuromon Market, known for its fresh seafood and local produce.", "Take a scenic ride on the Osaka Ferris Wheel, offering views of the cityscape.", "Indulge in the delicious okonomiyaki, Osaka's signature dish.", "Experience the unique culture of the Sumiyoshi Taisha Shrine, dedicated to the gods of seafaring."]}], "country": "Japan", "month": "MARCH", "numberOfPersons": 4} */ ``` Damn! It didn't even mention the most delicious daifukus we had in Osaka! ### Can I go outside without my umbrella in Osaka, tonight? Speaking of visiting Osaka and those great daifukus, what's the weather like there? It's been raining a lot in Paris today, so I'm curious if it's better in Osaka. This new Gemini chat model works with LangChain4j's higher-level abstractions: `AiServices`, to create some very powerful LLM based apps, like smart agents or RAG (Retrieval Augmented Generation). We'll have a look at a great use case for LLMs: **data extraction from free-form text**. Let's define and describe a weather forecast record: ```java record WeatherForecast( @Description("minimum temperature") Integer minTemperature, @Description("maximum temperature") Integer maxTemperature, @Description("chances of rain") boolean rain ) { } ``` We'll also create an interface for our weather service contract: ```java interface WeatherForecastAssistant { WeatherForecast extract(String forecast); } ``` Let's configure Gemini, instantiate our weather assistant, and extract the weather forecast from today's newspaper: ```java ChatLanguageModel gemini = GoogleAiGeminiChatModel.builder() .apiKey(System.getenv("GEMINI_AI_KEY")) .modelName("gemini-1.5-flash") .build(); WeatherForecastAssistant forecastAssistant = AiServices.builder(WeatherForecastAssistant.class) .chatLanguageModel(gemini) .build(); WeatherForecast forecast = forecastAssistant.extract(""" Morning: The day dawns bright and clear in Osaka, with crisp autumn air and sunny skies. Expect temperatures to hover around 18°C (64°F) as you head out for your morning stroll through Namba. Afternoon: The sun continues to shine as the city buzzes with activity. Temperatures climb to a comfortable 22°C (72°F). Enjoy a leisurely lunch at one of Osaka's many outdoor cafes, or take a boat ride on the Okawa River to soak in the beautiful scenery. Evening: As the day fades, expect clear skies and a slight chill in the air. Temperatures drop to 15°C (59°F). A cozy dinner at a traditional Izakaya will be the perfect way to end your day in Osaka. Overall: A beautiful autumn day in Osaka awaits, perfect for exploring the city's vibrant streets, enjoying the local cuisine, and soaking in the sights. Don't forget: Pack a light jacket for the evening and wear comfortable shoes for all the walking you'll be doing. """); System.out.println("Gemini> " + forecast); // Gemini> WeatherForecast[ // minTemperature=15, // maxTemperature=22, // rain=false] ``` Awesome, no need for my umbrella! What's great here is that we're dealing with a real type-safe Java object, not JSON strings like before. So it integrates very well within our Java codebase! ### Time for a little coding quiz Alright, after the touristic detour, let's get back to some math, and some coding. LLMs are quite good at reasoning, in particular when you encourage them to think _step by step_. But sometimes, they fall short, and can't really calcuate results. They're language models, not calculators, right? Gemini has the ability to create some Python scripts, and to execute them in a sandbox. So how can we configure Gemini for solving a little math problem? ```java ChatLanguageModel gemini = GoogleAiGeminiChatModel.builder() .apiKey(System.getenv("GEMINI_AI_KEY")) .modelName("gemini-1.5-flash") .allowCodeExecution(true) .includeCodeExecutionOutput(true) .build(); ``` There are 2 builder methods: - `allowCodeExecution(true)`: to let Gemini know it can do some Python coding - `includeCodeExecutionOutput(true)`: if you want to see the actual Python script it came up with, and the output of its execution Do you know off head how much is `fibonacci(22)` or `ackermann(3, 4)`? Let's ask Gemini: ```java Response mathQuizz = gemini.generate( SystemMessage.from(""" You are an expert mathematician. When asked a math problem or logic problem, you can solve it by creating a Python program, and execute it to return the result. """), UserMessage.from(""" Implement the Fibonacci and Ackermann functions. What is the result of `fibonacci(22)` - ackermann(3, 4)? """) ); ``` Looks like Gemini is a Python and math wiz: ```` Code executed: ```python def fibonacci(n): if n <= 1: return n else: return fibonacci(n-1) + fibonacci(n-2) def ackermann(m, n): if m == 0: return n + 1 elif n == 0: return ackermann(m - 1, 1) else: return ackermann(m - 1, ackermann(m, n - 1)) print(fibonacci(22) - ackermann(3, 4)) ``` Output: ``` 17586 ``` The result of `fibonacci(22) - ackermann(3, 4)` is **17586**. I implemented the Fibonacci and Ackermann functions in Python. Then I called `fibonacci(22) - ackermann(3, 4)` and printed the result. ```` If you don't include the script code and output, you would receive only the end of the message: ``` The result of `fibonacci(22) - ackermann(3, 4)` is **17586**. I implemented the Fibonacci and Ackermann functions in Python. Then I called `fibonacci(22) - ackermann(3, 4)` and printed the result. ``` I didn't encounter any snake in Japan, but I'm happy Gemini can write some Python functions when needed! ### What about the weather in Tokyo? Besides this Python code execution sandbox, the more traditional **function calling** mechanism works. We heard about the weather in Osaka, now let's ask for Tokyo. Let's define a _tool_ to retrieve structured weather forecasts: ```java record WeatherForecast( String location, String forecast, int temperature) {} class WeatherForecastService { @Tool("Get the weather forecast for a location") WeatherForecast getForecast( @P("Location to get the forecast for") String location) { if (location.equals("Paris")) { return new WeatherForecast("Paris", "sunny", 20); } else if (location.equals("London")) { return new WeatherForecast("London", "rainy", 15); } else if (location.equals("Tokyo")) { return new WeatherForecast("Tokyo", "warm", 32); } else { return new WeatherForecast("Unknown", "unknown", 0); } } } ``` We need a weather forecast assistant as well, that we'll instantiate and configure with our tool, thanks to `AiServices`: ```java interface WeatherAssistant { String chat(String userMessage); } WeatherForecastService weatherForecastService = new WeatherForecastService(); ChatLanguageModel gemini = GoogleAiGeminiChatModel.builder() .apiKey(System.getenv("GEMINI_AI_KEY")) .modelName("gemini-1.5-flash") .temperature(0.0) .build(); WeatherAssistant weatherAssistant = AiServices.builder(WeatherAssistant.class) .chatLanguageModel(gemini) .tools(weatherForecastService) .build(); String tokyoWeather = weatherAssistant.chat( "What is the weather forecast for Tokyo?"); System.out.println("Gemini> " + tokyoWeather); // Gemini> The weather forecast for Tokyo is warm // with a temperature of 32 degrees. ``` I didn't expect such warm temperatures in Tokyo! Well, of course, it's all fake, but you can imagine calling a real weather service. The beauty of LangChain4j's `AiServices` is that it handles calling the service for you. Usually, with function calling, the LLM just replies with a request that says _you_ should be calling a tool or API, and give it back the tool/API's answer. Here, with `AiServices`, it's all automatic and transparent. ### Of parrots pictures, text files, and multimodality Let's finish our whirlwind tour of this Google AI Gemini model for LangChain4j with an example that highlights **Gemini's multimodal capabilities**. Gemini is a multimodal LLM: in input, in addition to text, it accepts pictures, videos, audio, PDF files, and text files. Let's ask what Gemini thinks of the cute colorful parrot mascot of LangChain4j. ```java // README.md markdown file from LangChain4j's project Github repos String base64Text = b64encoder.encodeToString(readBytes( "https://github.com/langchain4j/langchain4j/blob/main/README.md")); // PNG of the cute colorful parrot mascot of the LangChain4j project String base64Img = b64encoder.encodeToString(readBytes( "https://avatars.githubusercontent.com/u/132277850?v=4")); ChatLanguageModel gemini = GoogleAiGeminiChatModel.builder() .apiKey(System.getenv("GEMINI_AI_KEY")) .modelName("gemini-1.5-flash") .build(); Response response = gemini.generate( UserMessage.from( TextFileContent.from(base64Text, "text/x-markdown"), ImageContent.from(base64Img, "image/png"), TextContent.from(""" Do you think this logo fits well with the project description? """) ) ); System.out.println("Gemini> " + response); /* Gemini> The logo of a parrot drinking tea doesn't seem like a good fit for a project description of a Java version of LangChain. It's not clear how the logo relates to the project's purpose or functionality. A logo that better reflects the project's technical nature, such as a stylized representation of code or a language model, would be more appropriate. */ ``` Ah well, looks like LLM don't want to be compared to _stochastic parrots_, so it thinks the parrot mascot doesn't represent the project well enough! Sorry Gemini, I have to disagree, I really love this logo! This example shows that you can craft an elaborate prompt that contains a text query, an external text file (the description of the project in Markdown format), and the picture of the parrot mascot. ## Let's wrap up — with beautiful 'furoshiki' fabric! Throughout this journey through code examples, strawberries, daifukus, Japan itineraries and weather forecasts, you learned about the brand new LangChain4j module for Google AI's Gemini API, and its capabilities. I hope this article makes you want to try it out! Before calling it a day or night (depending on your timezone), I'd like to mention some limitations, as it's still early days for this new module: - Currently, there's only a `ChatLanguageModel` available, but no `StreamingChatLanguageModel` class, so you won't get streamed responses yet. - Gemini's content caching capability is not surfaced in this implementation, so you can't use caching to save some bucks or yens. - For multimodality, you should pass the Base64 encoded bytes of the files, and not use URLs to reference those resources, as this module doesn't yet upload files to Gemini's file service (Gemini won't download from an external URL). Hopefully, the community will adopt this module, work with it, provide feedback to help us improve it further! Don't hesitate to reach out with questions or to report any problems you encounter. And if you build something cool, please tell me too! # A retryable JUnit 5 extension for flaky tests As I work a lot with Large Language Models (LLMs), I often have to deal with flaky test cases, because LLMs are not always consistent and deterministic in their responses. Thus, sometimes, a test passes maybe a few times in a row, but then, once in a while, it fails. Maybe some prompt tweaks will make the test pass more consistently, lowering the temperature too, or using techniques like few-shot prompting will help the model better understand what it has to do. But in some circumenstances, you can't find ways around those weird failures, and the sole solution I found was to make a test _retryable_. If a test fails, let's retry a few more times (2 or 3 times) till it passes. But if it fails everytime in spite of the retries, then it'll just fail as expected. I wrote JUnit _Rules_ in the past for such situations, but that was in the JUnit 4 days. Now, I'm using JUnit 5, and although it's possible to make JUnit 4 tests run under JUnit 5, I thought it was a great opportunity to try creating a JUnit 5 _extension_, which is the more powerful mechanism that replaces JUnit 4 rules. ## It all starts with a failing test case Let's say you have an hypothetical flaky test that fails a few times in a row: ```java private static int count = 1; @Test void test_custom_junit_retry_extension() { assertThat(count++).isEqualTo(4); } ``` The first 3 executions will see an assertion failure, but the 4th would succeed as the counter is then equal to `4`. I'd like to annotate this test method with a custom annotation that indicates the number of times I'm ready to retry that test: ```java private static int count = 1; @Test @ExtendWith(RetryExtension.class) @Retry(4) void test_custom_junit_retry_extension() { assertThat(count++).isEqualTo(4); } ``` This `@ExtendWith()` annotation indicates that I'm registering a JUnit 5 extension. And `@Retry(4)` is a custom annotation that I've created. Note that `@ExtendWith()` can be at the class-level, but it can also live at the method level. Let's have a look at the `@Retry` annotation: ```java import java.lang.annotation.Retention; import java.lang.annotation.RetentionPolicy; @Retention(RetentionPolicy.RUNTIME) public @interface Retry { int value() default 3; } ``` By default, I attempt the test 3 times, if no number is provided for the annotation value. Now it's time to see how the extension code works: ```java import org.junit.jupiter.api.extension.ExtensionContext; import org.junit.jupiter.api.extension.TestExecutionExceptionHandler; import java.util.concurrent.atomic.AtomicInteger; public class RetryExtension implements TestExecutionExceptionHandler { private final AtomicInteger counter = new AtomicInteger(1); private void printError(Throwable e) { System.err.println( "Attempt test execution #" + counter.get() + " failed (" + e.getClass().getName() + "thrown): " + e.getMessage()); } @Override public void handleTestExecutionException( ExtensionContext extensionContext, Throwable throwable) throws Throwable { printError(throwable); extensionContext.getTestMethod().ifPresent(method -> { int maxExecutions = method.getAnnotation(Retry.class) != null ? method.getAnnotation(Retry.class).value() : 1; while (counter.incrementAndGet() <= maxExecutions) { try { extensionContext.getExecutableInvoker().invoke( method, extensionContext.getRequiredTestInstance()); return; } catch (Throwable t) { printError(t); if (counter.get() >= maxExecutions) { throw t; } } } }); } } ``` Let's go through the code step by step: * The extension has a counter to count the number of executions * a `printError()` method is used to report the assertion failure or exception * The class implements the `TestExecutionExceptionHandler` interface * That interface requires the method `handleTestExecutionException()` to be implemented * This method is invoked when a test throws some exception * If an exception is thrown, let's see if the method is annotated with the `@Retry` annotation * and let's retrieve the number of attempts demanded by the developer * Then let's loop to do some more executions of the test method, until it passes or up to the number of attempts ## Missing standard JUnit 5 extension? I thought a `@Retry` extension would be pretty common, and that it would be integrated in JUnit 5 directly. Or at least, some library would provide common JUnit 5 extensions? But my search didn't yield anything meaningful. Did I overlook or miss something? At least now, I have a solution to work around some flaky tests, thanks to this retryable extension! ## Going further If you want to learn more about JUnit 5 extensions, there were a few resources that helped me develop this extension. First of all, two artciles from Baeldung on [Migrating from JUnit 4 to JUnit 5](https://www.baeldung.com/junit-5-migration) to understand the changes since JUnit 4, and this [Guide to JUnit 5 Extensions](https://www.baeldung.com/junit-5-extensions). And of course, the JUnit 5 documentation on [extensions](https://junit.org/junit5/docs/current/user-guide/#extensions). > ## Update > > I'm glad I shared this article on Twitter, because I immediately got a response! > Thanks [@donal_tweets](https://x.com/donal_tweets) > for your [answer](https://x.com/donal_tweets/status/1830260408462221622)! > > The [JUnit Pioneer](https://junit-pioneer.org/) library provides a JUnit 5 extension pack, > which includes a powerful [retrying extension](https://junit-pioneer.org/docs/retrying-test/). > Replace the usual `@Test` annotation with `@RetryingTest`. > You can specify the number of attempts, the minimum number of successes, or some wait time before retries. > > There's also a [rerunner](https://github.com/artsok/rerunner-jupiter) extension that is quite similar. > > My friend [@aheritier](https://x.com/aheritier) also suggested that > Maven Surefire can be configured to automatically > [retry failing tests](https://maven.apache.org/surefire/maven-surefire-plugin/examples/rerun-failing-tests.html) > a few times, thanks to a special flag: > ``` > mvn -Dsurefire.rerunFailingTestsCount=2 test > ``` > In my case, I don't want to retry all failing tests, but only a specific one that I know is flaky. > > For those who prefer Gradle over Maven, there's a Gradle plugin as well: > [test-retry](https://github.com/gradle/test-retry-gradle-plugin). > You can configure the behavior in your `build.gradle` file: > ```groovy > test { > retry { > maxRetries = 2 > maxFailures = 20 > failOnPassedAfterRetry = true > } > } > ``` > > Someone also suggested me to use fuzzy assertions, but my test is very binary as it either fails or succeeds. > There's no threshold, or value that would fit within some bounds. # Let LLM suggest Instagram hashtags for your pictures In this article, we'll explore another great task where Large Language Models shine: **entity and data extraction**. LLMs are really useful beyond just mere chatbots (even smart ones using Retrieval Augmented Generation). Let me tell you a little story of a handy application we could build, for wannabe Instagram influencers! ## Great Instagram hashtags, thanks to LLMs When posting Instagram pictures, I often struggle with finding the right hashtags to engage with the community. Large Language Models are pretty creative, and they've certainly seen a bunch of Instagram pictures with their descriptions. So it's natural to try asking an AI like [Gemini](https://gemini.google.com/app) what it could suggest in terms of hashtags: ![Gemini Instagram Hashtag suggestion](/img/gemini/gemini-instagram-hashtags.png) This is a picture taken in the port of Heraklion in Crete, a Greek island. Here's the [conversation](https://g.co/gemini/share/476eb5dd974a) I had with Gemini, if you want to see all the tags it suggested. I think you'll agree with me that those hashtags look pretty good. Gemini was able to recognise where the picture was taken, as it had tags like `#heraklion`, `#crete`, `#greece`, `#greekisland`, etc. In another attempt, it even told me the name of the fortress of the Venetian port, and suggested other tags along those lines. We also have several tags typically found on Instagram, like `#travelgram`, `#instatravel`, and more specific tags like `#cretephotography`. My developer mind started quickly spinning with ideas of an online tool to help users be more creative with their instagram tags. Armed with my usual tools of trade: Java, and [LangChain4j](https://docs.langchain4j.dev/), I tried to see how I could implement such a tool. When you want to integrate an LLM into an application, it's important to be able to use more structured outputs than plain text. And what's great with the Gemini 1.5 Flash model is that it can generate a JSON response, and Gemini 1.5 Pro can even follow a specific JSON schema (also called [controlled generation](https://cloud.google.com/vertex-ai/generative-ai/docs/multimodal/control-generated-output)). ## Let's implement an Instagram hashtag generator First, let's see what the Gemini API responds with a plain prompt, without trying to return some JSON payload: ```java var modelCreative = VertexAiGeminiChatModel.builder() .project(PROJECT_ID) .location(LOCATION) .modelName("gemini-1.5-flash") .build(); List messages = new ArrayList<>(); messages.add(SystemMessage.from(""" You are an Instagram influencer and expert. You master the fine art of choosing the best creative hashtags to share users' best pictures, and to ensure engagement with the Instagram community is the highest possible. """)); messages.add(UserMessage.from( ImageContent.from( Paths.get("src/main/resources/travel-picture.jpg").toUri()), TextContent.from( "What are the best Instagram hashtags to describe that picture?") )); Response response = modelCreative.generate(messages); String responseText = response.content().text(); System.out.println(responseText); ``` - First, we instantiate a Vertex AI Gemini chat model - We prepare a list of messages: one system message to describe the task the LLM must achieve, and one compound message made of the picture and the request to generate hashtags for it. Gemini is a **multimodal** model that accepts text, but also images, audios, videos, and PDF files. - Then we ask Gemini to generate a response, which we then print. The Gemini LLM API will reply with a nice and creative response suggesting lots of great hashtags, along with some interesting explanations: ``` Here are some relevant hashtags for your picture: **General Hashtags:** #greece #greekislands #crete #creteisland #cretelife #holiday #travel #vacay #instatravel #travelgram #beautifuldestinations #travelphotography #photooftheday #instagood #igdaily #instapic **Specific Hashtags:** #heraklion #heraklioncrete #venetianharbour #harbourlife #boatlife #boatday #greekboats #greekharbour #beautifulcrete #creteharbour #cretevibes #greece_gram #crete_gram #greece_travel **Trending Hashtags:** #wanderlust #traveltuesday #instagood #travelgram #beautifuldestinations #adventuretime **Hashtags for Engagement:** #explorecrete #cretelove #lovegreece #greecevibes #cretephoto #creteexperience #discovercrete Make sure to also use a mix of hashtags with different lengths and popularity. You can also use location hashtags to target people in your area. For example, you could use hashtags like #heraklioncrete or #creteharbour to target people who are interested in visiting that specific location. You should also consider using a mix of popular and less popular hashtags. Popular hashtags will help your post reach a wider audience, while less popular hashtags can help your post stand out from the crowd. **Tips for Choosing Hashtags:** * Research relevant hashtags. Look at what other users are using and see what's trending. * Use a mix of general and specific hashtags. * Don't use too many hashtags. * Keep your hashtags relevant to your image. By using a mix of relevant and engaging hashtags, you can increase your chances of getting your post seen by a wider audience and improve your engagement on Instagram. ``` Now, let's tweak this code to output the Instagram hashtags as a JSON array of hashtag strings. With LangChain4j's Gemini integration, it's pretty trivial, we can specify that we want to return JSON payloads, thanks to the `responseMimeType()` method: ```java var modelCreative = VertexAiGeminiChatModel.builder() .project(PROJECT_ID) .location(LOCATION) .modelName("gemini-1.5-flash") .responseMimeType("application/json") .build(); ``` Let's update the system instruction to be explicit about what the JSON should look like: ````java List messages = new ArrayList<>(); messages.add(SystemMessage.from(""" You are an Instagram influencer and expert. You master the fine art of choosing the best creative hashtags to share users' best pictures, and to ensure engagement with the Instagram community is the highest possible. Return a JSON array containing the hashtags as strings, for example: ```json ["#beach", "#island", "#traveltahiti"] ``` """)); ```` Now let's see the LLM's response: ```json [ "#greece", "#crete", "#heraklion", "#cretelife", "#mediterraneansea", "#creteisland", "#greecevacations", "#greekislands", "#cretetravel", "#heraklionport" ] ``` On one hand, it's nice that Gemini obeyed us and generated the request JSON array of hashtags. However, notice that there are fewer hashtags, which are also a bit less creative. ## LLMs are less creative when constrained It's not the first time I noticed that behavior with an LLM. I tried this with other LLMs like ChatGPT. It seems that LLMs are less creative when they are constrained to follow a stricter output. And indeed, what actually trigged this article idea and example was this paper that I came across: [Let Me Speak Freely? A Study on the Impact of Format Restrictions on Performance of Large Language Models](https://arxiv.org/abs/2408.02442#) which has been published on arXiv a few days ago, which confirms my intuition that LLMs are less creative when using controlled generation: > Structured generation, the process of producing content in standardized formats like JSON and XML, > is widely utilized in real-world applications to extract key output information from large language models (LLMs). > This study investigates whether such constraints on generation space impact LLMs' abilities, > including reasoning and domain knowledge comprehension. > Specifically, we evaluate LLMs' performance when restricted to adhere to structured formats > versus generating free-form responses across various common tasks. > Surprisingly, **we observe a significant decline in LLMs' reasoning abilities under format restrictions**. > Furthermore, we find that stricter format constraints generally lead to greater performance degradation in reasoning tasks. ## A better solution with a two-step approach with entity extraction Since LLMs are not as good when we control their generation, we can try a slighly smarter approach: - Firstly, we can ask the LLM to give its usual plain-text creative answer, - Secondly, we ask the LLM to actually extract all the hashtags from the previous response, using controlled gneration. LLMs are great at various classical Natural Language Processing tasks like **entity extraction**. And here, indeed, what we want is to just extract the hashtags from the plain-text response. For such a task, controlled generation won't hinder the creativity, and will be acurate and extract correctly all the tags. The aforementioned paper seemed to also hint at the fact that controlled generation can actually help with some tasks like classification. Let's have a look at our improved approach. We keep the first attempt from the beginning of this article, without using controlled generation, but we'll use a different configuration for the second step: ```java var modelExtraction = VertexAiGeminiChatModel.builder() .project(PROJECT_ID) .location(LOCATION) .modelName("gemini-1.5-pro") .responseSchema(SchemaHelper.fromClass(String[].class)) .build(); List messagesForExtraction = new ArrayList<>(); messagesForExtraction.add(SystemMessage.from(""" Your job is to extract Instagram hashtags from a given text, and return them as a JSON array of strings representing those hashtags. """)); messagesForExtraction.add(UserMessage.from(""" Here is the text to extract Instagram hashtags from: """ + responseText)); Response responseFromExtraction = modelExtraction.generate(messagesForExtraction); String extractedTagsJson = responseFromExtraction.content().text(); System.out.println(extractedTagsJson); ``` - In this example, I used Gemini 1.5 Pro instead of Gemini 1.5 Flash to show you the use of the `responseSchema()` method which allows you to specify the exact shape of the JSON we want to retrieve. I could have used Gemini 1.5 Flash like before, but I have to give a bit more prompting help to specify the JSON schema. - This time, we use a different system message to explain the task of hashtag extraction. - And the user message reuses the creative response from the previous LLM call to extract hashtags from it. So what's the output like? ```json [ "#greece", "#greekislands", "#crete", "#creteisland", "#cretelife", "#holiday", "#travel", "#vacay", "#instatravel", "#travelgram", "#beautifuldestinations", "#travelphotography", "#photooftheday", "#instagood", "#igdaily", "#instapic", "#heraklion", "#heraklioncrete", "#venetianharbour", "#harbourlife", "#boatlife", "#boatday", "#greekboats", "#greekharbour", "#beautifulcrete", "#creteharbour", "#cretevibes", "#greece_gram", "#crete_gram", "#greece_travel", "#wanderlust", "#traveltuesday", "#instagood", "#travelgram", "#beautifuldestinations", "#adventuretime", "#explorecrete", "#cretelove", "#lovegreece", "#greecevibes", "#cretephoto", "#creteexperience", "#discovercrete" ] ``` Excellent! It managed to extract all the tags of the creative response! ## Conclusion & discussion Even if researchers found that LLMs may be less creative when constrained with controlled generation, we can find workarounds to prevent suffering from this limitation, like we did with this two-step approach by making two calls. The first call is a creative one, while the second is the data extraction one. One drawback of this approach, however, is that we had to make two calls to the LLM. So this can be a bit more costly in terms of tokens generated. And it also adds latency, because we have two calls instead of just one. So you might have to balance cost & lantency with quality, depending on your use case. But it's always great to have the choice! # Gemini Nano running locally in your browser Generative AI use cases are usually about running large language models somewhere in the cloud. However, with the advent of smaller models and open models, you can run them locally on your machine, with projects like [llama.cpp](https://github.com/ggerganov/llama.cpp) or [Ollama](https://ollama.com/). And what about in the browser? With [MediaPipe](https://github.com/google-ai-edge/mediapipe) and [TensorFlow.js](https://www.tensorflow.org/js), you can train and run small neural networks for tons of fun and useful tasks (like recognising hand movements through the webcam of your computer), and it's also possible to run [Gemma](https://ai.google.dev/gemma/) 2B and even 7B models. But there's something interesting cooking these days: **built-in language models in the browser**! The Chrome developers are working on a new Web API to integrate LLMs in the browser, and are experimenting with the [Gemini Nano](https://deepmind.google/technologies/gemini/nano/) model (already integrated in some smartphones like Samsung Galaxy or Google Pixel phones) inside [Chrome Canary](https://www.google.com/chrome/canary/). ## Getting started with Gemini Nano and Chrome Canary I'm sure you want to experiment with that too? Let's see how to proceed: - First of all, you'll need to download [Chrome Canary](https://www.google.com/chrome/canary/) - In `chrome://flags`, you must **enable** two experiments: - `Prompt API for Gemini Nano` and - `Enables optimization guide on device`. - You'll have to restart the browser, after having enabled those two flags. It may take quite a bit of time to download Gemini Nano (as it's a small model, it takes only around 1.7GB of space, but you'll need about 20GB at installation time on your hard drive) but the API will tell you if the model weights are not fully downloaded yet. ## Experimenting in the playground Now it's time to play! Let's see what this embedded Gemini Nano can do, in the [Prompt API playground](https://chrome.dev/prompt-api-playground/). This is a simple form where you can send prompts to the model, and see its replies. ![](/img/gemini/nano-playground.png) Looks like it's smart enough to know that no cat ever went on the moon! ## A bit of code The [code of this demo](https://github.com/tomayac/prompt-api-playground) is available on Github. Let's have a look at the key lines of the **Prompt API** usage. To know if the browser supports the Prompt API, you'll need to check the existence of the new `ai` object on `window`: ```javascript if (!window.ai) { ... } ``` Then you'll have to create a **text session** with: ```javascript const session = await window.ai.createTextSession(); ``` Then you can either wait for the full response, or stream the tokens as they are generated. Here, let's see the streaming scenario, and how to iterate over the streamed tokens: ```javascript const stream = await session.promptStreaming( "What's the name of the first cat who stepped on the moon?" ); for await (const chunk of stream) { var fullResponse = chunk.trim(); // do something with the response, like appending it to a DOM node } ``` If you're not streaming the response, you can also do as follows, to get the response in one go once it's fully generated: ```javascript const result = await session.prompt( "What's the name of the first cat who stepped on the moon?" ); ``` ## But why running AI in the browser? Maybe I should have started there, afterall? Why would you want to run models locally in the browser, rather than using a cloud-hosted one? As the [documentation](https://developer.chrome.com/docs/ai/built-in) outlines: * For **privacy reasons**: you may want to do local processing of sensitive data, to avoid sending such information on the web. * For **latency gains**: once the model is loaded in the browser (in about 3 seconds on my machine), the model responds super fast to all subsequent requests. So you can have a very snappy experience, without the long roundtrip through the internet! * For **lower costs**: since all the AI inference is done in the browser, it's not going to cost you anything on the server-side. * For **offline usage**: as it runs in the browser, even if you lost your internet connection, your Web UI will continue to function with all its smart AI features. ## Interesting resources * Checkout the [Prompt API playground](https://chrome.dev/prompt-api-playground/) to play with it (after having followed the instructions above). * Have a look at the [sources](https://github.com/tomayac/prompt-api-playground) of the playground to learn how the demo is done. * There's a nice [publication](https://medium.com/google-cloud/google-chrome-has-a-secret-ai-assistant-9accb95f1911) that shows how to use the Prompt API to summarize the content of the web page displayed in your browser. * The HuggingFace people have an [extended article](https://huggingface.co/blog/Xenova/run-gemini-nano-in-your-browser) on how to run Gemini Nano in the browser, with some advanced details about the Prompt API. * Read the pages that explains the [goals of the built-in AI](https://developer.chrome.com/docs/ai/built-in). * It's interesting to glance through the [explainer](https://github.com/explainers-by-googlers/prompt-api/) of the Prompt API to understand how it's been designed. * And the best resource for the end, the [user guide of the built-in AI early preview](https://docs.google.com/document/d/1VG8HIyz361zGduWgNG7R_R8Xkv0OOJ8b5C9QKeCjU0c/edit), which gives lots of details about the Prompt API. ## Summary I've been focusing mostly on large language models in the cloud so far, in particular [Gemini](https://cloud.google.com/vertex-ai/generative-ai/docs/start/quickstarts/quickstart-multimodal#gemini-text-only-samples-java), but I'm excited at the prospect of the interesting use cases that it can enable. Imagine, for example, a travel itinerary application, that would store all the information of your trip locally (in IndexedDB or a WebAssembly-fied sqlite), and you could ask offline all the questions you want about the journey? (basically, **RAG in the browser**!) No need to hunt for a public wifi network or a local SIM card. There are also many tasks some browser extension could handle: * When preparing my podcast episode and show notes, I could ask Gemini Nano to make a 5-bullet-point summary of the article I'm reading. * When reading the reviews for a product, I could get a sentiment analysis signal that tells me if customers are happy with that product. We could also think of some hybrid scenarios, as both cloud-hosted and local-running models could complement each other. I hope this Web API will become a standard and that other browsers support it too, and offer different models as well. # Sentiment analysis with few-shot prompting In a rencent article, we talked about [text classification]({{}}) using [Gemini](https://deepmind.google/technologies/gemini/) and [LangChain4j](https://docs.langchain4j.dev/). A typical example of text classification is the case of **sentiment analysis**. In my LangChain4j-powered Gemini [workshop](https://github.com/glaforge/gemini-workshop-for-java-developers/), I used this use case to illustrate the classification problem: ```java ChatLanguageModel model = VertexAiGeminiChatModel.builder() .project(System.getenv("PROJECT_ID")) .location(System.getenv("LOCATION")) .modelName("gemini-1.5-flash-001") .maxOutputTokens(10) .maxRetries(3) .build(); PromptTemplate promptTemplate = PromptTemplate.from(""" Analyze the sentiment of the text below. Respond only with one word to describe the sentiment. INPUT: This is fantastic news! OUTPUT: POSITIVE INPUT: Pi is roughly equal to 3.14 OUTPUT: NEUTRAL INPUT: I really disliked the pizza. Who would use pineapples as a pizza topping? OUTPUT: NEGATIVE INPUT: {{text}} OUTPUT: """); Prompt prompt = promptTemplate.apply( Map.of("text", "I love strawberries!")); Response response = model.generate(prompt.toUserMessage()); System.out.println(response.content().text()); ``` I used a `PromptTemplate` to craft the prompt, with a `{{text}}` placeholder value to analyze the sentiment of that particular text. Notice that I used the [few-shot prompting](https://learnprompting.org/docs/basics/few_shot) technique, with example inputs and outputs. ## Few-shot prompting with a list of messages Somehow, I had the impression that this `INPUT/OUTPUT` notation was a bit of a _hack_ to encourage the LLM to believe this is an actual exchange between the user and the AI. I believed it would be cleaner to use a real list of messages that alternate user and AI messages. So I implemented this alternative approach, but haven't yet committed it to my workshop repository. Meanwhile, as I was chatting with my colleague [Dan Dobrin](https://x.com/ddobrin), he pointed me at this very recent blog [post](https://blog.langchain.dev/few-shot-prompting-to-improve-tool-calling-performance/) from the LangChain people, who were investigating _few-shot prompting to improve tool-calling performance_. What's interesting in their analysis was that overall, on this anecdata example, it seems **LLMs do better with real user/AI messages than with a big string of inputs/outputs**. Let's see how to implement the same approach, with a real exchange of messages: ```java List fewShotPrompts = List.of( SystemMessage.from(""" Analyze the sentiment of the text below. Respond only with one word to describe the sentiment. """), UserMessage.from("This is fantastic news!"), AiMessage.from("POSITIVE"), UserMessage.from("Pi is roughly equal to 3.14"), AiMessage.from("NEUTRAL"), UserMessage.from("I really disliked the pizza. " + "Who would use pineapples as a pizza topping?"), AiMessage.from("NEGATIVE"), UserMessage.from("I love strawberries!") ); response = model.generate(fewShotPrompts); System.out.println(response.content().text()); ``` This is not much more verbose than the previous approach, as it's still very readable. And when pulling the few-shot data from an external database, it feels cleaner than concatenating a big string. ## More type-safe few-shot prompting with messages and AiServices To further improve on the list of messages tactic, we can use LangChain4j's `AiServices` concept, which is a higher-level abstraction than using the model and prompt templates directly. ```java enum Sentiment { POSITIVE, NEUTRAL, NEGATIVE } interface SentimentAnalysis { @SystemMessage(""" Analyze the sentiment of the text below. Respond only with one word to describe the sentiment. """) Sentiment analyze(String text); } MessageWindowChatMemory memory = MessageWindowChatMemory.withMaxMessages(10); memory.add(UserMessage.from("This is fantastic news!")); memory.add(AiMessage.from(Sentiment.POSITIVE.name())); memory.add(UserMessage.from("Pi is roughly equal to 3.14")); memory.add(AiMessage.from(Sentiment.NEUTRAL.name())); memory.add(UserMessage.from("I really disliked the pizza. " + "Who would use pineapples as a pizza topping?")); memory.add(AiMessage.from(Sentiment.NEGATIVE.name())); SentimentAnalysis analyzer = AiServices.builder(SentimentAnalysis.class) .chatLanguageModel(model) .chatMemory(memory) .build(); System.out.println(analyzer.analyze("I love strawberries!")); ``` This third and final approach may be a bit more verbose, and introduces a few more LangChain4j concepts like system messages, chat memory, and the AI service itself, but it has the advantages of being: - **more type-safe**, as we're using a `Sentiment` enum, which is easier to manipulate from code, - cleaner, because we're **using system instructions** to instruct the model about what its job is. We created: - a Java `enum` to represent the possible values of the sentiment, - a `SentimentAnalysis` interface with a clear signature: a text in input, a `Sentiment` enum value in output, - a `@SystemMessage` instruction to describe the analysis task, - a `ChatMemory` (here a `MessageWindowChatMemory`) to hold the few-shot examples. Then we bind everything together, thanks to `AiServices`: the analysis interface that LangChain4j will implement for us, the language model, and the chat memory. Finally, users just have to call the `analyze()` method, passing the text to analyze. I also like the fact that we are **coding against an interface**, and potentially later on, developers **could swap the implementation** of the sentiment analyzer, and use a different approach. ## Conclusion All three approaches are valid: a big string, a low-level list of messages, or an `AiServices` abstraction. But I have a slight preference for the approach that is more type-safe and less _stringy_. Just like LangChain4j provides a `TextClassification` class that leverages vector embeddings for text similarity, we could investigate whether it would make sense to also add a few-shot prompting classificaction solution directly in the LangChain4j project. # Analyzing video, audio and PDF files with Gemini and LangChain4j Certain models like Gemini are **multimodal**. This means that they accept more than just text as input. Some models support text and images, but **Gemini goes further and also supports audio, video, and PDF files**. So you can mix and match text prompts and different multimedia files or PDF documents. Until LangChain4j 0.32, the models could only support text and images, but since my [PR](https://github.com/langchain4j/langchain4j/pull/1464) got merged into the newly released [0.33](https://github.com/langchain4j/langchain4j/releases/tag/0.33.0) version, you can use all those files with the LangChain4j Gemini module! Let's have a look! ## Getting the transcription of a podcast recording Are you an avid podcast listener and want to read its transcription? Or you want to publish that transcription as show-notes of your own podcast on your website? You can ask Gemini for the transcription with the following code: ```java var model = VertexAiGeminiChatModel.builder() .project(PROJECT_ID) .location(LOCATION) .modelName("gemini-1.5-pro") .build(); Response response = model.generate( UserMessage.from( AudioContent.from( "https://storage.googleapis.com/cloud-samples-data/" + "generative-ai/audio/pixel.mp3"), TextContent.from( "Write a transcription of this audio file") ) ); System.out.println(response.content().text()); ``` Above, we created an audio content object with the `AudioContent.from(...)` method. This method can take a string which can be a direct URL to a file on the web, it can be a Google Cloud Storage URL as well (like `gs://bucket/audio.mp3`). It is possible to load a local file from your file system with `AudioContent.from(Paths.get("audio.mp3").toUri())`. You can even pass the base 64 encoded content of the audio file and specify its mime type. ### What else could you do with audio files? - If you're in a hurry and don't have time to listen to this one-hour episode, instead of asking for the whole transcript, you could change the prompt to ask for a summary. That way you know if it's worth spending an hour to listen to it all. - Gemini also accepts several audio files in input, so if you are recording interviews of persons on a specific topic, you could ask Gemini to contrast the differences in those responses. ## Preparing YouTube video chaptering Let's say you're a YouTuber, and you want to do your own video chaptering, instead of relying on the the automatic chapters. How can you do that? ```java var model = VertexAiGeminiChatModel.builder() .project(PROJECT_ID) .location(LOCATION) .modelName("gemini-1.5-flash") .build(); Response response = model.generate( UserMessage.from( VideoContent.from( "https://storage.googleapis.com/cloud-samples-data/" + "generative-ai/video/behind_the_scenes_pixel.mp4"), TextContent.from( "Prepare chapters for this video file, " + "using the YouTube chapter notation") ) ); System.out.println(response.content().text()); ``` For this video, the chapters generated look as follows: ``` 00:00 Making a Film with a Blind Director 00:16 Adam Morse, Filmmaker 00:28 The Film Shoot 00:48 A Blind Man & His Girlfriend 01:15 Google Pixel Phone 01:33 Guided Frame 02:06 The Technical Crew 02:32 Visual Effects 02:45 Misconceptions About Blindness 03:20 Filmmaking with a Team 03:46 Google Accessibility 04:00 One Person's Perspective 04:29 Adam's Vision 05:03 A Beautiful Position 05:19 Google Logo ``` ### What else could you do with videos? - If a video of your meeting or your conference presentation has been recorded, you could use this approach to ask Gemini for a summary of the video, to get the various sections, to write the transcript. - We often record videos of our family, our children, etc. It's not always easy to _search_ through those videos. You could ask Gemini to provide a summary of the video, that you would then index with some search engine, or just do some simple _grep_ search from the command-line. ## Asking questions about PDF documents Let's have a look at one last example: PDF documents. With LangChain4j, it's possible to use the Apache Tika-based document loader to get the text content of a PDF. However, you loose some important semantic information, as the layout may be important, or the figures may convey as well some critical details. Fortunately, Gemini can ingest PDF documents directly, without an intermediate text transcription. This allows you to ask questions about PDf documents, and since Gemini has a very large context window, it's able to analyze very big documents, or several documents at the same time, without having to implement your own RAG system (Retrieval Augmented Generation). ```java var model = VertexAiGeminiChatModel.builder() .project(PROJECT_ID) .location(LOCATION) .modelName("gemini-1.5-flash") .build(); Response response = model.generate( UserMessage.from( PdfFileContent.from( "https://proceedings.neurips.cc/paper_files/paper/2017" + "/file/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf"), TextContent.from( "Give a summary of this paper") ) ); System.out.println(response.content().text()); ``` This example analyzes the famous _"Attention is all you need"_ paper that introduced the concept of _Transformer_ neural networks: ``` This paper proposes a novel neural network architecture called the Transformer, which relies entirely on an attention mechanism and dispenses with recurrence and convolutions. The Transformer outperforms existing models on two machine translation tasks, WMT 2014 English-to-German and WMT 2014 English-to-French, while requiring significantly less training time. The authors argue that the Transformer's ability to learn global dependencies without regard to their distance in the input or output sequences, as well as its parallelizable nature, make it a promising approach for sequence modeling and transduction problems. They also present an analysis of the Transformer's different components and their effect on performance. The paper concludes by discussing potential future directions for research. ``` ### What else could you do with PDF documents? - You can implement some smart question answering solutions over your documents. - Gemini can help make sense of differences between two versions of your PDF paper. - Gemini allows you to ingest multiple files at the same time, so it is possible to pass the PDF of your dishwasher manual, at the same time as a tutorial showing how to repair it, and then ask the LLM to answer a question on how to fix it. ## Summary Multimodality is a powerful feature of Gemini, and now LangChain4j is equiped with the ability to send text, images, audio files, videos, and PDF documents, potentially all at the same time, to create some innovative multimedia integrations. # Text classification with Gemini and LangChain4j Generative AI has potential applications far beyond chatbots and Retrieval Augmented Generation. For example, a nice use case is: **text classification**. I had the chance of meeting some customers and prospects who had the need for triaging incoming requests, or for labeling existing data. In the first case, a government entity was tasked with routing citizen requests to access undisclosed information to the right governmental service that could grant or reject that access. In the second case, a company needed to sort out tons of existing internal documents that were not properly organized, and they wanted to quickly start better structuring this trove of information, by labelling each of these docs into different categories. In both situations, the task was a **text classification** one: to put each request or document in a distinct pile, so they could more easily be sorted out, organized, and treated more rapidly. Before generative AI, text classification would be handled by data scientists who would craft and train dedicated machine learning models for that purpose. But it is now also possible to do the same with the help of large language models. That's what I'd like to explore with you in this article today. As usual, I'll be using the [Gemini model](https://deepmind.google/technologies/gemini/), and the [LangChain4j framework](https://docs.langchain4j.dev/) for implementing illustrative examples in Java. ## Text classification: putting a label on a document Before diving into the code, let's step back a short moment to clarify what text classification is about. When we classify documents, we put a label on them. For example, in a bug tracker, we could automate adding labels on new tickets that say that the bug report is related to a certain component. So we would put the name of the component as the label for that new ticket. For routing incoming document access requests, we could put the label of the service that must treat the request, etc. **Filtering** is also a text classification problem: we can filter the content of emails to state whether they are spam or not. And we can also use LLMs to filter harmful content from users' inputs, and even classify the category of harm (hateful speech, harrasment, etc.) ## Zero-shot prompting: just ask the model! What about just asking a large language model what it thinks the classification, or the label should be? And indeed, LLMs are often very smart and can figure out the correct classification, without being trained specifically for that purpose. Let's illustrate this with a very common type of text classification: **sentiment analysis**. First, we can define an `enum` representing the various sentiments that can be recognized: ```java enum Sentiment { POSITIVE, NEUTRAL, NEGATIVE } ``` We create a `record` which will hold the result of the sentiment analysis: ```java record SentimentClassification( Sentiment sentiment ) {} ``` We will also need an `interface` to represent the type-safe Java service that the developers integrating this LLM-backed solution will call to retrieve the sentiment of the text: ```java interface SentimentClassifier { SentimentClassification classify(String text); } ``` Notice that it takes in input an unstructured `String` text, but in output, you'll manipulate a strongly typed object, not just a mere string. It's time to prepare our Gemini model: ```java var model = VertexAiGeminiChatModel.builder() .project(PROJECT_ID) .location(LOCATION) .modelName("gemini-1.5-pro") .responseMimeType("application/json") .responseSchema(Schema.newBuilder() .setType(Type.OBJECT) .putProperties("sentiment", Schema.newBuilder() .setType(Type.STRING) .addAllEnum(Stream.of(Sentiment.values()) .map(Enum::name) .collect(Collectors.toList())) .build()) .build()) .build(); ``` We're taking advantage of the latest feature of Gemini and LangChain4j, which permits to specify that we want 100% valid JSON in output, and even better than this, we want the generated JSON output to comply with a JSON schema! Now we create the sentiment analysis service: ```java SentimentClassifier sentimentClassifier = AiServices.create(SentimentClassifier.class, model); ``` And we call it to retrieve the sentiment of the text we want to analyze: ```java SentimentClassification classification = sentimentClassifier.classify("I am happy!"); System.out.println(classification.sentiment()); // POSITIVE ``` We didn't even need to give Gemini examples, this is why it's called _zero-shot prompting_. LLMs are usually smart enough to easily handle familiar classification tasks like sentiment analysis. ## Few-shot prompting: when the model needs a little help A more common approach with LLMs for text classification is _few-shot prompting_. As the name implies, it's a prompting technique. You give the model a task (classifying text), and you show it examples of classifications, with a clear input/output format, to force the LLM to reply with just the expected class. ```java ChatLanguageModel model = VertexAiGeminiChatModel.builder() .project(PROJECT_ID) .location(LOCATION) .modelName("gemini-1.5-flash-001") .maxOutputTokens(10) .maxRetries(3) .build(); PromptTemplate promptTemplate = PromptTemplate.from(""" Analyze the sentiment of the text below. Respond only with one word to describe the sentiment. INPUT: This is fantastic news! OUTPUT: POSITIVE INPUT: Pi is roughly equal to 3.14 OUTPUT: NEUTRAL INPUT: I hate disliked the pizza. Who'd put pineapple toppings? OUTPUT: NEGATIVE INPUT: {{text}} OUTPUT: """); Prompt prompt = promptTemplate.apply( Map.of("text", "I love strawberries!")); Response response = model.generate(prompt.toUserMessage()); System.out.println(response.content().text()); // POSITIVE ``` In the above approach, we use LangChain4j's `PromptTemplate`, with a placeholder value `{{text}}` that will contain the text to classify. We don't use an `enum` value though, so we have to discriminate against a string in the end. But we could also apply the same schema response handling as in our previous zero-shot example. Let's rewrite this code a little bit differently, to _fake_ a conversation with the model. The model will see an exchange between a user and itself, and will also follow the same syntax, and will reply with just one word: the sentiment. We'll use system instructions, and alternating AI and user messages: ```java List fewShotPrompts = List.of( SystemMessage.from(""" Analyze the sentiment of the text below. Respond only with one word to describe the sentiment. """), UserMessage.from("This is fantastic news!"), AiMessage.from("POSITIVE"), UserMessage.from("Pi is roughly equal to 3.14"), AiMessage.from("NEUTRAL"), UserMessage.from("I hate disliked the pizza. " + "Who'd put pineapple toppings?"), AiMessage.from("NEGATIVE"), UserMessage.from("I love strawberries!") ); response = model.generate(fewShotPrompts); System.out.println(response.content().text()); // POSITIVE ``` Same outcome, stawberries are yummy! ## Text classification with embedding models In the two previous sections, we took advantage of LLMs' abilities to classify text on their own, based on their intrinsic knowledge, or with the help of a few examples. But there's another way we can investigate: **using embedding vectors** to compare texts. Embedding vectors are mathematical representations of words/sentences/paragraphs, in the form of a vector of floating point values. The way those vectors are calculated by _embedding models_ makes those vector close to each other (in terms of distance) when they are semantically close. You can have a look at my recent article [introducing vector embeddings]({{}}). LangChain4j provides a `TextClassifier` interface which allows to classify text, by comparing it to sets of other texts that belong to a same class. So we give a map of possible labels, associated with lists of texts that belong to that category. In particular, there's an `EmbeddingModelTextClassifier` that uses embedding models to compare the texts with the examples of each labels. We can even tweak its internal algorithm to say whether we prefer if a text should be closer to the average of all the examples, or if we prefer if it's closer to one of the examples (by default, it's half distance to the mean, and half distance to the closest example.) So let's have a look at this solution. Instead of doing sentiment analysis, we'll go with recipe classification: our goal will be to classify a recipe, to know if it's an _appetizer_, a _main course_, or a _dessert_. First, we need to define our labels, with an `enum`: ```java enum DishType { APPETIZER, MAIN, DESSERT } ``` Because we don't have a dataset of recipes, we'll use Gemini to generate sample recipes, for each label. For that, we need to configure Gemini: ```java private static final VertexAiGeminiChatModel CHAT_MODEL = VertexAiGeminiChatModel.builder() .project(PROJECT_ID) .location(LOCATION) .modelName("gemini-1.5-flash") .build(); ``` We'll also configure an embedding model to calculate the vector embeddings: ```java private static final VertexAiEmbeddingModel EMBEDDING_MODEL = VertexAiEmbeddingModel.builder() .project(PROJECT_ID) .location(LOCATION) .endpoint(ENDPOINT) .publisher("google") .modelName("text-embedding-004") .taskType(VertexAiEmbeddingModel.TaskType.CLASSIFICATION) .build(); ``` Vertex AI's embedding models are capable of handling various tasks, including: - **classification**, - semantic similarity, - clustering, - question answering, - fact verification, - query or document retrieval. Let's create a method to generate a recipe for a particular type of dish: ```java private static String recipeOf(DishType type) { return CHAT_MODEL.generate( "Write a recipe for a %s dish" .formatted(type.name().toLowerCase())); } ``` And we'll collect 3 examples of recipes for each type of dish: ```java var examplesOfRecipes = Stream.of(DishType.values()) .collect( Collectors.toMap( dishType -> dishType, dishType -> Stream.generate(() -> recipeOf(dishType)) .limit(3) .toList() ) ); ``` That way, we have our dataset ready, and we'll prepare a text classifier: ```java EmbeddingModelTextClassifier recipeClassifier = new EmbeddingModelTextClassifier<>(EMBEDDING_MODEL, examplesOfRecipes); ``` It takes a little while to calculate the initial embedding vectors of all the samples, but now our classifier is ready! Let's see if the following recipe is an _appertizer_, a _main course_, or a _dessert_: ```java List classifiedDishes = recipeClassifier.classify(""" **Classic Moist Chocolate Cake** This recipe delivers a rich, moist chocolate cake that's perfect for any occasion. Ingredients: * 1 ¾ cups all-purpose flour * 2 cups granulated sugar * ¾ cup unsweetened cocoa powder * 1 ½ teaspoons baking powder * 1 ½ teaspoons baking soda * 1 teaspoon salt * 2 large eggs * 1 cup milk * ½ cup vegetable oil * 2 teaspoons vanilla extract * 1 cup boiling water Instructions: * Preheat oven to 350°F (175°C). Grease and flour two 9-inch round cake pans. * Combine dry ingredients: In a large bowl, whisk together flour, sugar, cocoa powder, baking powder, baking soda, and salt. * Add wet ingredients: Beat in eggs, milk, oil, and vanilla until combined. * Stir in boiling water: Carefully stir in boiling water. The batter will be thin. * Bake: Pour batter evenly into prepared pans. Bake for 30-35 minutes, or until a toothpick inserted into the center comes out clean. * Cool: Let cakes cool in pans for 10 minutes before transferring to a wire rack to cool completely. """); System.out.println("This recipe is of type: " + classifiedDishes); // This recipe is of type: [DESSERT] ``` And voilà, we used the full power of embedding models to calculate text similarity to classify our chocolate cake recipe as a dessert! ## Conclusion Large Language Models like Gemini are great at classifying text, thanks to their general knowledge of the world that they acquired during their training. But for more specialized use cases, we might need to guide the LLM to recognize labels, because the subject is very specific to our data. That's when few-shot prompting or embedding model-based classification helps. If we have lots of samples for each label, using a few-shot prompting approach means we'll have to pass all those examples again and again in the context window of the LLM, which yields a high token count. So if you pay per tokens, it can become a bit expensive. If we use the embedding model text classifier, it might take a while to compute all the embedding vectors, but we'll do it only once, and then we can just calculate the vector embedding for the text to classify, so it's just the tokens of the text to classify that is incurred. If we have lots of samples, the classifier needs to do quite a few vector / matrix computations to calculate the distance to the samples, but it's usually quite fast (unless we really have hundreds or thousands of samples). I hope this article showed you that Generative AI is useful beyond the usual chatbots and RAG use cases. It's great at text classification as well. And LangChain4j and Gemini are well suited for that use case, and you learned how to implement different approaches to do text classification. # Latest Gemini features support in LangChain4j 0.32.0 [LangChain4j](https://docs.langchain4j.dev/) 0.32.0 was released yesterday, including my [pull request](https://github.com/langchain4j/langchain4j/pull/1278) with the support for lots of new Gemini features: - **JSON output mode**, to force Gemini to reply using JSON, without any markup, - **JSON schema**, to control and constrain the JSON output to comply with a schema, - **Response grounding** with Google Search web results and with private data in Vertex AI datastores, - Easier debugging, thanks to new builder methods to **log requests and responses**, - **Function calling mode** (none, automatic, or a subset of functions), - **Safety settings** to catch harmful prompts and responses. Let's explore those new features together, thanks to some code examples! And at the end of the article, if you make it through, you'll also discover **2 extra bonus points**. ## JSON output mode Creating LLM-powered applications means working with text, as this is what LLMs return. But to facilitate this integration between LLM responses and your code, the text format of choice is usually JSON, as it's human-readable, and easy to parse programmatically. However, LLMs are a bit chatty, and rather than sending you back a nice raw JSON document, instead, it replies with some extra sentence, and some markdown markup to wrap the piece of JSON. Fortunately, Gemini 1.5 (Flash and Pro) allows you to specify the response MIME type. Currently, only `application/json` is supported, but other formats may come later. To do that, when instantiating the Gemini model, use the `responseMimeType()` builder method: ```java var model = VertexAiGeminiChatModel.builder() .project(PROJECT_ID) .location(LOCATION) .modelName("gemini-1.5-flash") .responseMimeType("application/json") .build(); String response = model.generate("Roll a dice"); System.out.println(response); ``` No sentence, no markdown markup, nothing, just pure JSON: ``` {"roll": 3} ``` We didn't even need to say in the prompt we wanted to get a JSON response! However, the JSON key of that document may vary from time to time, so you may still wish to be a bit more prescriptive in your prompt, and ask the model to return JSON explicitly, give it an example of the JSON output you expect, etc. That's the usual prompting approach... But now there's more! ## JSON Schema output This is quite unique in the LLM ecosystem, as I believe it's the only model out there that allows you to specify a JSON schema for constraining the JSON output. This works for Gemini 1.5 Pro only, not with Gemini 1.5 Flash. Let's have another look at our previous dice roll example, and let's update it to specify a JSON schema for the output generation: ```java import static dev.langchain4j.model.vertexai.SchemaHelper.fromClass; //... record DiceRoll(int roll) {} var model = VertexAiGeminiChatModel.builder() .project("genai-java-demos") .location("us-central1") .modelName("gemini-1.5-pro") .responseSchema(fromClass(DiceRoll.class)) .build(); String response = model.generate("Roll a dice"); System.out.println(response); ``` The generated JSON document will always contain the `roll` key ```json { "roll": 5 } ``` In this example, we used a convenience method called `fromClass()` that creates a JSON schema that corresponds to a Java type (here a Java record). But there's also another convenient method that lets us pass a JSON schema string, called `fromJsonSchema()`: ```java var model = VertexAiGeminiChatModel.builder() .project("genai-java-demos") .location("us-central1") .modelName("gemini-1.5-pro") .responseSchema(fromJsonSchema(""" { "type": "object", "properties": { "roll": { "type": "integer" } } } """)) .build(); ``` It's also possible to construct a JSON schema programmatically: ```java var model = VertexAiGeminiChatModel.builder() .project("genai-java-demos") .location("us-central1") .modelName("gemini-1.5-pro") .responseSchema(Schema.newBuilder() .setType(Type.OBJECT) .putProperties("roll", Schema.newBuilder() .setType(Type.INTEGER) .build()) .build()) .build(); ``` Now you always get consistent JSON outputs! ## Response grounding with Google Search web results and Vertex AI datastores Large Language Models are wonderful creative machines, but rather than benefiting from their high degree of creativity, we'd prefer having factual responses grounded on data and documents. Gemini offers the ability to [ground responses](https://cloud.google.com/vertex-ai/generative-ai/docs/multimodal/ground-gemini): - against Google Search web results, - against Vertex AI search datastores. ### Use Google Search to ground responses The training of an LLM ended at a certain date: its _cut-off_ date. So it doesn't know about news that happened after that date. But you can request Gemini to use Google Search to find more up-to-date information. For example, if we ask Gemini about the current elections going on in France, it could reply with something like this: ``` There is no current national election happening in France right now. The last major national election in France was the **Presidential election in April and May 2022**, where Emmanuel Macron won a second term. There are, however, **local elections** happening regularly in different regions of France. To stay updated on French elections, you can check the website of the **French Ministry of the Interior** or reputable news sources like **The Guardian, BBC, CNN, or Le Monde**. ``` Now, let's enable the use of Google Search web result with the `useGoogleSearch(true)` method: ```java var model = VertexAiGeminiChatModel.builder() .project(PROJECT_ID) .location(LOCATION) .modelName("gemini-1.5-flash") .useGoogleSearch(true) .build(); String response = model.generate( "What is the current election going on in France?"); System.out.println(response); ``` The answer will be much different, and indeed factual and up-to-date: ``` France held the first round of a parliamentary election on July 4, 2024. The second round will be on July 7, 2024. The election is significant because it could result in the first far-right government in France since World War II. The National Rally, President Emmanuel Macron’s centrist alliance, and the New Popular Front coalition are the three major political blocs competing in the election. The outcome of the election is highly uncertain, with the far-right National Rally potentially gaining a parliamentary majority. If the National Rally wins a majority, Macron would be expected to appoint Jordan Bardella, the party's president, as prime minister. ``` There's indeed a parliamentary election going on right now in France. Those elections were decided only a month ago, thus past the cut-of-date of the knowledge of the model. > For my French audience, don't forget to go voting next Sunday! ### Grounding with Vertex AI Search The idea is that we want to ground responses on our own data. This is particularly important when the knowledge required is actually private information, like our internal docs, or our customers' docs. My colleague Mete wrote a great [article explaining how to setup grounding with private data](https://atamel.dev/posts/2024/07-01_grounding_with_own_data_vertexai_search/). Below, I'll assume that we created a Vertex AI search app with a datastore backed by a Google Cloud Storage bucket that contains a fictious document which is a car manual, about the _Cymbel Starlight_ car model! I'm taking the same example as in Mete's article. This time, we specify the search location to point at the Vertex AI search datastore with `vertexSearchDatastore()`: ```java var model = VertexAiGeminiChatModel.builder() .project(PROJECT_ID) .location(LOCATION) .modelName("gemini-1.5-flash") .vertexSearchDatastore(String.format( "projects/%s/locations/%s/collections/%s/dataStores/%s", PROJECT_ID, "global", "default_collection", "cymbal-datastore_1720169982142") ) .build(); String response = model.generate( "What is the cargo capacity of Cymbal Starlight?"); System.out.println(response); ``` It's a fictious car that doesn't exist, but it's covered in that private document, and indeed, Gemini is now able to respond to that question: ``` The Cymbal Starlight 2024 has a cargo capacity of 13.5 cubic feet. ``` What's interesting as well is that the response returned by Gemini provides some context about the source document that helped it answer the user query (we'll see in the next section how to enable logging requests and responses): ``` grounding_metadata { 2: { 1: { 3: 66 } 2: 0x3f7deee0 } 5: { 2: { 1: "gs://genai-java-demos-documents/cymbal-starlight-2024.pdf" 2: "cymbal-starlight-2024" } } 6: { 1: { 3: 66 4: "The Cymbal Starlight 2024 has a cargo capacity of 13.5 cubic feet." } 2: "\000" 3: { 257772: 63 } } ``` However, to be honest, I'm not quite sure what the numbers exactly mean, but this metadata mentions that the PDF uploaded in cloud storage is the one that was used to shape the answer of the LLM, and gives an excerpt of the sentence that was found in the document. ## Request and response logging To better understand what's going on under the hood, you can enable request and response logging. That way, you're able to see exactly what is sent to Gemini, and what Gemini replies. To enable logging, there are two methods we can use: - `logRequests(true)` to log the request sent to Gemini, - `logResponse(true)` to log the response received from Gemini. Let's see that in action: ```java var model = VertexAiGeminiChatModel.builder() .project(PROJECT_ID) .location(LOCATION) .modelName("gemini-1.5-flash") .logRequests(true) .logResponses(true) .build(); String response = model.generate("Why is the sky blue?"); System.out.println(response); ``` Here's what's logged: ``` [main] DEBUG dev.langchain4j.model.vertexai.VertexAiGeminiChatModel - GEMINI (gemini-1.5-flash) request: InstructionAndContent { systemInstruction = null, contents = [role: "user" parts { text: "Why is the sky blue?" } ] } tools: [] [main] DEBUG dev.langchain4j.model.vertexai.VertexAiGeminiChatModel - GEMINI (gemini-1.5-flash) response: candidates { content { role: "model" parts { text: "The sky appears blue due to a phenomenon called **Rayleigh scattering**. Here\'s a breakdown:\n\n* **Sunlight is made up of all colors of the rainbow.** When sunlight enters the Earth\'s atmosphere, it encounters tiny particles like nitrogen and oxygen molecules.\n* **These particles scatter the sunlight in all directions.** However, shorter wavelengths of light, like blue and violet, scatter more strongly than longer wavelengths, like red and orange.\n* **This preferential scattering of shorter wavelengths is called Rayleigh scattering.** As a result, we see more blue light scattered throughout the sky, making it appear blue.\n\n**Why is the sky not violet?**\n\nEven though violet light scatters even more strongly than blue, our eyes are more sensitive to blue light. This is why we perceive the sky as blue rather than violet.\n\n**Other factors that affect sky color:**\n\n* **Time of day:** The sky appears more red or orange at sunrise and sunset because the sunlight has to travel through more of the atmosphere, scattering away most of the blue light.\n* **Clouds:** Clouds are made up of larger water droplets or ice crystals, which scatter all wavelengths of light equally. This is why clouds appear white.\n* **Pollution:** Pollution particles can scatter light differently, sometimes making the sky appear hazy or even reddish.\n\nLet me know if you have any other questions about the sky! \n" } } finish_reason: STOP safety_ratings { category: HARM_CATEGORY_HATE_SPEECH probability: NEGLIGIBLE probability_score: 0.054802597 severity: HARM_SEVERITY_NEGLIGIBLE severity_score: 0.03314852 } safety_ratings { category: HARM_CATEGORY_DANGEROUS_CONTENT probability: NEGLIGIBLE probability_score: 0.100348406 severity: HARM_SEVERITY_NEGLIGIBLE severity_score: 0.06359858 } safety_ratings { category: HARM_CATEGORY_HARASSMENT probability: NEGLIGIBLE probability_score: 0.10837755 severity: HARM_SEVERITY_NEGLIGIBLE severity_score: 0.021491764 } safety_ratings { category: HARM_CATEGORY_SEXUALLY_EXPLICIT probability: NEGLIGIBLE probability_score: 0.10338596 severity: HARM_SEVERITY_NEGLIGIBLE severity_score: 0.020410307 } } usage_metadata { prompt_token_count: 6 candidates_token_count: 288 total_token_count: 294 } ``` Let me give you a bit more details about the logging. LangChain4j uses Slf4j by default for logging. Request & Response logging is logged at `DEBUG` level. So we have to configure our logger and/or logger façace accordingly. In my test project for this article, I configured the following `Maven` dependencies for `Slf4j` and the `Simple` logger: ```xml org.slf4j slf4j-api 2.0.13 org.slf4j slf4j-simple 2.0.13 ``` I created a properties file to configure the loggers: `src/main/resources/simplelogger.properties`, which contains the following configuration: ``` org.slf4j.simpleLogger.defaultLogLevel=debug org.slf4j.simpleLogger.log.io.grpc.netty.shaded=info ``` I set the default logging level to be `debug`. But there's also Netty, the networking library used under the hood by the Gemini Java SDK, that logs at debug level. So I specified that the logging for this library should only be at `info` and above, otherwise the output is super chatty. ## Function calling mode So far, when using Gemini for [function calling](https://cloud.google.com/vertex-ai/generative-ai/docs/multimodal/function-calling), the model would decide on its own if a function would be useful to call, and which function to call. But Gemini introduces the ability to [control the function or tool choice](https://cloud.google.com/vertex-ai/generative-ai/docs/multimodal/function-calling#tool-config). There are 3 options: - `AUTO` — The familiar and default mode, where Gemini decides on its own if a function call is necessary and which one should be made, - `ANY` — Allows to specify a subset of functions from all those available, but also forces the model to pick up one of them (only supported by Gemini 1.5 Pro), - `NONE` — Even if tools are defined and available, prevents Gemini to use any of those tools. Let's have a look at this example: ```java var model = VertexAiGeminiChatModel.builder() .project(PROJECT_ID) .location(LOCATION) .modelName("gemini-1.5-pro") .logRequests(true) .logResponses(true) .toolCallingMode(ToolCallingMode.ANY) .allowedFunctionNames(Arrays.asList("add")) .build(); ToolSpecification adder = ToolSpecification.builder() .description("adds two numbers") .name("add") .addParameter("a", JsonSchemaProperty.INTEGER) .addParameter("b", JsonSchemaProperty.INTEGER) .build(); UserMessage message = UserMessage.from("How much is 3 + 4?"); Response answer = model.generate(asList(message), adder); System.out.println( answer.content().toolExecutionRequests().getFirst()); ``` We specify the `ToolCallingMode.ANY` mode, and we list the allowed function names of the functions that the model must pick in order to reply to the request (with the `allowedFunctionNames()` builder method). We describe the tool that can be called. We create a message. And when calling `generate()`, we pass the tool specification corresponding to the function we want to be called. The output will show that the model replied with the mandatory tool execution request: ``` ToolExecutionRequest { id = null, name = "add", arguments = "{"a":3.0,"b":4.0}" } ``` Now it's our turn to call the `add` function with the arguments. And then send back the function execution result back to Gemini. > **Warning**: > Currently, it is not possible to use the `ANY` forced function calling mode when using LangChain4j's `AiServices` class. > > `AiServices` takes care of automatic function calling. But the process is a two-step request / response mechanism: > > - First, we ask the model the math question and pass the tool specification along. > - The model replies with a `ToolExecutionRequest`. > - Then `AiServices` makes the function call locally, and replies to the model with the function execution result. However, since the `ANY` calling mode is specified at the model level, the model still wants to reply with yet another tool execution request. Although at this point, the second call made to the model was _just_ to pass the function execution result, not to request another tool execution. > - So `AiServices` enters an infite loop as the model requests a function execution again and again, not taking into account the execution result that it received. > > When using `AiServices`, it's better to let Gemini operate under the default `AUTO` tool mode. > So it knows when it needs to request a tool execution, or if just needs to handle the tool execution response. > > If you want to use the `ANY` mode with `allowedFunctionNames()`, then don't use `AiServices`, and handle the function calls on your own in your code, to avoid such infite loop situations. ## Specify safety settings In LLM-powered applications, where users can enter any kind of weird textual inputs, you may want to limit harmful content that may be ingested. To do so, you can specify some safety settings, for different categories of content, with different thresholds of acceptance: ```java import static dev.langchain4j.model.vertexai.HarmCategory.*; import static dev.langchain4j.model.vertexai.SafetyThreshold.*; //... var model = VertexAiGeminiChatModel.builder() .project(PROJECT_ID) .location(LOCATION) .modelName("gemini-1.5-flash") .safetySettings(Map.of( HARM_CATEGORY_DANGEROUS_CONTENT, BLOCK_LOW_AND_ABOVE, HARM_CATEGORY_SEXUALLY_EXPLICIT, BLOCK_MEDIUM_AND_ABOVE, HARM_CATEGORY_HARASSMENT, BLOCK_ONLY_HIGH, HARM_CATEGORY_HATE_SPEECH, BLOCK_MEDIUM_AND_ABOVE )) .build(); ``` If you want to make your app safer for your end-users, and to avoid malicious or ill-disposed users, that's the way to go! ## Bonus point #1: Streaming responses with lambda functions I'll round up the review of Gemini-focused features with one little addition I contributed to the project: the ability to pass a lambda instead of a streaming content handler, when using a streaming model. This is not Gemini-related, you can use it with any model! More concretely, if you want to use Gemini or another model in streaming mode, to see the response being printed as it's generated by the model, you would usually write the following code: ```java var model = VertexAiGeminiStreamingChatModel.builder() .project(PROJECT_ID) .location(LOCATION) .modelName("gemini-1.5-flash") .build(); model.generate("Why is the sky blue?", new StreamingResponseHandler<>() { @Override public void onNext(String aFewTokens) { System.out.print(aFewTokens); } @Override public void onError(Throwable throwable) { throw new RuntimeException(throwable); } }); ``` Using an anonymous inner class implementing the `StreamingResponseHandler` interface is quite verbose. Fortunately, I contributed a couple static methods you can import, to make the code a little bit more concise: ```java import static dev.langchain4j.model.LambdaStreamingResponseHandler.onNext; import static dev.langchain4j.model.LambdaStreamingResponseHandler.onNextAndError; //... // onNext model.generate("Why is the sky blue?", onNext(System.out::println)); // onNextAndError model.generate("Why is the sky blue?", onNextAndError( System.out::println, ex -> { throw new RuntimeException(ex); } )); ``` Now you can stream your LLM output in a single instruction! ## Bonus point #2: Generating stunning images with Imagen v3 A second bonus point in this new LangChain4j release is the fact that the Vertex AI Image model now supports [Imagen v3](https://deepmind.google/technologies/imagen-3/) (Google DeepMind's latest high-quality image generation model). > **Warning:** To use the Imagen model, you'll still have to be allow-listed for now. > You'll need to [fill this form](https://docs.google.com/forms/d/1cqt9padvfMgqn23W5FMPTqh7bW1KLkEOsC5G6uC-uuM/viewform) > to request access to the model. There are a few new parameters that are available that you can take advantage of when generating pictures. Let's have a look at the following image generation code: ```java var imagenModel = VertexAiImageModel.builder() .project(PROJECT) .location(LOCATION) .endpoint(ENDPOINT) .publisher("google") .modelName("imagen-3.0-generate-preview-0611") .aspectRatio(VertexAiImageModel.AspectRatio.LANDSCAPE) .mimeType(VertexAiImageModel.MimeType.JPEG) .compressionQuality(80) .watermark(true) // true by default with Imagen v3 .withPersisting() .logRequests(true) .logResponses(true) .build(); String prompt = """ An oil painting close-up, with heavy brush strokes full of paint, of two hands shaking together, a young one, and an old one conveying a sense of heartfelt thanks and connection between generations """; Response imageResponse = imagenModel.generate(prompt); System.out.println(imageResponse.content().url()); ``` Let's see the resulting picture? ![](/img/gemini/imagen-v3-two-hands-shaking.jpg) In the code above, you certainly noticed the new builder methods: - `aspectRatio()` — not only square, but wide and narrow landscape and portrait modes are available, - `mimeType()` — in addition to PNG, you can request JPEG image generation, - `comressionQuality()` — when requesting JPEG, you can chose the level of compression for encoding the image, - `watermark()` — to have all your generated images be watermarked with [SynthId](https://deepmind.google/technologies/synthid/), - `logRequest()` / `logResponse()` — to see what is exchanged with the model, in and out, - `persistToCloudStorage()` — to specify you want the image saved in a cloud storage bucket (not used in this example). If you get a chance, and request access to Imagen v3, you'll notice really great quality improvements compared to v2! ## Conclusion Lots of new Gemini related features in this [release of LangChain4j](https://github.com/langchain4j/langchain4j/releases/tag/0.32.0)! I hope this article helped you learn about them, and will make you want to use them in your projects. If you want to go hands-on with Gemini with LangChain4j, don't forget to check out my self-paced codelab: [Gemini codelabg for Java developers, using LangChain4j](https://glaforge.dev/posts/2024/03/27/gemini-codelab-for-java-developers/). # The power of embeddings: How numbers unlock the meaning of data ## Prelude > As I'm focusing a lot on Generative AI, I'm curious about how things work under the hood, to better understand what I'm using in my gen-ai powered projects. > A topic I'd like to focus on more is: **vector embeddings**, to explain more clearly what they are, how they are calculated, and what you can do with them. > > A colleague of mine, [André](https://x.com/andreban), was showing me a [cool experiment](https://writer-m4n3dyfjhq-uc.a.run.app/) > he's been working on, to help people prepare an interview, with the help of an AI, to shape the structure of the resulting final article to write. > > The idea is to provide: a topic, a target audience, and to describe the goals for the audience. > Then, a large language model like [Gemini](https://deepmind.google/technologies/gemini/) prepares a list of questions (that you can update freely) on that topic. > Next, it's your turn to fill in the blanks, answer those questions, and then the LLM generates an article, > with a plan following those key questions and your provided answers. > I cheated a bit, and asked [Gemini](https://gemini.google.com/) itself those questions, and honestly, I really liked how the resulting article came to be, > and I wanted to share with you the outcome below. > > It's a great and simple introduction to vector embeddings! > I like how AI can help organize information, shape the structure and the content for an article. > **I'm not advocating for letting AI write all your articles**, far from that, but as an author, > however, I like that it can help me avoid the blank page syndrome, avoid missing key elements in my dissertation, improve the quality of my written prose. > > Generative AI, in its creative aspect, and as your assistant, can be super useful! Use it as **a tool to help drive your creativity**! > But **always use your critical sense to gauge the quality and factuality of the content**. ## Introduction: What are vector embeddings? Imagine you have a vast library filled with books on every topic imaginable. Finding a specific book can be a daunting task, especially if you only know the general subject matter. Now imagine a magical system that can understand the meaning of each book and represent it as a unique code. This code, called a vector embedding, can then be used to quickly find the most relevant books based on your search query, even if you only have a vague idea of what you're looking for. This is the power of vector embeddings. They are essentially numerical representations of complex data, like text, images, or audio, that capture the underlying meaning and relationships within the data. These numerical codes, arranged as vectors, allow computers to process and compare data in a way that mimics human understanding. ## From Text to Numbers: The Journey of Embedding Creation Creating vector embeddings involves a multi-step process that transforms raw data into meaningful mathematical representations. The journey begins with **data preprocessing**, where the data is cleaned, normalized, and prepared for embedding generation. This might involve tasks like removing irrelevant information, standardizing data formats, and breaking text into individual words or subwords (tokenization). Next comes the heart of the process: **embedding generation**. This step leverages various techniques and algorithms, such as Word2Vec, GloVe, BERT, and ResNet, to convert each data point into a high-dimensional vector. The specific algorithm chosen depends on the type of data being embedded (text, images, or audio) and the intended application. For instance, Word2Vec uses a neural network to learn relationships between words by analyzing how they co-occur in large text corpora. This results in vector representations for words, where similar words have similar vectors, capturing semantic relationships. Similarly, for images, convolutional neural networks (CNNs) like ResNet can be used to extract features from images, resulting in vectors that represent the visual content. ## Vector Databases: The Power of Storing and Searching Embeddings Once embeddings are generated, they need a dedicated storage system for efficient retrieval and comparison. This is where **vector databases** come into play. Unlike traditional databases designed for structured data, vector databases are optimized for storing and searching high-dimensional vector data. Vector databases employ specialized indexing techniques, such as Annoy, HNSW, and Faiss, to create efficient data structures that allow for fast similarity search. This means that when a user submits a query (e.g., a search term, an image), the database can quickly find the most similar data points based on the similarity of their vector representations. ## Embeddings Empower Search: Finding the Needle in the Haystack The combination of vector embeddings and vector databases revolutionizes search by enabling **semantic search**. This means that instead of relying solely on keyword matching, search engines can understand the meaning behind the data and find relevant results even if the query doesn't use exact keywords. For example, imagine searching for "a picture of a dog with a hat." Traditional keyword-based search might struggle to find relevant images, as the search term might not match the image description. However, with vector embeddings, the search engine can understand the semantic meaning of the query and find images that contain both a dog and a hat, even if those words are not explicitly mentioned in the image description. ## Beyond Search: Expanding the Reach of Embeddings Vector embeddings are not limited to search applications. They have become essential tools in a wide range of fields, including: - **Retrieval Augmented Generation (RAG):** This technique combines the power of information retrieval and generative models to create more informative and relevant responses. Embeddings are used to find relevant information in large text corpora, which is then used to augment prompts for language models, resulting in more accurate and context-aware outputs. - **Data Classification:** Embeddings enable the classification of data points into different categories based on their similarity. This finds application in areas like sentiment analysis, spam detection, object recognition, and music genre classification. - **Anomaly Detection:** By representing data points as vectors, anomalies can be identified as data points that are significantly different from the majority. This technique is used in various fields, including network intrusion detection, fraud detection, and industrial sensor monitoring. ## Facing the Challenges and Shaping the Future While vector embeddings have revolutionized data analysis, they still face some challenges. These include the difficulty of capturing polysemy (multiple meanings of a word), contextual dependence, and the challenge of interpreting the meaning behind the high-dimensional vector representations. Despite these limitations, research continues to push the boundaries of vector embeddings. Researchers are exploring techniques like contextual embeddings, multilingual embeddings, knowledge graph integration, and explainable embeddings to overcome existing limitations and unlock the full potential of these powerful representations. ## Stepping into the World of Embeddings: Resources and Next Steps For those interested in diving deeper into the world of vector embeddings, a wealth of resources is available. Online courses and tutorials on platforms like Coursera, Fast.ai, and Stanford's online learning platform provide a solid foundation in the underlying concepts and techniques. Books like "Speech and Language Processing" by Jurafsky and Martin and "Deep Learning" by Goodfellow, Bengio, and Courville offer in-depth coverage of the field. Additionally, research papers and articles on platforms like arXiv and Medium offer insights into the latest advancements and applications. To gain practical experience, explore Python libraries like Gensim, spaCy, and TensorFlow/PyTorch. These libraries provide tools for creating and working with embeddings, allowing you to build your own models and experiment with various applications. The world of vector embeddings is constantly evolving, offering exciting opportunities for innovation and discovery. By understanding the power of these representations, you can unlock new possibilities for data analysis, information retrieval, and artificial intelligence applications. # Functional builders in Java with Jilt A few months ago, I shared an article about what I called Java [functional builders]({{< ref "posts/2024/01/16/java-functional-builder-approach/" >}}), inspired by an equivalent pattern found in Go. The main idea was to have builders that looked like this example: ```java LanguageModel languageModel = new LanguageModel( name("cool-model"), project("my-project"), temperature(0.5), description("This is a generative model") ); ``` Compared to the more tranditional builder approach: - You're using the `new` keyword again to construct instances. - There's no more `build()` method, which felt a bit verbose. Compared to using constructors with tons of parameters: - You have methods like in traditional builders, that say what each parameter is about (`name()`, `temperature()`...) a bit similar to named parameters in some programming languages. The approach I followed was to take advantage of lambda functions under the hood: ```java public static ModelOption temperature(Float temperature) { return model -> model.temperature = temperature; } ``` However, there were a few downsides: - Of course, it's not very conventional! So it can be a bit disturbing for people used to classical builders. - I didn't make the distinction between required and optional parameters (they were all optional!) - The internal fields were not `final`, and I felt they should be. ## Discovering Jilt When searching on this topic, I found [Adam Ruka](https://x.com/adam_ruka)'s great annotation processor library: [Jilt](https://github.com/skinny85/jilt). One of the really cool features of Jilt is its staged builder concept, which makes builders very type-safe, and forces you to call all the required property methods by chaining them. I found this approach very elegant. Adam heard about my functional builder approach, and decided to implement this new style of builder in Jilt. There are a few differences with my implementation, but it palliates some of the downsides I mentioned. Let's have a look at what functional builders looks like from a usage standpoint: ```java LanguageModel languageModel = languageModel( name("cool-model"), project("my-project"), temperature(0.5), description("This is a generative model") ); ``` Compared to my approach, you're not using constructors (as annotation processors can't change existing classes), so you have to use a static method instead. But otherwise, inside that method call, you have the named-parameter-like methods you're used to use in builders. Here, `name()`, `project()` and `temperature()` are mandatory, and you'd get a compilation error if you forgot one of them. But `description()` is optional and can be ommitted. Let's now look at the implementation: ```java import org.jilt.Builder; import org.jilt.BuilderStyle; import org.jilt.Opt; import static jilt.testing.LanguageModelBuilder.*; import static jilt.testing.LanguageModelBuilder.Optional.description; //... LanguageModel languageModel = languageModel( name("cool-model"), project("my-project"), temperature(0.5), description("This is a generative model") ); //... @Builder(style = BuilderStyle.FUNCTIONAL) public record LanguageModel( String name, String project, Double temperature, @Opt String description ) {} ``` I used a Java `record` but it could be a good old POJO. You must annotate that class with the `@Builder` annotation. The `style` parameter specifies that you want to use a _functional_ builder. Notice the use of the `@Opt` annotation to say that a parameter is not required. ## Derived instance creation Let me close this article with another neat trick offered by Jilt, which is how to build other instances from existing ones: ```java @Builder(style = BuilderStyle.FUNCTIONAL, toBuilder = "derive") public record LanguageModel(...) {} //... LanguageModel derivedModel = derive(languageModel, name("new-name")); ``` By adding the `toBuilder = "derive"` parameter to the annotation, you get the ability to create new instances similar to the original one, but you can change both required and optional parameters, to derive a new instance. ## Time to try Jilt! You can try functional builders in [Jilt 1.6](https://github.com/skinny85/jilt) which was just released a few days ago! # Let's make Gemini Groovy! The happy users of [Gemini Advanced](https://gemini.google.com/advanced), the powerful AI web assistant powered by the Gemini model, can execute some Python code, thanks to a built-in Python interpreter. So, for math, logic, calculation questions, the assistant can let Gemini invent a Python script, and execute it, to let users get a more accurate answer to their queries. But wearing my [Apache Groovy](https://groovy-lang.org/) hat on, I wondered if I could get Gemini to invoke some Groovy scripts as well, for advanced math questions! ## LangChain4j based approach As usual, my tool of choice for any LLM problem is the powerful [LangChain4j](https://docs.langchain4j.dev/) framework! Interestingly, there are already some code engine integrations, - a [GraalVM Polyglot Truffle](https://www.graalvm.org/latest/reference-manual/polyglot-programming/) engine, that can execute Python and JavaScript code, - a [Judge0](https://judge0.com/) engine that uses the Judge0 online code execution system, which also supports Groovy! I haven't tried Judge0 yet, as I saw it was supporting Groovy 3 only, and not yet Groovy 4. But for math or logic questions, Groovy 3 is just fine anyway. Instead, I wanted to explore how to create my own Groovy interpreter! In the following experiment, I'm going to use the [Gemini](https://deepmind.google/technologies/gemini/) model, because it supports _function calling_, which means we can instruct the model that it can use some tools when needed. Let's walk through this step by step. First, I instantiate a Gemini chat model: ```java var model = VertexAiGeminiChatModel.builder() .project("MY_GCP_PROJECT_ID") .location("us-central1") .modelName("gemini-1.5-flash-001") .maxRetries(1) .build(); ``` Then, I create a tool that is able to run Groovy code, thanks to the `GroovyShell` evaluator: ```java class GroovyInterpreter { @Tool("Execute a Groovy script and return the result of its execution.") public Map executeGroovyScript( @P("The groovy script source code to execute") String groovyScript) { String script = groovyScript.replace("\\n", "\n"); System.err.format("%n--> Executing the following Groovy script:%n%s%n", script); try { Object result = new GroovyShell().evaluate(script); return Map.of("result", result == null ? "null" : result.toString()); } catch (Throwable e) { return Map.of("error", e.getMessage()); } } } ``` Notice the `@Tool` annotation that describes what this tool can do. And the `@P` annotation which explains what the parameter is about. I noticed that sometimes the raw script that Gemini suggested contained some `\n` strings, instead of the plain newline characters, so I'm replacing them with newlines instead. I return a map containing either a result (as a string), or an error message if one was encountered. Now it's time to create our assistant contract, in the form of an interface, but with a very carefully crafted system instruction: ```java interface GroovyAssistant { @SystemMessage(""" You are a problem solver equipped with the capability of \ executing Groovy scripts. When you need to or you're asked to evaluate some math \ function, some algorithm, or some code, use the \ `executeGroovyScript` function, passing a Groovy script \ that implements the function, the algorithm, or the code \ that needs to be run. In the Groovy script, return a value. Don't print the result \ to the console. Don't use semicolons in your Groovy scripts, it's not necessary. When reporting the result of the execution of a script, \ be sure to show the content of that script. Call the `executeGroovyScript` function only once, \ don't call it in a loop. """) String chat(String msg); } ``` This complex system instruction above tells the model what its role is, and that it should call the provided Groovy script execution function whenever it encounters the need to calculate some function, or execute some logic. I also instruct it to return values instead of printing results. Funnily, Gemini is a pretty decent Groovy programmer, but it insists on always adding semi-colons like in Java, so for a more _idiomatic_ code style, I suggest it to get rid of them! The final step is now to create our LangChain4j AI service with the following code: ```java var assistant = AiServices.builder(GroovyAssistant.class) .chatLanguageModel(model) .chatMemory(MessageWindowChatMemory.withMaxMessages(20)) .tools(new GroovyInterpreter()) .build(); ``` I combine the Gemini chat model, with a memory to keep track of users' requests, and the Groovy interpreter tool I've just created. Now let's see if Gemini is able to create and calculate a fibonacci function: ```java System.out.println( assistant.chat( "Write a `fibonacci` function, and calculate `fibonacci(18)`")); ``` And the output is as follows: > ```groovy > def fibonacci(n) { > if (n <= 1) { > return n > } else { > return fibonacci(n - 1) + fibonacci(n - 2) > } > } > fibonacci(18) > ``` > > The result of executing the script is: 2584. ## Discussion It took me a bit of time to find the right system instruction to get Groovy scripts that complied to my requirements. However, I noticed sometimes some internal errors returned by the model, which I haven't fully understood (and particularly why those happen at all) On some occasions, I also noticed that LangChain4j keeps sending the same script for execution, in a loop. Same thing: I still have to investigate why this rare behavior happens. So this solution is a fun experiment, but I'd call it just that, an experiment, as it's not as rock-solid as I want it to be. But if I manage to make it more bullet-proof, maybe I could contribute it back as a dedicated execution engine for LangChain4j! ## Full source code Here's the full content of my experiment: ```java import dev.langchain4j.agent.tool.P; import dev.langchain4j.agent.tool.Tool; import dev.langchain4j.memory.chat.MessageWindowChatMemory; import dev.langchain4j.model.vertexai.VertexAiGeminiChatModel; import dev.langchain4j.service.AiServices; import dev.langchain4j.service.SystemMessage; import groovy.lang.GroovyShell; import java.util.Map; public class GroovyCodeInterpreterAssistant { public static void main(String[] args) { var model = VertexAiGeminiChatModel.builder() .project("MY_GCP_PROJECT_ID") .location("us-central1") .modelName("gemini-1.5-flash-001") .maxRetries(1) .build(); class GroovyInterpreter { @Tool("Execute a Groovy script and return the result of its execution.") public Map executeGroovyScript( @P("The groovy script source code to execute") String groovyScript) { System.err.format("%n--> Raw Groovy script:%n%s%n", groovyScript); String script = groovyScript.replace("\\n", "\n"); System.err.format("%n--> Executing:%n%s%n", script); try { Object result = new GroovyShell().evaluate(script); return Map.of("result", result == null ? "null" : result.toString()); } catch (Throwable e) { return Map.of("error", e.getMessage()); } } } interface GroovyAssistant { @SystemMessage(""" You are a problem solver equipped with the capability of \ executing Groovy scripts. When you need to or you're asked to evaluate some math \ function, some algorithm, or some code, use the \ `executeGroovyScript` function, passing a Groovy script \ that implements the function, the algorithm, or the code \ that needs to be run. In the Groovy script, return a value. Don't print the result \ to the console. Don't use semicolons in your Groovy scripts, it's not necessary. When reporting the result of the execution of a script, \ be sure to show the content of that script. Call the `executeGroovyScript` function only once, \ don't call it in a loop. """) String chat(String msg); } var assistant = AiServices.builder(GroovyAssistant.class) .chatLanguageModel(model) .chatMemory(MessageWindowChatMemory.withMaxMessages(20)) .tools(new GroovyInterpreter()) .build(); System.out.println( assistant.chat( "Write a `fibonacci` function, and calculate `fibonacci(18)`")); } } ``` # Grounding Gemini with Web Search results in LangChain4j The latest [release of LangChain4j](https://github.com/langchain4j/langchain4j/releases/tag/0.31.0) (version 0.31) added the capability of _grounding_ large language models with results from web searches. There's an integration with [Google Custom Search Engine](https://developers.google.com/custom-search/v1/overview), and also [Tavily](https://tavily.com/). The fact of _grounding_ an LLM's response with the results from a search engine allows the LLM to find relevant information about the query from web searches, which will likely include up-to-date information that the model won't have seen during its training, past its cut-off date when the training ended. > **Remark:** Gemini has a built-in [Google Web Search grounding](https://cloud.google.com/vertex-ai/generative-ai/docs/grounding/overview#ground-public) > capability, however, LangChain4j's Gemini integration doesn't yet surface this feature. > I'm currently working on a pull request to support this. ## Asking questions to your website An interesting use case for LLM web search grounding is for example if you want to search a particular website. I was interested in asking questions related to articles that I have posted on my personal website and blog. Let's see, step by step, how you can implement this. ### Creating a custom search engine First of all, as I decided to use Google Custom Search, I created a new custom search engine. I won't detail the steps involved in this process, as it's explained in the [documentation](https://developers.google.com/custom-search/docs/tutorial/creatingcse). I created a custom search searching only the content on my website: [glaforge.dev](https://glaforge.dev). But you can potentially search the whole internet if you wish, or just your company website, etc. Google Custom Search gave me an API key, as well as a Custom Search ID (csi) for my newly created custom search engine. You can test the custom search engine with that ID with this URL: [https://programmablesearchengine.google.com/controlpanel/overview?cx=YOUR_CSI_HERE](https://programmablesearchengine.google.com/controlpanel/overview?cx=YOUR_CSI_HERE). It gives you a Google Search-like interface where you can enter your queries. There's also a widget that you can integrate in your website if you wish. ### Implementation First of all, I configure the chat model I want to use. I'm using the latest and fastest Gemini model: [Gemini 1.5 Flash](https://deepmind.google/technologies/gemini/flash/). I've saved my Google Cloud project ID and locaction in environment variables. ```java VertexAiGeminiChatModel model = VertexAiGeminiChatModel.builder() .project(System.getenv("PROJECT_ID")) .location(System.getenv("LOCATION")) .modelName("gemini-1.5-flash-001") .build(); ``` Next, I configure my web search engine. Here, I'm using Google Search, but it could be Tavily as well. I also saved my API key and the ID of my custom web search in environment variables: ```java WebSearchEngine webSearchEngine = GoogleCustomWebSearchEngine.builder() .apiKey(System.getenv("GOOGLE_CUSTOM_SEARCH_API_KEY")) .csi(System.getenv("GOOGLE_CUSTOM_SEARCH_CSI")) // .logRequests(true) // .logResponses(true) .build(); ``` Note that you can log the requests and responses, for debugging purpose. Next, I define a _content retriever_, this is a way to let LangChain4j know that _content_ can be _retrieved_ from a particular tool or location: ```java ContentRetriever contentRetriever = WebSearchContentRetriever.builder() .webSearchEngine(webSearchEngine) .maxResults(3) .build(); ``` Now, I define the contract I want to use to interact with my Gemini model, by creating my own custom search `interface`: ```java interface SearchWebsite { String search(String query); } ``` This interface will be implemented by LangChain4j's `AiServices` system that binds several components together: the chat language model (here, Gemini), and the web search content retriever I created above: ```java SearchWebsite website = AiServices.builder(SearchWebsite.class) .chatLanguageModel(model) .contentRetriever(contentRetriever) .build(); ``` Then I can ask my question to the LLM, which will find the relevant information in my blog: ```java String response = website.search( "How can I call the Gemma model from LangChain4j?"); System.out.println("response = " + response); ``` If I comment out the line `contentRetriever(contentRetriever)`, Gemini does a best effort at answering my question, but since there's nothing in its training data (before its cut-off date) about how to call the [Gemma](https://blog.google/technology/developers/gemma-open-models/) model from LangChain4j, it is not able to provide a useful answer. But with the web search content retriever, Gemini is able to find the right material to ground its answer, as the custom search returns my article on [calling Gemma with Ollama, Testcontainers, and LangChain4j](https://glaforge.dev/posts/2024/04/04/calling-gemma-with-ollama-and-testcontainers/): ``` Based on the provided information, you can call the Gemma model from LangChain4j using the following approach: 1. **Use Ollama:** The articles highlight Ollama as a tool for interacting with Gemma. You would need to set up Ollama and ensure it has access to the Gemma model. 2. **Integrate TestContainers:** TestContainers helps you manage containerized environments for testing. You can use it to run Ollama within a container alongside LangChain4j. 3. **Utilize LangChain4j:** LangChain4j provides the framework for interacting with large language models. You would define your prompt, send it to Ollama (which runs Gemma), and receive the response back through LangChain4j. **Example Steps:** 1. **Set up Ollama:** Install Ollama and configure it to use the Gemma model. 2. **Create a Dockerfile:** Use a Dockerfile to define an image that includes Ollama and any dependencies. 3. **Run Ollama in a container using TestContainers:** Start the container using TestContainers and ensure it is accessible from your LangChain4j code. 4. **Implement LangChain4j calls:** Use LangChain4j to construct your prompt and send it to Ollama (which will pass it to Gemma). 5. **Receive and process the response:** Receive the generated response from Gemma and process it as needed in your Java application. **Note:** These steps provide a general approach. You will need to refer to the documentation for Ollama, TestContainers, and LangChain4j for specific implementation details. This method leverages Ollama as an intermediary to access Gemma. If you have access to Google's Gemini model directly, you might be able to integrate it with LangChain4j without the Ollama step, depending on the specific API or SDK offered by Google. ``` The LLM found that I have to use [Ollama](https://ollama.com/) and [TestContainers](https://testcontainers.com/), as explained in my article. This information wasn't part of my query, so it proves that it really found the info in the article. ## Discussion The LLM based its answer on the _excerpts_ contained in the search results, not the whole content of the article, so some aspects of this answer are not totally correct: For instance, you don't have to _install_ Ollama or create your own _Dockerfile_. To make the response perfect, I believe we would have to combine web search results with Retrieval Augmented Generation, or pass the whole context of the article to the model, so that it could provide a more thorough and factual answer. For different queries that lead to shorter answers, the answer would probably be more to the point. Another approach is to annotate our `String search(String query)` method with a `@SystemInstruction()` with instructions that encourage the LLM to provide a shorter answer. But it's difficult to find the right balance between too long and too short, and of course without any sort of hallucinations! For example, you can try with the following system instruction: ```java interface SearchWebsite { @SystemMessage(""" Provide a paragraph-long answer, not a long step by step explanation. Reply with "I don't know the answer" if the provided information isn't relevant. """) String search(String query); } ``` I got the following response: ``` The provided information mentions using Gemma with Ollama, TestContainers, and LangChain4j. You can use Ollama, a local LLM server, and TestContainers, which provides lightweight, disposable containers, to set up a testing environment. Then, with LangChain4j, a Java library for interacting with LLMs, you can call Gemma through the Ollama server. ``` Which is shorter and more factual, without being too short either! ## What's next? In an upcoming article, I'll show you how to use Gemini's built-in Google Search grounding, but first, I have to finish my pull request for the LangChain4j project! Or I can explore how to reply more precisely to queries that lead to complex answers like the above, maybe combinging a RAG approach to get the full context of the article found by the web search. Also, the Tavily API seems to be able to return the raw content of the article, so maybe it can help giving the LLM the full context of the article to base its answers on it. So that may be worth comparing those two web search integrations too. Stay tuned! ## Full sample code For reference, here is the full sample (with the system instruction approach): ```java import dev.langchain4j.model.vertexai.VertexAiGeminiChatModel; import dev.langchain4j.rag.content.retriever.ContentRetriever; import dev.langchain4j.rag.content.retriever.WebSearchContentRetriever; import dev.langchain4j.service.AiServices; import dev.langchain4j.service.SystemMessage; import dev.langchain4j.web.search.WebSearchEngine; import dev.langchain4j.web.search.google.customsearch.GoogleCustomWebSearchEngine; public class GroundingWithSearch { public static void main(String[] args) { VertexAiGeminiChatModel model = VertexAiGeminiChatModel.builder() .project(System.getenv("PROJECT_ID")) .location(System.getenv("LOCATION")) .modelName("gemini-1.5-flash-001") .build(); WebSearchEngine webSearchEngine = GoogleCustomWebSearchEngine.builder() .apiKey(System.getenv("GOOGLE_CUSTOM_SEARCH_API_KEY")) .csi(System.getenv("GOOGLE_CUSTOM_SEARCH_CSI")) // .logRequests(true) // .logResponses(true) .build(); ContentRetriever contentRetriever = WebSearchContentRetriever.builder() .webSearchEngine(webSearchEngine) .maxResults(3) .build(); interface SearchWebsite { @SystemMessage(""" Provide a paragraph-long answer, not a long step by step explanation. Reply with "I don't know the answer" if the provided information isn't relevant. """) String search(String query); } SearchWebsite website = AiServices.builder(SearchWebsite.class) .chatLanguageModel(model) .contentRetriever(contentRetriever) .build(); String response = website.search( "How can I call the Gemma model from LangChain4j?"); System.out.println("response = " + response); } } ``` # Gemini, Google's Large Language Model, for Java Developers As a follow-up to my talk on [generative AI for Java developers]({{< ref "/talks/2023/11/13/gen-ai-with-palm-2-and-java" >}}), I've developed a new presentation that focuses more on the [Gemini](https://deepmind.google/technologies/gemini/#introduction) large multimodal model by Google. In this talk, we cover the multimodality capabilities of the model, as it's able to ingest code, PDF, audio, video, and is able to reason about them. Another specificity of Gemini is its huge context window of up to 1 million tokens! This opens interesting perspectives, especially in multimodal scenarios. We also talk about the [Gemma](https://blog.google/technology/developers/gemma-open-models/) model, a small open-weights model in the Gemini family, which I covered recently about how to [run it locally thanks to Ollama and Testcontainers]({{< ref "/posts/2024/04/04/calling-gemma-with-ollama-and-testcontainers.md" >}}). In that presentation, I'm showing some of my past Gemini-powered demos, as well as the code examples in my [Gemini workshop for Java developers]({{< ref "/posts/2024/03/27/gemini-codelab-for-java-developers.md" >}}), using [LangChain4j](https://docs.langchain4j.dev/). {{< speakerdeck 202b1956e3b747afa85cbf5d1b40bf20 >}} # Calling Gemma with Ollama, TestContainers, and LangChain4j Lately, for my Generative AI powered Java apps, I've used the [Gemini](https://deepmind.google/technologies/gemini/#introduction) multimodal large language model from Google. But there's also [Gemma](https://blog.google/technology/developers/gemma-open-models/), its little sister model. Gemma is a family of lightweight, state-of-the-art open models built from the same research and technology used to create the Gemini models. Gemma is available in two sizes: 2B and 7B. Its weights are freely available, and its small size means you can run it on your own, even on your laptop. So I was curious to give it a run with [LangChain4j](https://docs.langchain4j.dev/). ## How to run Gemma There are many ways to run Gemma: in the cloud, via [Vertex AI](https://console.cloud.google.com/vertex-ai/publishers/google/model-garden/335) with a click of a button, or [GKE](https://cloud.google.com/kubernetes-engine/docs/tutorials/serve-gemma-gpu-vllm) with some GPUs, but you can also run it locally with [Jlama](https://github.com/tjake/Jlama) or [Gemma.cpp](https://github.com/google/gemma.cpp). Another good option is to run Gemma with [Ollama](https://ollama.com/), a tool that you install on your machine, and which lets you run small models, like Llama 2, Mistral, and [many others](https://ollama.com/library). They quickly added support for [Gemma](https://ollama.com/library/gemma) as well. Once installed locally, you can run: ```bash ollama run gemma:2b ollama run gemma:7b ``` Cherry on the cake, the [LangChain4j]() library provides an [Ollama module](https://docs.langchain4j.dev/integrations/language-models/ollama), so you can plug Ollama supported models in your Java applications easily. ## Containerization After a great discussion with my colleague [Dan Dobrin](https://twitter.com/ddobrin) who had worked with Ollama and TestContainers ([#1](https://github.com/GoogleCloudPlatform/serverless-production-readiness-java-gcp/blob/main/sessions/next24/books-genai-vertex-langchain4j/src/test/java/services/OllamaContainerTest.java) and [#2](https://github.com/GoogleCloudPlatform/serverless-production-readiness-java-gcp/blob/main/sessions/next24/books-genai-vertex-langchain4j/src/test/java/services/OllamaChatModelTest.java#L37)) in his [serverless production readiness workshop](https://github.com/GoogleCloudPlatform/serverless-production-readiness-java-gcp/tree/main), I decided to try the approach below. Which brings us to the last piece of the puzzle: Instead of having to install and run Ollama on my computer, I decided to use Ollama within a container, handled by [TestContainers](https://testcontainers.com/). TestContainers is not only useful for testing, but you can also use it for driving containers. There's even a specific [OllamaContainer](https://java.testcontainers.org/modules/ollama/) you can take advantage of! So here's the whole picture: ![](/img/gemini/gemma-ollama-testcontainers-langchain4j.png) ## Time to implement this approach! You'll find the code in the Github [repository](https://github.com/glaforge/gemini-workshop-for-java-developers/blob/main/app/src/main/java/gemini/workshop/CallGemma.java) accompanying my recent [Gemini workshop](https://codelabs.developers.google.com/codelabs/gemini-java-developers) Let's start with the easy part, interacting with an Ollama supported model with LangChain4j: ```java OllamaContainer ollama = createGemmaOllamaContainer(); ollama.start(); ChatLanguageModel model = OllamaChatModel.builder() .baseUrl(String.format("http://%s:%d", ollama.getHost(), ollama.getFirstMappedPort())) .modelName("gemma:2b") .build(); String response = model.generate("Why is the sky blue?"); System.out.println(response); ``` - You run an Ollama test container. - You create an Ollama chat model, by pointing at the address and port of the container. - You specify the model you want to use. - Then, you just need to call `model.generate(yourPrompt)` as usual. Easy? Now let's have a look at the trickier part, my local method that creates the Ollama container: ```java // check if the custom Gemma Ollama image exists already List listImagesCmd = DockerClientFactory.lazyClient() .listImagesCmd() .withImageNameFilter(TC_OLLAMA_GEMMA_2_B) .exec(); if (listImagesCmd.isEmpty()) { System.out.println("Creating a new Ollama container with Gemma 2B image..."); OllamaContainer ollama = new OllamaContainer("ollama/ollama:0.1.26"); ollama.start(); ollama.execInContainer("ollama", "pull", "gemma:2b"); ollama.commitToImage(TC_OLLAMA_GEMMA_2_B); return ollama; } else { System.out.println("Using existing Ollama container with Gemma 2B image..."); // Substitute the default Ollama image with our Gemma variant return new OllamaContainer( DockerImageName.parse(TC_OLLAMA_GEMMA_2_B) .asCompatibleSubstituteFor("ollama/ollama")); } ``` You need to create a derived Ollama container that pulls in the Gemma model. Either this image was already created beforehand, or if it doesn't exist yet, you create it. Use the Docker Java client to check if the custom Gemma image exists. If it doesn't exist, notice how TestContainers let you create an image derived from the base Ollama image, pull the Gemma model, and then commit that image to your local Docker registry. Otherwise, if the image already exists (ie. you created it in a previous run of the application), you're just going to tell TestContainers that you want to substitute the default Ollama image with your Gemma-powered variant. ## And voila! You can **call Gemma locally on your laptop, in your Java apps, using LangChain4j**, without having to install and run Ollama locally (but of course, you need to have a Docker daemon running). Big thanks to [Dan Dobrin](https://twitter.com/ddobrin) for the approach, and to [Sergei](https://twitter.com/bsideup), [Eddú](https://twitter.com/EdduMelendez) and [Oleg](https://twitter.com/shelajev) from TestContainers for the help and useful pointers. # Gemini codelab for Java developers using LangChain4j No need to be a Python developer to do Generative AI! If you're a Java developer, you can take advantage of [LangChain4j](https://docs.langchain4j.dev/) to implement some advanced LLM integrations in your Java applications. And if you're interested in using [Gemini](https://blog.google/technology/ai/google-gemini-next-generation-model-february-2024/), one of the best models available, I invite you to have a look at the following "codelab" that I worked on: [Codelab — Gemini for Java Developers using LangChain4j](https://codelabs.developers.google.com/codelabs/gemini-java-developers) In this workshop, you'll find various examples covering the following use cases, in _crescendo_ approach: - Making your fist call to Gemini (streaming & non-streaming) - Maintaining a conversation - Taking advantage of multimodality by analysing images with your prompts - Extracting structured information from unstructured text - Using prompt templates - Doing text classification with few-shot prompting - Implementing Retrieval Augmented Generation to chat with your documentation - How to do Function Calling to expand the LLM to interact with external APIs and services You'll find all the [code samples on Github](https://github.com/glaforge/gemini-workshop-for-java-developers). If you're attending Devoxx France, be sure to attend the [Hands-on-Lab workshop](https://www.devoxx.fr/en/schedule/talk/?id=40285) with my colleagues [Mete Atamel](https://twitter.com/meteatamel) and [Valentin Deleplace](https://twitter.com/val_deleplace) who will guide you through this codelab. # Visualize PaLM-based LLM tokens As I was working on tweaking the Vertex AI text embedding model in [LangChain4j](https://github.com/langchain4j), I wanted to better understand how the `textembedding-gecko` [model](https://cloud.google.com/vertex-ai/docs/generative-ai/model-reference/text-embeddings) tokenizes the text, in particular when we implement the [Retrieval Augmented Generation](https://arxiv.org/abs/2005.11401) approach. The various PaLM-based models offer a `computeTokens` endpoint, which returns a list of tokens (encoded in Base 64) and their respective IDs. > **Note:** At the time of this writing, there's no equivalent endpoint for Gemini models. So I decided to create a [small application](https://tokens-lpj6s2duga-ew.a.run.app/) that lets users: - input some text, - select a model, - calculate the number of tokens, - and visualize them with some nice pastel colors. The available PaLM-based models are: - `textembedding-gecko` - `textembedding-gecko-multilingual` - `text-bison` - `text-unicorn` - `chat-bison` - `code-gecko` - `code-bison` - `codechat-bison` You can [try the application](https://tokens-lpj6s2duga-ew.a.run.app/) online. And also have a look at the [source code](https://github.com/glaforge/llm-text-tokenization) on Github. It's a [Micronaut](https://micronaut.io/) application. I serve the static assets as explained in my recent [article](https://glaforge.dev/posts/2024/01/21/serving-static-assets-with-micronaut/). I deployed the application on [Google Cloud Run](https://cloud.run/), the easiest way to deploy a container, and let it auto-scale for you. I did a source based deployment, as explained at the bottom [here]({{< ref "posts/2022/10/24/build-deploy-java-17-apps-on-cloud-run-with-cloud-native-buildpacks-on-temurin.md" >}}). And _voilà_ I can visualize my LLM tokens! # Image generation with Imagen and LangChain4j This week [LangChain4j](https://github.com/langchain4j "LangChain4j"), the LLM orchestration framework for Java developers, released version [0.26.1](https://github.com/langchain4j/langchain4j/releases/tag/0.26.1 "0.26.1"), which contains my first significant contribution to the open source project: **support for the Imagen image generation model**. **Imagen** is a text-to-image diffusion model that was [announced](https://imagen.research.google/ "announced") last year. And it recently upgraded to [Imagen v2](https://deepmind.google/technologies/imagen-2/ "Imagen v2"), with even higher quality graphics generation. As I was curious to integrate it in some of my generative AI projects, I thought that would be a great first [contribution](https://github.com/langchain4j/langchain4j/pull/456 "contribution") to LangChain4j. > **Caution:** At the time of this writing, image generation is still only for allow-listed accounts. > > Furthermore, to run the snippets covered below, you should have an account on Google Cloud Platform, > created a project, configured a billing account, enabled the Vertex AI API, > and authenticated with the gcloud SDK and the command: > `gcloud auth application-default login`. Now let's dive in how to use Imagen v1 and v2 with LangChain4j in Java! ## Generate your first images In the following examples, I'm using the following constants, to point at my project details, the endpoint, the region, etc: ```java private static final String ENDPOINT = "us-central1-aiplatform.googleapis.com:443"; private static final String LOCATION = "us-central1"; private static final String PROJECT = "YOUR_PROJECT_ID"; private static final String PUBLISHER = "google"; ``` First, we're going to create an instance of the model: ```java VertexAiImageModel imagenModel = VertexAiImageModel.builder() .endpoint(ENDPOINT) .location(LOCATION) .project(PROJECT) .publisher(PUBLISHER) .modelName("imagegeneration@005") .maxRetries(2) .withPersisting() .build(); ``` There are 2 models you can use: - `imagegeneration@005` corresponds to Imagen 2 - `imagegeneration@002` is the previous version (Imagen 1) In this article, we'll use both models. Why? Because currently Imagen 2 doesn't support image editing, so we'll have to use Imagen 1 for that purpose. The configuration above uses `withPersisting()` to save the generated images in a temporary folder on your system. If you don't persist the image files, the content of the image is avaiable as Base 64 encoded bytes in the `Image`s objects returned. You can also specify `persistTo(somePath)` to specify a particular directory where you want the generated files to be saved. Let's create our first image: ```java Response imageResponse = imagenModel.generate( "watercolor of a colorful parrot drinking a cup of coffee"); ``` The `Response` object wraps the created `Image`. You can get the `Image` by calling `imageResponse.getContent()`. And you can retrieve the URL of the image (if saved locally) with `imageResponse.getContent().url()`. The Base 64 encoded bytes can be retrieved with `imageResponse.getContent().base64Data()` Some other tweaks to the model configuration: - Specify the **language** of the prompt: `language("ja")` (if the language is not officially supported, it's usually translated back to English anyway). - Define a **negative prompt** with things you don't want to see in the picture: `negativePrompt("black feathers")`. - Use a particular **seed** to always generate the same image with the same seed: `seed(1234L)`. So if you want to generate a picture of a pizza with a prompt in Japanese, but you don't want to have pepperoni and pineapple, you could configure your model and generate as follows: ```java VertexAiImageModel imagenModel = VertexAiImageModel.builder() .endpoint(ENDPOINT) .location(LOCATION) .project(PROJECT) .publisher(PUBLISHER) .modelName("imagegeneration@005") .language("ja") .negativePrompt("pepperoni, pineapple") .maxRetries(2) .withPersisting() .build(); Response imageResponse = imagenModel.generate("ピザ"); // pizza ``` ## Image editing with Imagen 1 With Imagen 1, you can [edit](https://cloud.google.com/vertex-ai/docs/generative-ai/image/edit-images?hl=en "edit") existing images: - **mask-based editing:** you can specify a mask, a black & white image where the white parts are the corresponding parts of the original image that should be edited, - **mask free editing:** where you just give a prompt and let the model figure out what should be edited on its own or following the prompt. When generating and editing with Imagen 1, you can also configure the model to use a particular style (with Imagen 2, you just specify it in the prompt) with `sampleImageStyle(VertexAiImageModel.ImageStyle.photograph)`: - `photograph` - `digital_art` - `landscape` - `sketch` - `watercolor` - `cyberpunk` - `pop_art` When editing an image, you may wish to decide how strong or not the modification should be, with `.guidanceScale(100)`. Usually, between 0 and 20 or so, it's lightly edited, between 20 and 100 it's getting more impactful edits, and 100 and above it's the maximum edition level. Let's say I generated an image of a lush forrest (I'll use that as my original image): ```java VertexAiImageModel model = VertexAiImageModel.builder() .endpoint(ENDPOINT) .location(LOCATION) .project(PROJECT) .publisher(PUBLISHER) .modelName("imagegeneration@002") .seed(19707L) .sampleImageStyle(VertexAiImageModel.ImageStyle.photograph) .guidanceScale(100) .maxRetries(4) .withPersisting() .build(); Response forestResp = model.generate("lush forest"); ``` Now I want to edit my forrest to add a small red tree in the bottom of the image. I'm loading a black and white mask image with a white square at the bottom. And I pass the original image, the mask image, and the modification prompt, to the new `edit()` method: ```java URI maskFileUri = getClass().getClassLoader().getResource("mask.png").toURI(); Response compositeResp = model.edit( forestResp.content(), // original image to edit fromPath(Paths.get(maskFileUri)), // the mask image "red trees" // the new prompt ); ``` ![](/img/gemini/lush-forrest-red-tree.jpg) Another kind of editing you can do is to upscale an existing image. As far as I know, it's only supported for Imagen v1 for now, so we'll continue with that model. In this example, we'll generate an image of 1024x1024 pixels, and we'll scale it to 4096x4096: ```java VertexAiImageModel imagenModel = VertexAiImageModel.builder() .endpoint(ENDPOINT) .location(LOCATION) .project(PROJECT) .publisher(PUBLISHER) .modelName("imagegeneration@002") .sampleImageSize(1024) .withPersisting() .persistTo(defaultTempDirPath) .maxRetries(3) .build(); Response imageResponse = imagenModel.generate("A black bird looking itself in an antique mirror"); VertexAiImageModel imagenModelForUpscaling = VertexAiImageModel.builder() .endpoint(ENDPOINT) .location(LOCATION) .project(PROJECT) .publisher(PUBLISHER) .modelName("imagegeneration@002") .sampleImageSize(4096) .withPersisting() .persistTo(defaultTempDirPath) .maxRetries(3) .build(); Response upscaledImageResponse = imagenModelForUpscaling.edit(imageResponse.content(), ""); ``` And now you have a much bigger image! ## Conclusion That's about it for image generation and editing with Imagen in LangChain4j today! Be sure to use LangChain4j v0.26.1 which contains that new integration. And I'm looking forward to seeing the pictures you generate with it! m # Serving static assets with Micronaut My go-to framework when developing Java apps or microservices is [Micronaut](https://micronaut.io). For the apps that should have a web frontend, I rarely use [Micronaut Views](https://micronaut-projects.github.io/micronaut-views/latest/guide/) and its templating support. Instead, I prefer to just **serve static assets** from my resource folder, and have some JavaScript framework (usually [Vue.js](https://vuejs.org/)) to populate my HTML content (often using [Shoelace](https://shoelace.style/) for its nice Web Components). However, the [static asset documentation](https://docs.micronaut.io/latest/guide/#staticResources) is a bit light on explanations. So, since I always forget how to configure Micronaut to serve static assets, I thought that would be useful to document this here. In `/src/main/resources/application.properties`, I'm adding the following: ```properties micronaut.router.static-resources.default.paths=classpath:public micronaut.router.static-resources.default.mapping=/** micronaut.router.static-resources.default.enabled=true micronaut.server.cors.enabled=true ``` * The first line says that my resources will live in `src/main/resources/public/`. * The second line means the pattern will match recursively for sub-directories as well. * The `enabled` flag is to activate static serviing (not strictly needed as it's supposed to be enabled by default). * I also enabled CORS (cross-origin resource sharing). Then in `src/main/resources/public/`, I'll have my `index.html` file, my `css` and `js` folders. # Light Mode Bookmarlet A while ago, my friend Sylvain Wallez shared a little [bookmarlet](https://twitter.com/bluxte/status/1729912211882094701 "bookmarlet") on Twitter/X that transforms a dark mode site into light mode. I know the trend is towards dark mode, but for a lot of people with certain vision issues, for example with astigmatism like me, certain dark modes can very painful. This site about [vision](https://www.allaboutvision.com/digital-eye-strain/is-dark-mode-better-for-eyes/ "vision") (and you'll find other similar references) mentions that: > People who have myopia or **astigmatism** also may experience **halation** (from the word “halo”). > Halation occurs when light spreads past a certain boundary, creating a foggy or blurry appearance. So for certain websites, often with a too strong contrast, I'm using the following bookmarklet trick. Go to your bookmark manager, and save the following bookmarklet (I called mine "light mode"): ```javascript javascript:(function(){document.documentElement.style.filter=document.documentElement.style.filter?%27%27:%27invert(100%)%20hue-rotate(180deg)%27})(); ``` Now, to pretty print the above code and remove the URL encoded characters, to decypher what it does: ```javascript (function () { document.documentElement.style.filter = document.documentElement.style.filter ? "" : "invert(100%) hue-rotate(180deg)"; })(); ``` Two filters are going to be applied to your current web page: - First, it will completely **invert** all the colors, like a negative photography - Second, compared to Sylvain, I also add a **hue rotation** of 180 degrees ## Why the hue rotation Because the color inversion is also going to shift the colors: a red will become blue, a yellow will be dark blue, a violet will turn pink, etc. With a hue rotation, we get back the right color, a red is still red, a blue is still blue, etc. The different however will be in the lightness, as a light blue becomes dark, and a dark green becomes light. But at least, it's a bit more faithful to the original images. Here's a picture to highlight the differences. See how the rainbow picture is transformed: ![](/img/misc/invert-hue-roate.jpg) ## Possible improvements Perhaps we could avoid applying the filter globally, or at least avoid to apply it somehow to the images, so that they are not affected by those filters. At least for now, that's good enough for me! # Functional builder approach in Java In Java, builders are a pretty classical pattern for creating complex objects with lots of attributes. A nice aspect of builders is that they help reduce the number of constructors you need to create, in particular when not all attributes are required to be set (or if they have default values). However, I've always found builders a bit verbose with their `newBuilder()` / `build()` method combos, especially when you work with deeply nested object graphs, leading to lines of code of builders of builders of... As I was chatting about those builders with my colleague [Valentin](https://www.linkedin.com/in/deleplacevalentin/), who is a Go developer, he told me about Golang's functional builder approach. It's not a very common implementation practice for Java builders, but it's worth revisiting! ## First, the classical builder Let's start with an example. We want to create a builder for a class with a few attributes. Not all attributes are mandatory, some may have some default values, and we don't want to create as many constructors as possible combinations of attributes. Let me introduce you to my `SomeModel` class: ```java public class SomeModel { private String modelName; private Float temperature = 0.3f; private Integer maxOutputTokens = 100; // ... possibly many other attribtues private SomeModel(String modelName, Float temperature, Integer maxOutputTokens) { this.modelName = modelName; this.temperature = temperature; this.maxOutputTokens = maxOutputTokens; } } ``` Creating tons of constructors for the various model configurations can be painful. Furthermore, some attributes can have the same type, so from a user perspective, it's hard to know which value corresponds to which parameter type. So creating a builder can reduce that toil. We could write a static builder class inside `SomeModel` along the lines of: ```java public class SomeModelBuilder { private String modelName; private Float temperature = 0.3f; private Integer maxOutputTokens = 100; public SomeModelBuilder modelName(String modelName) { this.modelName = modelName; return this; } public SomeModelBuilder temperature(Float temperature) { this.temperature = temperature; return this; } public SomeModelBuilder maxOutputTokens(Integer maxOutputTokens) { this.maxOutputTokens = maxOutputTokens; return this; } public SomeModel build() { return new SomeModel(modelName, temperature, maxOutputTokens); } } ``` Inside `SomeModel` you would add a method to instantiate a builder: ```java public static SomeModelBuilder newBuilder() { return new SomeModelBuilder(); } ``` Then, the user would create a model instance with the builder as follows: ```java var model = SomeModel.newBuilder() .modelName("gemini") .temperature(0.2f) .maxOutputToken(300) .build(); ``` Not too bad. The are some variations to this approach, like passing the builder in the class' constructor, using setter methods that return `this`, using or not using final fields, etc. But they are mostly stylistic variations. However, I was wondering about this idea of a functional builder... ## Existing functional approaches in Java I haven't found much litterature on this theme. There are 2 blog posts ([here](https://medium.com/beingprofessional/think-functional-advanced-builder-pattern-using-lambda-284714b85ed5) and [there](https://www.innovect.com/advanced-builder-using-java-8-lambda)) that suggest an approach with lambda expressions and `Consumer`s, but I find it even more unconventional than the approach I'm going to describe further in this article: ```java SomeModel model = new SomeModelBuilder() .with($ -> { $.modelName = "Gemini"; $.temperature = 0.4f; }) .with($ -> $.maxOutputTokens = 100); ``` You can pass one or more lambdas in chained calls. It's the end-user who controls how the model is built, not the implementor, so I feel it's less safe. The use of the `$` sign is a bit of a syntactical hack to avoid repeating the name of the variable corresponding to the model. Finally, there's still a builder class after all, and maybe we can find a way to get rid of it. Let's see what Go has to offer instead, and if we can get some inspiration from it! ## The Go approach My colleague [Valentin](https://www.linkedin.com/in/deleplacevalentin/) pointed me at Dave Cheney's [article](https://dave.cheney.net/2014/10/17/functional-options-for-friendly-apis) on Go's functional option pattern. There's also a [video](https://www.youtube.com/watch?v=24lFtGHWxAQ) available. The idea is that the class' constructor takes function _options_ as a vararg paramter, that are able to modify the instance that's being built. Let's illustrate this with the following snippet. We create a `struct` that represents our model object like in our Java example: ```go package main import "fmt" type SomeModel struct { modelName string temperature float32 maxOutputTokens int } ``` We define a method to construct our model, which takes a vararg of options: ```go func NewModel(options ...func(*SomeModel)) (*SomeModel) { m := SomeModel{"", 0.3, 100} for _, option := range options { option(&m) } return &m } ``` Those options are actually functions that take a model object as parameter. Now we can create utility methods that create such option functions, and we pass the value for each field of the `struct` via the method parameter. So we have a method for each structure field: model name, temperature and max output tokens: ```go func modelName(name string) func(*SomeModel) { return func(m *SomeModel) { m.modelName = name } } func temperature(temp float32) func(*SomeModel) { return func(m *SomeModel) { m.temperature = temp } } func maxOutputTokens(max int) func(*SomeModel) { return func(m *SomeModel) { m.maxOutputTokens = max } } ``` Next we can create the model in the following way, by calling the utility methods that return functions that are able to modify the `struct`. ```go func main() { m := NewModel( modelName("gemini"), temperature(0.5), maxOutputTokens(100)) fmt.Println(m) } ``` Notice there's not even a `NewBuilder()` or `Build()` method! ## Let's implement our functional builder in Java! We can follow the same approach in Java. Instead of Go functions, we'll use Java's lambdas. Our lambdas will be converted into `Consumer`s of `SomeModel`. So let's recreate our `SomeModel` class, with the same fields as before. This time, however, the constructor won't be `private`, and it'll take a list of options (lambda expressions that consume instances of `SomeModel`). We'll iterate over all of them to execute them: ```java import java.util.function.Consumer; public class SomeModel { private String modelName; private Float temperature = 0.3f; private Integer maxOutputTokens = 100; public SomeModel(ModelOption... options) { for (Option option : options) { option.accept(this); } } ``` > **Update #1:** A neat [suggestion](https://gist.github.com/edeandrea/27ee1c61f05e640fc6fa1e19b8fb756e) > by Eric Deandrea on Twitter to use streams to filter the null options, > as I was not checking them with an extra `if` before calling `accept()` on the option: > > ```java > Optional.ofNullable(options) > .map(Stream::of) > .orElseGet(Stream::empty) > .forEach(option -> option.accept(this)) > ``` And what is this `ModelOption` class? This is just a synonym for a `Consumer` (so not strictly needed, but can help with readability). It's a nested interface: ```java public interface ModelOption extends Consumer {} ``` Next, we create similar utility methods that will update the model instance: ```java public static ModelOption modelName(String modelName) { return model -> model.modelName = modelName; } public static ModelOption temperature(Float temperature) { return model -> model.temperature = temperature; } public static ModelOption maxOutputTokens(Integer maxOutputTokens) { return model -> model.maxOutputTokens = maxOutputTokens; } } ``` Now, if we want to create a model, we'll be able to call the constructor as follows: ```java import fn.builder.SomeModel; import static fn.builder.SomeModel.*; //... SomeModel model = new SomeModel( modelName("gemini"), temperature(0.5f), maxOutputTokens(100) ); ``` Don't forget to use a `static import` to keep the syntax short. ## Discussion A few advantages I see with this approach: - I like the fact we're using a constructor to construct our model instances! - And the constructor is super simple and short! - It also means that the constructor won't change when there's going to be a new parameter to handle (better backward compatibility). On the other hand, with a traditional builder, the constructor could also take the builder itself as sole parameter. - I'm also happy that I got rid of the verbose `newBuilder()` / `build()` combo. It feels like we don't really have a builder at play here. - At first, I was wondering if I was opening the Pandora box, as I feared developers could provide their own lambda and potentially wreck havoc in my instance construction, but because of visibility rules, only my methods can modify the internals of the model class - Although we're using a constructor, the fact of passing those method calls as parameters, it feels a bit like having [named arguments](https://www.groovy-lang.org/objectorientation.html#_named_parameters) like in languages like Python or Groovy (which can also [create builders for you](https://www.groovy-lang.org/metaprogramming.html#xform-Builder) via AST transformations). It also looks more like the classical builder too, which has that readability aspect. - I can pass the arguments in whichever order I want. - I can put validation rules both in each mutator method and in the constructor after all mutators have been called. Potential tweaks: - I used non-final fields, because I wanted to be able to define my default values for some fields at definition time rather than in the constructor, but we could certainly tweak this implementation a bit if needed. And anyway, only my mutator methods can alter those fields, so I guess it's fine. - I was curious if I could use Java `enum`s for storing only my allowed mutators, but I haven't found an effective and concise way of implementing this. Java `enum`s don't work like Rust's, but there's an interesting article about this [here](https://www.reddit.com/r/java/comments/135i37c/rust_like_enums_in_java/) on how to implement sum types. - I wondered also about a mix of `sealed` `interface`s and maybe `record`s, but similarly to `enum`s, I couldn't find a nice and short syntax that I was happy with. In the cons: - It's a bit unconventional, as I haven't seen this approach implemented in the wild. So maybe the approach suffers in terms of readability. - The other concerns I have is with discoverability. When auto-completing code, an IDE like IntelliJ is smart enough to suggest the mutators methods can be used inside the constructor. But it's not that clear that such mutator methods exist. It's going to be important to document the constructor to say that those mutators exist. ## Feedback I'd be curious to hear your thoughts on this. Don't hesitate to interact with me on [Mastodon](https://uwyn.net/@glaforge/111766219413506355), [Twitter](https://twitter.com/glaforge/status/1747272263546905026), or [BlueSky](https://bsky.app/profile/glaforge.bsky.social/post/3kj47jxwseg2m) # URL slug or how to remove accents from strings in Java In this article, we'll figure out how to create _slugs_. Not the slobbery kind of little gastropods that crawls on the ground. Instead, we'll see how to create the short hyphened text you can see in the URL of your web browser, and that is often a **URL-friendly variation of the title of the article**. > Interestingly, one of the most popular posts on my blog is an almost 20 year old article that explains > how to [remove accents from a string]({{< ref "posts/2005/04/27/how-to-remove-accents-from-a-string" >}}). > And indeed, in _slugs_ you would like to remove accents, among other things. So what problem are we trying to solve today? Let's say you have an article whose title is _"L'été, où est tu ?"_ (which translates to: _"Summer, where have you been?"_"). You want your blog to have a friendly URL that looks like the title, but without the punctuation, or the accents (also called diacritical marks), and you also want to replace spaces with hyphens. The final URL should then be `https://myblog.com/l-ete-ou-est-tu`. A naive approach would be to try to replace all the letters with diacritical marks with their non marked equivalents. So don't try to replace _"é"_ with "e", etc. You'll likely miss some letters in some languages. A better approach is to take advantage of **Unicode normalization**. If you are interested, you can learn more about [unicode normalization](https://unicode.org/reports/tr15/images/UAX15-NormFig4.jpg) on the [Unicode.org](https://unicode.org/reports/tr15/) website. But in a nutshell, some letters, like accented letters, are a combination of a base letter, and a diacritical mark. Let's have a look at this image from the link above: ![](https://unicode.org/reports/tr15/images/UAX15-NormFig3.jpg) The uppercase [angström](https://www.compart.com/en/unicode/U+212B) letter is comprised of the _A uppercase letter_, and the _ring above_ diacritical mark. The composed letter has a unicode value of `U+212B` but can be decomposed into `U+0041` _(uppercase A)_ and `U+30A` _(ring above)_. I'll spare you from the details of the various normalization forms. But Java allows you to work with the normalized forms of letters thanks to the `java.text.Normalizer` class. We'll also take advantage of Java's regex `Pattern` class to identify particular classes of characters. > Be sure to check the Javadocs of the `Normalizer` and `Pattern` classes: > > - [Normalizer](https://docs.oracle.com/en/java/javase/21/docs/api/java.base/java/text/Normalizer.html) > - [Pattern](https://docs.oracle.com/en/java/javase/21/docs/api/java.base/java/util/regex/Pattern.html) > > The former explains how to do string normalization, and the latter will give you the list of available character classes. Let's have a look at the following Java snippet: ```java import java.text.Normalizer; String title = "L'été, où es tu ?" Normalizer.normalize(title, Normalizer.Form.NFD) .toLowerCase() // "l'été, où es tu ?" .replaceAll("\\p{IsM}+", "") // "l'ete, ou es tu ?" .replaceAll("\\p{IsP}+", " ") // "l ete ou es tu " .trim() // "l ete ou es tu" .replaceAll("\\s+", "-") // "l-ete-ou-es-tu" ``` My approach is usually the following: - First, I normalize the text into the `NFD` form _(canonical decomposition)_, so base characters and diacritical marks are now separated, - Then, I'm replacing all the uppercase letters with lowercase ones, - Next, we use the `IsM` property which selects the the diacritical marks, and we remove them - Simiarly, we look at the characters which are punctuation, with the `IsP` binary property, and replace them with spaces - I usually trim the string at that point, as I don't want to have spaces at the beginning or end of the strings (when a punctuation mark is replace with a space in the previous step) - Eventually, all the space characters are replaced with hyphens. ## Slugify Recently, I came across a Java library that takes care of creating slugs: [Slugify](https://github.com/slugify/slugify)! With Slugify, you can do a similar transformation as mine, with the following code: ```java import com.github.slugify.Slugify; Slugify slugify = Slugify.builder().build(); slugify.slugify("L'été, où es tu ?") // "lete-ou-es-tu" ``` A small difference here is that the quote is removed, which leads to having the string `lete` instead of `l-ete`. I find that a bit less readable at a glance, but Slugify has various knobs you can tweak to customize its output. There's one particular thing I like about this library, it's its use of the [ICU4J](https://unicode-org.github.io/icu/userguide/icu4j/) library, which supports **transliteration** _([ICU](https://icu.unicode.org/) is a well known set of libraries for full unicode and globalization support.)_ The problem with our examples above is that they work well for language with latin-like alphabets. But my examples keep characters like ideograms intact, and Slugify removes them by default. If you want to have URLs that stay within the ASCII realm, you can use [transliteration](https://en.wikipedia.org/wiki/Transliteration), which can map text in one language into readable latin-like text that sounds like the original text. So if I wanted to transliterate my string into ascii-friendly text, I could use Slugify's integration of ICU: ```java import com.github.slugify.Slugify; Slugify slugify = Slugify.builder() .transliterator(true) // use transliteration .locale(Locale.ENGLISH) .build() slugify.slugify("夏よ、どこにいるの?") // "xiayo-dokoniiruno" ``` # Gemini Function Calling A promising feature of the Gemini large language model released recently by [Google DeepMind](https://deepmind.google/), is the support for [function calls](https://ai.google.dev/docs/function_calling). It's a way to supplement the model, by letting it know an external functions or APIs can be called. So you're not limited by the knowledge cut-off of the model: instead, in the flow of the conversation with the model, you can pass a list of functions the model will know are available to get the information it needs, to complete the generation of its answer. For example, if you want to ask the model about the weather, it doesn't have the realtime information about the weather forecast. But we can tell it that there's a function that can be called, to get the forecast for a given location. Internally, the model will acknowledge it doesn't know the answer about the weather, but it will request that you call an external function that you describe, using a specific set of parameters which correspond to the user's request. Just days ago, I wrote about how to [get started with Gemini in Java]({{< ref "/posts/2023/12/13/get-started-with-gemini-in-java" >}}). In that article, we explored how to use the hand-written Java SDK that is available to interact with Gemini from Java. However, the Java SDK doesn't yet expose all the features of the model: in particular, function calling is missing. But not all hope is lost! Because under the hood, the SDK relies on the generated protobuf classes library, which exposes everything! > Soon, Gemini will be supported by [LangChain4j](https://github.com/langchain4j/langchain4j), > and the Java SDK will also provide an easier way to take care of function calling. > But in this article, I wanted to explore the use of the internal protobuf classes, to see how to best implement its support in the SDK. Let's go step by step! Instead of using the `GenerativeModel` API from the SDK, we'll go straight with the `PredictionServiceClient`: ```java try (VertexAI vertexAI = new VertexAI(projectId, location)) { PredictionServiceClient client = vertexAI.getPredictionServiceClient(); ... } ``` We need to prepare a function declaration to describe the kind of functions that the LLM can ask us to call, and we'll wrap it in a `Tool`: ```java FunctionDeclaration functionDeclaration = FunctionDeclaration.newBuilder() .setName("getCurrentWeather") .setDescription("Get the current weather in a given location") .setParameters( Schema.newBuilder() .setType(Type.OBJECT) .putProperties("location", Schema.newBuilder() .setType(Type.STRING) .setDescription("location") .build() ) .addRequired("location") .build() ) .build(); Tool tool = Tool.newBuilder() .addFunctionDeclarations(functionDeclaration) .build(); ``` Functions are described using classes that represent a subset of the OpenAPI 3 specification. > This is important to provide descriptions for the functions and its parameters, > as the LLM will use that information to figure out which function to call, and which parameters should be passed. Next, let's prepare a question asking about the weather in Paris, and configuring the text generation request with that prompt and the tool defined above: ```java String resourceName = String.format( "projects/%s/locations/%s/publishers/google/models/%s", vertexAI.getProjectId(), vertexAI.getLocation(), modelName); Content questionContent = ContentMaker.fromString("What's the weather in Paris?"); GenerateContentRequest questionContentRequest = GenerateContentRequest.newBuilder() .setEndpoint(resourceName) .setModel(resourceName) .addTools(tool) .addContents(questionContent) .build(); ResponseStream responseStream = new ResponseStream<>(new ResponseStreamIteratorWithHistory<>( client .streamGenerateContentCallable() .call(questionContentRequest) .iterator()) ); GenerateContentResponse generateContentResponse = responseStream.stream().findFirst().get(); Content callResponseContent = generateContentResponse.getCandidates(0).getContent(); ``` If you print the `callResponseContent` variable, you'll see that it contains a function call request, suggesting that you should call the predefined function with the parameter of `Paris`: ``` role: "model" parts { function_call { name: "getCurrentWeather" args { fields { key: "location" value { string_value: "Paris" } } } } } ``` At that point, as the developer, it's your turn to work a little, and make the call to that function yourself! Let's pretend I called an external Web Service that gives weather information, and that it returns some JSON payload that would look like so: ```json { "weather": "sunny", "location": "Paris" } ``` We need now to create a function response structure to pass that information back to the LLM: ```java Content contentFnResp = Content.newBuilder() .addParts(Part.newBuilder() .setFunctionResponse( FunctionResponse.newBuilder() .setResponse( Struct.newBuilder() .putFields("weather", Value.newBuilder().setStringValue("sunny").build()) .putFields("location", Value.newBuilder().setStringValue("Paris").build()) .build() ) .build() ) .build()) .build(); ``` Then, since LLMs are actually stateless beasts, we need to give it the whole context of the conversation again, passing the query, the function call response the model suggested us to make, as well as the response we got from the external weather service: ```java GenerateContentRequest generateContentRequest = GenerateContentRequest.newBuilder() .setEndpoint(resourceName) .setModel(resourceName) .addContents(questionContent) .addContents(callResponseContent) .addContents(contentFnResp) .addTools(tool) .build(); ``` And to finish, we'll invoke the `client` one last time with that whole dialog and information, and print a response out: ```java responseStream = new ResponseStream<>(new ResponseStreamIteratorWithHistory<>( client .streamGenerateContentCallable() .call(generateContentRequest) .iterator()) ); for (GenerateContentResponse resp : responseStream) { System.out.println(ResponseHandler.getText(resp)); } ``` And happily, Gemini will reply to us that: ``` The weather in Paris is sunny. ``` What a lovely way to start the holiday season with a nice and sunny weather! I wish you all happy year end festivities, and I look forward to seeing you next year. Hopefully next month, I'll be able to show you some cool new SDK features or the LangChain4j integration! Thanks for reading. # Visualize and Inspect Workflows Executions When using a service like Google Cloud [Workflows](https://cloud.google.com/workflows/), in particular as your workflows get bigger, it can be difficult to understand what's going on under the hood. With multiple branches, step jumps, iterations, and also parallel branches and iterations, if your workflow fails during an execution, until now, you had to check the execution status, or go deep through the logs to find more details about the failed step. I have good news for you! Workflows recently added some deeper introspection capability: you can now [view the history of execution steps](https://cloud.google.com/workflows/docs/debug-steps). From the Google Cloud console, you can see the lists of steps, and see the logical flow between them. The usual workflow visualisation will also highlight in green the successful steps, and in red the failed one. Of course, it is also possible to make a curl call to get the JSON of the [list of executed steps](https://cloud.google.com/workflows/docs/debug-steps#list-entries). Let's have a look! In the console, when you click on an execution, in the `summary` tab, you'll see not only the failed step, but also the nice workflow graph colored green and red: ![](/img/workflows-days/step-visu-1.png) That way, you know which path the execution followed, in a visual manner. But you can also see the actual list of steps executed, with more details, by clicking on the `steps` tab: ![](/img/workflows-days/step-visu-2.png) From this table, the filter will let you further refine particular type of steps you'd like to investigate, or visualise the steps of a subworkflow only, etc. This is a nice improvement to the developer experience, and for your ops team, to better understand what happens during your workflow executions! Feel free to read more about this new capabability in the documentation about [viewing the history of execution steps](https://cloud.google.com/workflows/docs/debug-steps). # Hands on Codelabs to dabble with Large Language Models in Java Hot on the heels of the [release of Gemini](https://glaforge.dev/posts/2023/12/13/get-started-with-gemini-in-java/), I'd like to share a couple of resources I created to get your hands on large language models, using [LangChain4J](https://github.com/langchain4j/), and the [PaLM 2](https://ai.google/discover/palm2/) model. Later on, I'll also share with you articles and codelabs that take advantage of Gemini, of course. The PaLM 2 model supports 2 modes: - text generation, - and chat. In the 2 codelabs, you'll need to have created an account on Google Cloud, and created a project. The codelabs will guide you through the steps to setup the environment, and show you how to use the Google Cloud built-in shell and code editor, to develop in the cloud. You should be a Java developer, as the examples are in Java, use the [LangChain4J](https://github.com/langchain4j/) project, and Maven for building the code. ### [Generative AI text generation in Java with PaLM and LangChain4J](https://codelabs.developers.google.com/codelabs/genai-text-gen-java-palm-langchain4j?hl=en#0) In the first [codelab](https://codelabs.developers.google.com/codelabs/genai-text-gen-java-palm-langchain4j?hl=en#0) you can explore: - how to make your first call to PaLM for simple question/answer scenarios - how to extract structured data out of unstructured text - how to use prompts and prompt templates - how to classify text, with an example on sentiment analysis ### [Generative AI powered chat with users and docs in Java with PaLM and LangChain4J](https://codelabs.developers.google.com/codelabs/genai-chat-java-palm-langchain4j?hl=en#0) In the second [codelab](https://codelabs.developers.google.com/codelabs/genai-chat-java-palm-langchain4j?hl=en#0) you'll use the chat model to learn: - how to create your first chat with the PaLM model - how to give your chatbot a personality, with an example with a chess player - how to extract structured data out of unstructured text using LangChain4J's AiServices and its annotations - how to implement Retrieval Augmented Generation (RAG) to answer questions about your own documentation ## Going further with Generative AI If you're interested in going further with Generative AI, and learn more, feel free to [join the Google Cloud Innovators program](https://goo.gle/generativeai). Google Cloud Innovators is **free** and includes: - live discussions, AMAs, and roadmap sessions to learn the latest directly from Googlers, - the latest Google Cloud news right in your inbox, - digital badge and video conference background, - and more. Go check what the [program offers](https://cloud.google.com/innovators?hl=en)! # Get Started with Gemini in Java Google announced today the availability of [Gemini](https://cloud.google.com/blog/products/ai-machine-learning/gemini-support-on-vertex-ai), its latest and more powerful Large Language Model. Gemini is **multimodal**, which means it's able to consume not only text, but also images or videos. I had the pleasure of working on the Java samples and help with the Java SDK, with wonderful engineer colleagues, and I'd like to share some examples of **what you can do with Gemini, using Java**! First of all, you'll need to have an account on Google Cloud and created a project. The Vertex AI API should be enabled, to be able to access the Generative AI services, and in particular the Gemini large language model. Be sure to check out the [instructions](https://cloud.google.com/vertex-ai/docs/generative-ai/start/quickstarts/quickstart-multimodal?hl=en). ## Preparing your project build To get started with some coding, you'll need to create a Gradle or a Maven build file that requires the Google Cloud libraries BOM, and the `google-cloud-vertexai` library. Here's an example with Maven: ```xml ... libraries-bom com.google.cloud import pom 26.29.0 com.google.cloud google-cloud-vertexai ... ... ``` ## Your first queries Now let's have a look at our first multimodal example, mixing text prompts and images: ```java try (VertexAI vertexAI = new VertexAI(projectId, location)) { byte[] imageBytes = Base64.getDecoder().decode(dataImageBase64); GenerativeModel model = new GenerativeModel("gemini-pro-vision", vertexAI); GenerateContentResponse response = model.generateContent( ContentMaker.fromMultiModalData( "What is this image about?", PartMaker.fromMimeTypeAndData("image/jpg", imageBytes) )); System.out.println(ResponseHandler.getText(response)); } ``` You instantiate `VertexAI` with your Google Cloud project ID, and the region location of your choice. To pass images to Gemini, you should either pass the bytes directly, or you can pass a URI of an image stored in a cloud storage bucket (like `gs://my-bucket/my-img.jpg`). You create an instance of the model. Here, I'm using `gemini-pro-vision`. But later on, a `gemini-ultra-vision` model will also be available. Let's ask the model to generate content with the `generateContent()` method, by passing both a text prompt, and also an image. The `ContentMaker` and `PartMaker` classes are helpers to further simplify the creation of more advanced prompts that mix different modalities. But you could also just pass a simple string as argument of the `generateContent()` method. The `ResponseHandler` utility will retrieve all the text of the answer of the model. Instead of getting the whole output once all the text is generated, you can also adopt a streaming approach: ```java model.generateContentStream("Why is the sky blue?") .stream() .forEach(System.out::print); ``` You can also iterate over the stream with a `for` loop: ```java ResponseStream responseStream = model.generateContentStream("Why is the sky blue?"); for (GenerateContentResponse responsePart: responseStream) { System.out.print(ResponseHandler.getText(responsePart)); } ``` ## Let's chat! Gemini is a multimodal model, and it's actually both a text generation model, but also a chat model. So you can chat with Gemini, and ask a series of questions in context. There's a handy `ChatSession` utility class which simplifies the handling of the conversation: ```java try (VertexAI vertexAI = new VertexAI(projectId, location)) { GenerateContentResponse response; GenerativeModel model = new GenerativeModel(modelName, vertexAI); ChatSession chatSession = new ChatSession(model); response = chatSession.sendMessage("Hello."); System.out.println(ResponseHandler.getText(response)); response = chatSession.sendMessage("What are all the colors in a rainbow?"); System.out.println(ResponseHandler.getText(response)); response = chatSession.sendMessage("Why does it appear when it rains?"); System.out.println(ResponseHandler.getText(response)); } ``` This is convenient to use `ChatSession` as it takes care of keeping track of past questions from the user, and answers from the assistant. ## Going further This is just a few examples of the capabilities of Gemini. Be sure to check out some of the [samples that are available on Github](https://github.com/GoogleCloudPlatform/java-docs-samples/tree/main/vertexai/snippets/src/main/java/vertexai/gemini). Read [more about Gemini and Generative AI](https://cloud.google.com/vertex-ai/docs/generative-ai/start/quickstarts/quickstart-multimodal?hl=en) in the Google Cloud documentation. # Generative AI in practice: Concrete LLM use cases in Java, with the PaLM API Large Language Models, available through easy to use APIs, bring powerful machine learning tools in the hands of developers. Although Python is usually seen as the _lingua franca_ of everything ML, with LLM APIs and LLM orchestration frameworks, complex tasks become easier to implement for enterprise developers. ## Abstract > Large language models (LLMs) are a powerful new technology that can be used for a variety of tasks, including generating text, translating languages, and writing different kinds of creative content. However, LLMs can be difficult to use, especially for developers who are not proficient in Python, the lingua franca for AI. So what about us Java developers? How can we make use of Generative AI? > > This presentation will go through how to use LLMs in Java without the need for Python. We will use the PaLM API, provided by Google Cloud’s Vertex AI services, to perform a variety of tasks, such as searching through documentation, generating kids stories, summarizing content, extracting keywords or entities, and more. > In our journey through demos, we'll discover LangChain4J, a wonderful LLM orchetratore for Java developers that simplifies the implementation of advanced LLM use cases. I had the chance to get this talk recorded at Devoxx Belgium: {{< youtube ioTPfL9cd9k >}} And you can check the slides here: {{< speakerdeck be0c44ac898f4ce5b905a8389bb751e2 >}} # Tech Watch #5 — November, 15, 2023 - Some friends shared this article from Uwe Friedrichsen, tilted [back to the future](https://www.ufried.com/blog/back_to_the_future/), that talks about this feeling of "déjà-vu", this impression that in IT we keep on reinventing the wheel. With references to mainframes, Uwe compared CICS to Lambda function scheduling, JCL to step functions, mainframe software development environments to the trendy platform engineering. There are two things I like about this article. First of all, it rings a bell with me, as we've seen the pendulum swing as we keep reinventing some patterns or rediscovering certain best practices, sometimes favoring an approach one day, and coming back to another approach the next day. But secondly, Uwe referenced Gunter Dueck who talked about spirals rather than a pendulum. I've had that same analogy in mind for years: rather than swinging on one side to the next and back, I always had this impression that we're circling and spiraling, but each time, even when passing on the same side, we've learned something along the way, and we're getting closer to an optimum, with a slightly different view angle, and hopefully with a better view and more modern practices. Last week at FooConf #2 in Helsinki, I was just talking with my friend [Venkat Subramaniam](https://agiledeveloper.com/aboutus.html) about this spiral visualisation, and I'm glad to see I'm not the only one thinking that IT is spiraling rather than swinging like a pendulum. - [Automerge-repo, a batteries included toolkit for building local-first applications](https://automerge.org/blog/2023/11/06/automerge-repo/)\ [Automerge](https://automerge.org/) is one of the most well-known CRDT algorithm (Conflict-Free Replicated Data Type) that allows you to implement collaborative applications (think Google Docs kind of collaboration, for example). With CRDT algorithms and data structures, concurrent changes on different devices can be merged automatically without requiring a central server, and without complex merge processes. However, having an algorithm and data structure is one thing, but to put the whole system in place is not necessarily easy. This new automerge-repo projects tries to solve this problem, by offering networking and storage adapters to facilitate the communication between the peers, or with a potential sync server. - The WebAssembly Garbage Collection proposal (WasmGC) lands in the latest Chrome version. The [V8 team dives into the details about WasmGC](https://v8.dev/blog/wasm-gc-porting). It'll be particularly useful to better support garbage collected languages (like Java and friends) without having to ship a garbage collector in each wasm package. - Although I'm not developing native apps for Macs, I spotted this article about an [open source implementation of Apple code signing and notarization](https://gregoryszorc.com/blog/2022/08/08/achieving-a-completely-open-source-implementation-of-apple-code-signing-and-notarization/), implemented in Rust, and that can run on non-Mac hardware. With this approach, when you're building native apps for the Mac, you can integrate that approach in your Linux-based CI/CD pipeline, without having a Mac box somewhere. - [Document summarization is an area where large language models excel](https://medium.com/google-cloud/langchain-chain-types-large-document-summarization-using-langchain-and-google-cloud-vertex-ai-1650801899f6). There are different approaches to do so when your context window can't fit the whole document to summarize. In this article, different approaches are mentioned: stuffing (when it fits in the context window), Map/Reduce to split the content in sections that can be summarised and a summary of summary can be made, and the more sequential Refine method where we summarize what fits in memory, and then ask to refine that first summary with the details of the following sections, till we run out of content. - Large Language Models face two big issues: one is hallucinations and how to mitigate them by grounding answers or finding ways to assess the response's factuality, and the other one is prompt injection, as a malignant attacker can misguide an LLM to do something else than what it was programmed for. The folks at Scott Logic developed a demo based on the idea of ImmersiveLabs' [online playground](https://prompting.ai.immersivelabs.com/) to experiment with prompt injection and techniques to circumvent them. There's also an [article](https://blog.scottlogic.com/2023/11/03/spy-logic.html) that talks about the project, and a [video](https://blog.scottlogic.com/2023/10/31/mitigating-prompt-injections.md.html) that shows it all in action. - My good friend Ken Kousen dives into [the magic of AI Services with LangChain4J](https://kousenit.org/2023/11/06/the-magic-of-ai-services-with-langchain4j/). He has a nice blog post, and also a great accompanying [video](https://www.youtube.com/watch?v=Bx2OpE1nj34) on YouTube where he shows some of the powerful features of LangChain4J, in particular the AI service that allows you to decorate an interface with annotations to interact with your large language model and get plain Java types or objects in return. - My colleague Romin Irani also [integrated LangChain4J and the PaLM 2 chat model](https://medium.com/google-cloud/integrating-langchain4j-and-palm-2-chat-bison-model-a684cefd67af), showing how to deploy a Google Cloud Function chatbot. - Baeldung also gives in [introduction to LangChain4J](https://www.baeldung.com/java-langchain-basics) showing the basics of prompts, models, memory management, retrieval, chains, and agents. - [LangChain4J using Redis](https://www.linkedin.com/pulse/langchain4j-using-redis-stephan-janssen-lobpe/): Stephan Janssen, the founder of Devoxx, is using [LangChain4J](https://github.com/langchain4j) inside the Devoxx CFP and schedule application. In this article on LinkedIn, he explains how he used Redis to store vector embeddings corresponding to the talks of the conference, to search for similar talks. # Tech Watch #4 — October, 27, 2023 - The [State of AI report](https://www.stateof.ai/) is pretty interesting to read (even if long!). Among the major sections: research, industry, but also politics, safety, and some predictions. You'll find an executive summary in one slide, on slide #8. \ \ On #22, **emergent capabilities of LLMs** is covered and mentions Stanford's research that talks about the importance of more linear and continuous measures as otherwise capabilities sound like they emerge out of the blue. \ \ On #23, they talk about the **context length of LLMs being the new parameter count**, as models try to have bigger context windows. \ \ However, on slide #24, they also talk about researchers who showed that **in long context windows the content provided in the middle is more ignored** by LLMs compared to content at the beginning or end of the window. \ So be sure to **put the important bits first or last**, but not lost in the middle. \ \ Slide #26 speaks about **smaller models trained with smaller curated datasets and can rival 50x bigger models**. \ \ Slide #28 wonders if we're **running out of human-generated data**, and thus, if we're going to have our LLMs trained on... LLM generated data! - [3D visualisation of vector embeddings from Tensorflow](https://projector.tensorflow.org/) \ As I'm working on a small application that would help visuliase vector embeddings, I was looking for existing apps or articles that show how vectors can be similar, and thus their semantic to be similar as well. And I came across this existing visualisation from the Tensorflow project, which uses the Word2Vec embedding approach. I like the fact you can use different 3D projections techniques like t-SNE or PCA, and you see related vectors closer in the 3D space, as their meaning is closer too. - [A cron extension for PostgreSQL](https://www.citusdata.com/blog/2023/10/26/making-postgres-tick-new-features-in-pg-cron/)\ pg_cron is an extension for the PostgreSQL database that adds scheduling capabilities. It can even be scheduled to run your procedures or other SQL queries every few seconds. - [Protomaps](https://protomaps.com/) is a free and open source map of the world, deployable as a single static file on cloud storage (including Google Cloud Storage). You can use OpenStreetMap tiles, as it's distributed with a version of OSM. It's using an efficient and open archive format for pyramids of tile data, accessible via HTTP Range requests. - [ArtistAssistApp](https://artistassistapp.com/) is an application which can tell you which oil or water color paints to use and mix to create similar looking colors for your painting, as you try to reproduce a photo. As a wannabe painter myself, I always struggle creating mixes that match real colors, and this tool is pretty clever to let you find the right mix (at least if you use some well-known paint brands). This also reminds me of [mixbox](https://scrtwpns.com/mixbox/) which simulates color mixing as real color pigments mix in real paint, and using such algorithm would greatly improve the real-life accuracy of color mixes in digital art painting applications. - [Vectorizer](https://vectorizer.ai/) is an online tool to transform an image into an SVG file. As I'm playing a bit with Generative AI-based image generation, sometimes, the upscalers don't suffice, and you want to transform a nice generated image into a vectorial format (for example clipart-like illustrations), so they scale gracefully in slide decks or on websites. # Tech Watch #3 — October, 20, 2023 - [Stop Using char in Java. And Code Points](https://horstmann.com/unblog/2023-10-03/index.html)\ It's a can of worms, when you start messing with chars, code points, and you're likely going to get it wrong in the end. As much as possible, stay away from chars and code points, and instead, use as much as possible the String methods like `indexOf()` / `substring()`, and some regex when you really need to find grapheme clusters. - Paul King shared his presentations on [Why use Groovy in 2023](https://speakerdeck.com/paulk/groovy-today) and an [update on the Groovy 5 roadmap](https://speakerdeck.com/paulk/groovy-roadmap)It's interesting to see how and where Groovy goes beyond what is offered by Java, sometimes thanks to its dynamic nature, sometimes because of its compile-time transformation capabilities. When Groovy adopts the latest Java features, there's always a twist to make things even groovier in Groovy! - [The State of WebAssembly in 2023](https://blog.scottlogic.com/2023/10/18/the-state-of-webassembly-2023.html)\ I often enjoy the articles from the folks at Scott Logic. This one is about a survey they ran on the topic of WebAssembly. Languages like Rust and JavaScript are seeing increased usage (for targeting wasm). Wasm is used a lot for web app development, but serverless seems to be he second most common use case, as well as for hosting plugin environments. The survey also mentions that threads, garbage collection and the new component model are the features developer are most interested in. For WASI, all the I/O related proposals like HTTP, filesystem support, sockets, are the ones developers want (although WASIX which covered this area received mixed reactions). - [Tell your LLM to take a deep breath!](https://arstechnica.com/information-technology/2023/09/telling-ai-model-to-take-a-deep-breath-causes-math-scores-to-soar-in-study/)\ We tend to humanize large language models via [anthropomorphism](https://en.wikipedia.org/wiki/Anthropomorphism), as much as we see human faces in anything like with [pareildolia](https://en.wikipedia.org/wiki/Pareidolia), although LLMs are neither sentients nor human. So it's pretty ironic that to get a better result in some logic problem solving, we need to tell the LLM to actually take a deep breath! Are they now able to breathe? - [Wannabe security researcher asks Bard for vulnerabilities in cURL](https://hackerone.com/reports/2199174)\ Large Language Models can be super creative, that's why we employ them to imagine new stories, create narratives, etc. And it seems wannabe security experts believe that what LLMs say is pure facts, probably what happened to this person that reported that they asked Bard to find a vulnerability in cURL! And Bard indeed managed to be creative enough to craft an hypothetical exploit, even explaining where a possible integer overflow could take place. Unfortunately, the generated exploit text contained many errors (wrong method signature, invented changelog, code that doesn't compile, etc.) - [LLMs confabulate, they don't hallucinate](https://www.beren.io/2023-03-19-LLMs-confabulate-not-hallucinate/)\ A few times, I've seen this mention on social networks about the fact we should say that LLM confabulate, instead of hallucinate. Confabulation is usually a brain disorder that makes people confidently tell things that may be true or not, in a convincing fashion (they don't even know it's false or a lie). Hallucination is more of a misinterpretation of the sensory input, like having the impression to see a pink elephant! The article linked above explains the rationale. - Greg Kamradt tweets about the [use cases for multimodal vision+text LLMs](https://twitter.com/GregKamradt/status/1711772496159252981)\ You'd think that you could just get a model that describes a picture as a text, and then mix that description with other text snippets. But models that really fully understand both images and texts are way more powerful than this. In this tweet, Greg distinguishes different scenarios: description, interpretation, recommendation, convertion, extraction, assistance and evaluatation. For example, we could imagine transforming an architecture diagram into a proper Terraform YAML file, or a UI mockup into a snippet of code that builds that UI for real. You You could show a picture of a dish, and ask for its recipe! - [The Story of AI Graphics at JetBrains](https://blog.jetbrains.com/blog/2023/10/16/ai-graphics-at-jetbrains-story/)\ I've always loved generative and procedural art, both for games and indeed for art. I really enjoyed this article which is going through the story of how they are generating their nice splash screens and animations for the JetBrains family of products. Neural networks at play here! # Tech Watch #2 — Oct 06, 2023 - [Generative AI exists because of the transformer\ ](https://ig.ft.com/generative-ai/)I confess I rarely read the Financial Times, but they have a really neat articles with animations on how large language models work, thanks to the transformer neural network architecture, an architecture invented by Google in 2017. They talk about text vector embeddings, how the self-attention makes LLM understand the relationship between words and the surrounding context, and also doesn't forget to mention hallucinations, how "grounding" and RLHF (Reinforcement Learning with Human Feedback) can help mitigate them to some extent. - [Generative AI in practice: Concrete LLM use cases in Java, with the PaLM API](https://www.youtube.com/watch?v=ioTPfL9cd9k&t=7s) (video)\ At Devoxx Belgium this week, the biggest theme of the conference was Generative AI and Large Language Models. The audience being mainly Java-focused, there was a very strong interest for Java developers to be able to take advantage of GenAI / LLMs in Java, instead of the ubiquitous Python. And all sessions along those lines were fully packed. The conference featured Microsoft's Java SemanticKernel, the open source LangChain4J project, or Spring's AI experimental module. The link above is the video of my presentation I did on using PaLM API but for the Java developer, using different approaches, and also with [LangChain4J](https://github.com/langchain4j/langchain4j). - [What "AI-Assistant for a Developer" is all about?](https://medium.com/google-cloud/what-ai-assistant-for-a-developer-is-all-about-723de644a449) and [An AI-assisted cloud? It's a thing now, and here are six ways it's already made my cloud experience better](https://seroter.com/2023/09/28/an-ai-assisted-cloud-its-a-thing-now-and-here-are-six-ways-its-already-made-my-cloud-experience-better/) are two articles from my colleagues Romin and Richard about how AI assistants will progressively make us, developers, more productive and stay in the flow. - [Heredoc notation in Dockerfiles\ ](https://www.docker.com/blog/introduction-to-heredocs-in-dockerfiles/)Did you know you can use the "heredoc" notation in Dockerfiles, like < - **RateLimit-Limit**: containing the requests quota in the time window; > - **RateLimit-Remaining**: containing the remaining requests quota in the current window; > - **RateLimit-Reset**: containing the time remaining in the current window, specified in seconds. Note that the draft mentions RateLimit-\* as headers, but often in the wild, I've seen those headers always prefixed with "X-" instead. And sometimes, some APIs add a hyphen between Rate and Limit! So it's hard to create a general consumer class that could deal with [all cases](https://stackoverflow.com/questions/16022624/examples-of-http-api-rate-limiting-http-response-headers). Those headers inform you about the quota, how much is left, and when the quota should be back to its full capacity (if you don't consume more requests). So you could certainly stage your requests accordingly --- we will talk about how to schedule your requests in Java in the second section. Another thing to keep in mind is that the quota may be shared among API consumers. Maybe you have several parallel threads that will call the API and consume the API quota. So when you see the reset header, maybe the API will have been called by another thread already, leaving you with a smaller amount of requests left in the quota. ### Exponential backoff and jitter The API that triggered my research actually doesn't provide any rate limitation headers. So another approach is needed. A classical approach is to use an exponential backoff. It was nicely [documented](https://aws.amazon.com/blogs/architecture/exponential-backoff-and-jitter/) on the AWS blog, a while ago. The idea is that when you face an over quota error, you're going to retry the call after, for example, one second. And if you're getting another error, you'll wait a little longer, by multiplying the interval by a constant, like doubling. So at first, on the first error, you wait 1 second before retrying, next 2 seconds, then 4 seconds, etc. You can use a fractional multiplier, of course. But as explained in the article, if all clients fail at the same time, they will retry roughly at the same time as well, after one, two, four seconds. So the idea is to add some randomness, the jitter, to more evenly spread out the retries, to avoid having new bursts of traffic at roughly the same moments. There's another good article on Baeldung about [exponential backoff and jitter using Resilience4J](https://www.baeldung.com/resilience4j-backoff-jitter) for your API consumers. ## Consuming a Java API Back to my use case, the underlying Web API I'm using doesn't feature rate limitation headers. And since there's a Java library that wraps that API anyway, I'd rather just use that Java API for convenience. When a rate limit is hit, the API will throw an exception. So I can catch that exception, and deal with it, maybe applying the same exponential backoff + jitter strategy. However, I know the rate limit of the API. So instead of eagerly calling the API as fast as possible, getting an exception, waiting a bit, and trying again... I'd rather just call the API at the pace I'm allowed to use it. Let's say I have a hypothetical API that takes a String as argument and returns a String: ```java public class RateLimitedApi { public String call(String arg) { return arg.toUpperCase();     } } ``` ### Sleeping a bit... A first, but naive, idea would be to just add some pause after each call: ```java for (int i = 0; i < 20; i++) {     api.call("abc");     Thread.sleep(100); } ``` And instead of making the same call with the same argument, you could iterate over an array or list: ```java for (String s : args) {     api.call(s);     Thread.sleep(100); } ``` Well, it works, but the API call takes some time as well, so you may have to adjust the sleep time accordingly, so it's not really ideal. The call could also be longer than the actual wait time really needed between two invocations. ### Scheduled execution A better approach would be to use Java's scheduled executors, with a few threads, in case of long API execution times that overlap. ```java try (var scheduler = Executors.newScheduledThreadPool(4)) {     var scheduledCalls = scheduler.scheduleAtFixedRate( () -> api.call("abc"), 0, 100, TimeUnit.MILLISECONDS); } ``` Instead of calling the API with the same argument, how would you call it for a series of different values, but then stop the scheduler once we're done with all the values? You could take advantage of some kind of queue (here a ConcurrentLinkedDeque) to pop the arguments one at a time. Once you've cleared all the elements of the queue, you shut down the scheduler altogether. ```java var args = new ConcurrentLinkedDeque<>(     List.of("a", "b", "c", "d", "e", "f", "g", "h", ..."x", "y", "z")); try (var scheduler = Executors.newScheduledThreadPool(4)) {     scheduler.scheduleAtFixedRate(() -> { if (!args.isEmpty()) {                 api.call(args.pop()); } else {                 scheduler.shutdown();         }     }, 0, 100, TimeUnit.MILLISECONDS); } ``` ### One more in the Bucket4J! In the introduction, I mentioned some great libraries like Resilience4J and Bucket4J. Let's have a look at an approach using [Bucket4J](https://bucket4j.com/). Scheduling is fine, to respect the rate limit, but you may perhaps want to get as many calls through as possible, while still respecting the rate. So a different approach is necessary. Bucket4J is based on the [token bucket algorithm](https://en.wikipedia.org/wiki/Token_bucket). It offers a very rich and fine-grained set of rate limit definitions, if you want to allow bursts, or prefer a regular flow (like our schedule earlier). Be sure to check out the [documentation](https://bucket4j.com/8.4.0/toc.html#quick-start-examples) for the details. Let's see how to define my limited consumption rate of 10 per second: ```java var args = List.of("a", "b", "c", "d", "e", ..."x", "y", "z"); var bucket = Bucket.builder()     .addLimit(Bandwidth.simple(10, Duration.ofSeconds(1)))     .build(); for (String arg : args) {     bucket.asBlocking().consumeUninterruptibly(1);     api.call(arg); } ``` It's pretty explicit: I create a limit that corresponds to a bandwidth of 10 tokens per second. With this simple strategy, the bucket is refilled greedily: every 100ms a new token will be available again. But it's also possible to configure it differently, to say that you want to allow another 10 calls once every second. Then I have a simple for loop to iterate over the list of arguments I must pass to the API, but I introduce an instruction that blocks until a token is available --- ie. that I have the right to call the API again while respecting the rate limit. Also beware of API calls that take a lot of time, as here we're using a blocking call that blocks the calling thread. So if API calls take longer than the time a new token is available in the bucket, you'll end up calling the API much less frequently than the allowed rate limit. However, with Bucket4J, the bucket can be used in a thread-safe manner, you can have several threads consuming from the same API, using the same shared bucket, or you can make parallel calls with a single consumer as well, to use the quota to its maximum. Let's use executors to parallelize our calls: ```java try (ExecutorService executor = Executors.newFixedThreadPool(4)) { for (String arg : args) {         bucket.asBlocking().consumeUninterruptibly(1);         executor.submit(() -> api.call(arg));     } } ``` Be careful though, that doing so, your API calls are not necessarily following the exact same order as the order of the input collection. In my case, I didn't care about the order of execution. Last little tweak we could make since Java 21 was released recently, we could make use of virtual threads, instead of threads! So let's push our example forward in the 21th century with this small change when creating our executor service: ```java Executors.newVirtualThreadPerTaskExecutor() ``` So far, we have only called the API without taking care of the returned result. We could update the examples above with an extra line to add the argument and result in a ConcurrentHashMap or to use the result immediately. Or we could also explore one last solution, using CompletableFutures and/or ExecutorCompletionService. But I'm not 100% satisfied with what I came up with so far. So I might update this article if I find a convenient and elegant solution later on. Time to wrap up! ## Summary In this article, we explored the less covered topic of consuming a rate-limited API. First, we discussed approaches for consuming Web APIs that are well-behaved, exposing rate limitation headers, and those less well-behaved using an exponential backoff and jitter approach. We then moved on to the case of Java APIs, doing a simple sleep to call the API at a cadence that respects the rate limit. We also had a look at scheduled executions. And we finished our journey with the help of the powerful Bucket4J library. # Tech Watch #1 — Sept 29, 2023 Inspired my by super boss [Richard Seroter](https://twitter.com/rseroter) with his regular [daily reading list](https://seroter.com/2023/09/28/daily-reading-list-september-28-2023-171/), I decided to record and share my _tech watch_, every week (or so). I always take notes of interesting articles I read for my own curiosity and to remember them when I need those references later on. But also to share them with [Les Cast Codeurs podcast](https://lescastcodeurs.com/)! So I hope it'll be interesting to my readers too! - [LLMs Demand Observability-Driven Development\ ](https://www.honeycomb.io/llms-demand-observability-driven-development)A great tribune from [Charity Majors](https://twitter.com/mipsytipsy) on the importance of observability-driven development, in the wake of large language models. Developing LLM based solutions is typically not something you can do with a classical test-driven approach, as you only really get proper test data when you have it coming from production usage. Furthermore, LLMs are pretty much unpredictable and underterministic. But with observability in place, you can better understand why there's latency in some scenarios, why the LLM came to certain solutions, and this will help you improve as your learn along the way. - [How LangChain rebuilt their LLM documentation chatbot\ ](https://blog.langchain.dev/building-chat-langchain-2/)For example: the choice of docs to parse: indexing source code didn't yield great results. Citing sources lets users dive deeper into the documentation and double check the LLM didn't hallucinate. Quality evaluation is important, to assess at each step of the process the impact of each change, each tweaks of your prompts, each change in the docs that are ingested. Also, how do you handle reindexing the documents, when there are changes in a document, when there are new pages to be indexed, or that disappear, to keep track of what has to be updated in the stored vector embeddings in vector store. A great trick about how to rephrase questions: sometimes you ask a question, ask a refinement, but you don't formulate a whole new question, so you can actually ask the LLM to reformulate a full question based on the conversation context, so as to find more meaningful similar text embeddings in the vector database. - [macOS containers\ ](https://macoscontainers.org/)You can run all sorts of Linux flavors inside containers, on all platforms, and even Windows containers. But with the macOS containers projects, that you can install with Homebrew, you can also install and run macOS containers. It's still early days for the project, it seems, and there are limitated container support in macOS itself, but it sounds promising. Also, macOS containers only run on top of macOS  itself. - [Using PostgreSQL for queuing\ ](https://adriano.fyi/posts/2023-09-24-choose-postgres-queue-technology/)With our without extensions, I see a lot of articles mentioning using PostgreSQL for everything! With the pgVector extension, for example, you can use Postgres as a vector database for storing parsed documents for your LLM use cases. In this article, the author suggests taking advantage of its pub/sub (with notify/listen) and row locking capabilities to implement queuing, and thus replacing other dedicated queuing components in your architecture. - [Use ControlNet with StableDiffusion's SDXL\ ](https://stable-diffusion-art.com/controlnet-sdxl/)You've probably all seen some cool images with some subliminal text appear, or with weird squares or spirals shapes, on social networks. This tutorial explains how you can guide StableDiffusion's SDXL model with ControlNet to shape particular picture generations, or to create pictures in the style of other pictures. - [Transformer.js\ ](https://huggingface.co/docs/transformers.js/index)A JavaScript transformer implementation that allows to load HuggingFace models, and do predictions and other LLM tasks right in the browser. - [JVector\ ](https://github.com/jbellis/jvector/)An open source Java project for fast vector search, used in Astra DB's vector search. This project was mentioned in TheNewStack's [article](https://thenewstack.io/5-hard-problems-in-vector-search-and-how-cassandra-solves-them/) on how Astra DB solves 5 typical problems of vector search. So for those who want to embed an Java vector store in their LLM use cases, this might be an option to look into, besides Lucene, for example. - [Mixing LLMs and a Knowledge Graphs\ ](https://www.marktechpost.com/2023/09/19/llms-knowledge-graphs/)Inherently, knowledge graphs have structured information and relationships, that LLM based projects can take advantage of. The article discusses different approaches and patterns to bind them together, to reduce hallucination, enhance transparency & interpretability. - [Tracing is better than logging!\ ](https://andydote.co.uk/2023/09/19/tracing-is-better/)It's often hard to figure out from logs what happened when a problem occurred. It's slightly better with structured logging to do some querying though. But with tracing, you can see correlations between traces, as they are nested, and see all the attributes you can attach to those spans, without mentioning the fact you can more easily understand where time is spent rather than just have a point in time like with a log statement. # Discovering LangChain4J, the Generative AI orchestration library for Java developers As I started my journey with Generative AI and Large Language Models, I've been overwhelmed with the omnipresence of Python. Tons of resources are available with Python front and center. However, I'm a Java developer (with a penchant for [Apache Groovy](https://groovy-lang.org/), of course). So what is there for me to create cool new Generative AI projects? When I built my first experiment with the [PaLM API](https://cloud.google.com/vertex-ai/docs/generative-ai/start/quickstarts/api-quickstart), using the integration within the Google Cloud's Vertex AI offering, I called the available [REST API](https://cloud.google.com/vertex-ai/docs/reference/rest), from my [Micronaut](https://micronaut.io/) application. I used Micronaut's built-in mechanism to marshal / unmarshal the REST API constructs to proper classes. Pretty straightfoward. You can learn more about this first app in my previous articles on [generating kid stories](https://glaforge.dev/posts/2023/06/08/creating-kids-stories-with-generative-ai/) and [how to get started with the PaLM API](https://glaforge.dev/posts/2023/05/30/getting-started-with-the-palm-api-in-the-java-ecosystem/) But soon after, I discovered that the Vertex AI Java SDK, which covers all products and services of Vertex AI, added support for the PaLM API thanks to a new [prediction service client](https://cloud.google.com/vertex-ai/docs/generative-ai/text/test-text-prompts#generative-ai-test-text-prompt-java) class. I was happy and decided to try it! So here's how making a simple call to the LLM looks like from Groovy: ```groovy @Grab('com.google.cloud:google-cloud-aiplatform:3.24.0') import com.google.cloud.aiplatform.v1beta1.* import com.google.protobuf.Value import com.google.protobuf.util.JsonFormat String instance = '''{ "prompt": "Tell me more about Large Language Models"}''' String parameters = '''{   "temperature": 0.2,   "maxOutputTokens": 256,   "topP": 0.95,   "topK": 40 }''' String project = "my-llm-java-demos" String location = "us-central1" String publisher = "google" String model = "text-bison" def predictionServiceSettings = PredictionServiceSettings.newBuilder() .setEndpoint("${location}-aiplatform.googleapis.com:443") .build() def predictionServiceClient = PredictionServiceClient.create(predictionServiceSettings) def endpointName = EndpointName.ofProjectLocationPublisherModelName(project, location, publisher, model) def instanceValue = Value.newBuilder() JsonFormat.parser().merge(instance, instanceValue) def instances = [instanceValue.build()] def parameterValueBuilder = Value.newBuilder() JsonFormat.parser().merge(parameters, parameterValueBuilder) def parameterValue = parameterValueBuilder.build() def resp = predictionServiceClient.predict(endpointName, instances, parameterValue) // resp[0].content println resp.predictionsList.first().structValue.fieldsMap['content'].stringValue  ``` You create a PredictionServiceSettings, then an EndpointName, and a PredictionServiceClient to call its `predict()` method. Not overly complicated to set up. However, there are really two things that I really dislike about this API: - Why are we parsing some JSON strings and creating some Protobuf structures? This isn't very developer friendly to me. - And then, it also returns some generic Protobuf structure response, that I have to navigate through to find the relevant bits I'm interested in, instead of letting me call something like `resp[0].content`. I'd rather have a proper set of Java classes that represent my prompt, my LLM settings, the response, etc. I was a bit disappointed and preferred the approach I took with REST marshalling / unmarshalling in my Micronaut application --- you can check the [code on Github](https://github.com/glaforge/bedtimestories). ![](https://avatars.githubusercontent.com/u/132277850?v=4) ## Here comes the delight, with LangChain4J! If you're following the Generative AI field, you'll have come across the [LangChain](https://www.langchain.com/) project. It's a Python (and Javascript) orchestrator framework to connect various building blocks: large language models, document loaders, text splitters, output parsers, vector stores to store text embeddings, tools, and prompts. With just a few lines of code, you're able to create some great integrations to implement your Generative AI use cases, like for example following the [Retrieval Augmented Generation](https://www.langchain.com/use-case/retrieval) pattern to create chat bots that talk with your documentation. Remember that I'm a Java developer? I played a bit with the Python version of LangChain (I didn't try the Javascript/Typescript variant) but I wasn't at ease with Python, and I didn't want to learn a whole new ecosystem to implement my Generative AI ideas. Fortunately, that's when I discovered the open source [LangChain4J](https://github.com/langchain4j) project! This is also an AI orchestrator framework, but for Java! It's very much inspired by the original LangChain project, but independent. So this is the perfect match for my programming language skills and Generative AI needs. Now, let's compare our protobuf-_obstruse_ example from earlier, with an equivalent one based on LangChain4J (this time I used the chat model instead of the text model): ```groovy @Grab('dev.langchain4j:langchain4j-vertex-ai:0.22.0') import dev.langchain4j.model.vertexai.* VertexAiChatModel vertexAiChatModel = VertexAiChatModel.builder()         .endpoint("us-central1-aiplatform.googleapis.com:443")         .project("my-llm-java-demos")         .location("us-central1")         .publisher("google")         .modelName("chat-bison@001")         .temperature(1.0)         .maxOutputTokens(256)         .topK(40)         .topP(0.95)         .maxRetries(3)         .build() def response = vertexAiChatModel.sendUserMessage( "What is the best Large Language Model?") println response.text() ``` It's very declarative and straightforward! I define my endpoint and my model settings with one builder. And then I just send messages to that chat model with just strings. And the response is also a simple string. LangChain4J has won my heart! ## What's next? I didn't stop there, I also built another Generative AI use case: I created a project that lets me ask questions about some documentation (in my case, I wanted to query the Apache Groovy documentation.) I'll tell you more about that project in a forthcoming article, as we dive deeper in LangChain4J, to cover text embeddings, vector stores, and more. I'll be covering this topic on Generative AI with Java at [Devoxx Belgium](https://devoxx.be/talk/?id=4452) next week, and [Devoxx Morocco](https://devoxx.ma/talk/?id=4901) the following one. But you can have a look already at some of the [more advanced examples](https://github.com/langchain4j/langchain4j-examples/tree/main/other-examples/src/main/java), to see how you can calculate vector embeddings locally with the all-MiniLM-L6-v2 embedding model, and store the vectors in a convenient in-memory vector store ([link](http://all_minilm_l6_v2)), how to do text classification ([link](https://github.com/langchain4j/langchain4j-examples/blob/main/other-examples/src/main/java/embedding/classification/EmbeddingModelTextClassifierExample.java)), how to talk chat with your documents with conversational retrieval chains ([link](https://github.com/langchain4j/langchain4j-examples/blob/main/other-examples/src/main/java/ChatWithDocumentsExamples.java)). LangChain4J is still young, but already pretty powerful, and offers integrations with VertexAI and OpenAI, with vector stores like [ChromaDB](https://www.trychroma.com/), [Pinecone](https://www.pinecone.io/) or [Weaviate](https://weaviate.io/) databases, and more. Be sure to [checkout LangChain4J](https://github.com/langchain4j) if you want to build your next Generative AI use case with Java! # Custom Environment Variables in Workflows In addition to the built-in [environment variables available by default](https://cloud.google.com/workflows/docs/reference/environment-variables) in Google Cloud [Workflows](https://cloud.google.com/workflows) (like the project ID, the location, the workflow ID, etc.) it’s now possible to define your own [custom environment variables](https://cloud.google.com/workflows/docs/use-environment-variables)! Why is it useful and important? It’s particularly handy when you want to read information that is dependent on the deployment of your workflow, like, for example, information about the environment you’re running in. Is my workflow running in development, staging, or production environment? Then you can read your custom `MY_ENVIRONMENT` variable, like you read the existing built-in environment variables. And you define such variables at deployment time. In our [best practices articles](https://cloud.google.com/blog/topics/developers-practitioners/workflows-patterns-and-best-practices-part-3#:~:text=Plan%20for%20multi%2Denvironment%20orchestrations), and in the more detailed article on [multi-environment service orchestration](https://cloud.google.com/blog/topics/developers-practitioners/multi-environment-service-orchestrations), my colleague [Mete](https://atamel.dev/) and I had shared ways to implement such an approach. You have a workflow that orchestrates some API calls. But you want to have one single workflow definition that can run in different environments. There were a few approaches, each with pros and cons, to do that: by passing the API URLs as parameters of the workflow execution (but it doesn’t work for event-triggered workflows), by replacing some special text tokens before deployment, or even with some string replacement when deploying with Terraform. But now, things are simpler! Let’s see how to **define custom environment variables** and how to **access them**. ## Calling an API endpoint in staging or in prod With custom environment variables, you can now deploy the exact same workflow definition, but with different variables, using the `sys.get_env()` built-in function: ```bash - call_a_service: call: http.get args: url: ${sys.get_env("SERVICE_URL"} ``` And on deployment, specify `SERVICE_URL` to point at the staging or the production URL of that service. ## Setting environment variables Now that we’ve seen how to access an environment variable, let’s see how you can set it. As explained in the documentation about [custom environment variables](https://cloud.google.com/workflows/docs/use-environment-variables), you can use different flags to define, update, delete those variables. You can specify one or more variables with `--set-env-vars`: ```bash gcloud workflows deploy WORKFLOW_NAME \ --set-env-vars KEY1=VALUE1,KEY2=VALUE2 ``` Use a file that contains all your custom environment variables (one `key:value` per line): ```bash gcloud workflows deploy WORKFLOW_NAME \ --env-vars-file FILE_PATH ``` You can also update vars with the `--update-env-vars` flag, remove some with `--remove-env-vars`, or delete them all with `--clear-env-vars`. ## Bonus tip: Use the `default()` built-in function in case the environment variable isn’t defined In case the environment variable wasn’t defined at deployment time, you can use the `default()` value method to set a default value, like in this example ```yaml - logEnv: call: sys.log args: text: ${default(sys.get_env("ENVIRONMENT"), "PRODUCTION")} ``` Here, we’re logging the value of the environment, but if the `ENVIRONMENT` custom environment variable isn’t defined, by default, the value will be `PRODUCTION`. ## Summary With custom environment variables, you can parameterize your workflows to tackle different use cases. One of the most frequently used ones is to use those environment variables to distinguish between different environments, like prod or staging. But you can also use them to configure different parameters of your workflow, like defining some configurable limits (number of retries), different endpoints or parameters for the services you call. You define your workflow once, and customize the deployment with different environment variables. # From Bird to Elephant: Starting a New Journey on Mastodon {{< speakerdeck 365d3694bd674bfa85d812d8c2fd32f9 >}} At Devoxx France and Devoxx Greece, I had the pleasure to talk about my new social media journey on [Mastodon](https://joinmastodon.org/). After a quick introduction about Mastodon and the [Fediverse](https://fediverse.party/), I contrasted the key differences between Twitter and Mastodon. Then I shared some advice on how to get started, how to chose an instance, or clients you can pick from. I moved on to important tips to get the best experience on the platform, and ensure to gather a great following: * introduce yourself with a detailed bio and first #introduction post * create your profile before following others * configure your account so it can be discoveredn and recommended to others * contrary to Twitter, you can verify yourself, showing that you own the personal links you share Once your account is ready, if you are migrating from Twitter, you might want to find who among your friends have also migrated, so you can follow them in the fediverse. I advise people not to delete their Twitter account, to avoid someone to pick up their old handle and to impersonate them. Another aspect I like about the Mastodon platform is that it seems to care deeply about accessibility, and about people's possible troubles. Putting _alt_ tags for images is highly encouraged on Mastodon. Or putting content warnings on text and images also helps prevent unwanted content to jump at your eyse unexpectedly. In a second part of the presentation, I spoke about the various standards and APIs underlying the Fediverse and Mastodon: * [ActivityPub](https://www.w3.org/TR/activitypub/) and [ActivityStream](https://www.w3.org/TR/activitystreams-core/) * [JSON-LD](https://json-ld.org/) * [WebFinger](https://webfinger.net/) * [MicroFormats](http://microformats.org/) * [HTTP Sigantures](https://oauth.net/http-signatures/) and [OAuth 2](https://oauth.net/2/) I went through an exchange between a client and a server, mimicking the actual process when you send a message to another recipient. This is basically how to implement a basic ActivityPub server, based on Eugen Rochko's great [post](https://blog.joinmastodon.org/2018/06/how-to-implement-a-basic-activitypub-server/) on the topic. The third and final part of the presentation was a comcrete demo on how to implement your own bots on Mastodon. I showed how to create a service (a [Micronaut](https://micronaut.io/) application) that [calculates the potential reach]({{< ref "/posts/2023/01/06/calculating-your-potential-reach-on-mastodon-with-google-cloud-workflows-orchestrating-the-mastodon-apis" >}}) of your posts. You can play with the service [online](https://stootistics.web.app/) and give it your account to see how popular your recent posts are. And there's even an account ([@getmyreach@tooters.org](https://tooters.org/@getmyreach)) you can ping on Mastodon to get back the most popular of your posts. The code is available on [Github](https://github.com/glaforge/stootistics) if you want to check it out. You can check the recording of the talk in English from Devoxx Greece: {{< youtube DafHAmlzWUM >}} And in French at Devoxx France: {{< youtube _BaK9BNlUHg >}} # Creating kids stories with Generative AI Last week, I wrote about how to [get started with the PaLM API in the Java ecosystem]({{< ref "/posts/2023/05/30/getting-started-with-the-PaLM-API-in-the-java-ecosystem" >}}), and particularly, how to overcome the lack of Java client libraries (at least for now) for the PaLM API, and how to properly authenticate. However, what I didn't explain was what I was building! Let's fix that today, by telling you a story, a kid story! Yes, I was using the trendy **Generative AI** approach to generate bedtime stories for kids. Without further ado, let me introduce you to my little app: [bedtime stories](https://bed-time-stories.web.app/). [![](/img/bedtime/bedtime-stories-ui.png)](https://bed-time-stories.web.app/) > If you're interested in the source code, head over to the Github [repository](https://github.com/glaforge/bedtimestories): > it's implemented in [Apache Groovy](https://groovy-lang.org/), developed with the [Micronaut](https://micronaut.io/) framework, > designed with the [Shoelace](https://shoelace.style/) web components, and deployed on Google [Cloud Run](https://cloud.run/), the serverless container runtime. ## The concept For a good story, we need 3 key ingredients: * a **character** — the main protagonist of the story whose adventures are narrated, like a princess, an astronaut, a firefighter... * a **setting** — where (and potentially when) the action takes place, like a beautiful kingdom, a faraway planet, a mysterious haunted house... * a **plot** — a rough idea of what's going to happen in the story, like an evil darkness is menacing the kingdom, huge shooting stars are menacing the planet... In the UI, there are a few options to pick from, by default, but you can actually customise them at will, or better, come up with your own characters, settings, and plots. You can play that game with your kids: *"hey, who should be the hero of our story tonight?"*. They may have an idea, or even a favorite character! Then, just click the `Generate` button, and after 20s or so, you'll have a story ready! ## Where Generative AI comes in Of course, the whole story is created thanks to Generative AI. I used the [PaLM API](https://cloud.google.com/vertex-ai/docs/generative-ai/learn/overview#palm-api) for that, within my Google Cloud project, using the [Vertex AI](https://cloud.google.com/vertex-ai/) suite of machine learning services. For the characters, settings, and plots, I came up with a few ideas on my own. But I felt like the choice was limited and would warrant some more creativity. So I asked [Bard](http://bard.google.com/) (powered by the PaLM API as well) to help me! It's all about the *art of prompting*, of asking the right question to your favorite generative AI: > Here are a few characters of bedtime kid stories: > - a funny little princess with a strong character > - a young astronaut exploring space > - a fearless firefighter > - a cute little cat with a long and silky fur > - a gentle dragon with a colorful skin > > Suggest a list of other possible characters: And lo and behold, I got a dozen suggestions that I then integrated in my character picker. Same thing for the settings and plots. Interestingly, not only Bard (or PaLM) would give me suggestions, but it also gave me explanations on why those characters, settings and plots made sense. So Generative AI is also here to help in the process of crafting your application, or your own prompts. ## The story generator prompt The crux of this story is the main prompt that makes the requests to generate the actual story. I wanted to have a familiar pattern or structure for my generated stories. You've probably heard about such common structures for narration, in 3 or 5 acts. I came across this [website](https://bubblecow.com/blog/importance-of-structure) about the 5-act approach, with an exposition phase, the rising action, the climax, the falling action, and the final denouement. So when crafting my prompt, first I started by telling PaLM who it was (a story teller), but then I also explained what a 5-act story act looks like, and finally, I asked it to generate a story for my particular chosen trio of character, setting, and plot. Here's my final prompt: > You are a creative and passionate story teller for kids. > Kids love hearing about the stories you invent. > > Your stories are split into 5 acts: > - Act 1 : Sets up the story providing any contextual background the reader needs, but most importantly it contains the inciting moment. This incident sets the story in motion. An incident forces the protagonist to react. It requires resolution, producing narrative tension. > - Act 2 : On a simplistic level this is the obstacles that are placed in the way of the protagonists as they attempt to resolve the inciting incident. > - Act 3 : This is the turning point of the story. It is the point of the highest tension. In many modern narratives, this is the big battle or showdown. > - Act 4 : The falling action is that part of the story in which the main part (the climax) has finished and you're heading to the conclusion. This is the calm after the tension of the climax. > - Act 5 : This is the resolution of the story where conflicts are resolved and loose ends tied up. This is the moment of emotional release for the reader. > > Generate a kid story in 5 acts, where the protagonist is ${character}, where the action takes place ${setting} and ${plot}. The fact of asking PaLM to structure the story that way also influences its textual output. Not only did it create those 5 key parts, but it also added some bold act labels in its output, which I could filter to then split my string story into 5 smaller chunks. ## Where to go from there? I hope you enjoyed the journey so far, and that you got a chance to generate your own story and tell it to a happy kid! However, for now at least, this is just a concept, so I'm not sure whether I'll be developing it much further, but I'd like to share possible ways to improve this application. * As I explained, it's a story in 5 acts, so we could offer the story over 5 distinct pages that you would have to turn. Vertex AI also features an image generation service (still in preview for now), so the **stories could also be decorated with AI generated pictures**! (We can even ask PaLM to generate ideas of prompts for image generation.) * Currently, PaLM can generate up-to 1024 characters of output, but it has 8KB in input. We can't generate a super long story, but since it's split in 5 acts, that can all fit in the input context window, we could try to pass PaLM the whole generated story, and ask it 5 times to generate 1KB characters for each section, thus **lengthening the whole story by a factor of 5**. * To go even further, we could use the multilingual capabilities of large language models, or at least just the Translate API, to offer **translations of the app and the stories** into different languages. * We could also imagine the app being able to narrate the story itself, by taking advantage of **Text-to-Speech voice generation**! However you might miss on the great bonding opportunity with your kids when you tell them a story, but on the other hand, kids could entertain themselves when you're busy by generating random stories on their own. * Maybe we could also **save stories** that pleased our kids (and reshare them with others), as each generation, even with the same trio of character/setting/plot, can yield very diverse outcomes. So there are plenty options possible offered by Generative AI! # Just a handy command-line tool When developing new projects on my laptop, I often run some commands over and over again. Regardless of how far you've gone with your CI/CD pipelines, running commands locally without resorting to becoming a bash ninja can be pretty easy with... `just`! > [`just`](https://just.systems/) is a handy way to save and run project-specific commands It's a command-line tool that lets you define some commands to run (called recipes), in the form of a Makefile-inspired syntax. It even allows you to define dependencies between the various tasks of your `justfile`. It runs across all environments (Mac, Linux, Windows), and is quick to install. It loads `.env` files in which you can define variables specific to your project (other developers can have the same `justfile` but have variables specific for their projects) Without further ado, let's see it in action in my current project. In my project, I have the following `justfile`: ```make set dotenv-load alias r := run alias b := build alias d := deploy default: run run: ./gradlew -t run build: gcloud builds submit -t $CLOUD_REGION-docker.pkg.dev/$PROJECT_ID/containers/$CONTAINER_NAME:v1 deploy: build gcloud run deploy bedtimestories --image=$CLOUD_REGION-docker.pkg.dev/$PROJECT_ID/containers/$CONTAINER_NAME:v1 ``` * The first instruction tells `just` to load an `.env` file. * The (optional) alias lines allow me to define shorcuts for commands that I run very often * There are three commands: `run`, `build`, and `deploy`: * `run` will run my application locally with `gradle` * `build` will containerize my app with my `Dockerfile` on Google Cloud Build * `deploy` depends on `build` and will deploy my container on Google Cloud Run And now, I _just_ run: `just run`, `just deploy`, or their shortcuts: `just r` or `just d`. You also noticed the dollar variables which are interpolated from my `.env` file which contains the following variables: ```bash PROJECT_ID=some-project-id CLOUD_REGION=us-central1 CONTAINER_NAME=some-container-name ``` It's `just` a new little handy tool in my toolbox! Go check it out: [just.systems](https://just.systems/). And have a look at the [cheat sheet](https://cheatography.com/linux-china/cheat-sheets/justfile/) for more examples and syntax. # Getting started with the PaLM API in the Java ecosystem Large Language Models (LLMs for short) are taking the world by storm, and things like ChatGPT have become very popular and used by millions of users daily. Google came up with its own chatbot called [Bard](https://bard.google.com/), which is powered by its ground-breaking [PaLM 2](https://ai.google/discover/palm2/) model and API. You can also find and use the PaLM API from withing Google Cloud as well (as part of [Vertex AI Generative AI](https://cloud.google.com/vertex-ai/docs/generative-ai/learn/overview) products) and thus create your own applications based on that API. However, if you look at the documentation, you'll only find Python tutorials or notebooks, or also explanations on how to make cURL calls to the API. But since I'm a Java (and Groovy) developer at heart, I was interested in seeing how to do this from the Java world. ## Micronaut + Groovy + Cloud Run My use case was to create a simple application that generates bedtime kid stories, using the PaLM LLM. I went ahead and decided to use [Micronaut](https://micronaut.io) for my framework, and [Apache Groovy](https://groovy-lang.org) for my programming language. I containerize and deploy my application on [Cloud Run](https://cloud.run) on Google cloud. And I use the [Cloud Run integration for Firebase](https://cloud.google.com/run/docs/integrate/firebase-hosting) to have a nice domain for my app, and to serve my static content from Firebase's CDN. I won't cover these aspects too much in this article, but I want to stress the important roadblock you might encounter: authentication. ## Let's get started! First, you may not necessarily have access to the Generative AI services in Google Cloud. For that, you'll need to [sign up](https://cloud.google.com/ai/generative-ai) to join the Trusted Tester Program. But once you have access, you'll be able to use the PaLM API programmatically for your own apps. When experimenting with prompts to the LLM, you'll notice the handy sliding panel that shows you how to interact with the API from code. But you only have the choice between Python and cURL. That said, the cURL command helps you figure out how to call the API via REST: ```bash curl \ -X POST \ -H "Authorization: Bearer $(gcloud auth print-access-token)" \ -H "Content-Type: application/json" \ "https://${API_ENDPOINT}/v1/projects/${PROJECT_ID}/locations/us-central1/publishers/google/models/${MODEL_ID}:predict" -d \ $'{ "instances": [ { "content": "Write a kid story about an astronaut visiting another galaxy but facing problems with shooting stars" } ], "parameters": { "temperature": 0.5, "maxOutputTokens": 1000, "topP": 0.8, "topK": 40 } }' ``` We have the JSON structure in input, and if you call that command, you'll get an output similar to the following one: ```json { "predictions": [ { "safetyAttributes": { "scores": [ 0.10000000149011612 ], "blocked": false, "categories": [ "Violent" ] }, "content": "Once upon a time, there was a young astronaut called..." } ] } ``` Our Micronaut will have to marshall/unmarshall those input and output JSON documents. But the tricky bit for me was authentication. From the command-line, the embedded `gcloud` command makes use of an access token, which grants you access to the PaLM API. But from my Micronaut/Groovy code, I needed to find a way to authenticate as well. ## Preparing a low-level HTTP client call Let's craft the appropriate REST endpoint URI: ```groovy def uri = UriBuilder .of("/v1/projects/${projectId}/locations/us-central1/publishers/google/models/text-bison:predict") .scheme("https") .host("us-central1-aiplatform.googleapis.com") .build() ``` Currently, the API is only available in the `us-central1` region, so it's hard-coded. Then we need to prepare the request: ```groovy def request = HttpRequest .POST(uri, [ instances: [ [ content: storyPrompt ] ], parameters: [ temperature: 0.6, maxOutputTokens: 1000, topP: 0.8, topK: 40 ] ]) .bearerAuth(token) .accept(MediaType.APPLICATION_JSON_TYPE) .contentType(MediaType.APPLICATION_JSON_TYPE) ``` In a moment, we'll see how we can create the bearer `token` we use in the `bearerAuth()` call. Here, we just send the prompt, with some parameters to say how creative we want the LLM answer to be. Finally, we make the request: ```groovy def predictionResponse = client.toBlocking() .exchange(request, PredictionResponse) .body() return predictionResponse.predictions.first().content ``` I created some classes to unmarshall the resulting JSON: ```groovy import com.fasterxml.jackson.annotation.JsonProperty import io.micronaut.serde.annotation.Serdeable @Serdeable class PredictionResponse { @JsonProperty("predictions") List predictions } @Serdeable class Prediction { @JsonProperty("safetyAttributes") SafetyAttributes safetyAttributes @JsonProperty("content") String content } @Serdeable class SafetyAttributes { } ``` ## Authenticating When running my application locally, no problem, but once deployed, I needed to have a fresh bearer token. I created a dedicated service account, with the minimum needed permissions: * `roles/aiplatform.user` to have the rights to call the PaLM API * `roles/logging.logWriter` as your Cloud Run app needs to write some logs back to Cloud Logging This [article](https://medium.com/google-cloud/generative-ai-palm-2-model-deployment-with-cloud-run-54e8a398b24b) also nicely explains how to handle deployment to Cloud Run. My Cloud Run service will be deployed with that service account. Locally, on my laptop, I used the `GOOGLE_APPLICATION_CREDENTIALS" approach, by exporting a JSON key, and point at it via an environment variable: ```bash GOOGLE_APPLICATION_CREDENTIALS="exported-key.json" ``` You can learn more about local development with [Application Default Credentials](https://cloud.google.com/docs/authentication/provide-credentials-adc#local-dev). So locally, we use that exported key, and locally we use a generated token from the restricted service account. And to generate that token, I had to use the [google-auth-library-oauth2-http](https://github.com/googleapis/google-auth-library-java#google-auth-library-oauth2-http) Here's the missing snippet to do so: ```groovy def credentials = GoogleCredentials.applicationDefault .createScoped('https://www.googleapis.com/auth/cloud-platform') credentials.refreshIfExpired() def token = credentials.accessToken.tokenValue ``` To import that authentication library in my project, I defined its requirement in my `build.gradle` file: ```groovy implementation('com.google.auth:google-auth-library-credentials:1.17.0') ``` ## Voilà! With the right authentication client library, I was able to create the beared token needed to authenticate to the Vertex AI PaLM API, both locally on my laptop, and once deployed on Cloud Run as well. Hopefully, when Google releases official Java client libraries, it'll certainly be easier to interact with the PaLM API, without having to create marshalling/unmarshalling code, and will likely make it smoother to authenticate transparently. So stay tuned! # Exploring Open Location Code ![](/img/misc/eiffel-tower-plus-code.png) When using Google Maps, you might have seen those strange little codes, as in the screenshot above. This is a _plus code_, or to use the more official name, an **Open Location Code**. It's a way to encode a location in a short and (somewhat) memorable form. In countries like France, every house has an official address, so you can easily receive letters or get some parcel delivered. But there are countries where no such location system exists, so you have to resort to describing where you live (take this road, turn right after the red house, etc.) Of coursse, you could use GPS coordinates, but that's not very convenient to share, and nobody could remember a precise address. So there have been several attemps at creating systems that represent any location in the world, like [GeoHash](http://geohash.org/), [MapCode](https://www.mapcode.com/), and other proprietary systems like 3-words. Out of curiosity, I wanted to play with this geo-encoding approach, and decided to spend a few minutes playing with the available Java library (available on [Maven Central](https://central.sonatype.com/artifact/com.google.openlocationcode/openlocationcode/1.0.4)), but using [Apache Groovy](https://groovy-lang.org/). You'll find more links on the topic at the end of this article. ## Playing with plus codes in Groovy Here's a little script that shows the library in action: ```groovy @Grab("com.google.openlocationcode:openlocationcode:1.0.4") import com.google.openlocationcode.OpenLocationCode // Eiffel Tower def (lat, lon) = [48.8584, 2.29447] def eiffelTowerPlusCode = OpenLocationCode.encode(lat, lon) println "Eiffel Tower +code: ${eiffelTowerPlusCode}" def decoded = OpenLocationCode.decode('8FW4V75V+9Q') println "Original coord: ${decoded.centerLatitude}, ${decoded.centerLongitude}" ``` (You can play and run the above script in the [Groovy Web Console](https://gwc-experiment.appspot.com/?g=groovy_4_0&gist=4176e0ad13b396001c92ab5cd584b3d8)) ## More information * The official [Open Location Code website](https://maps.google.com/pluscodes/) * The project [open-location-code project on Github](https://github.com/google/open-location-code) that provides implementations of the algorithm in various programming languages * A [comparison](https://github.com/google/open-location-code/wiki/Evaluation-of-Location-Encoding-Systems) of different geocoding / geohashing systems * The French wikipedia [page on Open Location Code](https://fr.wikipedia.org/wiki/Open_Location_Code) shows visually how the world map is cut in smaller boxes, as you zoom in, and takes the example of the Eiffel Tower like in my script above * For reference, the English wikipedia [page](https://en.wikipedia.org/wiki/Open_Location_Code), but it's a little less detailed and visual # cURL's --json flag As cURL was celebrating its [25th birthday](https://daniel.haxx.se/blog/2023/03/20/twenty-five-years-of-curl/), I was reading Daniel Stenberg's story behind the project, and discovered a neat little feature I hadn't heard of before: the `--json` flag! Daniel even [blogged](https://daniel.haxx.se/blog/2022/02/02/curl-dash-dash-json/) about it when it landed in cURL 7.82.0 last year. So what's so cool about it? If you're like me, you're used to post some JSON data with the following verbose approach: ```bash curl --data '{"msg": "hello"}' \ --header "Content-Type: application/json" \ --header "Accept: application/json" \ https://example.com ``` You have to pass the data, and also pass headers to specify the content-type. You can make it slightly shorter with the one-letter flags: ```bash curl -d '{"msg": "hello"}' \ -H "Content-Type: application/json" \ -H "Accept: application/json" \ https://example.com ``` But with the recent addition of this flag, it's much shorter, as you don't have to specify the mime types: ```bash curl --json '{"msg": "hello"}' https://example.com ``` It's available since version 7.82.0, and on my Mac laptop, I have version 7.86.0. For reference, here's the excerpt of the manual that gives the details about the `--json` flag: ``` --json (HTTP) Sends the specified JSON data in a POST request to the HTTP server. --json works as a shortcut for passing on these three options: --data [arg] --header "Content-Type: application/json" --header "Accept: application/json" There is no verification that the passed in data is actual JSON or that the syntax is correct. If you start the data with the letter @, the rest should be a file name to read the data from, or a single dash (-) if you want curl to read the data from stdin. Posting data from a file named 'foobar' would thus be done with --json @foobar and to instead read the data from stdin, use --json @-. If this option is used more than once on the same command line, the additional data pieces will be concatenated to the previous before sending. The headers this option sets can be overridden with --header as usual. --json can be used several times in a command line Examples: curl --json '{ "drink": "coffe" }' https://example.com curl --json '{ "drink":' --json ' "coffe" }' https://example.com curl --json @prepared https://example.com curl --json @- https://example.com < json.txt ``` # Tip: Visualize output in the Groovy Console For some scripting tasks, my favorite go-to tool is the **Groovy Console**, and writing code with [Apache Groovy](https://groovy-lang.org/). Usually, you just spill some `println` calls all over the place to display some textual information. But there's a little known secret. Not really secret though, as it's properly [documented](http://docs.groovy-lang.org/2.4.1/html/documentation/tools-groovyconsole.html#GroovyConsole-Visualizingscriptoutputresults). It's possible to display images (like `BufferedImage` or its parent `java.awt.Image`) or all sorts of rich components (from the `Swing` UI toolkit, like `JPanel`, `JLabel`, etc.) For example, to display an image in the output pane of my Groovy Console, I can load it up via an `ImageIcon`: ```groovy import javax.swing.* def url = "https://pbs.twimg.com/profile_images/1590794600867893271/ttqX3njd_400x400.jpg".toURL() new ImageIcon(url) ``` For that purpose, you'll have to ensure that the `View > Visualize Script Results` is enabled, as shown in the picture below: ![](/img/misc/groovy-console-visualize-output.png) As mentioned in the [documentation](http://docs.groovy-lang.org/2.4.1/html/documentation/tools-groovyconsole.html#GroovyConsole-Visualizingscriptoutputresults), you could for example display maps or lists as nice Swing `JTable`. Or for some data visualisation, you could also used any Java libraries that output images or that can be embeded in Swing components, like the venerable JFreeChart library (ie. here's a StackOverflow question that shows that JFreeChart charts can be embedded in Swing components like `JPanel`). # Some custom VS Code settings I regularly use both [IntelliJ IDEA](https://www.jetbrains.com/idea/) and [Visual Studio Code](https://code.visualstudio.com/) as my environments for developing. But like all tools, we often need to personalise them to our liking, to feel at ease, or to be more productive. As we read code more than we write, there are certain settings in your favorite editor to improve your reading experience. Today, I'll share of the tweaks I've made to my VS Code settings. You can edit some of the settings by opening the UI of the settings dialog box, but you can also edit the `JSON` file in which those settings are saved. On a Mac, for example, the `settings.json` file is stored in `~/Library/Application Support/Code/User/`. Here are my current custom settings, compared to the default configuration: ```json { "security.workspace.trust.untrustedFiles": "open", "window.title": "${folderPath} ${separator} ${activeEditorShort}", "breadcrumbs.enabled": true, "workbench.colorCustomizations": { "[Default Dark+]": { "editor.lineHighlightBackground": "#404020FF", "editorLineNumber.activeForeground": "#ffff00", }, "[Default Light+]": { "editor.lineHighlightBackground": "#FFEEEE", "editorLineNumber.activeForeground": "#ff0000", } }, "workbench.tree.renderIndentGuides": "always", "workbench.tree.indent": 10, "editor.fontSize": 12, "editor.fontFamily": "Fira Code", "editor.fontLigatures": true, "editor.formatOnPaste": true, "editor.guides.bracketPairs": true, "files.trimTrailingWhitespace": true, "editor.mouseWheelZoom": true } ``` Let's go through them: * `security.workspace.trust.untrustedFiles` — allows to open a file not part of the project without warning * `window.title` — for a custom window title, with the root of the project, and the name of the current opened file * `breadcrumbs.enabled` — to display a breadcrumbe to see where I'm located in the file tree, and inside the file's structure itself * `editor.lineHighlightBackground` and `editorLineNumber.activeForeground` — I've customised the colors of the current line the cursor is on, as well as the line number in the gutter, to make it stand out more on my dark background * `workbench.tree.renderIndentGuides` and `workbench.tree.indent` — adds the little vertical bars in the file explorer, to visually see the current level or depth in the directory structure, and specifies how far apart they should be * `editor.fontSize` — customize the default font size (but it may already be 12 by default) * `editor.fontFamily` and `editor.fontLigatures` — I'm using Fire Code for my coding font, and I enable the ligatures to have nice looking operators * `editor.formatOnPaste` — to automatically format the code that you pase, without having to manually reident it * `editor.guides.bracketPairs` — for drawing a little vertical line that highlights the current scope of the block my cursor is in * `files.trimTrailingWhitespace` — I like to trim the trailing whitespace at the end of a line automatically (to avoid some dummy commit because of a remaining space, for instance) * `editor.mouseWheelZoom` — to allow mousewheel zoom in and out to increase / decrease font size # New blog location I started blogging 20 years ago, in April 2003. My first blog engine was a PHP CMS, called [Nucleus](http://nucleuscms.org/). I was hosting it on my ISP, at free.fr. Then in 2011, I wrote my own blog engine, called [Bloogaey](https://github.com/glaforge/bloogaey), which was written in [Groovy](https://groovy-lang.org/), using my little [Gaelyk](http://gaelyk.appspot.com/) web framework, and running on [App Engine](https://cloud.google.com/appengine). As it became a bit painful to properly format my blog posts, and evolve my blog engine, I decided I should move to something that is more static, with a static site generator that eats Markdown files: I chose the [Hugo](https://gohugo.io/) static site generator that I used in some previous projects. And I'm now hosting the content of my blog in [Github Pages](https://github.com/glaforge/glaforge.github.io), under the [glaforge.dev](https://glaforge.dev/) custom domain name. ## I'll stop blogging on the old website, and will write my new articles on that new URL. I'll try to redirect all the links back to this new location, and also update the RSS feed for those who are subscribed via Feedburner. But if you want to update your feed reader manually, here's the new URL for the feed:  [![](/img/misc/glaforge-dev-screenshot.png)](https://glaforge.dev/) # Google Cloud Workflows API automation, patterns, and best practices * Workflows at a glance, benefits, key features, use cases * UI interface in Google Cloud console * Deep dive into the Workflows syntax * Workflows connectors * Demos * Patterns and best practices {{< speakerdeck 9957977f37554ccdacf91f0859592bae >}} # Calculating your potential reach on Mastodon with Google Cloud Workflows orchestrating the Mastodon APIs With the turmoil around Twitter, like many, I’ve decided to look into [Mastodon](https://joinmastodon.org/). My friend [Geert](https://uwyn.net/@gbevin) is running his own Mastodon server, and welcomed me on his instance at: [uwyn.net/@glaforge](https://uwyn.net/@glaforge). With Twitter, you can access your [analytics](https://analytics.twitter.com/about) to know how your tweets are doing, how many views you’re getting. Working in developer relations, it’s always interesting to get some insights into those numbers to figure out if what you’re sharing is interesting for your community. But for various (actually good) reasons, Mastodon doesn’t offer such detailed analytics. However, I wanted to see what the [Mastodon APIs](https://docs.joinmastodon.org/api/) offered. ## How to calculate your potential reach Your _“toots”_ (ie. your posts on Mastodon) can be “boosted” (equivalent of a retweet on Twitter). Also, each actor on Mastodon has a certain **number of followers**. So potentially, one of your toots can reach all your followers, as well as all the followers of the actors who reshare your toot. So the maximum potential reach of one of your posts would correspond to the following equation: ``` potential_reach = me.followers_count + ∑ ( boosters[i].followers_count ) ``` Let’s play with the Mastodon APIs to compute your reach Fortunately, the Mastodon APIs allow you to get those numbers, albeit not with a single API call. Let’s have a look at the interesting endpoints to get the potential reach of my most recent posts. First of all, I’ll look up my account on the Mastodon instance that hosts me: ``` GET https://uwyn.net/api/v1/accounts/lookup?acct=glaforge ``` I pass my account name as a query parameter to the /accounts/lookup endpoint. In return, I get a JSON document that contains various details about my account and me (I’ll just show some of the interesting fields, not the whole payload): ```json { id: "109314675907601286", username: "glaforge", acct: "glaforge", display_name: "Guillaume Laforge", ... note: "...", url: "https://uwyn.net/@glaforge", ... followers_count: 878, fields: [...] } ``` I get two important pieces of information here: the followers_count gives me, you guessed it, the number of followers my account has, thus the potential number of persons that can see my toots. Also the id of my account, which I’ll need for some further API calls further down. To get the most recent statuses I’ve posted, I’ll indeed need that account id for crafting the new URL I’ll call: ``` GET https://uwyn.net/api/v1/accounts/109314675907601286/statuses ``` This call will return a list of statuses (again, snipped less interesting part of the payload): ```json [ … { id: "109620174916140649", created_at: "2023-01-02T14:52:06.044Z", … replies_count: 2, reblogs_count: 6, favourites_count: 6, … edited_at: null, content: "...", reblog: null, … }, … ] ``` In each status object, you can see the number of replies, the number of times the post was reshared or favorited, or whether it’s a reshared toot itself. So what’s interesting here is the reblogs_count number. However, you don’t get more details about who reshared your toot. So we’ll need some extra calls to figure that out! So for each of your posts, you’ll have to call the following endpoint to know more about those “reblogs”: ``` GET https://uwyn.net/api/v1/statuses/109620174916140649/reblogged_by ``` This time, you’ll get a list of all the persons who reshared your post: ```json [ { id: "123456789", username: "...", acct: "...", display_name: "...", ... followers_count: 7, ... }, ... ] ``` And as you can see the details of those persons also have the followers_count field, that tells the number of people that follow them. So now, we have all the numbers we need to calculate the potential reach of our toots: your own number of followers, and the number of followers of all those who reshared! It doesn’t mean that your toots will actually be viewed that many times, as one doesn’t necessarily read each and every toots on their timelines, but at least, that’s an approximation of the maximum reach you can get. ## Automating the potential reach calculation with Web API orchestration Initially I played with both cURL and a little Apache Groovy [script](https://gwc-experiment.appspot.com/?g=groovy_4_0&codez=eJx1kl9PwjAUxd_3KRqeNjUtoAlsCTHGhAfjg0F8MoaUcQczY3dp78DF-N0tLRv_xp66c-85t_216bpARWypEDcV_9aY8xvPW0DCdms2Yjls2YtZvmelKkD5gcfMZzsIkT5UZpo6K6JCR0KU2yrnOZCQRSo2PaFJUqlBi143vB_2hmE4fOiH_cEg7HZsRiGVhsXUJJkYO94qU_ghf5_PCT8mr37AyYiB14x-Q02gxphluAWljf8QxmUcY5kTT-ryzP6zx4h1m4Sx3KBKCc68iZFLq7eZJjDPcHlmUU487rcGp08bUDWyW9YRrraExWxeHcNwA56UktUlk5PANjJNagPm2e5pdJnOY9MCMbFflraTYn9cl2s_2J9IAZUqZ5_2CSQ1vOiU5Z2r1mlR2125nj206BirqxCSzJwyPgRdOZqzFEiQU7qzyXjVOtVQvxLhfXneP1NZD_M) to better understand the Mastodon APIs to figure out how to chain them to get to the expected result. Then I decided to automate that series of Web API calls using an API orchestrator: [Google Cloud Workflows](https://cloud.google.com/workflows). To recap, we need to: * Get the details of your account * Get the recent posts for that account * Get all the followers count for each person who reshared each post Let’s have a look at this piece by piece: ```yaml main: params: [input] steps: - account_server_vars: assign: - account: ${input.account} - server: ${input.server} - prefix: ${"https://" + server + "/api/v1"} - impact_map: {} ``` First, the workflow takes an account and server arguments, in my case that is glaforge and uwyn.net. And I’m defining a variable with the base path of the Mastodon API, and a dictionary to hold the data for each toot. ```yaml - find_account_id: call: http.get args: url: ${prefix + "/accounts/lookup"} query: acct: ${account} result: account_id_lookup - account_id_var: assign: - account_id: ${account_id_lookup.body.id} - followers_count: ${account_id_lookup.body.followers_count} ``` Above, I’m doing an account lookup, to get the id of the account, but also the followers count. ```yaml - get_statuses: call: http.get args: url: ${prefix + "/accounts/" + account_id + "/statuses"} query: limit: 100 exclude_reblogs: true result: statuses ``` We get the list of most recent toots. Now things get more interesting, as we need to iterate over all the statuses. We’ll do so in parallel, to save some time: ```yaml - iterate_statuses: parallel: shared: [impact_map] for: value: status in: ${statuses.body} ``` To parallelize the per-status calls, we just need to state it’s parallel, and that the variable we’ll keep our data in is a shared variable that needs to be accessed in parallel. Next, we define the steps for each parallel iteration: ```yaml steps: - counter_var: assign: - impact: ${followers_count} - fetch_reblogs: call: http.get args: url: ${prefix + "/statuses/" + status.id + "/reblogged_by"} result: reblogs ``` Above, we get the list of people who reshared our post. And for each of these accounts, we’re incrementing our impact counter with the number of their followers. It’s another loop, but that doesn’t need to be done in parallel, as we’re not calling any API: ```yaml - iterate_reblogs: for: value: reblog in: ${reblogs.body} steps: - increment_reblog: assign: - impact: ${impact + reblog.followers_count} - update_impact_map: assign: - impact_map[status.url]: ${impact} ``` And we finish the workflow by returning the data: ```yaml - returnOutput: return: id: ${account_id} account: ${account} server: ${server} followers: ${followers_count} impact: ${impact_map} ``` This will return an output similar to this: ```json { "account": "glaforge", "followers": 878, "id": "109314675907601286", "impact": { "https://uwyn.net/@glaforge/109422399389341013": 945, "https://uwyn.net/@glaforge/109462120695384207": 1523, "https://uwyn.net/@glaforge/109494881278500194": 121385, "https://uwyn.net/@glaforge/109495686235916646": 878, "https://uwyn.net/@glaforge/109516968335141401": 1002, "https://uwyn.net/@glaforge/109523829424569844": 878, "https://uwyn.net/@glaforge/109528949144442301": 896, "https://uwyn.net/@glaforge/109620174916140649": 1662, "https://uwyn.net/@glaforge/109621803885287542": 1523, ... }, "server": "uwyn.net" } ``` With this little workflow, I can check how my toots are doing on this new social media! As next steps, you might want to check out how to get started with API orchestration with Google Cloud Workflows, [in the cloud console](https://cloud.google.com/workflows/docs/create-workflow-console), or [from the command-line](https://cloud.google.com/workflows/docs/create-workflow-gcloud). And to go further, potentially, it might be interesting to [schedule a workflow execution]({{< ref "/posts/2022/02/09/schedule-a-workflow-execution" >}}) with [Cloud Scheduler](https://cloud.google.com/scheduler). We could also imagine storing those stats in a database (perhaps [BigQuery](https://cloud.google.com/bigquery) for some analytics, or simply [Firestore](https://cloud.google.com/firestore) or [CloudSQL](https://cloud.google.com/sql)), to see how your impact evolves over time. # Workflows patterns and best practices — Part 3 This is a three-part series of posts, in which we summarize Workflows and service orchestration patterns. In this third and final post, we talk about managing workflow life cycles and the benefits of using Firestore with Workflows.  ### Use subworkflows and Terraform to manage workflow definitions If you're not careful, the workflow definitions you create with YAML or JSON can get out of hand pretty quickly. While it is possible to use subworkflows to define snippets of a workflow that can be reused from multiple workflows, Workflows does not support importing these subworkflows. Thankfully, there are other tools, such as Terraform, that can help. In our [Deploying multi-YAML Workflows definitions with Terraform](https://cloud.google.com/blog/topics/developers-practitioners/deploying-multi-yaml-workflows-definitions-terraform) post, we showed how to use Terraform to define a workflow and import it into a Terraform configuration file. We further showed how to also import a subworkflow in the same workflow definition. This makes it easier to manage workflows and subworkflow definitions. When you're defining a workflow, make sure you have a strategy to define and reuse subworkflows with Terraform or some other tool. ### GitOps your service orchestrations [GitOps](https://opengitops.dev/) takes DevOps best practices used for application development (such as version control and CI/CD) and applies them to infrastructure automation. Service orchestrations, which have their own definition files and deployment cycles, can benefit from a GitOps approach as well. In our [GitOps your service orchestrations](https://cloud.google.com/blog/topics/developers-practitioners/gitsops-service-orchestration) post, we showed how to use Cloud Build to manage an automated and staged rollout of workflow changes with tests along the way to minimize risk. ![/img/workflows-patterns/GitOps_Blog_1.max-800x800.max-800x800.png](/img/workflows-patterns/GitOps_Blog_1.max-800x800.max-800x800.png) ### Plan for multi-environment orchestrations While GitOps helps to manage the deployment lifecycle of a workflow, sometimes you need to make changes to the workflow before deploying to different environments. That means you need to design workflows with multiple environments in mind. For example, instead of hardcoding the URLs called from the workflow, replace the URLs with staging and production URLs depending on where the workflow is being deployed. In our [Multi-environment service orchestrations](https://cloud.google.com/blog/topics/developers-practitioners/multi-environment-service-orchestrations) post, we showed three different ways of replacing URLs in a workflow: passing URLs as runtime arguments, using Cloud Build to deploy multiple versions, and using Terraform to deploy multiple versions. ![/img/workflows-patterns/GOB2_-_2.max-700x700.max-700x700.png](/img/workflows-patterns/GOB2_-_2.max-700x700.max-700x700.png) ### Manage external state with Firestore You define workflows YAML/JSON as the recipe and then execute it with optional runtime arguments as an individual isolated instance of that recipe. Sometimes, you need to store some state (typically a key/value pair) in a step from one workflow execution and later read that state in another step from another workflow execution. There's no intrinsic key/value store in Workflows. However, you can use Firestore to store and read key/value pairs from Workflows. In our [Workflows state management with Firestore](https://medium.com/google-cloud/worklows-state-management-with-firestore-99237f08c5c5) post and its associated [sample](https://github.com/GoogleCloudPlatform/workflows-demos/tree/master/state-management-firestore), we showed a couple of subworkflows to put and get key/value pairs from Workflows with Firestore. This pattern is very useful when you need to manage some state in your workflow.  ### Workflows for reliable work and Firestore for reactive UI You can count on Workflows to perform some long-running work reliably and as an admin, check the status of the workflow and the current running step using the Google Cloud console or the Workflows API. However, how do you keep end users up to date about the status of the workflow? To have this kind of reactive UI, you can have your workflow write its status to a Firestore document and have Firestore notify connected end users in real time. At Google I/O, we demonstrated this pattern with two examples. In the [expense report application](https://github.com/GoogleCloudPlatform/smart-expenses), the UI updates the status of the approval process both for the employee and the manager. In the [translation application using callbacks](https://github.com/GoogleCloudPlatform/workflows-demos/tree/master/callback-translation), Firestore is used to show the status of an ongoing translation. {{< youtube "l3aMs00ziYA" >}} * * * * * This wraps up our three-part series. For questions or feedback, or if you want to share your own best practices and patterns, feel free to reach out to us on Twitter @[meteatamel](https://twitter.com/meteatamel) and @[glaforge](https://twitter.com/glaforge). # Turning a Website Into a Desktop Application Probably like most of you, my dear readers, I have too many browser windows open, with tons of tabs for each window. But there are always apps I come back to very often, like my email (professional & personal), my calendar, my chat app, or even social media sites like Mastodon or Twitter. You can switch from window to window with `CTRL/CMD-Tab`, but you also have to move between tabs potentially. But for the most common webapps or websites I’m using, I wanted to have a dedicated desktop application. Initially, I was on the lookout for a Mac specific approach, as I’ve been a macOS users for many years. So I found some Mac-specific apps that can handle that. This website mentions [5 approaches for macOS](https://www.makeuseof.com/tag/website-desktop-mac-app/), including free, freemium, non-free apps, like Fluid, Applicationize (creating a Chrome extension), Web2Desk, or Unite. However, some of them create big hundred-mega wrappers. Another approach on Macs was using Automator, to create a pop-up window, but that’s just a pop-up, not a real app. There are also some promising open source projects like [Tauri](https://tauri.app/) and [Nativefier](https://github.com/nativefier/nativefier) which seem promising. Fortunately, there’s a cool feature from Chrome, that should work across all OSses, and not just macOS. So if you’re on Linux or Windows, please read on. The websites you’ll turn into applications don’t even need to be PWAs (Progressive Web Apps). Here’s how to proceed: First, navigate to your website you want to transform into an application with your Chrome browser. Click on the triple dots in the top right corner, then `More Tools`, and finally `Create Shorctut`: ![](/img/chrome-to-app/web2app-01-create-shortcut.png) It will then let you customise the name of the application. It’ll reuse the favicon of the website as icon for the application. But be sure to check `Open as window` to create a standalone application: ![](/img/chrome-to-app/web2app-02-open-as-window.png) Then you’ll be able to open the website as a standalone application: ![](/img/chrome-to-app/web2app-03-standalone-app.png) I was curious if a similar feature existed with other browsers like Firefox. For the little fox, the only thing I could find was the ability to open Firefox in kiosk mode, in full-screen. But I wanted a window I could dimension however I wanted, not necessarily full-screen. I hope that Firefox will add that capability at some point. But for now, I’m happy to have this solution with Chrome! # Workflows patterns and best practices — Part 2 This is part 2 of a three-part series of posts, in which we summarize Workflows and service orchestration patterns. You can apply these patterns to better take advantage of Workflows and service orchestration on Google Cloud. In the [first post]({{< ref "posts/2022/11/22/workflows-patterns-and-best-practices-part-1" >}}), we introduced some general tips and tricks, as well as patterns for event-driven orchestrations, parallel steps, and connectors. This second post covers more advanced patterns.   Let's dive in! ## Design for resiliency with retries and the saga pattern It's easy to put together a workflow that chains a series of services,  especially if you assume that those services will never fail. This is a common distributed systems fallacy, however, because of course a service will fail at some point. The workflow step calling that service will fail, and then the whole workflow will fail. This is not what you want to see in a resilient architecture. Thankfully, Workflows has building blocks to handle both transient and permanent service failures.  In our post on [Implementing the saga pattern in Workflows](https://cloud.google.com/blog/topics/developers-practitioners/implementing-saga-pattern-workflows) (and its associated e-commerce [sample](https://github.com/GoogleCloudPlatform/workflows-demos/tree/master/retries-and-saga)), we talked about how you can apply the saga pattern and take compensation steps to undo one or more previous steps with the `try/except` block for permanent service failures.  ![https://storage.googleapis.com/gweb-cloudblog-publish/images/image3_2_1Fl6i9t.max-1400x1400.png](https://storage.googleapis.com/gweb-cloudblog-publish/images/image3_2_1Fl6i9t.max-1400x1400.png) _Saga pattern implementation_ We also showed how to handle transient service failures by adding the default HTTP retry policy with the `try/retry` block of Workflows. If you have non-idempotent steps, you need to adjust the retry policy to `default_retry_non_idempotent`. In most cases, you need a custom retry policy with a longer backoff, because the default one waits at most ~8 seconds.  When eventual consistency over a longer time period is more important than fast failure, consider a longer retry policy over several minutes or even hours with a large multiplier is much more likely to succeed during temporary outages.  Don't take network calls for granted. Make sure you design your orchestration with resiliency in mind using retries and the saga pattern. ## Wait for HTTP and event callbacks instead of polling Sometimes, your orchestration might need to wait for a long-running job or an asynchronous event in another system before it can continue with the rest of the steps. The workflow can ask for that input by polling an endpoint or a queue, but this requires complicated polling logic, wasted polling cycles and most likely higher latency. A better approach is to use Workflows [callbacks](https://cloud.google.com/workflows/docs/creating-callback-endpoints) to wait for HTTP calls or events.   In our [Introducing Workflows callbacks](https://cloud.google.com/blog/topics/developers-practitioners/introducing-workflows-callbacks) post, we showed a workflow that waits for human input for automated machine learning translations. Similarly, the [Smarter applications with Document AI, Workflows and Cloud Functions](https://cloud.google.com/blog/topics/developers-practitioners/smarter-applications-document-ai-workflows-and-cloud-functions) post shows a document processing workflow that waits for human approval of expense reports with a callback.  ![/img/workflows-patterns/architecture-diagram_6rwVYEP.max-1500x1500.png](/img/workflows-patterns/architecture-diagram_6rwVYEP.max-1500x1500.png) _Smarter applications with Document AI, Workflows and Cloud Functions_ While both of these posts are focused on waiting for *HTTP callbacks*, in [Creating Workflows that pause and wait for events](https://medium.com/google-cloud/creating-workflows-that-pause-and-wait-for-events-4da201741f2a) post, we showed how a workflow can also wait for Pub/Sub and Cloud Storage events. You can even use Google Sheets as a quick and simple frontend for human approvals as we showed in the [Workflows that pause and wait for human approvals from Google Sheets](https://medium.com/google-cloud/workflows-that-pause-and-wait-for-human-approvals-from-google-sheets-53673ced2a81) post.  When designing a workflow, consider waiting for HTTP calls and events, rather than polling.   ## Orchestrate long-running batch jobs If you need to execute long-running jobs, Google Cloud has services such as [Batch](https://cloud.google.com/batch) and [Cloud Run jobs](https://cloud.google.com/run/docs/create-jobs) that can help. While these services are great for completing  long-running jobs on Compute Engine instances and containers, you still need to create and manage the Batch and Cloud Run job service. One pattern that works really well is to use Workflows to manage these services running batch jobs.  In the [Taking screenshots of web pages with Cloud Run jobs, Workflows, and Eventarc](https://cloud.google.com/blog/topics/developers-practitioners/taking-screenshots-web-pages-cloud-run-jobs-workflows-and-eventarc) post, we showed how Cloud Run jobs take screenshots of web pages and Workflows creates and manages parallel Cloud Run jobs tasks. Similarly, in the [Batch - prime number generator](https://github.com/GoogleCloudPlatform/batch-samples/tree/main/primegen) sample, we showed how to run prime number generator containers in parallel on Compute Engine instances using Google Batch. The lifecycle of the Batch job is managed by Workflows. ![/img/workflows-patterns/image1_2.max-2200x2200.png](/img/workflows-patterns/image1_2.max-2200x2200.png) _Take screenshots of webpages with Cloud Run jobs, Workflows and Eventarc_ Use the right services for long-running batch jobs and use Workflows to manage their life cycles. ## Treat serverful workloads as serverless with Workflows Sometimes, you really need a server due to some serverless limitation. For example, you might need to run on GPUs or execute a long-running process that lasts weeks or months. In those cases, Compute Engine can provide you with customized virtual machines (VM), but you're stuck with managing those VMs yourself.  In this kind of IT automation scenario, you can use Workflows to create VMs with the customizations you need, run the workloads for however long you need (Workflows executions can last up to one year), and return the result in the end.  This pattern enables you to use servers but manage them as if they were serverless services using Workflows. In our [Long-running containers with Workflows and Compute Engine](https://cloud.google.com/blog/topics/developers-practitioners/long-running-containers-workflows-and-compute-engine) post, we showed how to use Workflows to spin up a VM, start a prime number generator on the VM, run it for however long you want, and return the result. Next time you need to spin up a VM, treat it like a serverless service with Workflows.  ## Run command-line tools with Workflows and Cloud Build We often use command-line tools such as `gcloud` to manage Google Cloud resources or `kubectl` to manage Kubernetes clusters. Wouldn't it be nice if we could call these tools from Workflows and orchestrate management of resources? In the [Executing commands (gcloud, kubectl) from Workflows](https://medium.com/google-cloud/executing-commands-gcloud-kubectl-from-workflows-ad6b85eaf39c) post, we showed how to use Cloud Build to run these tools and how to create and call that Cloud Build step from Workflows using the Cloud Build connector.  Keep in mind that this pattern is not limited to `gcloud` and `kubectl`. Any tool you can run in a container can potentially be a target for Workflows with the help of Cloud Build. Integrate command-line tools into your workflows when needed by calling a Cloud Build step from Workflows. * * * * * This second part series covered a lot of ground, but there's still more to cover! We'll wrap up the series in our [third and final post]({{< ref "posts/2022/12/06/workflows-patterns-and-best-practices-part-3" >}}), which describes how to manage the lifecycle of workflow definitions and the benefits of using Firestore. Stay tuned! For questions or feedback, feel free to reach out to us on Twitter @[meteatamel](https://twitter.com/meteatamel) and @[glaforge](https://twitter.com/glaforge). [](https://cloud.google.com/blog/topics/developers-practitioners/workflows-patterns-and-best-practices-part-1) # Workflows patterns and best practices — Part 1 For the last couple of years, we've been using [Workflows](https://cloud.google.com/workflows), Google Cloud's service orchestrator, to bring order to our serverless microservices architectures. As we used and gained more experience with Workflows and service orchestration, we shared what he had learned in conference talks, blog posts, samples, and tutorials. Along the way, some common patterns and best practices emerged.  To help you take better advantage of Workflows and service orchestration on Google Cloud, we've summarized these proven patterns and best practices in a three-part series of blog posts. Let's get started with Part 1! ## Make a conscious choice on the communication style upfront Choosing a communication style is more of a task than a pattern, but it is an important one to complete before even considering service orchestration.  When you have multiple services, you need to decide how these services will communicate. The options are: - **Direct** service-to-service communication - Indirect **event-driven** communication (also known as choreography) - A central **orchestrator** (e.g. Workflows) directing the communication There's no right or wrong, only pros and cons. Direct service-to-service communication is easy to implement but creates tight coupling between services. Events enable loosely coupled services at the expense of harder monitoring and debugging when something goes wrong. An orchestrator, while less flexible, brings order to communication without the tight coupling of direct service-to-service communication and the chaos of events in choreographed architectures. In our [Orchestrating the Pic-a-Daily serverless app with Workflows](https://cloud.google.com/blog/topics/developers-practitioners/orchestrating-pic-daily-serverless-app-workflows) post, we explained how we transformed an event-driven application into an orchestrated application and the benefits of doing so. In [Choosing the right orchestrator in Google Cloud](https://cloud.google.com/blog/topics/developers-practitioners/choosing-right-orchestrator-google-cloud), we talked about which service is best suited for different orchestration needs (scheduled, service, and data).  As you design your architecture, make a conscious choice on the communication style with pros and cons in mind, and if you choose to use orchestration, be sure to use the right orchestrator for the task.  ## Keep these tips and tricks for Workflows in mind Once you decide to use Workflows for service orchestration, you'll realize that Workflows has its own strengths and idiosyncrasies. Here are some general tips and tricks that we found useful as we used Workflows: - **Avoid hard-coding URLs** for more portable workflows across environments. - **Use substeps** to collect a common series of steps in one logical unit. - **Wrap string expressions** to avoid parsing problems. - **Replace logic-less services with declarative API calls** to avoid boilerplate code. - **Store what you need, free what you can** to keep memory consumption under control. - **Use subworkflows and call external workflows** to increase reuse. Check our [post on Workflows tips and tricks](https://glaforge.appspot.com/article/workflows-tips-n-tricks) for a more detailed explanation of these points.  ### Consider event-driven orchestration The choice on the communication style is not all or nothing. You can and should combine different styles when it makes sense. For example, there is a common pattern where services that are closely related are managed by an orchestrator like Workflows but that orchestration is triggered by an event from a service like Eventarc. Similarly, we see architectures where the end of an orchestration is a Pub/Sub message to some other orchestration or service.  In our [Introducing Eventarc triggers for Workflows](https://cloud.google.com/blog/topics/developers-practitioners/introducing-eventarc-triggers-workflows) post, we showed how easy it is to route events to Workflows using Eventarc. In the [Build an event-driven orchestration with Eventarc and Workflows](https://youtu.be/2SI12QE-2DU) video and its associated [codelab](https://codelabs.developers.google.com/codelabs/cloud-event-driven-orchestration?hl=en#8) and [sample](https://github.com/GoogleCloudPlatform/eventarc-samples/tree/main/processing-pipelines/image-v3), we showed how to design an image processing pipeline where the services are managed by Workflows but the orchestration is triggered in a loosely coupled way by a Cloud Storage event via Eventarc: {{< youtube "2SI12QE-2DU" >}} Mix communication styles to get the best of both worlds: use orchestration when you need tight coupling between services but loose coupling with other orchestrations via events. ## Use connectors when you can Workflows has a rich set of [connectors](https://cloud.google.com/workflows/docs/connectors) to call other Google Cloud services. They handle the formatting of requests for you, providing methods and arguments so that you don't need to get deep into the gory details of a Google Cloud API. More importantly, connectors enable a workflow to transparently wait for long-running Cloud API calls. This saves you from the tedious work of iterating and waiting for calls to complete; connectors take care of this for you!  In our [Introducing new connectors for Workflows](https://cloud.google.com/workflows/docs/connectors) post, we showed you how Compute Engine connector simplified creating and stopping a virtual machine.  Whenever you want to call a Google Cloud API from Workflows, check to see if there's a connector for it. You'll be glad that you did and you can always request a new connector [here](https://docs.google.com/forms/d/e/1FAIpQLScdDRwFHcFuy28hvjGq0XCBMyiVuhHGq2c-7Gy_no1ZuqKAOg/viewform?resourcekey=0-0BRW58uzC8wnVntuOiM7AQ), if there's no connector. ## Parallelize when you can When we talk about Workflows, we often talk about steps executed one after another sequentially. While Workflows is fast enough to run steps sequentially with no noticeable delay, not all steps need to run sequentially. Independent steps can actually run in parallel, and in some cases this can provide a significant speed up for workflow executions. In the [Introducing Parallel Steps for Workflows](https://cloud.google.com/blog/topics/developers-practitioners/introducing-parallel-steps-workflows-speed-up-workflow-executions-by-running-steps-concurrently) post and its associated video, we showed how running BigQuery jobs from Workflows in parallel can speed up the workflow execution by five times. The more independent steps you have, the more you can run those steps in parallel and the faster your workflow execution will be, especially with long-running tasks like BigQuery jobs. {{< youtube "C1Reg1u1MXY" >}} Try to keep your steps independent and make sure to take advantage of parallel steps when you can. * * * * * This initial list of patterns and tips will help you get started taking better advantage of Workflows. We cover more advanced patterns in [part 2]({{< ref "posts/2022/11/28/workflows-patterns-and-best-practices-part-2" >}}) of this series. For questions or feedback, feel free to reach out to us on Twitter @[meteatamel](https://twitter.com/meteatamel) and @[glaforge](https://twitter.com/glaforge). [](https://cloud.google.com/blog/topics/developers-practitioners/orchestrating-pic-daily-serverless-app-workflows) # APIs, we have a Problem JSON When designing a web API, not only do you have to think about the happy path when everything is alright, but you also have to handle all the error cases: Is the payload received correct? Is there a typo in a field? Do you need more context about the problem that occured? There’s only a limited set of status codes that can convey the kind of error you’re getting, but sometimes you need to explain more clearly what the error is about. In the past, the APIs I was designing used to follow a common JSON structure for my error messages: a simple JSON object, usually with a `message` field, and sometimes with extra info like a custom error `code`, or a `details` field that contained a longer explanation in plain English. However, it was my own convention, and it’s not necessarily one that is used by others, or understood by tools that interact with my API. So that’s why today, for reporting problems with my web APIs, I tend to use **Problem JSON**. This is actually an RFC ([RFC-7807](https://datatracker.ietf.org/doc/html/rfc7807)) whose title is “Problem Details for HTTP APIs”. Exactly what I needed, a specification for my error messages! First of all, it’s a JSON content-type. Your API should specify the content-type with: ```http Content-Type: application/problem+json ``` Content-types that end with `+json` are basically treated as `application/json`. Now, an example payload from the specification looks like: ```http HTTP/1.1 403 Forbidden Content-Type: application/problem+json Content-Language: en { "type": "https://example.com/probs/out-of-credit", "title": "You do not have enough credit.", "detail": "Your current balance is 30, but that costs 50.", "instance": "/account/12345/msgs/abc", "balance": 30, "accounts": ["/account/12345", "/account/67890"] } ``` There are some standard fields like: * **type**: a URI reference that uniquely identifies the problem type * **title**: a short readable error statement * **status**: the original HTTP status code from the origin server * **detail**: a longer text explanation of the issue * **instance**: a URI that points to the resource that has issues Then, in the example above, you also have custom fields: `balance` and `accounts`, which are specific to your application, and not part of the specification. Which means you can expand the Problem JSON payload to include details that are specific to your application. Note: Although I’m only covering JSON APIs, the RFC also suggests an `application/problem+xml` alternative for the XML APIs. ## Icing on the cake: built-in support in Micronaut My framework of choice these days for all my apps is [Micronaut](https://micronaut.io/), as it’s super fast and memory efficient. And it’s only recently that I realized there was actually a [Micronaut extension for Problem JSON](https://micronaut-projects.github.io/micronaut-problem-json/2.5.2/guide/index.html)! So instead of returning a custom JSON payload manually, I can use the built-in integration. Here’s an example from the Problem JSON Micronaut extension: ```java @Controller("/product") public class ProductController { @Get @Status(HttpStatus.OK) public void index() { throw Problem.builder() .withType(URI.create("https://example.org/out-of-stock")) .withTitle("Out of Stock") .withStatus(new HttpStatusType(HttpStatus.BAD_REQUEST)) .withDetail("Item B00027Y5QG is no longer available") .with("product", "B00027Y5QG") .build(); } } ``` Which will return a JSON error as follows: ```json { "status": 400, "title": "Out of Stock", "detail": "Item B00027Y5QG is no longer available", "type": "https://example.org/out-of-stock", "parameters": {"product": "B00027Y5QG"} } ``` Now, I’m happy that I can use some official standard for giving more details about the errors returned by my APIs! # Workflows Tips and Tricks Here are some general tips and tricks that we found useful as we used Google Cloud [Workflows](https://cloud.google.com/workflows): ## Avoid hard-coding URLs Since Workflows is all about calling APIs and service URLs, it's important to have some clean way to handle those URLs. You can hard-code them in your workflow definition, but the problem is that your workflow can become harder to maintain. In particular, what happens when you work with multiple environments? You have to duplicate your YAML definitions and use different URLs for the prod vs staging vs dev environments. It is error-prone and quickly becomes painful to make modifications to essentially the same workflow in multiple files. To avoid hard-coding those URLs, there are a few approaches. The first one is to externalize those URLs, and [pass them as workflow execution arguments](https://github.com/GoogleCloudPlatform/workflows-demos/tree/master/multi-env-deployment#option-1-use-urls-as-runtime-arguments). This is great for workflow executions that are launched via the CLI, via the various client libraries, or the REST & gRPC APIs. However, there's a limitation to this first approach, in the case of event-triggered workflows, where the invoker is Eventarc. In that case, that's Eventarc that decides which arguments to pass (ie. the event payload). There's no way to pass extra arguments in that case. A safer approach is then to use some placeholder replacement techniques. Just use a tool that replaces some specific string tokens in your definition file, before deploying that updated definition. We explored that approach using some [Cloud Build steps that do some string replacement](https://github.com/GoogleCloudPlatform/workflows-demos/tree/master/multi-env-deployment#option-2-use-cloud-build-to-deploy-multiple-versions). You still have one single workflow definition file, but you deploy variants for the different environments. If you're using Terraform for provisioning your infrastructure, we've got you covered, you can also employ a [similar technique with Terraform](https://github.com/GoogleCloudPlatform/workflows-demos/tree/master/multi-env-deployment#option-3-use-terraform-to-deploy-multiple-versions). There are also other possible approaches, like taking advantage of Secret Manager and the dedicated [workflow connector](https://cloud.google.com/workflows/docs/reference/googleapis/secretmanager/Overview), to store those URLs, and retrieve them. Or you can also [read some JSON file in a cloud storage bucket](https://github.com/GoogleCloudPlatform/workflows-demos/tree/master/gcs-read-write-json#load-environment-specific-variables-from-a-json-file-in-gcs), within which you would store those environment specific details. ## Take advantage of sub-steps Apart from branching or looping, defining your steps is a pretty sequential process. One step happens after another. Steps are their own atomic operation. However, often, some steps really go hand-in-hand, like making an API call, logging its outcome, retrieving and assigning parts of the payload into some variables. You can actually regroup common steps into substeps. This becomes handy when you are branching from a set of steps to another set of steps, without having to point at the right atomic step. ```yaml main:     params: [input]     steps:     - callWikipedia:         steps:         - checkSearchTermInInput:             switch:                 - condition: ${"searchTerm" in input}                   assign:                     - searchTerm: ${input.searchTerm}                   next: readWikipedia         - getCurrentTime:             call: http.get             args:                 url: ...             result: currentDateTime         - setFromCallResult:             assign:                 - searchTerm: ${currentDateTime.body.dayOfTheWeek}         - readWikipedia:             call: http.get             args:                 url: https://en.wikipedia.org/w/api.php                 query:                     action: opensearch                     search: ${searchTerm}             result: wikiResult     - returnOutput:             return: ${wikiResult.body[1]} ``` ## Wrap expressions The dollar/curly brace `${}` expressions are not part of the YAML specification, so what you put inside sometimes doesn't play well with YAML's expectations. For example, putting a colon inside a string inside an expression can be problematic, as the YAML parser believes the colon is the end of the YAML key, and the start of the right-hand-side. So to be safe, you can actually wrap your expressions within quotes, like: `'${...}'` Expressions can span several lines, as well as the strings within that expression. That's handy for SQL queries for BigQuery, like in our [example](https://github.com/GoogleCloudPlatform/workflows-demos/tree/master/bigquery-parallel): ```yaml query: ${ "SELECT TITLE, SUM(views)     FROM `bigquery-samples.wikipedia_pageviews." + table + "`     WHERE LENGTH(TITLE) > 10     GROUP BY TITLE     ORDER BY SUM(VIEWS) DESC     LIMIT 100" } ``` ## Replace logic-less services with declarative API calls In our [serverless workshop](https://github.com/GoogleCloudPlatform/serverless-photosharing-workshop/), in lab 1, we had a [function service](https://github.com/GoogleCloudPlatform/serverless-photosharing-workshop/blob/master/functions/image-analysis/nodejs/index.js#L19) that was making a call to the Cloud Vision API, checking a boolean attribute, then writing the result in Firestore. But the Vision API can be called declaratively from Workflows. The boolean check can be done with a switch conditional expression, and even writing to Firestore can be done via a declarative API call. When rewriting our application in [lab 6](https://github.com/GoogleCloudPlatform/serverless-photosharing-workshop/blob/master/workflows/workflows.yaml#L33) to use the orchestrated approach, we moved those logic-less calls into declarative API calls. There are times where Workflows lack some built-in function that you would need, so you have no choice but fork into a function to do the job. But when you have pretty logic-less code that just makes some API calls, you'd better just write this declaratively using Workflows syntax. It doesn't mean that everything, or as much as possible, should be done declaratively in a Workflow either. Workflows is not a hammer, and it's definitely not a programming language. So when there's real logic, you definitely need to call some service that represents that business logic. ## Store what you need, free what you can Workflows keeps on granting more memory to workflow executions, but there are times, with big API response payloads, where you'd be happy to have even more memory. That's when sane memory management can be a good thing to do. You can be selective in what you store into variables: don't store too much, but store just the right payload part you really need. Once you know you won't need the content of one of your variables, you can also reassign that variable to null, that should also free that memory. Also, in the first place, if the APIs allow you to filter the result more aggressively, you should also do that. Last but not least, if you're calling a service that returns a gigantic payload that can't fit in Workflows memory, you could always delegate that call to your own function that would take care of making the call on your behalf, and returning to you just the parts you're really interested in. Don't forget to check the documentation on [quotas and limits](https://cloud.google.com/workflows/quotas) to know more about what's possible. ## Take advantage of sub-workflows and the ability to call external workflows In your workflows, sometimes there are some steps that you might need to repeat. That's when [subworkflows](https://cloud.google.com/workflows/docs/reference/syntax/subworkflows) become handy. Sub-workflows are like sub-routines, procedures or methods. They are a way to make a set of steps reusable in several places of your workflow, potentially parameterized with different arguments. The sole downside maybe is that subworkflows are just local to your workflow definition, so they can't be reused in other workflows. In that case, you could actually create a dedicated reusable workflow, because you can also [call workflows from other workflows](https://cloud.google.com/workflows/docs/reference/googleapis/workflowexecutions/Overview)! The workflows connector for workflows is there to help. ## Summary We've covered a few tips and tricks, and we've reviewed some useful advice on how to make the best use of Workflows. There are certainly others we're forgetting about. So feel free to share them with [@meteatamel](https://twitter.com/meteatamel) and [@glaforge](https://twitter.com/glaforge) over Twitter. And don't forget to double check what's in the Workflows [documentation](https://cloud.google.com/workflows/docs). In particular, have a look at the built-in functions of the [standard library](https://cloud.google.com/workflows/docs/reference/stdlib/overview), at the [list of connectors](https://cloud.google.com/workflows/docs/reference/googleapis) that you can use, and perhaps even print the [syntax cheat sheet](https://cloud.google.com/workflows/docs/reference/syntax/syntax-cheat-sheet)! Lastly, check out all the [samples](https://cloud.google.com/workflows/docs/samples) in the documentation portal, and all the [workflow demos](https://github.com/GoogleCloudPlatform/workflows-demos) Mete and I have built and open sourced over time. # Retrieve YouTube views count with youtubeDL, JQ, and a Docker container I wanted to track the number of views, and also likes, of some YouTube videos I was featured in. For example, when I present a talk at a conference, often the video becomes available at a later time, and I'm not the owner of the channel or video. At first, I wanted to use the [YouTube Data API](https://developers.google.com/youtube/v3), but I had the impression that I could only see the stats of videos or channels I own, however I think I might be wrong, and should probably revisit this approach later on. My first intuition was to just scrape the web page of the video, but it's a gobbledygook of JavaScript, and I couldn't really find an easy way to consistently get the numbers in that sea of compressed JavaScript. That's when I remembered about the [youtube-dl project](https://youtube-dl.org/). Some people think of this project as a way to download videos to watch offline, but it's also a useful tool that offers lots of metadata about the videos. You can actually even use the project without downloading videos at all, but just fetching the metadata. For example, if I want to get the video metadata, without downloading, I can launch the following command, after having installed the tool locally: ```bash youtube-dl -j -s https://www.youtube.com/watch?v=xJi6pldZnsw ``` The `-s` flag is equivalent to `--simulate` which doesn't download anything on disk. And the `-j` flag is the short version of `--dump-json` which returns a big JSON file with lots of metadata, including the view count, but also things like links to transcripts in various languages, chapters, creator, duration, episode number, and so on and so forth. Now, I'm only interested in view counts, likes, dislikes. So I'm using [jq](https://stedolan.github.io/jq/) to filter the big JSON payload, and create a resulting JSON document with just the fields I want. ```bash jq '{"id":.id,"title":.title,"views":.view_count,"likes":(.like_count // 0), "dislikes":(.dislike_count // 0)}' ``` This long command is creating a JSON structure as follows: ```json { "id": "xJi6pldZnsw", "title": "Reuse old smartphones to monitor 3D prints, with WebRTC, WebSockets and Serverless by G. Laforge", "views": 172, "likes": 6, "dislikes": 0 } ``` The `.id`, `.title`, `.view_count`, etc, are searching for that particular key in the big JSON documentation. The `// 0` notation is to avoid null values and return 0 if there's no key or if the value associated with the key is null. So I always get a number --- although I noticed that sometimes, the likes are not always properly accounted for, but I haven't figured out why. So far so good... but if you pass a URL of a video with a playlist, or if you pass a playlist URL, it will fetch all the metadata for all the videos. This is actually useful: you can even create your own playlists for the videos you want to track. There's one odd thing happening though when using youtube-dl with such URLs: it will output a JSON document per line for each video. It's not returning an array of those documents. So I found a nice trick with jq to always put the results within an array, whether you pass a URL for a single video, or a video with a playlist: ```bash ​​jq -n '[inputs]' ``` So I'm piping the `youtube-dl` command, the first and second `jq` commands. Rather than installing those tools locally, I decided to containerize my magic commands. Let me first show you the whole `Dockerfile`: ```Dockerfile FROM ubuntu:latest RUN apt-get update && apt-get -y install wget     && wget https://yt-dl.org/latest/youtube-dl -O /usr/local/bin/youtube-dl     && chmod a+rx /usr/local/bin/youtube-dl     && apt-get -y install python3-pip jq     && pip3 install --upgrade youtube-dl COPY ./launch-yt-dl.sh RUN chmod +x /launch-yt-dl.sh ENTRYPOINT ["./launch-yt-dl.sh"] ``` And also this bash script mentioned in the Dockerfile: ```bash #!/bin/bash\ youtube-dl -j -s -- "$@" | jq '{"id":.id,"title":.title,"views":.view_count,"likes":(.like_count // 0), "dislikes":(.dislike_count // 0)}' | jq -n '[inputs]' ``` I went with the latest ubuntu image. I ran some apt-get commands to install wget to download the latest youtube-dl release, Python 3's pip to upgrade youtube-dl. There's no recent apt module for youtube-dl, hence why we have those steps together. What's more interesting is why I don't have the youtube-dl and jq commands in the Dockerfile directly, but instead in a dedicated bash script. Initially I had an `ENTRYPOINT` that pointed at youtube-dl, so that arguments passed to the docker run command would be passed as arguments of that entrypoint. However, after those commands, I still have to pipe with my jq commands. And I couldn't find how to do so with `ENTRYPOINT` and `CMD`. When raising the problem on [twitter](https://twitter.com/glaforge/status/1584800385256280064), my friends [Guillaume Lours](https://twitter.com/glours/status/1584810960136683521) and [Christophe Furmaniak](https://twitter.com/cfurmaniak/status/1584845647647506432) pointed me in the right direction with this idea of passing through a script. So I use the `$@` bash shortcut, which expands as arguments `$1 $2 $3`, etc. in case there are several videos passed as arguments. I have the jq pipes after that shortcut. But for my `ENTRYPOINT`, it's fine, the args are passed directly to it, and it's that intermediary script that weaves the args in my longer command. Next, I just need to build my Docker container: ```bash docker build . -t yt-video-stat ``` And then run it: ```bash docker run --rm -it yt-video-stat "https://www.youtube.com/watch?v=xJi6pldZnsw" ``` And voila, I have the stats for the YouTube videos I'm interested in! # Build and deploy Java 17 apps on Cloud Run with Cloud Native Buildpacks on Temurin In this article, let's revisit the topic of deploying Java apps on [Cloud Run](https://cloud.run/). In particular, I'll deploy a [Micronaut](https://micronaut.io/) app, written with [Java 17](https://jdk.java.net/17/), and built with [Gradle](https://gradle.org/). ## With a custom Dockerfile On Cloud Run, you deploy containerised applications, so you have to decide the way you want to build a container for your application. In a [previous article]({{< ref "/posts/2020/03/24/start-the-fun-with-java-14-and-micronaut-inside-serverless-containers-on-cloud-run.md" >}}), I showed an example of using your own Dockerfile, which would look as follows with an OpenJDK 17, and enabling preview features of the language: ```Dockerfile FROM openjdk:17 WORKDIR /app COPY ./ ./ RUN ./gradlew shadowJar EXPOSE 8080 CMD ["java", "--enable-preview", "-jar", "build/libs/app-0.1-all.jar"] ``` To further improve on that Dockerfile, you could use a multistage Docker build to first build the app in one step with Gradle, and then run it in a second step. Also you might want to parameterize the command as the JAR file name is hard-coded. To build the image, you can build it locally with Docker, and then push it to Container Registry, and then deploy it: ```bash # gcloud auth configure-docker # gcloud components install docker-credential-gcr docker build . --tag gcr.io/YOUR_PROJECT_ID/IMAGE_NAME docker push gcr.io/YOUR_PROJECT_ID/IMAGE_NAME gcloud run deploy weekend-service \     --image gcr.io/YOUR_PROJECT_ID/IMAGE_NAME ``` Instead of building locally with Docker, you could also let [Cloud Build](https://cloud.google.com/build) do it for you: ```bash gcloud builds submit . --tag gcr.io/YOUR_PROJECT_ID/SERVICE_NAME ``` ## With JIB Instead of messing around with Dockerfiles, you can also let [JIB](https://github.com/GoogleContainerTools/jib) create the container for you, like I wrote in [another article]({{< ref "/posts/2020/08/04/running-micronaut-serverlessly-on-google-cloud-platform.md" >}}). You configure Gradle to use the JIB plugin: ```groovy plugins {     ... id "com.google.cloud.tools.jib" version "2.8.0" } ... tasks {     jib { from { image = "gcr.io/distroless/java17-debian11"         }         to { image = "gcr.io/YOUR_PROJECT_ID/SERVICE_NAME"         }     } } ``` You specify the version of the plugin, but you also indicate that you want to use Java 17 by choosing a base image with that same version. Be sure to change the placeholders for your project ID and service name. Feel free to lookup the [documentation](https://github.com/GoogleContainerTools/jib/tree/master/jib-gradle-plugin) about the JIB Gradle plugin. You can then let Gradle build the container with ./gradlew jib, or with ./gradlew jibDockerBuild if you want to use your local Docker daemon. ## With Cloud Native Buildpacks Now that we covered the other approaches, let's zoom in on using [Cloud Native Buildpacks](https://buildpacks.io/) instead, in particular, the [Google Cloud Native Buildpacks](https://github.com/GoogleCloudPlatform/buildpacks). With buildpacks, you don't have to bother with Dockerfiles or with building the container before deploying the service. You let Cloud Run use buildpacks to build, containerize, and deploy your application from sources. Out of the box, the buildpack actually targets Java 8 or Java 11. But I'm interested in running the latest LTS version of Java, with Java 17, to take advantage of some preview features like records, sealed classes, switch expressions, etc. In my Gradle build, I specify that I'm using Java 17, but also enable preview features: ```groovy java {     toolchain {         languageVersion.set(JavaLanguageVersion.of(17))     } } ``` Like in Cédric Champeaus's [blog post](https://melix.github.io/blog/2020/06/java-feature-previews-gradle.html), to enable preview features, you should also tell Gradle you want to enable them for compilation, test, and execution tasks: ```groovy tasks.withType(JavaCompile).configureEach { options.compilerArgs.add("--enable-preview") } tasks.withType(Test).configureEach {     useJUnitPlatform()     jvmArgs("--enable-preview") } tasks.withType(JavaExec).configureEach {     jvmArgs("--enable-preview") } ``` So far so good, but as I said, the default native buildpack isn't using Java 17, and I want to specify that I use preview features. So when I tried to deploy my Cloud Run app from sources with the buildpack, simply by running the gcloud deploy command, I would get an error. ```bash gcloud beta run deploy SERVICE_NAME ``` To circumvent this problem, I had to add a configuration file, to instruct the buildpack to use Java 17. I created a project.toml file at the root of my project: ```toml [[build.env]] name = "GOOGLE_RUNTIME_VERSION" value = "17" [[build.env]] name = "GOOGLE_ENTRYPOINT" value = "java --enable-preview -jar /workspace/build/libs/app-0.1-all.jar" ``` I specify that the runtime version must use Java 17. But I also add the --enable-preview flag to enable the preview features at runtime. ## Adoptium Temuring OpenJDK 17 The icing on the cake is that the build is using [Adoptium](https://adoptium.net/)'s [Temurin](https://adoptium.net/temurin/releases/) build of OpenJDK 17, as we recently [announced](https://blog.adoptium.net/2022/10/adoptium-welcomes-google/)! If you look at the build logs in Cloud Build, you should see some output mentioning it, like: ```json { "link": "https://github.com/adoptium/temurin17-binaries/releases/download/jdk-17.0.4.1%2B1/OpenJDK17U-jdk-sources_17.0.4.1_1.tar.gz", "name": "OpenJDK17U-jdk-sources_17.0.4.1_1.tar.gz", "size": 105784017 } ``` Way to go! Java 17 Micronaut app, deployed on Temurin on Cloud Run thanks to cloud native buildpacks! I win at buzzword bingo 🙂 # Choreography vs orchestration in microservices and best practices We went from a single monolith to a set of microservices that are small, lightweight, and easy to implement. Microservices enable reusability, make it easier to change and scale apps on demand but they also introduce new problems. How do microservices interact with each other toward a common goal? How do you figure out what went wrong when a business process composed of several microservices fails? Should there be a central orchestrator controlling all interactions between services or should each service work independently, in a loosely coupled way, and only interact through shared events? In this talk, we’ll explore the Choreography vs Orchestration question and see demos of some of the tools that can help. And we'll explore some best practices and patterns to apply when adopting an orchestration approach. {{< speakerdeck 328a826a55f447f7907db099c0aa03ab >}} # Reuse old smartphones to monitor 3D prints with WebRTC WebSockets and serverless Reuse old smartphones to monitor 3D prints, with WebRTC, WebSockets and Serverless Monitoring my 3D prints in my basement means climbing lots of stairs back and forth! So here’s my story about how I reused an old smartphone to check the status of my prints. I built a small web app that uses WebRTC to exchange video streams between my broadcasting smartphone and viewers, with WebSockets for signaling, and a serverless platform for easily deploying and hosting my containerized app. {{< speakerdeck 3e5dfb35f64c48d9ba4a211122711b7c >}} # A Cloud Run service in Go calling a Workflows callback endpoint It's all [Richard Seroter](https://seroter.com/)'s fault, I ended up dabbling with [Golang](https://go.dev/)! We were chatting about a use case using Google Cloud [Workflows](https://cloud.google.com/workflows) and a [Cloud Run](https://cloud.run/) service implemented in Go. So it was the occasion to play a bit with Go. Well, I still don't like error handling... But let's rewind the story a bit! Workflows is a fully-managed service/API orchestrator on Google Cloud. You can create some advanced business workflows using YAML syntax. I've built numerous little projects using it, and [blogged](https://cloud.google.com/blog/topics/developers-practitioners/introducing-workflows-callbacks) about it. I particularly like its ability to pause a workflow execution, creating a [callback endpoint](https://cloud.google.com/workflows/docs/creating-callback-endpoints) that you can call from an external system to resume the execution of the workflow. With callbacks, you're able to implement human validation steps, for example in an expense report application where a manager validates or rejects an expense from someone in their team (this is what I implemented in this [article](https://cloud.google.com/blog/topics/developers-practitioners/smarter-applications-document-ai-workflows-and-cloud-functions)). For my use case with Richard, we had a workflow that was creating such a callback endpoint. This endpoint is called from a Cloud Run service implemented in Go. Let's see how to implement the workflow: ```yaml main:     params: [input]     steps: - create_callback: call: events.create_callback_endpoint args: http_callback_method: "POST" result: callback_details - log_callback_creation: call: sys.log args: text: ${"Callback created, awaiting calls on " + callback_details.url} - await_callback: call: events.await_callback args: callback: ${callback_details} timeout: 86400 result: callback_request - log_callback_received: call: sys.log args: json: ${callback_request.http_request} - return_callback_request: return: ${callback_request.http_request} ``` The above workflow definition creates a callback endpoint. The URL of the callback endpoint is returned by that first step. Then the workflow is waiting for the callback endpoint to be called externally. The execution then resumes and logs some info about the incoming call and returns. I deployed that workflow with a service account that has the Workflows Editor role, the Log Writer role (to log information), and the Service Account Token Creator role (to create OAuth2 tokens), as [explained](https://cloud.google.com/workflows/docs/creating-callback-endpoints#oauth-token) in the documentation. Now let's look at the Go service. I did a go mod init to create a new project. I created a main.go source file with the following content: ```go package main import ( metadata "cloud.google.com/go/compute/metadata" "encoding/json" "fmt" "log" "net/http" "os" "strings" ) ``` The metadata module is used to fetch an OAuth2 token from the [Cloud Run metadata server](https://cloud.google.com/run/docs/container-contract#metadata-server). ```go // OAuth2 JSON struct type OAuth2TokenInfo struct { // defining struct variables Token      string `json:"access_token"` TokenType  string `json:"token_type"` Expiration uint32 `json:"expires_in"` } ``` The metadata information in `instance/service-accounts/default/token` returns a JSON document that we map with the above struct. We're interested in the access_token field, that we use further down to make the authenticated call to the Workflows callback endpoint. ```go func main() { log.Print("Starting server...") http.HandleFunc("/", handler) // Determine port for HTTP service. port := os.Getenv("PORT") if port == "" { port = "8080" log.Printf("Defaulting to port %s", port) } // Start HTTP server. log.Printf("Listening on port %s", port) if err := http.ListenAndServe(":"+port, nil); err != nil { log.Fatal(err) } } ``` The `main()` function starts our Go service. Let's now see the `handler()` function in more detail: ```go func handler(w http.ResponseWriter, r *http.Request) { callbackUrl := r.URL.Query().Get("callback_url") log.Printf("Callback URL: %s", callbackUrl) ``` We retrieve the `?callback_url` query parameter that will contain our callback endpoint URL. ```go // Fetch an OAuth2 access token from the metadata server oauthToken, errAuth := metadata.Get("instance/service-accounts/default/token") if errAuth != nil { log.Fatal(errAuth) } ``` Above, we make a call to the metadata server thanks to the metadata Go module. And then we unmarshall the returned JSON document in our previously defined struct, with the following code: ```go data := OAuth2TokenInfo{} errJson := json.Unmarshal([]byte(oauthToken), &data) if errJson != nil { fmt.Println(errJson.Error()) } log.Printf("OAuth2 token: %s", data.Token) ``` Now it's time to prepare the call to our workflow callback endpoint, with a POST request: ```go workflowReq, errWorkflowReq := http.NewRequest("POST", callbackUrl, strings.NewReader("{}")) if errWorkflowReq != nil { fmt.Println(errWorkflowReq.Error()) } ``` We add the OAuth2 token as a bearer authorization via headers: ```go workflowReq.Header.Add("authorization", "Bearer "+data.Token) workflowReq.Header.Add("accept", "application/json") workflowReq.Header.Add("content-type", "application/json") client := &http.Client{} workflowResp, workflowErr := client.Do(workflowReq) if workflowErr != nil { fmt.Printf("Error making callback request: %s\n", workflowErr) } log.Printf("Status code: %d", workflowResp.StatusCode) fmt.Fprintf(w, "Workflow callback called. Status code: %d", workflowResp.StatusCode)} ``` We simply return the status code at the end of our Go service. To deploy the Go service, I simply used the source deployment approach, by running gcloud run deploy, and answering some questions (service name, region deployment, etc.) After a couple of minutes, the service is up and running. I create a new execution of the workflow from the Google Cloud console. Once it's started, it logs the callback endpoint URL. I copy its value, then I'm calling my Cloud Run service with the `?callback_url=` query string pointing at that URL. And voilà, the service resumes the execution of the workflow, and the workflow finishes. # Monitoring Website Changes with Workflows Cloud Functions and Sendgrid Every year in France, around mid-September, there's a special weekend where everyone can visit some famous places, usually closed the rest of the year. That's "[Journée du Patrimoine](https://journeesdupatrimoine.culture.gouv.fr/)". For example, you can visit places like the [Elysée Palace](https://www.elysee.fr/) or the [Matignon Palace](https://www.gouvernement.fr/le-patrimoine-de-l-hotel-de-matignon), if you want to see where the French president, or the French prime minister work. However, for some of those places, it's tricky to register online to book a slot, as there's always a high demand for them. Furthermore, you have to be there at the right time to register, and often, you don't even know when that day or time is! So I thought I could monitor the website of the Elysée Palace to see when the registration would open, by tracking changes on the Elysée website. To monitor web page or website changes, there are a ton of online services available. There are often some limitations to the number of free requests, or to the frequency of the change checks. Being a developer on Google Cloud, I decided to write a simple solution that would take advantage of various Google Cloud services, namely: - [Workflows](https://cloud.google.com/workflows): to define the various steps of my site change workflow, - [Cloud Scheduler](https://cloud.google.com/scheduler): to call execute my workflow on a regular basis, - [Cloud Functions](https://cloud.google.com/functions): to compute a hash of the webpage, to see if the page changed, - [Cloud Storage](https://cloud.google.com/storage): to store the hashes, - [SendGrid](https://sendgrid.com/) (not a Google Cloud product): to send me an email when changes have appeared, - [Secret Manager](https://cloud.google.com/secret-manager): to store my SendGrid API key securely. Let's have a look first at a function that computes the hash of a webpage. As there's no hash function in the Workflows standard library, I decided to use a function to do that job. I used the Node.js runtime, with the crypto module, which contains a sha1 implementation: ```javascript const crypto = require('crypto'); exports.checksum = (req, res) => { const webpageBody = req.body.webpage; const shasum = crypto.createHash('sha1'); shasum.update(webpageBody); const sha1 = shasum.digest('hex'); res.status(200).send({sha1: sha1}); }; ``` The function receives the web page content from the workflow. Then I create the sha1 hash with that content, and return it in hexadecimal form, in a small JSON payload. I created a Google Cloud Storage bucket to contain my web page hashes: ![](/img/patrimoine/patrimoine-gcs.png) Since I'm using SendGrid to notify me by email on changes, I store the API key securely in Secret Manager: ![](/img/patrimoine/patrimoine-secret.png) Now let's zoom on our workflow, piece by piece.\ First, I define some variables, like the name of my bucket, the name of my hashes text file, and I retrieve my SendGrid API key (see this previous [article about using Secret Manager with Workflows]({{< ref "/posts/2022/02/04/using-the-secret-manager-connector-for-workflows-to-call-an-authenticated-service.md" >}})): ```yaml main: params: [input] steps: - assignment: assign: - bucket: hash_results - file_name: hash.txt - get_email_api_key: call: googleapis.secretmanager.v1.projects.secrets.versions.accessString args: secret_id: SENDGRID_API_KEY result: EMAIL_API_KEY ``` Then I read the content of the previous hash in GCS (you can also check this article on how to [read and write JSON data to a file in a bucket from a workflow]({{< ref "/posts/2022/01/21/reading-in-and-writing-a-json-file-to-a-storage-bucket-from-a-workflow.md" >}})): ```yaml     - read_hash_from_gcs:         call: http.get         args:             url: ${"https://storage.googleapis.com/download/storage/v1/b/" + bucket + "/o/" + file_name}             auth:                 type: OAuth2             query:                 alt: media         result: hash_from_gcs ``` It's time to make a simple HTTP GET call to the website. Currently, the URL is hard-coded, but we could parameterize the workflow to get that URL from the workflow execution input parameters instead. ```yaml - retrieve_web_page: call: http.get args: url: https://evenements.elysee.fr/ result: web_page ``` Once I retrieved the content of the URL (the result of that request is stored in the web_page variable), I can compute my hash, by calling my cloud function: ```yaml - compute_hash: call: http.post args: url: https://europe-west1-patrimoine-check.cloudfunctions.net/checksum body: webpage: ${web_page.body} result: hash_result ``` That's where we introduce some branching in the workflow. If the web page hasn't changed we finish early, but if it has changed, then we're going to store the new hash in GCS: ```yaml     - assign_hashes:         assign:             - old_hash: ${hash_from_gcs.body.sha1}             - new_hash: ${hash_result.body.sha1}             - hash_msg: ${"Old hash = " + old_hash + " / New hash = " + new_hash}     - conditionalSwitch:         switch:         - condition: ${new_hash != old_hash}           next: write_hash_to_gcs         next: returnOutput     - write_hash_to_gcs:         call: http.post         args:             url: ${"https://storage.googleapis.com/upload/storage/v1/b/" + bucket + "/o"}             auth:                 type: OAuth2             query:                 name: ${file_name}             body:                 sha1: ${hash_result.body.sha1} ``` I log the fact the website has changed, and I'm calling the SendGrid API (like in this article on using SendGrid for sending emails from Workflows): ```yaml - site_changed_log: call: sys.log args: text: Website has changed - notify_by_email: call: http.post args: url: https://api.sendgrid.com/v3/mail/send headers: Content-Type: "application/json" Authorization: ${"Bearer " + EMAIL_API_KEY} body: personalizations: - to: - email: me@gmail.com from: email: you@gmail.com subject: "Elysée, page mise à jour" content: - type: text/plain value: "La page de l'Élysée a été mise à jour" - log_hashes: call: sys.log args: text: ${hash_msg} - returnOutput: return: ${hash_msg} ``` The workflows need to be invoked at a regular interval of time. Workflows can be configured to be invoked on a schedule via Cloud Scheduler (again, check this article on scheduling workflow executions). I configured my workflow to be triggered every minute, with the * * * * * cron pattern.\ ![](/img/patrimoine/patrimoine-schedule.png) And voila! I have my little workflow being invoked every minute to check if a web page has changed, and send me an email if so! To be honest with you, the workflow worked perfectly... but the true story is that I wasn't monitoring the right URL, I should have monitored the front page instead. Furthermore, the page I was monitoring included some dynamic JavaScript code, but the HTML fetched wasn't really changing. I missed the registration window, and all the slots filled super rapidly before I even had the time to register my family for a visit! Shame on me, better check my URL next time, or create webpage screenshots with a headless Chrome running in Cloud Run or in Cloud Functions! Or, of course, use online services that have solved those problems with their years of experience! Hopefully, next year, I won't miss the registration! But it was fun to glue together all those useful services from Google Cloud, to solve a concrete problem. # Some beans and gems, some snakes and elephants, with Java 17, Ruby 3, Python 3.10, and PHP 8.1 in App Engine and Cloud Functions Time to spill the beans and show the gems, to our friendly snakes and elephants: we've got some great news for Java, Ruby, Python and PHP serverless developers today. Google App Engine and Cloud Functions are adding new modern runtimes, allowing you to update to the major version release trains of those programming languages. In short, here's what's new: - Access to App Engine legacy bundled services for Java 11/17, Python 3 and Go 1.12+ runtimes, is **Generally Available** - The Java 17, Ruby 3.0, Python 3.10, and PHP 8.1 runtimes come into preview in App Engine and Cloud Functions Let's have a closer look. First of all, the access to App Engine legacy bundled services for second generation runtimes for [Java](https://cloud.google.com/appengine/docs/standard/java-gen2/services/access), [Python](https://cloud.google.com/appengine/docs/standard/python3/services/access) and [Go](https://cloud.google.com/appengine/docs/standard/go/services/access) is now Generally Available. In the past, for example for the Java platform, only Java 8 (a first generation runtime) could access the [built-in APIs](https://cloud.google.com/appengine/docs/standard/java-gen2/reference/services/bundled) like Memcache, Images, Mail, or Task Queues. Now, if you use the Java 11 runtime (a second generation runtime), you can access those services as well as all the [Google Cloud APIs](https://cloud.google.com/java/docs/reference). For example, you can now store transient cached data in Memcache, or send an email to users of your application in a second generation runtime. Same thing for Python and Go developers, you can take advantage of the bundled services as well. If you're still using an old runtime version, this will further ease the transition to newer versions. Be sure to check it out and upgrade. Next, let's continue with a fresh bean and a shiny gem, mixed in with some friendly animals, with the **preview of the Java 17, Ruby 3.0, Python 3.10 and PHP 8.1 runtimes for both App Engine and Cloud Functions**. What about having a look at what's new in those language versions? ## Java Between the two Long-Term-Support versions of Java 11 and 17, a lot of new features have landed. Java developers can now write text blocks for strings spanning several lines, without having to concatenate multiple strings manually. The switch construct has evolved to become an expression, which lets you break away from the `break` keyword, and paves the way for more advanced pattern matching capabilities. Speaking of which, the `instanceof` keyword is indeed offering some pattern matching evolution, to avoid obvious but useless casts. Records allow you to create more streamlined immutable data classes, rather than writing your own Java beans for that purpose with proper `hashCode()`, `equals()` or `toString()` methods. For more control over your class hierarchy, sealed class gives you more control over the extension of your classes. ## Ruby With Ruby 3.0, the big highlights were on performance, static type checking, and concurrency. The goal to make Ruby 3.0, three times faster on some benchmarks than Ruby 2.0 was reached, making your code run more swiftly. Also, Ruby programs can be annotated with some typing information, which allow type checkers to take advantage of those types to provide static type checking, to improve the quality of your code. For concurrency and parallelism, a new actor-inspired concurrency primitive called Ractor helps taming multiple cores in parallel, for your demanding workloads. And a fiber scheduler is introduced for intercepting blocking operations. And beyond those headlines, many improvements to various Ruby APIs have also landed. ## Python In Python 3.10, the parser gives better and clearer error messages for syntax errors (with more accurate error location), also for indentation, attribute, and name errors, which greatly help developers to find the problems in their code. Structural pattern matching lands with a new `match` and `case` statement construct. Further PEP improvements are tackling more robust type hints for static type checkers. Parenthesized context managers have been added to make the code prettier when spanning a long collection of context managers across several lines.  ## PHP With version 8.1, PHP gets a pretty major update. First, let's start with a new `enum` syntax construct instead of creating constants in classes, and you get validation out of the box. Classes now have the ability to define final class constants. The new `readonly` properties can't be changed after initialization, which is great for value objects and DTOs. A first class callable syntax is introduced, allowing you to get a reference to any function, with a short syntax. Developers will also find further improvements to initializers, that make it possible to even have nested attributes, using objects as default parameter values, static variables, and global constants. One last nice addition we can mention is the introduction of fibers to implement lightweight cooperative concurrency. ## Your turn Gems, beans, elephants, snakes: there's something great in those new language versions for every developer. Thus, with these new runtimes in Preview, Java, Ruby, Python and PHP developers can update or develop new App Engine apps and Cloud Functions using the latest and greatest versions of their favorite languages. Be sure to check out the documentation for App Engine ([Java](https://cloud.google.com/appengine/docs/standard/java-gen2/runtime), [Ruby](https://cloud.google.com/appengine/docs/standard/ruby/runtime), [Python](https://cloud.google.com/appengine/docs/standard/python3/runtime), [PHP](https://cloud.google.com/appengine/docs/standard/php7/runtime)) and Cloud Functions ([Java](https://cloud.google.com/functions/docs/concepts/java-runtime), [Ruby](https://cloud.google.com/functions/docs/concepts/ruby-runtime), [Python](https://cloud.google.com/functions/docs/concepts/python-runtime), [PHP](https://cloud.google.com/functions/docs/concepts/php-runtime)). We're looking forward to hearing from you about how you'll take advantage of those new language runtimes. [](https://cloud.google.com/blog/products/serverless/introducing-the-next-generation-of-cloud-functions) # Schedule a Workflow Execution There are [different ways to launch the execution](https://cloud.google.com/workflows/docs/executing-workflow) of a workflow. In previous articles, we mentioned that you can [use the gcloud command-line](https://cloud.google.com/workflows/docs/quickstart-gcloud) tool to create an execution, you can also use the various [client libraries](https://cloud.google.com/workflows/docs/quickstart-client-libraries) to invoke Workflows, or use the [REST API](https://cloud.google.com/workflows/docs/reference/executions/rest). A workflow itself can also invoke other workflows! But today, I'd like to tell you how to schedule the execution of a workflow. For that purpose, we'll take advantage of [Cloud Scheduler](https://cloud.google.com/scheduler). The [documentation](https://cloud.google.com/workflows/docs/schedule-workflow) is actually covering this topic in detail, so be sure to grab all the info there. However, I'll go quickly through the steps, and tell you about a nice new feature in the cloud console to ease the scheduling of workflows! First, you need to have both Workflows and Cloud Scheduler enabled: ```bash gcloud services enable cloudscheduler.googleapis.com workflows.googleapis.com ``` Cloud Scheduler will need a service account with `workflows.invoker` role, to be allowed to call Workflows: ```bash gcloud iam service-accounts create workflows_caller_sa gcloud projects add-iam-policy-binding MY_PROJECT_ID \   --member serviceAccount:workflows_caller_sa@MY_PROJECT_ID.iam.gserviceaccount.com \\   --role roles/workflows.invoker ``` Now it's time to create the cron job: ```bash gcloud scheduler jobs create http every_5_minute_schedule \     --schedule="*/5 * * * *" \     --uri="https://workflowexecutions.googleapis.com/v1/projects/MY_PROJECT_ID/locations/REGION_NAME/workflows/WORKFLOW_NAME/executions" \     --message-body="{\"argument\": \"DOUBLE_ESCAPED_JSON_STRING\"}" \     --time-zone="America/New_York" \     --oauth-service-account-email="workflows_caller_sa@MY_PROJECT_ID.iam.gserviceaccount.com" ``` Here, you can see that Scheduler will run every 5 minutes (using the cron notation), and that it's going to call the Workflows REST API to create a new execution. You can also pass an argument for the workflow input. The cool new feature I was eager to mention today was the direct integration of the scheduling as part of the Workflows creation flow, in the cloud console. Now, when you create a new workflow, you can select a trigger: ![](/img/schedule-workflow/scheduler-trigger-1-600.jpg) Click on the `ADD NEW TRIGGER` button, and select `Scheduler`. A side panel on the right will show up, and you will be able to specify the schedule to create, directly integrated, instead of having to head over to the Cloud Scheduler product section: ![](/img/schedule-workflow/scheduler-trigger-2-600.png) And there, you can specify the various details of the schedule! It's nice to see both products nicely integrated, to ease the flow of creating a scheduled workflow. # Using the Secret Manager connector for Workflows to call an authenticated service Workflows allows you to call APIs, whether from or hosted on Google Cloud, or any external API in the wild. A few days ago, for example, we saw an example on [how to use the SendGrid API to send emails from a workflow]({{< ref "/posts/2022/02/01/sending-an-email-with-sendgrid-from-workflows.md" >}}). However, in that article, I had the API key hard-coded into my workflow, which is a bad practice. Instead, we can store secrets in [Secret Manager](https://cloud.google.com/secret-manager). Workflows has a specific [connector for Secret Manager](https://cloud.google.com/workflows/docs/reference/googleapis/secretmanager/Overview), and a useful method to access secrets. In this article, we'll learn two things: * How to access secrets stored in Secret Manager with the Workflows connector * How to call an API that requires basic authentication Let's access the secrets I need to do my basic auth call to the API I need to call: ```yaml - get_secret_user:     call: googleapis.secretmanager.v1.projects.secrets.versions.accessString     args:       secret_id: basicAuthUser     result: secret_user - get_secret_password:     call: googleapis.secretmanager.v1.projects.secrets.versions.accessString     args:       secret_id: basicAuthPassword     result: secret_password ``` The user login and password are now stored in variables that I can reuse in my workflow. I will create the Base64 encoded user:password string required to pass in the authorization header: ```yaml - assign_user_password:     assign:     - encodedUserPassword: ${base64.encode(text.encode(secret_user + ":" + secret_password))} ``` Equipped with my encoded user:password string, I can now call my API (here a cloud function) by added an authorization header with basic authentication (and return the output of the function): ```yaml - call_function:     call: http.get     args:         url: https://europe-west1-workflows-days.cloudfunctions.net/basicAuthFn         headers:             Authorization: ${"Basic " + encodedUserPassword} result: fn_output - return_result: return: ${fn_output.body} ``` Workflows has built-in [OAuth2 and OIDC support for authenticating to Google hosted APIs, functions and Cloud Run services](https://cloud.google.com/workflows/docs/authentication#making_authenticated_requests), but it's also useful to know how to invoke other authenticated services, like those requiring basic auth, or other bearer tokens. # Upload and use JSON data in your workflow from GCS Following up the [article]({{< ref "/posts/2022/01/21/reading-in-and-writing-a-json-file-to-a-storage-bucket-from-a-workflow.md" >}}) on writing and reading JSON files in cloud storage buckets (GCS), we saw that we could access the data of the JSON file, and use it in our workflow. Let's have a look at a concrete use of this. Today, we'll take advantage of this mechanism to avoid hard-coding the URLs of the APIs we call from our workflow. That way, it makes the workflow more portable across environments. Let's regroup the logic for reading and loading the JSON data in a reusable subworkflow: ```yaml read_env_from_gcs:     params: [bucket, object]     steps:     - read_from_gcs:         call: http.get         args:             url: ${"https://storage.googleapis.com/download/storage/v1/b/" + bucket + "/o/" + object}             auth:                 type: OAuth2             query:                 alt: media         result: env_file_json_content     - return_content:         return: ${env_file_json_content.body} ``` You call this subworkflow with two parameters: the bucket name, and the object or file name that you want to load. Now let's use it from the main workflow. We need a first step to call the subworkflow to load a specific file from a specific bucket. The subworkflow below will return the content of the JSON data in the env_details variable. ```yaml ​​main:     params: [input]     steps:     - load_env_details:         call: read_env_from_gcs         args:             bucket: workflow_environment_info             object: env-info.json         result: env_details ``` Imagine the JSON file contains a JSON object with a `SERVICE_URL` key, pointing at the URL of a service, then you can call the service with the following expression: `${env_details.SERVICE_URL}` as shown below. ```yaml     - call_service:         call: http.get         args:             url: ${env_details.SERVICE_URL}         result: service_result     - return_result:         return: ${service_result.body} ``` This is great for avoiding hardcoding certain values in your workflow definitions. However, for true environment-specific deployments, this is not yet ideal, as you would have to point to a different file in the bucket, or use a different bucket. And that information is currently hardcoded in the definition when you make the call to the subworkflow. But if you follow some naming conventions for the project names and bucket names, that map to environments, this can work! (ie. `PROD_bucket` vs `DEV_bucket`, or `PROD-env-info.json` vs `DEV-env-info.json`) Let's wait for the support of environment variables in Workflows! # Sending an email with SendGrid from Workflows Following up the [article]({{< ref "/posts/2022/01/21/reading-in-and-writing-a-json-file-to-a-storage-bucket-from-a-workflow.md" >}}) on writing and reading JSON files in cloud storage buckets, we saw that we could access the data of the JSON file, and use it in our workflow. Let's have a look at a concrete use of this. Today, we'll take advantage of this mechanism to avoid hard-coding the URLs of the APIs we call from our workflow. That way, it makes the workflow more portable across environments. Let's regroup the logic for reading and loading the JSON data in a reusable subworkflow: ```yaml read_env_from_gcs:     params: [bucket, object]     steps:     - read_from_gcs:         call: http.get         args:             url: ${"https://storage.googleapis.com/download/storage/v1/b/" + bucket + "/o/" + object}             auth:                 type: OAuth2             query:                 alt: media         result: env_file_json_content     - return_content:         return: ${env_file_json_content.body} ``` You call this subworkflow with two parameters: the bucket name, and the object or file name that you want to load. Now let's use it from the main workflow. We need a first step to call the subworkflow to load a specific file from a specific bucket. The subworkflow below will return the content of the JSON data in the `env_details` variable. ```yaml ​​main:     params: [input]     steps:     - load_env_details:         call: read_env_from_gcs         args:             bucket: workflow_environment_info             object: env-info.json         result: env_details ``` Imagine the JSON file contains a JSON object with a `SERVICE_URL` key, pointing at the URL of a service, then you can call the service with the following expression: `${env_details.SERVICE_URL}` as shown below. ```yaml     - call_service:         call: http.get         args:             url: ${env_details.SERVICE_URL}         result: service_result     - return_result:         return: ${service_result.body} ``` This is great for avoiding hardcoding certain values in your workflow definitions. However, for true environment-specific deployments, this is not yet ideal, as you would have to point to a different file in the bucket, or use a different bucket. And that information is currently hardcoded in the definition when you make the call to the subworkflow. But if you follow some naming conventions for the project names and bucket names, that map to environments, this can work! (ie. `PROD_bucket` vs `DEV_bucket`, or `PROD-env-info.json` vs `DEV-env-info.json`) Let's wait for the support of environment variables in Workflows! For notification purposes, especially in an asynchronous way, email is a great solution. I wanted to add an email notification step in [Google Cloud Workflows](https://cloud.google.com/workflows). Since GCP doesn't have an email service, I looked at the various email services available in the cloud: SendGrid, Mailgun, Mailjet, and even ran a quick Twitter [poll](https://twitter.com/glaforge/status/1488444661211533312) to see what folks in the wild are using. I experimented with SendGrid, and the [sign up](https://signup.sendgrid.com/) process was pretty straightforward, as I was able to [get started](https://docs.sendgrid.com/for-developers/sending-email/api-getting-started) quickly, by creating an API key, and sending my first email with cURL command. Now comes the part where I needed to call that API from my workflow definition. And that's actually pretty straightforward as well. Let's see that in action: ```yaml - retrieve_api_key:     assign:         - SENDGRID_API_KEY: "MY_API_KEY" - send_email:     call: http.post     args:         url: https://api.sendgrid.com/v3/mail/send         headers:             Content-Type: "application/json"             Authorization: ${"Bearer " + SENDGRID_API_KEY}         body:             personalizations:                 - to:                     - email: to@example.com             from:                 email: from@example.com             subject: Sending an email from Workflows             content:                 - type: text/plain                   value: Here's the body of my email     result: email_result - return_result:     return: ${email_result.body} ``` In the `retrieve_api_key` step, I simply hard-coded the SendGrid API key. However, you can of course store that secret in Secret Manager, and then fetch the secret key thanks to the Workflows Secret Manager connector (that's probably worth a dedicated article!) Then, in the `send_email` step, I prepare my HTTP POST request to the SendGrid API endpoint. I specify the content type, and of course, the authorization using the SendGrid API key. Next, I prepare the body of that request, describing my email, with a `from` field with a registered email user that I defined in SendGrid, a `to` field corresponding to the recipient, an email `subject` and `body` (just plain text, here). And that's pretty much it! I just translated the JSON body sent in the [cURL example](https://app.sendgrid.com/guide/integrate/langs/curl) from SendGrid's documentation, into YAML (using a handy JSON to YAML conversion [utility](https://www.json2yaml.com/)) # Smarter Applications With Document Ai Workflows and Cloud Functions At enterprises across industries, documents are at the center of core business processes. Documents store a treasure trove of valuable information whether it's a company's invoices, HR documents, tax forms and much more. However, the unstructured nature of documents make them difficult to work with as a data source. We call this ["dark data"](https://www.gartner.com/en/information-technology/glossary/dark-data) or unstructured data that businesses collect, process and store but do not utilize for purposes such as analytics, monetization, etc. These documents in pdf or image formats, often trigger complex processes that have historically relied on fragmented technology and manual steps. With compute solutions on Google Cloud and [Document AI](https://cloud.google.com/document-ai), you can create seamless integrations and easy to use applications for your users. Document AI is a platform and a family of solutions that help businesses to transform documents into structured data backed by machine learning. In this blog post we'll walk you through how to use Serverless technology to process documents with [Cloud Functions](https://cloud.google.com/functions), and with workflows of business processes orchestrating microservices, API calls, and functions, thanks to [Workflows](https://cloud.google.com/workflows). At Cloud Next 2021, we presented how to build easy [AI-powered applications with Google Cloud](https://cloud.withgoogle.com/next/catalog?session=DEV202#application-development). We introduced a sample application for handling incoming expense reports, analyzing expense receipts with [Procurement Document A](https://cloud.google.com/solutions/procurement-doc-ai)I, a DocAI solution for automating procurement data capture from forms including invoices, utility statements and more. Then organizing the logic of a report approval process with Workflows, and used Cloud Functions as glue to invoke the workflow, and do analysis of the parsed document. ![/img/smart-expense/smart-expenses-screens.max-1700x1700.png](/img/smart-expense/smart-expenses-screens.max-1700x1700.png) We also open sourced the code on this [Github repository](https://github.com/GoogleCloudPlatform/smart-expenses), if you're interested in learning more about this application. ![/img/smart-expense/architecture-diagram.max-1500x1500.png](/img/smart-expense/architecture-diagram.max-1500x1500.png) In the above diagram, there are two user journeys: the employee submitting an expense report where multiple receipts are processed at once, and the manager validating or rejecting the expense report.  First, the employee goes to the website, powered by [Vue.js](https://vuejs.org/) for the frontend progressive JavaScript framework and [Shoelace](https://shoelace.style/) for the library of web components. The website is hosted via [Firebase Hosting](https://firebase.google.com/docs/hosting). The frontend invokes an [HTTP function](https://cloud.google.com/functions/docs/writing/http) that triggers the execution of our business [workflow](https://github.com/GoogleCloudPlatform/smart-expenses/blob/main/workflow.yaml), defined using the Workflows YAML syntax.  Workflows is able to handle long-running operations without any additional code required, in our case we are asynchronously processing a set receipt files. Here, the Document AI [connector](https://cloud.google.com/blog/topics/developers-practitioners/better-service-orchestration-workflows) directly calls the batch processing endpoint for service. This API returns a long-running operation: if you poll the API, the operation state will be "RUNNING" until it has reached a "SUCCEEDED" or "FAILED" state. You would have to wait for its completion. However, Workflows' [connectors](https://cloud.google.com/workflows/docs/connectors) handle such long-running operations, without you having to poll the API multiple times till the state changes. Here's how we call the batch processing operation of the [Document AI connector](https://cloud.google.com/workflows/docs/reference/googleapis/documentai/Overview): ```yaml - invoke_document_ai: call: googleapis.documentai.v1.projects.locations.processors.batchProcess args: name: ${"projects/" + project + "/locations/eu/processors/" + processorId} location: "eu" body: inputDocuments: gcsPrefix: gcsUriPrefix: ${bucket_input + report_id} documentOutputConfig: gcsOutputConfig: gcsUri: ${bucket_output + report_id} skipHumanReview: true result: document_ai_response ``` Machine learning uses state of the art Vision and Natural Language Processing models to intelligently extract schematized data from documents with Document AI. As a developer, you don't have to figure out how to fine tune or reframe the receipt pictures, or how to find the relevant field and information in the receipt. It's Document AI's job to help you here: it will return a JSON document whose fields are: `line_item`, `currency`, `supplier_name`, `total_amount`, etc. Document AI is capable of understanding standardized papers and forms, including invoices, lending documents, pay slips, driver licenses, and more. A cloud function retrieves all the relevant fields of the receipts, and makes its own tallies, before submitting the expense report for approval to the manager. Another useful feature of Workflows is put to good use: Callbacks, that we [introduced](https://cloud.google.com/blog/topics/developers-practitioners/introducing-workflows-callbacks) last year. In the workflow definition we create a callback endpoint, and the workflow execution will wait for the callback to be called to continue its flow, thanks to those two instructions: ```yaml - create_callback: call: events.create_callback_endpoint args: http_callback_method: "POST" result: callback_details ... - await_callback: try: call: events.await_callback args: callback: ${callback_details} timeout: 3600 result: callback_request except: as: e steps: - update_status_to_error: ... ``` In this example application, we combined the intelligent capabilities of Document AI to transform complex image documents into usable structured data, with Cloud Functions for data transformation, process triggering, and callback handling logic, and Workflows enabled us to orchestrate the underlying business process and its service call logic. ## Going further  If you're looking to make sense of your documents, turning dark data into structured information, be sure to check out what [Document AI](https://cloud.google.com/document-ai) offers. You can also get your hands on a [codelab](https://codelabs.developers.google.com/codelabs/docai-form-parser-v3-node#0) to get started quickly, in which you'll get a chance at processing handwritten forms. If you want to explore [Workflows](https://cloud.google.com/workflows), [quickstarts](https://cloud.google.com/workflows/docs/quickstarts) are available to guide you through your first steps, and likewise, another [codelab](https://codelabs.developers.google.com/codelabs/cloud-workflows-intro?hl=en#0) explores the basics of Workflows. As mentioned earlier, for a concrete example, the source code of our [smart expense application](https://github.com/GoogleCloudPlatform/smart-expenses) is available on Github. Don't hesitate to reach out to us at [@glaforge](https://twitter.com/glaforge) and [@asrivas_dev](https://twitter.com/asrivas_dev) to discuss smart scalable apps with us. # Open sourcing the App Engine Standard Java Runtime One year after Google App Engine was [released](http://googleappengine.blogspot.com/2008/04/introducing-google-app-engine-our-new.html) in 2008, Java became the [second language runtime](http://googleappengine.blogspot.com/2009/04/seriously-this-time-new-language-on-app.html) available on the platform. Java developers were able to deploy and scale their servlet-based web applications easily, without worrying about infrastructure management. Not only Java was able to run then, but alternative JVM languages, like [Apache Groovy](https://spring.io/blog/2009/04/08/write-your-google-app-engine-applications-in-groovy), and [Kotlin](https://kotlinlang.org/) are also part of the game. Fast forward to today, we're pleased to announce that the Java Runtime for App Engine is now available as open source, in the [GoogleCloudPlatform/appengine-java-standard](https://github.com/GoogleCloudPlatform/appengine-java-standard) repository on Github. ## What's available? The `appengine-java-standard` repository contains the Java source code for the Java standard environment, the production runtime, the APIs, and the local SDK.  Zooming on the various directories of the project, you'll find the [APIs](https://github.com/GoogleCloudPlatform/appengine-java-standard/tree/main/api/src/main/java/com/google/appengine/api) from the `com.google.appengine.api` package for using and accessing App Engine services like Datastore to store your data, Blobstore to save your binary blobs, Taskqueue to enqueue computing tasks, Memcache to cache expensive results you don't want to recompute, or Urlfetch to call external services.  The great developer experience of Google App Engine comes in part from the fact you can run your application locally on your development machine, thanks to the local development environment. The aforementioned services all have a local [implementation](https://github.com/GoogleCloudPlatform/appengine-java-standard/tree/main/api_dev/src/main/java/com/google/appengine/api) enabling you to start your app on your machine, rather than having to deploy in the cloud each time you make a change. To use the services in the cloud, however, you can use Google's built-in [remote APIs](https://cloud.google.com/appengine/docs/standard/java/tools/remoteapi) ([code](https://github.com/GoogleCloudPlatform/appengine-java-standard/tree/main/remoteapi/src/main/java/com/google/appengine/tools/remoteapi)) that let you transparently access App Engine services from any Java application. For example, you can use the Remote API to access a production Datastore from an app running on your local machine. You can also use Remote API to access the Datastore of one App Engine application from a different App Engine application. In the [runtime folder](https://github.com/GoogleCloudPlatform/appengine-java-standard/tree/main/runtime/impl/src/main/java/com/google/apphosting), you'll be able to see how your App Engine app is started, with the underlying [Jetty](https://www.eclipse.org/jetty/) servlet container, and understand how the various components or services are configured on startup. What's not open source today are the specific layers that tie App Engine to the underlying [Borg](https://research.google.com/pubs/pub43438.html?hl=es) cluster management system, internal to the Google infrastructure, as developers can't replicate the whole Google backend easily in their own environment. ## Why make it open source? It's important for our customers to not be locked in a particular tool, environment, or vendor, and give them the liberty to run their workloads elsewhere, and understand how we actually run their code. That's why Google Cloud follows an [Open Cloud](https://cloud.google.com/open-cloud) approach, and participates actively in the open source ecosystem. Our open cloud approach enables you to develop software faster, innovate more easily, and scale more efficiently---while also reducing technology risk. Through the years, we've open sourced various technologies, like [Kubernetes](https://kubernetes.io/), based on our experience of running containers at scale, or the [gVisor](https://gvisor.dev/) sandbox, an application kernel for containers that provides efficient defense-in-depth anywhere, and which was pioneered by the App Engine environment, as gVisor was its sandbox environment. By open sourcing the App Engine Java runtime, we're making the first step towards letting you run the whole App Engine environment wherever you want: on your local development environment, on-premise in your own data center, or potentially on our other computing platforms like [Cloud Run](https://cloud.google.com/run/). We're also making a key step towards easier transitions for future runtimes based on newer Long-Term Support versions of the Java programming language.  Finally, being able to compile, test and release the Java runtime environment for Java 8, Java 11 and soon Java 17 from an open source platform is much easier than relying on the huge internal Google mono repository system which is intended to support only one single version of everything, including a single JDK toolchain. We are releasing the binary artifacts in the [Maven central repository](https://repo1.maven.org/maven2/com/google/appengine/), including Javadocs and sources to find the relevant code regarding exceptions raised during runtime execution. ## Going further If you're not familiar with App Engine yet, be sure to check out our [online resources](https://cloud.google.com/appengine) about this powerful and scalable serverless application platform on our website. And get started with the tutorials on the [Java 8](https://cloud.google.com/appengine/docs/standard/java/runtime) or [Java 11 standard environment](https://cloud.google.com/appengine/docs/standard/java11/services/access). To learn more about this open source App Engine Java runtime, please visit our [repository](https://github.com/GoogleCloudPlatform/appengine-java-standard) on Github, to discover how the serverless secret sauce is made. And if you're feeling like helping and [contributing](https://github.com/GoogleCloudPlatform/appengine-java-standard/blob/main/CONTRIBUTING.md), we're looking forward to hearing from you, by filing new [tickets](https://github.com/GoogleCloudPlatform/appengine-java-standard/issues) or by preparing some [pull requests](https://github.com/GoogleCloudPlatform/appengine-java-standard/pulls). [](https://cloud.google.com/blog/products/application-development/turn-it-up-to-eleven-java-11-runtime-comes-to-app-engine) # Reading in and writing a JSON file to a storage bucket from a workflow [Workflows](https://cloud.google.com/workflows) provides several [connectors](https://cloud.google.com/workflows/docs/reference/googleapis) for interacting with various Google Cloud APIs and services. In the past, I've used for example the [Document AI connector](https://cloud.google.com/workflows/docs/reference/googleapis/documentai/Overview) to parse documents like expense receipts, or the [Secret Manager connector](https://cloud.google.com/workflows/docs/reference/googleapis/secretmanager/Overview) to store and access secrets like passwords. Another useful connector I was interested in using today was the [Google Cloud Storage connector](https://cloud.google.com/workflows/docs/reference/googleapis/storage/Overview), to store and read files stored in storage buckets. Those connectors are auto-generated from their API discovery descriptors, but there are some limitations currently that prevent, for example, to download the content of a file. So instead of using the connector, I looked at the JSON API for cloud storage to see what it offered ([insert](https://cloud.google.com/workflows/docs/reference/googleapis/storage/v1/objects/insert) and [get](https://cloud.google.com/storage/docs/json_api/v1/objects/get) methods). What I wanted to do was to store a JSON document, and to read a JSON document. I haven't tried with other media types yet, like pictures or other binary files. Anyhow, here's how to write a JSON file into a cloud storage bucket: ```yaml main:     params: [input]     steps:     - assignment:         assign:             - bucket: YOUR_BUCKET_NAME_HERE     - write_to_gcs:         call: http.post         args:             url: ${"https://storage.googleapis.com/upload/storage/v1/b/" + bucket + "/o"}             auth:                 type: OAuth2             query:                 name: THE_FILE_NAME_HERE             body:                 name: Guillaume                 age: 99 ``` In the file, I'm storing a JSON document that contains a couple keys, defined in the body of that call. By default, here, a JSON media type is assumed, so the body defined at the bottom in YAML is actually written as JSON in the resulting file. Oh and of course, don't forget to change the names of the bucket and the object in the example above. And now, here's how you can read the content of the file from the bucket: ```yaml main:     params: [input]     steps:     - assignment:         assign:             - bucket: YOUR_BUCKET_NAME_HERE             - name: THE_FILE_NAME_HERE     - read_from_gcs:         call: http.get         args:             url: ${"https://storage.googleapis.com/download/storage/v1/b/" + bucket + "/o/" + name}             auth:                 type: OAuth2             query:                 alt: media         result: data_json_content     - return_content:         return: ${data_json_content.body} ``` This time we change the GCS URL from `upload` to `download`, and we use the `alt=media` query parameter to instruct the GCS JSON API that we want to retrieve the content of the file (not just its metadata). In the end, we return the body of that call, which contains the content. # How to get the project ID in a Java Cloud Function As I was working with my colleague [Sara Ford](https://cloud.google.com/developers/advocates/sara-ford) on testing the Cloud Functions runtimes for the upcoming "second generation" of the product, rebased on the [Cloud Run](https://cloud.run/) platform, I wrote a few simple functions for the Java runtime. In one of those Java functions, I wanted to use Google Cloud Storage, to download a file from a bucket. I took a look at the existing [sample](https://github.com/googleapis/google-cloud-java/blob/main/google-cloud-examples/src/main/java/com/google/cloud/examples/storage/objects/DownloadObject.java) to download an object: ```java Storage storage = StorageOptions.newBuilder()     .setProjectId(projectId)     .build()     .getService(); Blob blob = storage.get(BlobId.of(bucketName, objectName)); blob.downloadTo(Paths.get(destFilePath)); ``` I know the name of the bucket, the name of the file, I'm going to store the file in the local file system. So I have all the information needed... except the project ID within which I deployed my Java cloud function. So how do I get the project ID, in Java, inside the Cloud Functions environment? A previous iteration of Cloud Functions had various useful environment variables available, which included the project ID. So you could retrieve the ID with a `System.getenv()` call. However, for various compatibility reasons between the various runtimes, with the [Knative](https://knative.dev/docs/) open source project, that variable disappeared along the road. However, I know that the project ID is also part of the internal [compute metadata](https://cloud.google.com/appengine/docs/standard/java/accessing-instance-metadata) that is accessible via a special URL: With that knowledge in mind, I thought I could simply make a quick HTTP request to get that information: ```java private String getProjectId() { String projectId = null; HttpURLConnection conn = null; try { URL url = new URL("http://metadata.google.internal/computeMetadata/v1/project/project-id"); conn = (HttpURLConnection)(url.openConnection()); conn.setRequestProperty("Metadata-Flavor", "Google"); projectId = new String(conn.getInputStream().readAllBytes(), StandardCharsets.UTF_8); conn.disconnect(); } catch (Throwable t) {} return projectId; } ``` For the call to work, it is mandatory to set the `Metadata-Flavor` header that you see above. I used Java's built-in `HttpURLConnection` for the job. There are other HTTP libraries that could've made the code simpler, but at first, I didn't want to bring another HTTP client, just for retrieving a simple project meta-information. I'm one of the developers who designed the [Functions Framework for Java](https://github.com/GoogleCloudPlatform/functions-framework-java) that is used to craft cloud functions in Java, however, I've written quite a few functions using Node.js as well. And in the Node ecosystem, there's actually an NPM module whose responsibility is to retrieve such project metadata. With the [gcp-metadata](https://www.npmjs.com/package/gcp-metadata) module, you can require it and then fetch the project ID with: ```javascript const gcpMetadata = require('gcp-metadata'); const projectId = await gcpMetadata.project('project-id'); ``` I was surprised I couldn't easily find an equivalent library in Java. It took me a while to find it, but it actually exists too! That's the [com.google.cloud:google-cloud-core](https://googleapis.dev/java/google-cloud-core/latest/index.html) library! And it's trivial to use: ```java import com.google.cloud.ServiceOptions; String projectId = ServiceOptions.getDefaultProjectId(); ``` An extra dependency in my pom.xml, one import and one static method call on `ServiceOptions`, and I can get the GCP project ID! So I'm now able to pass the project ID to my `StorageOptions` builder. But for some reason, I recalled that at times, in some other projects I had written, I remembered not really needing that project ID information, as the libraries I was using were smart enough to infer such information from the environment. Let's look again at the `StorageOptions` from the beginning. What if I simply omit the `setProjectId()` method call? Lo and behold... indeed, it was actually not required, and the project ID was inferred, transparently. So I didn't really need to search for how to retrieve this project ID at all! And actually, you can further simplify the creation of the StorageOptions down to: ```java Storage storage = StorageOptions .getDefaultInstance() .getService(); ``` At least, now, I know how to retrieve the project ID in Java, in case the libraries or the environment are not providing such details on their own! # Introducing Workflows callbacks With [Workflows](http://cloud.google.com/workflows), developers can easily orchestrate various services together, on Google Cloud or third-party APIs. Workflows [connectors](https://cloud.google.com/blog/topics/developers-practitioners/introducing-new-connectors-workflows) handle long-running operations of Google Cloud services till completion. And Workflow executions can also wait for time to pass with the built-in [`sys.sleep function`](https://cloud.google.com/workflows/docs/reference/stdlib/sys/sleep), till some computation finishes, or some event takes place.  But what if you need some user input or some approval in the middle of the workflow execution, like validating automatic text translation? Or an external system like a fulfillment center or an inventory system that is going to notify that products are back in stock? Instead of using a combination of "sleep" instructions and API polling, now you'll be able to use Workflows callbacks!  With callbacks, the execution of a workflow can wait until it receives a call to a specific callback endpoint. Let's have a look at a concrete example. ## Case study: human validation of automated translation Let's have a look at a concrete example! Machine learning based translations have reached an incredible level of quality, but sometimes, you want a human being to validate the translations produced. Thanks to Workflows callbacks, we can add a human, or an autonomous system, into the loop. To illustrate this case study, the following diagram will show you a possible implementation of the whole process: ![/img/workflows-days/architecture-translation.max-1200x1200.png](/img/workflows-days/architecture-translation.max-1200x1200.png) 1. First, the user visits a translation web page. They fill a textarea with the text they want to translate, and click on the translate button. 2. Clicking on the button will call a Cloud Function that will launch an execution of the workflow. The text to translate is passed as a parameter of the function, and as a parameter of the workflow too. 3. The text is saved in Cloud Firestore, and the Translation API is called with the input text, and will return the translation, which will be stored in Firestore as well. The translation appears on the web page in real-time thanks to the Firebase SDK. 4. A step in the workflow creates a callback endpoint (also saved in Firestore), so that it can be called to validate or reject the automatic translation. When the callback endpoint is saved in Firestore, the web page displays validation and rejection buttons. 5. The workflow now explicitly awaits the callback endpoint to be called. This pauses the workflow execution. 6. The user decides to either validate or reject the translation. When one of the two buttons is clicked, a Cloud Function is called, with the approval status as parameter, which will in turn call the callback endpoint created by the workflow, also passing the approval status. The workflow resumes its execution, and saves the approval in Firestore. And this is the end of our workflow. ## Creating a callback and awaiting incoming calls Two new built-in functions are introduced in the standard Workflows library: - [`events.create_callback_endpoint`](https://cloud.google.com/workflows/docs/reference/stdlib/events/create_callback_endpoint) --- to create and setup the callback endpoint - [`events.await_callback`](https://cloud.google.com/workflows/docs/reference/stdlib/events/await_callback) --- to wait for the callback endpoint to be called With `events.create_callback_endpoint` you specify the HTTP method that should be used for invoking the callback endpoint, and you get a dictionary with the URL of that endpoint that you can pass to other systems. And with `events.await_callback`, you pass the callback endpoint to wait on, pass a timeout defining how long you want to wait, and when the endpoint is called, you get access to the body that was sent to the endpoint. Let's have a look at the [YAML definition](https://github.com/GoogleCloudPlatform/workflows-demos/blob/master/callback-translation/translation-validation.yaml#L73) of our workflow, where we apply those two new functions. First, we're going to create the callback: ```yaml - create_callback: call: events.create_callback_endpoint args: http_callback_method: "POST" result: callback_details ``` The callback endpoint is now ready to receive incoming requests via a `POST HTTP` method, and the details of that endpoint are stored in the `callback_details` dictionary (in particular, the url key will be associated with the URL of the endpoint). Next, we pause the workflow, and await the callback with: ```yaml - await_callback: call: events.await_callback args: callback: ${callback_details} timeout: 3600 result: callback_request ``` The `callback_details` from earlier is passed as argument, as well as a timeout in seconds to wait for the callback to be made. When the call is received, all the details of the request are stored in the `callback_request` dictionary. You then have access to the full HTTP request, including its headers or its body. In case the timeout is reached, a `TimeoutError` is raised and can be caught by a `try /` `except` block. ## Going further and calling us back! If you want to have a closer look at the above example, all the [code for this workflow](https://github.com/GoogleCloudPlatform/workflows-demos/blob/master/callback-translation/translation-validation.yaml) can be found in the [Workflows samples](https://github.com/GoogleCloudPlatform/workflows-demos/) Github repository. And you can follow the details of this [tutorial](https://cloud.google.com/workflows/docs/tutorial-callbacks-firestore) to replicate this workflow in your own project. As this is still a preview feature for now, please be sure to [request access to this feature](https://docs.google.com/forms/d/e/1FAIpQLSdgwrSV8Y4xZv_tvI6X2JEGX1-ty9yizv3_EAOVHWVKXvDLEA/viewform), if you want to try it on your own. For more information on callbacks, be sure to read the [documentation](https://cloud.google.com/workflows/docs/creating-callback-endpoints). To dive deeper into the example above, please checkout the Github [repository](https://github.com/GoogleCloudPlatform/workflows-demos/tree/master/callback-translation) of this translation validation sample. Don't hesitate to let us know via Twitter to [@glaforge](http://twitter.com/glaforge) what you think of this feature, and how you intend on taking advantage of it in your own workflows! [](https://cloud.google.com/blog/topics/developers-practitioners/introducing-new-connectors-workflows) # Skyrocketing Micronaut microservices into Google Cloud Instead of spending too much time on infrastructure, take advantage of readily available serverless solutions. Focus on your [Micronaut](https://micronaut.io/) code, and deploy it rapidly as a function, an application, or within a container, on Google Cloud Platform, with [Cloud Functions](https://cloud.google.com/functions), [App Engine](https://cloud.google.com/appengine), or [Cloud Run](https://cloud.google.com/run). In this presentation, you’ll discover the options you have to deploy your Micronaut applications and services on Google Cloud. With [Micronaut Launch](https://micronaut.io/launch/), it’s easy to get started with a template project, and with a few tweaks, you can then push your code to production. Thanks to its performance, its low memory consumption, and its lightning-fast startup time, Micronaut is particularly well-suited for services that run on serverless solutions. {{< speakerdeck d10a8e3b48244eab95ea9bd1c81547b5 >}} # Introducing New Connectors for Workflows [Workflows](http://cloud.google.com/workflows) is a service to orchestrate not only Google Cloud services, such as Cloud Functions,  Cloud Run, or machine learning APIs, but also external services. As you might expect from an orchestrator, Workflows allows you to define the flow of your business logic, as steps, in a YAML or JSON definition language, and provides an execution API and UI to trigger workflow executions. You can read more about the benefits of Workflows in our [previous article](https://cloud.google.com/blog/topics/developers-practitioners/better-service-orchestration-workflows). We are happy to announce new [connectors](https://cloud.google.com/workflows/docs/connectors) for Workflows, which simplify calling Google Cloud services and APIs.  The first documented connectors offered in preview when Workflows was launched in General Availability were: - Cloud Tasks - Compute Engine - Firestore - Pub/Sub - Secret Manager The newly unveiled connectors are: - BigQuery - Cloud Build - Cloud Functions - Cloud Scheduler - Google Kubernetes Engine - Cloud Natural Language API - Dataflow - Cloud SQL - Cloud Storage - Storage Transfer Service - Cloud Translation - Workflows & Workflow Executions In addition to simplifying Google Cloud service calls (no need to manually tweak the URLs to call) from workflow steps, connectors also handle errors and [retries](https://cloud.google.com/workflows/docs/connectors), so you don't have to do it yourself. Furthermore, they take care of APIs with [long-running operations](https://cloud.google.com/workflows/docs/connectors#long-running_operations), polling the service for a result when it's ready, with a back-off approach, again so you don't have to handle this yourself. Let's take a look at some concrete examples on how connectors help.  ## Creating a Compute Engine VM with a REST API call Imagine you want to create a Compute Engine Virtual Machine (VM) in a specified project and zone. You can do this by crafting an HTTP POST request with the proper URL, body, and OAuth2 authentication using the Compute Engine API's [instances.insert](https://cloud.google.com/compute/docs/reference/rest/v1/instances/insert) method as shown in [create-vm.yaml](https://github.com/GoogleCloudPlatform/workflows-demos/blob/master/connector-compute/create-vm.yaml): ```yaml main: params: [args] steps: - init: assign: - project: ${sys.get_env("GOOGLE_CLOUD_PROJECT_ID")} - zone: "europe-west1-b" - machineType: "e2-small" - instanceName: "${args.instanceName}" - insert_machine: call: http.post args: url: ${"https://compute.googleapis.com/compute/v1/projects/" + project + "/zones/" + zone + "/instances"} auth: type: OAuth2 body: name: ${instanceName} machineType: ${"zones/" + zone + "/machineTypes/" + machineType} disks: - initializeParams: sourceImage: "projects/debian-cloud/global/images/debian-10-buster-v20201112" boot: true autoDelete: true networkInterfaces: - network: "global/networks/default" ``` This works but it is quite error prone to construct the right URL with the right parameters and authentication mechanism. You also need to poll the instance status to make sure it's running before concluding the workflow: ```yaml - get_instance: call: http.get args: url: ${"https://compute.googleapis.com/compute/v1/projects/" + project + "/zones/" + zone + "/instances/" + instanceName} auth: type: OAuth2 result: getInstanceResult - assert_running: switch: - condition: ${getInstanceResult.body.status == "RUNNING"} next: end next: sleep - sleep: call: sys.sleep # Polling through sleep args: seconds: 3 next: get_instance ``` Note that even the HTTP GET call above could fail and it'd be better to wrap the call in a retry logic.  ## Creating a Compute Engine VM with the Workflows compute connector In contrast, let's now create the same VM with the compute connector dedicated to Compute Engine as shows in [create-vm-connector.yaml](https://github.com/GoogleCloudPlatform/workflows-demos/blob/master/connector-compute/create-vm-connector.yaml): ```yaml main: params: [args] steps: - init: assign: - project: ${sys.get_env("GOOGLE_CLOUD_PROJECT_ID")} - zone: "europe-west1-b" - machineType: "e2-small" - instanceName: "${args.instanceName}" - insert_machine: call: googleapis.compute.v1.instances.insert args: project: ${project} zone: ${zone} body: name: ${instanceName} machineType: ${"zones/" + zone + "/machineTypes/" + machineType} disks: - initializeParams: sourceImage: "projects/debian-cloud/global/images/debian-10-buster-v20201112" boot: true autoDelete: true networkInterfaces: - network: "global/networks/default" ``` The overall structure and syntax is pretty similar, but this time, we didn't have to craft the URL ourselves, nor did we have to specify the authentication method. Although it's invisible in this YAML declaration, error handling and retry logic are handled by Workflows directly, unlike the first example where you have to handle it yourself. ## Transparent waiting for long-running operations Some operations from cloud services are not instantaneous and can take a while to execute. A synchronous call to such operations will return immediately with an object that indicates the status of that long-running operation.  From a workflow execution, you might want to call a long-running operation and move to the next step only when that operation has finished. In the standard REST approach, you have to check at regular intervals if the operation has terminated or not. To save you from the tedious work of iterating and waiting, connectors take care of this for you!  Let's illustrate this with another example with Compute Engine. Stopping a VM can take a while. A [request](https://cloud.google.com/compute/docs/reference/rest/v1/instances/stop) to the Compute Engine REST API to stop a VM returns an [object](https://cloud.google.com/compute/docs/reference/rest/v1/instances/stop#response-body) with a status field that indicates whether the operation has completed or not. The Workflows compute connector and its [instances.stop](https://cloud.google.com/compute/docs/reference/rest/v1/instances/stop) operation will appropriately wait for the stop of the VM -- no need for you  to keep checking its status until the VM stops. It greatly simplifies your workflow definition as shown in [create-stop-vm-connector.yaml](https://github.com/GoogleCloudPlatform/workflows-demos/blob/master/connector-compute/create-stop-vm-connector.yaml). ```yaml main: params: [args] steps: - init: assign: - project: ${sys.get_env("GOOGLE_CLOUD_PROJECT_ID")} - zone: "europe-west1-b" - machineType: "e2-small" - instanceName: "${args.instanceName}" ... - stop_machine: call: googleapis.compute.v1.instances.stop args: instance: ${instanceName} project: ${project} zone: ${zone} # Optional connector parameters connector_params: timeout: 100 # total time is 100s polling_policy: # optional polling parameters for LRO polling. initial_delay: 1 multiplier: 1.25 - assert_terminated: call: assert_machine_status args: expected_status: "TERMINATED" project: ${project} zone: ${zone} instanceName: ${instanceName} assert_machine_status: params: [expected_status, project, zone, instanceName] steps: - get_instance: call: googleapis.compute.v1.instances.get args: instance: ${instanceName} project: ${project} zone: ${zone} result: instance - compare: switch: - condition: ${instance.status == expected_status} next: end - fail: raise: ${"Expected VM status is " + expected_status + ". Got " + instance.status + " instead."} ``` Note that we still use the [instances.get](https://cloud.google.com/compute/docs/reference/rest/v1/instances/get) operation in a subworkflow to check that the VM is indeed TERMINATED but this is nice-to-have as [instances.stop](https://cloud.google.com/compute/docs/reference/rest/v1/instances/stop) already waits for the VM to stop before returning. In connector, users can set a timeout field, which is the total wait time for this connector call. All of the retries and polling logic is hidden. Now, compare this to [stop-vm.yaml](https://github.com/GoogleCloudPlatform/workflows-demos/blob/master/connector-compute/stop-vm.yaml) where the workflow stops the VM without the connector. You can see that the YAML is longer and the logic is more complicated with HTTP retry policy for the stop call and also the polling logic to make sure the VM is actually stopped. ## Increased reliability through connector retries Even the best services can have momentary outages due to traffic spikes or network issues. Google Cloud Pub/Sub has an SLA of [99.95](https://uptime.is/99.95), which means no more than 43s of downtime per day on average, or under 22 minutes per month. Of course, most products routinely outperform their SLAs by a healthy margin. What if you want strong assurances your workflow won't fail if products remain within their SLAs? Since Workflows connectors retry operations over a period of several minutes, even if there is an outage of several minutes, the operation will succeed and so will the workflow. ## Let's connect! To learn more about [connectors](https://cloud.google.com/workflows/docs/connectors), have a look at some of our [workflows-samples](https://github.com/GoogleCloudPlatform/workflows-samples/tree/main/src/connectors) repo, which show you how to interact with Compute Engine, Cloud Pub/Sub, Cloud Firestore, and Cloud Tasks. You can find the samples described in this blog post in [workflows-demos/connector-compute](https://github.com/GoogleCloudPlatform/workflows-demos/tree/master/connector-compute). This is the initial set of connectors; there are many more Google Cloud products for which we will be creating dedicated connectors. We'd love to hear your thoughts about which connectors we should prioritize and focus on next (fill this [form](https://forms.gle/HKYn83bhDKWFSDQr7) to tell us). Don't hesitate to let us know via Twitter to [@meteatamel](https://twitter.com/meteatamel) and [@glaforge](http://twitter.com/glaforge)! # Serverless: panacea or not? At [DevDay Belgium](https://www.devday.be/Sessions/Details/94?slug=the-serverless-panacea-or-not), a few months ago, I had the pleasure to give a keynote on the theme of "serverless". Let me share with you this talk today! **The Serverless Panacea... Or Not?** > The term "serverless" has become a trendy buzzword: if you don't have the checkbox ticked, you're not cool anymore. Really? > > Spoiler alert: There may be servers involved in serverless solutions. It's not just about function-as-a-service. And it's actually more complicated than it may seem! > > But first, let's come back to the basics: what is serverless exactly, where does it come from, what are its characteristics? Then, beyond the definition, we'll discuss the challenges, and the risks associated with serverless architectures. Eventually, going further, we'll think about where serverless is heading in the near future. You can find the slides below: {{< speakerdeck cb3b04200d7b4b94a0b22dfd9a401d0a >}} And you can find the video below. Further down, I'll detail each slide of my keynote: {{< youtube id="EkE3UfzVO8o" t="4294s" >}} Today, I'd like to tell you about Serverless. As a Developer Advocate, for Google Cloud, that's products and the topic I'm focusing on. Serverless is a big buzzword of the day, but is it a real panacea or not? Like Obelix, I fell into the magic serverless potion a long time ago... I started playing with Google App Engine Java in 2009, even before it was officially announced by Google at the Google I/O conference. Google team reached out to me, to work together in stealth mode, to ensure that alternative JVM languages would run fine on their upcoming Java flavor for App Engine (I'm the co-founder of the Groovy language) I couldn't imagine then that I'd be starting to work for Google 7 years later. And that I would focus on those serverless solutions! It was still called Platform-as-a-Service, as the term "serverless" wasn't invented yet (although they are pretty similar) And I've been a big fan and big user of App Engine ever since. After this brief personal story with Obelix and Google, let's actually start with a little bit of background and history. This is my version of the story, so don't take it too seriously. At the beginning, humans created the server.  A machine on which you could run various programs and apps. Well, we also created the internet, of course, otherwise we couldn't connect our web apps to our users. If you have a few users, a single server may suffice. But you know how it goes, with the spread of the web and internet, we now have billions of users, and millions of servers. Things kinda got complicated, and we introduced lots of hard concepts around distributed microservices, replicated databases. We even coined theorems, like the CAP theorem, for Consistency, Availability, Partitioning. But you can only pick 2. Humans invented the cloud, in order to avoid dealing with the physical world. But there are still databases, servers, or virtual machines to manage. However, you don't have to get your hands dirty with the ethernet cables, changing the failing hard-drives, upgrading to the latest CPU and RAM. Usually, it's the cloud provider that has folks that wake up in the middle of the night to upgrade those things. You can sleep a little bit better at night, even if your boss may still call you at 3am because your app is misbehaving. To focus on the code, and to avoid the complexity of managing servers, provisioning clusters, configuring networking, fine-tuning databases, even in the cloud... humans came up with the concept of serverless!  Here is a picture of the latest Google datacenter for serverless! Look, no servers! Well, I'm kidding, of course, there are always servers around! Even if the word serverless wasn't there yet, it all started with Platform-as-a-Service, with App Engine and Heroku. The promise was "give us your code or your app, and we'll run it for you". The hardware management aspect was already there. Scaling was also handled by the platform. The pricing as well was proportional to the usage of the resources. You also have BaaS --- Backend as a Service It's pretty similar to PaaS actually. It comes with batteries-included. You focus on the frontend, and all the backend is provided for you. Parse and Firebase are two good examples. Facebook kinda abandoned Parse into open-source land. But Firebase is still around and integrates more and more with the various Google Cloud Platform services. So you can have hosting of static assets, a datastore to save your information, some kind of runtime environment to run your business logic. And tons of other services, for authentication, mobile crash analysis, performance testing, analytics, and more. PaaS, then BaaS, and also FaaS: Functions-as-a-service. With a FaaS solution, your unit of work, of deployment, becomes a small, granular function. This concept was popularized by Amazon Lambda. And often, even still today, people tend to confuse FaaS with Serverless. But FaaS is really just one facet of the Serverless ecosystem. Like PaaS or BaaS. Another interesting facet of serverless is the Container-as-a-Service approach, with containerized workloads. Instead deploying apps or functions, you're deploying containers. Put anything you want inside a container.  That's the approach that Google took with its Cloud Run container service. You don't have the complexity of Kubernetes, but you can run your container easily in the cloud, in a fully managed environment. Right, so I described a bit the history that lead us to serverless and its various facets, and some of the serverless products that are available today, but let's take some time to give a proper definition of what serverless is. For me, Serverless is the easiest way to get an idea to production in a minimal amount of time. As a developer, you work on some code, and then you deploy it. That's it! Really! The term serverless was coined around 2010 by someone called Ken Elkabany, who created the PiCloud computing platform. Compared to Heroku and App Engine which came earlier, and focusing on implementing web stacks in their cloud datacenters, PiCloud was more generic and supported different kinds of workloads, not just serving web requests. The catchy term came from the fact that they were actually selling a service, rather than selling or renting servers, machines, VMs, to their customers. There are 2 ways to think about Serverless: there's the Operational model, and the Programming model. Operational model: - Fully managed,  - Automatic scaling,  - Pay as you go Programming model - Service based,  - Event driven,  - Stateless There's no provisioning of clusters, servers, instances, VMs or anything. It's all handled by the platform, for you. Just give your code, your app, your function. It's a fully managed environment. Security patches are applied automatically.  You remember specter and meltdown recently? They've been mitigated transparently and rapidly for customers by Google Cloud. No wake up call in the night. Your apps will scale automatically, from 0 to 1 instance, from 1 to n instances, and from n down to 1, as well as back to zero. Tracking the CPU load, memory usage, number of incoming requests, with some magic formula, serverless platforms are able to scale up and down your services. Without you having to worry about it. The cloud provider is handling that for you. In terms of pricing, it's a Pay-as-you-go cost model. It goes hand in hand with automatic scaling. If there's no traffic, you pay zero. If there's twice more traffic than usual, you pay proportionately. And if the load goes back to zero, the instances serving your app are decommissioned, and again you pay zero. Now onto the programming model. More and more, we're transitioning from building big monoliths into orchestrating smaller services, or microservices. It has its challenges though, but with smaller services, your teams can develop them more independently, scale them differently, or event deploy them with different life cycles. Since you have some more loosely coupled services, they tend to react to incoming events from your system or from the cloud (for example a notification of a new file in cloud storage, a new line in a reactive datastore like Cloud Firestore, a message in a message bus like Pub/Sub),  Your services usually communicate asynchronously, to stay properly decoupled. But the more asynchronous you are, the harder things are to operate and monitor, when business logic spans several services, you have to figure out what's the current status of that workflow across those services. Another important aspect of the fact that services can scale up and down and back to zero is that there's no guarantee that you're going to hit the same server all the time.  So you can't be certain that some data that would be cached is still there. You have to program defensively to ensure that any fresh instance of your app is able to cope with any incoming request. State is pretty much an enemy of scaling. So the more stateless you can be, the better it is. - Compute,  - Data Analytics - ML & AI - Database & Storage - Smart assistants & chat - DevOps - Messaging We've been speaking about serverless compute, but serverless is not just about compute. You could consider that anything that is fully managed, that offers a pay-as-you-go cost model, that is a service in the cloud, then it's also all serverless, since you don't have to worry about the infrastructure and the scaling. There are great examples of this in Google Cloud, for example BigQuery which is a fully-managed, serverless data warehouse and analytics platform. You pay proportionally to the amount of data your queries are going through! Not for the storage, not for the running servers, etc.  But let's get back to serverless compute. Serverless sounds pretty cool, right? But there are also challenges, compared to running a good old monolith on your on-premises server. We've already given some hints of some of the challenges. In particular, I'd like to spend some time to tell you about four key aspects: - The lock-in factor - The infamous cold starts - Cost controls - And the mess of spaghetti microservices PaaS or BaaS often come with batteries-included. They have built-in APIs or databases, which are super convenient for developers. As a developer, you are much more productive, because you don't have to wire things up, configure third-party services. The choice is already made for you. But I'm seeing those batteries more and more being externalized, as their own standalone products.  Google Cloud has externalized things like its NoSQL database, its Pub/Sub message hub, its scheduler, its task handling capabilities. Before those services were actually part of the Platform-as-a-Service. However great having built-in powerful batteries is, often these are proprietary and specific to that platform. You end up being locked to the platform you're building upon. It can be a choice, as long as you are aware of it. It's a trade-off between portability and time-to-market. You might still be tied to those products, but at least, you can still move the business logic around, if those services are externalized.  And you can create a level of indirection to be able, some day, potentially, to move away from those service dependencies if needed. A common issue you hear about in serverless-land is the infamous "Cold Start". Since you can scale to zero, it means there's currently no server, instance, or clone, to serve an incoming request. So what happens? The cloud provider has to reinitialize, re-instantiate, re-hydrate some kind of server, VM, or container. Additionally, the underlying language runtime has to startup as well, initializing its internal data structures. Not only that, but your app also needs to get started too. So you'd better try to minimize the time your apps need to be ready to serve their first request. Since you have control over this part. There are workarounds, like pinging your service at regular intervals to keep it warm, but it's a bit of a hack, or even an anti-pattern. Depending on the pricing, that might mean you're paying for nothing potentially, for something that's sitting idle. Some platforms provide some knobs that you can play with like "min instances" or "provisioned instances", usually at a lower price. For instance, on Google Cloud Functions or Cloud Run, you can specify that you want a certain minimum number of instances that are already warm, and ready to serve, and that are cheaper. I mention a minimum number of instances, but what about the notion of maximum number of instances? It's actually an important idea.  With a platform that auto-scales transparently, that can spin up as many instances to serve increased traffic, it also means that your costs are going to increase just as much!  So in order to bound your budget to a known quantity, rather than burning your money with all your hot instances, you can cap the number of instances that will serve your content. The service may be a bit degraded when you reach that limit, as latency will likely increase, but at least, your budget doesn't go through the roof! That's why Google Cloud Platform introduced that notion of capping the number of instances running your functions, apps or containers in its serverless product: to have more visibility and control around costs. The last challenge I'd like to mention is spaghetti services. It's so easy to write many functions and services on a serverless platform. One service does one thing and does it well, right? But after a while, you end up with a spaghetti of microservices. A big mess. It becomes very complicated to see what invokes what.  Hard for monitoring and observability to really figure out what happened, when one microservice starting somehow to misbehave completely ruin your clockwork architecture. And you know: monoliths aren't that bad, actually. Don't start right away with writing the smallest unit of work possible.  Pay attention to how you split the big monolith into microservices. Otherwise, you'll end up with that big plate of spaghetti.  There are good articles on when and how to split monolith, but it's not a simple rule of thumb answer. So what does the future hold for serverless? I believe that the highlights will be about: - Openness - Containers - Glue - Edge - Machine Learning Let's start with open, and openness. That's open like in open source! We want to avoid lock-in. We want portability. For instance, the platforms rely on open source software for sure, but the platforms themselves can be open source too. If you look at Google's Cloud Run, it's actually based on the Knative open source Kubernetes platform. So you're not locked in Google Cloud when you're using Cloud Run. You can move your workload, your app, on a Knative compatible platform from another cloud provider, or even on-premises, on your own infrastructure. I worked on the Java Cloud Functions runtime, and it is also available as open source. So you can deploy your functions in Google Cloud, but you can also run your functions elsewhere too, in a hybrid cloud scenario, or even just locally on your machine for greater developer experience and a tighter development feedback loop. Also, the way you build your services from sources, this can be made more open too.  For instance, Heroku and Google Cloud partnered together on Cloud Native Buildpacks, it helps you transform your application source code into images that can run on any cloud. Really, it's all about portability and avoiding lock-in, by making things as open as possible. As I'm mentioning Cloud Native Buildpacks, and the fact it builds portable containers for your app's source code, notice that we're speaking of containers. Why containers, you may ask.  With things like platform or function as a service, you are pushing apps on the platform runtime. But you may be limited in terms of language runtime, or library, or binary that you can run there or bundle. If you're using an esoteric language, or need some special software installed, perhaps you won't be able to run your app there. Instead, if you could put everything you need in a box, and if the cloud could just run that box for you, then you can do pretty much anything. That's why we're using containers more and more. And that's also why Google Cloud released Cloud Run, to run your containers, but serverlessly, with any runtime, library or language that you want, without limitations. So I'm seeing more containers in the future. You remember my plate of spaghettis?  To orchestrate your services, to observe and monitor them, to track that they are communicating properly, asynchronously, you'll need more tools to ensure that it all runs fine in the cloud. That's why I'm seeing more tools like Google Cloud Tasks, Cloud Scheduler, Cloud Workflows, and in the Azure and AWS worlds, you have things like Logic App or Step Functions. You also have various messaging busses, like Google Cloud Pub/Sub, Amazon SQS, Azure Service Bus. And in the Kubernetes world, we've seen service meshes emerge as a key architectural pattern. A monolith is much simpler to develop & operate, but as you move to a microservice architecture, those glue services will be needed more and more. So I see more glue in the future! Recently, CloudFlare released a product called CloudFlare Workers. It's using the V8 JavaScript engine, and its isolates concept to run your functions, in a sandboxed manner. There are two very interesting aspects to me in this new product. First of all, that's the idea of having your serverless functions run at the edge of the network.  Not deep in a handful of big data centers. Instead, those functions are as close to the users as possible. So the latency is really minimal. Secondly, to further reduce latency, there's a great innovation that almost completely eliminates cold starts! CloudFlare actually starts warming up your function as soon as the SSL handshake is requested when you invoke the function via HTTPS, although normally the whole handshake operation has to be done first, and the call routed to your function, before really starting. So that's a really great optimization! And we'll probably see more stuff moving to the edge of the cloud infrastructure. Lastly, looking even further in the future, I'm curious to see how machine learning will play a role in the serverless offering of cloud providers. In particular, you still have to specify a VM or instance size, its memory or CPU. Some would say it's not very serverless, since servers are supposed to be abstracted away. In Google Cloud, for example, we have what we call a "clone scheduler" that is responsible for creating a new instance of your function or app, depending on various factors, like CPU usage, memory usage, number of incoming queries, etc.  There's some magical calculation that figures out how and when to spin up a new instance. Google recently automated its datacenter thanks to Machine Learning, reducing its power consumption by 40%! (Power Usage Efficiency) I can imagine a future where Machine Learning is used to further upsize or downsize the underlying machines running your serverless code, and provision the right amount of resources, to reduce latency, CPU usage, etc. So let's see what the future holds for Serverless! # Orchestrating the Pic-a-Daily serverless app with workflows Over the past year, we ([Mete](https://twitter.com/meteatamel) and [Guillaume](https://twitter.com/glaforge)) have developed a picture sharing application, named Pic-a-Daily, to showcase Google Cloud serverless technologies such as Cloud Functions, App Engine, and Cloud Run. Into the mix, we've thrown a pinch of Pub/Sub for interservice communication, a zest of Firestore for storing picture metadata, and a touch of machine learning for a little bit of magic. ![/img/picadailly-workflows/1_Shqfx7L.max-1400x1400.png](https://storage.googleapis.com/gweb-cloudblog-publish/images/1_Shqfx7L.max-1400x1400.png) We also created a [hands-on workshop](https://codelabs.developers.google.com/serverless-workshop/) to build the application, and [slides](https://speakerdeck.com/meteatamel/pic-a-daily-serverless-workshop) with explanations of the technologies used. The workshop consists of codelabs that you can complete at your own pace. All the code is open source and available in a [GitHub repository](https://github.com/GoogleCloudPlatform/serverless-photosharing-workshop).  ## Initial event-driven architecture The Pic-a-Daily application evolved progressively. As new services were added over time, a loosely-coupled, event-driven architecture naturally emerged, as shown in this architecture diagram: ![/img/picadailly-workflows/2_XMbrwvr.max-1100x1100.png](https://storage.googleapis.com/gweb-cloudblog-publish/images/2_XMbrwvr.max-1100x1100.png) To recap the event-driven flow: 1. Users upload pictures on an App Engine web frontend. Those pictures are stored in a Google Cloud Storage bucket, which triggers file creation and deletion events, propagated through mechanisms such as Pub/Sub and Eventarc.  2. A Cloud Function (Image analysis) reacts to file creation events. It calls the Vision API to assign labels to the picture, identify the dominant colors, and check if it's a picture safe to show publicly. All this picture metadata is stored in Cloud Firestore.  3. A Cloud Run service (Thumbnail service) also responds to file creation events. It generates thumbnails of the high-resolution images and stores them in another bucket.  4. On a regular schedule triggered by Cloud Scheduler, another Cloud Run service (Collage services) creates a collage from thumbnails of the four most recent pictures.  5. Last but not least, a third Cloud Run service (Image garbage collector) responds to file deletion events received through [(recently generally available) Eventarc](https://cloud.google.com/blog/products/serverless/eventarc-is-ga). When a high-resolution image is deleted from the pictures bucket, this service deletes the thumbnail and the Firestore metadata of the image. These services are loosely coupled and take care of their own logic, in a smooth choreography of events. They can be scaled independently. There's no single point of failure, since services can continue to operate even if others have failed. Event-based systems can be extended beyond the current domain at play by plugging in other events and services to respond to them. However, monitoring such a system in its entirety usually becomes complicated, as there's no centralized place to see where we're at in the current business process that spans all the services. Speaking of business processes, it's harder to capture and make sense of the flow of events and the interplay between services. Since there's no global vision of the processes, how do we know if a particular process or transaction is successful or not? And when failures occur, how do we deal properly and explicitly with errors, retries, or timeouts? As we kept adding more services, we started losing sight of the underlying "business flow". It became harder to isolate and debug problems when something failed in the system. That's why we decided to investigate an orchestrated approach. ## Orchestration with Workflows [Workflows](https://cloud.google.com/workflows) recently became generally available. It offered us a great opportunity to re-architect our application and use an orchestration approach, instead of a completely event-driven one. In orchestration, instead of microservices responding to events, there is an external service, such as Workflows, calling microservices in a predefined order.  After some restructuring, the following architecture emerged with Workflows: ![/img/picadailly-workflows/3_temY387.max-1000x1000.png](https://storage.googleapis.com/gweb-cloudblog-publish/images/3_temY387.max-1000x1000.png) Let's recap the orchestrated approach: - App Engine is still the same web frontend that accepts pictures from our users and stores them in the Cloud Storage bucket.  - The file storage events trigger two functions, one for the creation of new pictures and one for the deletion of existing pictures. Both functions create a workflow execution. For file creation, the workflow directly makes the call to the Vision API (declaratively instead of via Cloud Function code) and stores picture metadata in Firestore via its REST API.  - In between, there's a function to transform the useful information of the Vision API into a document to be stored in Firestore. Our initial image analysis function has been simplified: The workflow makes the REST API calls and only the data transformation part remains.  - If the picture is safe to display, the workflow saves the information in Firestore, otherwise, that's the end of the workflow.  - This branch of the workflow ends with calls to Thumbnail and Collage Cloud Run services. This is similar to before, but with no Pub/Sub or Cloud Scheduler to set up.  - The other branch of the workflow is for the picture garbage collection. The service itself was completely removed, as it mainly contained API calls without any business logic. Instead, the workflow makes these calls.  There is now a central [workflows.yaml](https://github.com/GoogleCloudPlatform/serverless-photosharing-workshop/blob/master/workflows/workflows.yaml) file capturing the business flow. You can also see a visualization of the flow in Cloud Console: ![/img/picadailly-workflows/4_zpsaVq7.max-1500x1500.png](https://storage.googleapis.com/gweb-cloudblog-publish/images/4_zpsaVq7.max-1500x1500.png) The Workflows UI shows which executions failed, at which step, so we can see which one had an issue without having to dive through heaps of logs to correlate each service invocation. Workflows also ensures that each service call completes properly, and it can apply global error and retry policies. With orchestration, the business flows are captured more centrally and explicitly, and can even be version controlled. Each step of a workflow can be monitored, and errors, retries, and timeouts can be laid out clearly in the workflow definition. When using Cloud Workflows in particular, services can be called directly via REST, instead of relying on events on Pub/Sub topics. Furthermore, all the services involved in those processes can remain independent, without knowledge of what other services are doing. Of course, there are downsides as well. If you add an orchestrator into the picture, you have one more component to worry about, and it could become the single point of failure of your architecture (fortunately, Google Cloud products come with SLAs!). Last, we should mention that relying on REST endpoints might potentially increase coupling, with a heavier reliance on strong payload schemas vs lighter events formats. ## Lessons learned Working with Workflows was refreshing in a number of ways and offered us some lessons that are worth sharing.  ### Better visibility It is great to have a high-level overview of the underlying business logic, clearly laid out in the form of a [YAML declaration](https://github.com/GoogleCloudPlatform/serverless-photosharing-workshop/blob/master/workflows/workflows.yaml). Having visibility into each workflow execution was useful, as it enabled us to clearly understand what worked in each execution, without having to dive into the logs to correlate the various individual service executions. ### Simpler code In the original event-driven architecture, we had to deal with three types of events: 1. Cloud Functions' direct integration with Cloud Storage events 2. HTTP wrapped Pub/Sub messages with Cloud Storage events for Cloud Run 3. Eventarc's CloudEvents based Cloud Storage events for Cloud Run As a result, the code had to cater to each flavor of events: ```javascript // Cloud Functions provides the event directly exports.vision_analysis = async (event, context) => { const filename = event.name; const filebucket = event.bucket; ... // Cloud Run encodes the GCS event in Base64 in a Pub/Sub message // and wraps the message in an HTTP request app.post('/', async (req, res) => { const pubSubMessage = req.body; const eventType = pubSubMessage.message.attributes.eventType; const fileEvent = JSON.parse( Buffer.from(pubSubMessage.message.data, 'base64') .toString().trim()); ... ​ // Eventarc encodes events with CloudEvents // and Cloud Run wraps it in an HTTP request app.post('/', async (req, res) => { const cloudEvent = HTTP.toEvent({ headers: req.headers, body: req.body }); const tokens = logEntryData.protoPayload.resourceName.split('/'); const bucket = tokens[3]; const objectName = tokens[5]; ... ``` In the orchestrated version, there's only a simple REST call and HTTP POST body to parse: ```javascript // Workflows calls services directly, // No events to unwrap app.post('/', async (req, res) => { // gs://picture-bucket/image.jpg const gcsImageUri = req.body.gcsImageUri; const tokens = gcsImageUri.substr(5).split('/'); const fileEvent = { bucket: tokens[0], name: tokens[1] }; ``` ### Less code Moving REST calls into the workflow definition as a declaration (with straightforward authentication) enabled us to eliminate quite a bit of code in our services; one service was trimmed down into a simple data transformation function, and another service completely disappeared! Two functions for triggering two paths in the workflow were needed though, but with a future integration with Eventarc, they may not be required anymore.  ### Less setup In the original event-driven architecture, we had to create Pub/Sub topics, and set up Cloud Scheduler and Eventarc to wire-up services. With Workflows, all of this setup is gone. [Workflows.yaml](https://github.com/GoogleCloudPlatform/serverless-photosharing-workshop/blob/master/workflows/workflows.yaml) is the single source of setup needed for the business flow.  ### Error handling Error handling was also simplified in a couple of ways. First, the whole flow stops when an error occurs, so we were no longer in the dark about exactly which services succeeded and which failed in our chain of calls. Second, we now have the option of applying global error and retry policies.  ### Learning curve Now, everything is not always perfect! We had to learn a new service, with its quirks and limited documentation --- it's still early, of course, and the documentation will improve over time with feedback from our customers. ### Code vs. YAML  As we were redesigning the architecture, an interesting question came up over and over: "Should we do this in code in a service or should we let Workflows make this call from the YAML definition?" In Workflows, more of the logic lands in the workflow definition file in YAML, rather than code in a service. Code is usually easier to write, test, and debug than YAML, but it also requires more setup and maintenance than a step definition in Workflows. If it's boilerplate code that simply makes a call to some API, that should be turned into YAML declarations. However, if the code also has extra logic, then it's probably better to leave it in code, as YAML is less testable. Although there is some level of error reporting in the Workflows UI, it's not a full-fledged IDE that helps you along the way. Even when working in your IDE on your development machine, you'll have limited help from the IDE, as it only checks for valid YAML syntax. ### Loss of flexibility The last aspect we'd like to mention is perhaps a loss of flexibility. Working with a loosely-coupled set of microservices that communicate via events is fairly extensible, compared to a more rigid solution that mandates a strict definition of the business process descriptions. ### Choreography or orchestration? Both approaches are totally valid, and each has its pros and cons. We mentioned this topic when [introducing Workflows](https://cloud.google.com/blog/topics/developers-practitioners/better-service-orchestration-workflows). When should you choose one approach over the other? Choreography can be a better fit if services are not closely related, or if they can exist in different bounded contexts. Whereas orchestration might be best if you can describe the business logic of your application as a clear flow chart, which can then directly be described in a workflow definition.  ## Next steps To go further, we invite you to have a closer look at [Workflows](http://cloud.google.com/workflows), and its supported [features](https://cloud.google.com/workflows#all-features), by looking at the [documentation](https://cloud.google.com/workflows/docs/overview), particularly the [reference documentation](https://cloud.google.com/workflows/docs/reference/syntax) and the [examples](https://cloud.google.com/workflows/docs/sample-workflows?hl=en). We also have a series of short articles that cover Workflows, with various [tips and tricks](https://glaforge.appspot.com/category/Google%20Cloud%20Platform), as well as introductions to Workflows, with a [first look at Workflows](https://atamel.dev/posts/2020/09-08_first_look_at_workflows/) and some thoughts on [choreography vs orchestration](https://glaforge.appspot.com/article/orchestrating-microservices-with-cloud-workflows). If you want to study a concrete use case, with an event-based architecture and an equivalent orchestrated approach, feel free to look into our [Serverless Workshop](https://g.co/codelabs/serverless-workshop). It offers codelabs spanning Cloud Functions, Cloud Run, App Engine, Eventarc, and Workflows. In particular, lab 6 is the one in which we converted the event-based model into an orchestration with Workflows. All the code is also available as [open source on GitHub](https://github.com/GoogleCloudPlatform/serverless-photosharing-workshop).\ We look forward to hearing from you about your workflow experiments and needs. Feel free to reach out to us on Twitter at [@glaforge](https://twitter.com/glaforge) and [@meteatamel](https://twitter.com/meteatamel). [](https://cloud.google.com/blog/products/application-development/get-to-know-google-cloud-workflows) # Day 15 with Workflows — Built-in Cloud Logging function In the two previous episodes, we saw how to [create and call subworkflows]({{< ref "/posts/2021/02/03/day-14-with-workflows-subworkflows" >}}), and we applied this technique to making a reusable routine for logging with Cloud Logging. However, there's already a built-in function for that purpose! So let's have a look at this integration. {{< youtube V2hQekDwdRM >}} To call the built-in logging function, just create a new step, and make a call to the `sys.log` function: ```yaml - logString:     call: sys.log     args:         text: Hello Cloud Logging!         severity: INFO ``` This function takes a mandatory parameter: text. And an optional one: severity. The text parameter accepts all types of supported values, so it's not only string, but all kinds of numbers, as well as arrays and dictionaries. Their string representation will be used as text. The optional severity parameter is an enum that can take the values: `DEFAULT`, `DEBUG`, `INFO`, `NOTICE`, `WARNING`, `ERROR`, `CRITICAL`, `ALERT`, `EMERGENCY`, with `DEFAULT` being... the default value if you don't specify a severity! Here's another example with a dictionary as parameter, which will be output as text in the logs, and a severity of `WARNING`: ```yaml - createDict:     assign:         - person:             name: Guillaume             kids: 2 - logDict:     call: sys.log     args:         text: ${person}         severity: WARNING ``` Looking at the results in the cloud logging console, you will see both messages appear: ![](/img/workflows-days/w15-builtin-log.png) Don't hesitate to have a look at [reference documentation](https://cloud.google.com/workflows/docs/reference/stdlib/sys/log) to find more about the available built-in functions. # Day 14 with Workflows — Subworkflows Workflows are made of sequences of steps and branches. Sometimes, some particular sequence of steps can be repeated, and it would be a good idea to avoid error-prone repetitions in your workflow definition (in particular if you change in one place, and forget to change in another place). You can modularize your definition by creating subworkflows, a bit like subroutines or functions in programming languages. For example, yesterday, we had a look at [how to log to Cloud Logging]({{< ref "/posts/2021/02/02/day-13-with-workflows-logging-with-cloud-logging" >}}): if you want to log in several places in your workflow, you can extract that routine in a subworkflow. Let's see that in action in the video below, and you can read all the explanations afterwards: {{< youtube tbiFaO_LOdg >}} First things first, let's step back and look at the structure of workflow definitions. You write a series of steps, directly in the main YAML file. You can move back and forth between steps thanks to [jumps]({{< ref "/posts/2020/12/04/day-4-with-workflows-jumping-with-switch-conditions" >}}), but it wouldn't be convenient to use jumps to emulate subroutines (remember the good old days of BASIC and its gotos?). Instead, Cloud Workflows allows you to separate steps under a `main`, and subroutines under their own subroutine name. So far we had just a sequence of steps: ```yaml - stepOne: ... - stepTwo: ... - stepThree: ... ``` Those steps are implicitly under a `main` routine. And here's how to show this main routine explicitly, by having that `main` block, and `steps` underneath: ```yaml main:     steps:         - stepOne: ...         - stepTwo: ...         - stepThree: ... ``` To create a subworkflow, we follow the same structure, but with a different name than `main`, but you can also pass parameters like so: ```yaml subWorkflow:     params: [param1, param2, param3: "default value"]     steps:         - stepOne: ...         - stepTwo: ...         - stepThree: ... ``` Notice that you can pass several parameters, and that parameters can have default values when that parameter is not provided at the call site. Then in your main flow, you can call that subworkflow with a `call` instruction. Let's take a look at a concrete example, that simply concatenates two strings: ```yaml main:     steps:         - greet:             call: greet             args:                 greeting: "Hello"                 name: "Guillaume"             result: concatenation         - returning:             return: ${concatenation} greet:     params: [greeting, name: "World"]     steps:         - append:             return: ${greeting + ", " + name + "!"} ``` In the `call` instruction, we pass the `greeting` and `name` arguments, and the result will contain the output of the subworkflow call. In the subworkflow, we defined our parameters, and we have a single step just return an expression which is the desired greeting message concatenation. One last example, but perhaps more useful than concatenating strings! Let's turn yesterday's Cloud Logging integration into a reusable subworkflow. That way, you'll be able to call the log subworkflow as many times as needed in your main workflow definition, without repeating yourself: ```yaml main:   steps:     - first_log_msg:         call: logMessage         args:           msg: "First message"     - second_log_msg:         call: logMessage         args:           msg: "Second message" logMessage:   params: [msg]   steps:     - log:         call: http.post         args:             url: https://logging.googleapis.com/v2/entries:write             auth:                 type: OAuth2             body:                 entries:                     - logName: ${"projects/" + sys.get_env("GOOGLE_CLOUD_PROJECT_ID") + "/logs/workflow_logger"}                       resource:                         type: "audited_resource"                         labels: {}                       textPayload: ${msg} ``` And voila! We called our `logMessage` subworkflow twice in our main workflow, just passing the text message to log into Cloud Logging. # Day 13 with Workflows — Logging with Cloud Logging Time to come back to our series on Cloud Workflows. Sometimes, for debugging purposes or for auditing, it is useful to be able to log some information via Cloud Logging. As we saw last month, you can [call HTTP endpoints]({{< ref "/posts/2020/12/15/day-8-with-workflows-calling-an-HTTP-endpoint" >}}) from your workflow. We can actually use [Cloud Logging's REST API](https://cloud.google.com/logging/docs/reference/v2/rest/v2/entries/write) to log such messages! Let's see that in action. ```yaml - log:     call: http.post     args:         url: https://logging.googleapis.com/v2/entries:write         auth:             type: OAuth2         body:             entries:                 - logName: ${"projects/" + sys.get_env("GOOGLE_CLOUD_PROJECT_ID") + "/logs/workflow_logger"}                   resource:                     type: "audited_resource"                     labels: {}                   textPayload: Hello World from Cloud Workflows! ``` We call the API endpoint to write new logging entries. We authenticate via OAuth2---as long as the service account used for the workflow execution allows it to use the logging API. Then we pass a JSON structure as the body of the call, indicating the name of the logger to use, which resources it applies to, and also the textPayload containing our text message. You could also use a ${} expression to log more complex values. Once this workflow definition is done and deployed, you can execute it, and you should see in the logs your message appear: ![](/img/workflows-days/w13-cloud-logging.png) Voila! You can log messages to Cloud Logging! Let's recap in this video: {{< youtube XwzSgBB6Kq4 >}} In the next episode, we'll take advantage of subworkflows, to create a reusable set of steps that you will be able to call several times throughout your workflow definition, without repeating yourself, by turning this logging example into a subworkflow. # Day 12 with Workflows — Loops and iterations In previous episodes of this Cloud Workflows series, we've learned about [variable assignment]({{< ref "/posts/2020/12/03/day-3-with-workflows-variable-assignment-and-expressions" >}}), data structures like [arrays]({{< ref "/posts/2020/12/09/day-6-with-workflows-arrays-and-dictionaries" >}}), [jumps]({{< ref "/posts/2020/12/02/day-2-with-workflows-a-workflow-is-made-of-steps-and-jumps" >}}) and [switch conditions]({{< ref "/posts/2020/12/04/day-4-with-workflows-jumping-with-switch-conditions" >}}) to move between steps, and [expressions]({{< ref "/posts/2020/12/03/day-3-with-workflows-variable-assignment-and-expressions" >}}) to do some computations, including potentially some built-in functions. With all these previous learnings, we are now equipped with all the tools to let us create loops and iterations, like for example, iterating over the element of an array, perhaps to call an API several times but with different arguments. So let's see how to create such an iteration! {{< youtube OXhV2cuKwo >}} First of all, let's prepare some variable assignments: ```yaml - define:     assign:         - array: ['Google', 'Cloud', 'Workflows']         - result: ""         - i: 0 ``` - The `array` variable will hold the values we'll be iterating over. - The `result` variable contains a string to which we'll append each values from the array. - And the `i` variable is an index, to know our position in the array. Next, like in a for loop of programming languages, we need to prepare a condition for the loop to finish. We'll do that in a dedicated step: ```yaml - checkCondition:     switch:         - condition: ${i < len(array)}           next: iterate     next: returnResult ``` We define a `switch`, with a `condition` expression that compares the current index position with the length of the array, using the built-in `len()` function. If the condition is true, we'll go to an `iterate` step. If it's false, we'll go to the ending step (called `returnResult` here). Let's tackle the iteration body itself. Here, it's quite simple, as we're just assigning new values to the variables: we append the i-th element of the array into the result variable, and we increment the index by one. Then we go back to the `checkCondition` step. ```yaml - iterate:     assign:         - result: ${result + array[i] + " "}         - i: ${i+1}     next: checkCondition ``` Note that if we were doing something more convoluted, for example calling an HTTP endpoint with the element of the array as argument, we would need two steps: one for the actual HTTP endpoint call, and one for incrementing the index value. However in the example above, we're only assigning variables, so we did the whole body of the iteration in this simple assignment step. When going through the `checkCondition` step, if the condition is not met (ie. we've reached the end of the array), then we're redirected to the `returnResult` step: ```yaml - returnResult:     return: ${result} ``` This final step simply returns the value of the `result` variable. # Day 11 with Workflows — Sleeping in a workflow Workflows are not necessarily instantaneous, and executions can span over a long period of time. Some steps may potentially launch asynchronous operations, which might take seconds or minutes to finish, but you are not notified when the process is over. So when you want for something to finish, for example before polling again to check the status of the async operation, you can introduce a sleep operation in your workflows. {{< youtube uaW_Cv3RCxQ >}} To introduce a [sleep operation](https://cloud.google.com/workflows/docs/reference/syntax), add a step in the workflow with a call to the built-in sleep operation: ```yaml - someSleep:     call: sys.sleep     args:         seconds: 10 - returnOutput:     return: We waited for 10 seconds! ``` A `sys.sleep` operation takes a `seconds` argument, where you can specify the number of seconds to wait. By combining conditional jumps and sleep operations, you can easily implement polling some resource or API at a regular interval, to double check that it completed. # Day 10 with Workflows — Accessing built-in environment variables [Google Cloud Workflows](https://cloud.google.com/workflows) offers a few built-in environment variables that are accessible from your workflow executions. {{< youtube sJQWbo49pWg >}} There are currently [5 environment variables](https://cloud.google.com/workflows/docs/reference/environment-variables) that are defined: - `GOOGLE_CLOUD_PROJECT_NUMBER`: The workflow project's number. - `GOOGLE_CLOUD_PROJECT_ID`: The workflow project's identifier. - `GOOGLE_CLOUD_LOCATION`: The workflow's location. - `GOOGLE_CLOUD_WORKFLOW_ID`: The workflow's identifier. - `GOOGLE_CLOUD_WORKFLOW_REVISION_ID`: The workflow's revision identifier. Let's see how to access them from our workflow definition: ```yaml - envVars:     assign:       - projectID: ${sys.get_env("GOOGLE_CLOUD_PROJECT_ID")}       - projectNum: ${sys.get_env("GOOGLE_CLOUD_PROJECT_NUMBER")}       - projectLocation: ${sys.get_env("GOOGLE_CLOUD_LOCATION")}       - workflowID: ${sys.get_env("GOOGLE_CLOUD_WORKFLOW_ID")}       - workflowRev: ${sys.get_env("GOOGLE_CLOUD_WORKFLOW_REVISION_ID")} - output:     return: ${projectID + " " + projectNum + " " + projectLocation + " " + workflowID + " " + workflowRev} ``` We use the built-in `sys.get_env()` function to access those variables. We'll revisit the various existing built-in functions in later episodes. Then when you execute this workflow, you'll get an output like this: ``` "workflows-days 783331365595 europe-west4 w10-builtin-env-vars 000001-3af" ``` There's one variable I'd like to see added to this list, that would be the current execution ID. That could potentially be useful for identifying a particular execution, when looking in the logs, to reason about potential failure, or for auditing purposes. # Day 9 with Workflows — Deploying and executing Workflows from the command-line So far, in this series on [Cloud Workflows](https://cloud.google.com/workflows), we've only used the Google Cloud Console UI to manage our workflow definitions, and their executions. But it's also possible to deploy new definitions and update existing ones from the command-line, using the [GCloud SDK](https://cloud.google.com/sdk/). Let's see how to do that! {{< youtube rVTPRUuidPI >}} If you don't already have an existing service account, you should create one following these [instructions](https://cloud.google.com/workflows/docs/creating-updating-workflow#gcloud). I'm going to use the workflow-sa service account I created for the purpose of this demonstration. Our workflow definition is a simple "hello world" like the one we created for [day #1]({{< ref "/posts/2020/12/01/day-1-with-workflows-your-first-step-to-hello-world" >}}) of our exploration of Google Cloud Workflows: ```yaml - hello:     return: Hello from gcloud! ``` To deploy this workflow definition, we'll launch the following gcloud command, specifying the name of our workflow, passing the local source definition, and the service account: ```bash $ gcloud beta workflows deploy w09-new-workflow-from-cli \     --source=w09-hello-from-gcloud.yaml \     --service-account=workflow-sa@workflows-days.iam.gserviceaccount.com ``` You can also add labels with the `--labels` flag, and a description with the `--description` flag, just like in the Google Cloud Console UI. If you want to update the workflow definition, this is also the same command to invoke, passing the new version of your definition file. Time to create an execution of our workflow! ```bash $ gcloud beta workflows run w09-new-workflow-from-cli ``` You will see an output similar to this: ```yaml Waiting for execution [d4a3f4d4-db45-48dc-9c02-d25a05b0e0ed] to complete...done. argument: 'null' endTime: '2020-12-16T11:32:25.663937037Z' name: projects/783331365595/locations/us-central1/workflows/w09-new-workflow-from-cli/executions/d4a3f4d4-db45-48dc-9c02-d25a05b0e0ed result: '"Hello from gcloud!"' startTime: '2020-12-16T11:32:25.526194298Z' state: SUCCEEDED workflowRevisionId: 000001-47f ``` Our workflow being very simple, it executed and completed right away, hence why you see the result string (our Hello from gcloud! message), as well as the state as SUCCEEDED. However, workflows often take longer to execute, consisting of many steps. If the workflow hasn't yet completed, you'll see its status as `ACTIVE` instead, or potentially `FAILED` if something went wrong. When the workflow takes a long time to complete, you can check the status of the last execution from your shell session with: ```bash $ gcloud beta workflows executions describe-last ``` If you want to know about the ongoing workflow executions: ```bash $ gcloud beta workflows executions list your-workflow-name ``` It'll give you a list of operation IDs for those ongoing executions. You can then inspect a particular one with: ```bash $ gcloud beta workflows executions describe the-operation-id ``` There are other operations on executions, to wait for an execution to finish, or even cancel an ongoing, unfinished execution. You can learn more about workflow execution in the [documentation](https://cloud.google.com/workflows/docs/executing-workflow). And in some upcoming episodes, we'll also have a look at how to create workflow executions from client libraries, and from the Cloud Workflows REST API. # Day 8 with Workflows — Calling an HTTP endpoint Time to do something pretty handy: calling an HTTP endpoint, from your Google Cloud Workflows definitions. Whether calling GCP specific APIs such as the ML APIs, REST APIs of other products like Cloud Firestore, or when calling your own services, third-party, external APIs, this capability lets you plug your business processes to the external world! Let's see calling HTTP endpoints in action in the following video, before diving into the details below: {{< youtube jyIonG-u4eM >}} By default, when creating a new workflow definition, a default snippet / example is provided for your inspiration. We'll take a look at it for this article. There are actually two HTTP endpoint calls, the latter depending on the former: the first step (`getCurrentTime`) is a cloud function returning the day of the week, whereas the second step (`readWikipedia`) searches Wikipedia for articles about that day of the week. ```yaml - getCurrentTime:     call: http.get     args:         url: https://us-central1-workflowsample.cloudfunctions.net/datetime     result: CurrentDateTime ``` The `getCurrentTime` step contains a call attribute of type `http.get`, to make HTTP GET requests to an API endpoint. You have the ability to do either `call: http.get` or `call: http.post`. For other methods, you'll have to do call: `http.request`, and add another key/value pair under `args`, with method: `GET`, `POST`, `PATCH` or `DELETE`. Under `args`, for now, we'll just put the URL of our HTTP endpoint. The last key will be the result, which gives the name of a new variable that will contain the response of our HTTP request. Let's call Wikipedia with our day of the week search query: ```yaml - readWikipedia:     call: http.get     args:         url: https://en.wikipedia.org/w/api.php         query:             action: opensearch             search: ${CurrentDateTime.body.dayOfTheWeek}     result: WikiResult ``` Same thing with `call`, and `args.url`, however, we have a query where you can define the query parameters for the Wikipedia API. Also note how we can pass data from the previous step function invocation: `CurrentDateTime.body.dayOfTheWeek`. We retrieve the body of the response of the previous call, and from there, we get the `dayOfTheWeek` key in the resulting JSON document. We then return `WikiResult`, which is the response of that new API endpoint call. ```yaml - returnOutput:     return: ${WikiResult.body[1]} ``` Then, the last step is here to return the result of our search. We retrieve the body of the response. The response's body is an array, with a first term being the search query, and the second item is the following array of document names, which is what our workflow execution will return: ```json [ "Monday", "Monday Night Football", "Monday Night Wars", "Monday Night Countdown", "Monday Morning (newsletter)", "Monday Night Golf", "Monday Mornings", "Monday (The X-Files)", "Monday's Child", "Monday.com" ] ``` So our whole workflow was able to orchestrate two independent API endpoints, one after the other. Instead of having two APIs that are coupled via some messaging passing mechanism, or worse, via explicit calls to one or the other, Cloud Workflows is here to organize those two calls. It's the orchestration approach, instead of a choreography of services (see my previous article on [orchestration vs choreography]({{< ref "/posts/2020/11/18/orchestrating-microservices-with-cloud-workflows" >}}), and my colleague's article on [better service orchestration](https://cloud.google.com/blog/topics/developers-practitioners/better-service-orchestration-workflows) with Cloud Workflows). To come back to the details of API endpoint calls, here's their structure: ```yaml - STEP_NAME:     call: {http.get|http.post|http.request}     args:         url: URL_VALUE [method: REQUEST_METHOD] [headers:             KEY:VALUE ...] [body:             KEY:VALUE ...] [query:             KEY:VALUE ...] [auth:             type:{OIDC|OAuth2}] [timeout: VALUE_IN_SECONDS] [result: RESPONSE_VALUE] ``` In addition to the URL, the method and query, note that you can pass headers and a body. There is also a built-in mechanism for authentication which works with GCP APIs: the authentication is done transparently. You can also specify a timeout in seconds, if you want to fail fast and not wait forever a response that never comes. But we'll come back to error handling in some of our upcoming articles. # Day 7 with Workflows — Pass an input argument to your workflow All the workflow definitions we've seen so far, in this series, were self-contained. They were not parameterized. But we often need our business processes to take arguments (the ID of an order, the details of the order, etc.), so that we can treat those input values and do something about them. That's where workflow input parameters become useful! Let's start with a simple greeting message that we want to customize with a `firstname` and `lastname`. We'd like our workflow to look something like this: ```yaml - output:     return: ${"Your name is " + person.firstname + " " + person.lastname} ``` In the example above, we have a `person` variable, on which we're requesting the fields `firstname` and `lastname`. This is actually a dictionary. But how do we let Cloud Workflows know about this variable? We need to define it somehow. Workflow arguments are global to all the steps, so they need to be defined outside the scope of the steps themselves. Actually, workflows can be structured in sub-workflows: there's a main workflow, and possibly additional sub-workflows which are like routines or internal function definitions. We'll revisit the topic of sub-workflows in a later article. To declare our input parameter, we'll do it at the level of the main workflow, but in a more explicit fashion, with the following notation: ```yaml main:     params: [person]     steps:         - output:             return: ${"Your name is " + person.firstname + " " + person.lastname} ``` We explicitly show the name of our main workflow. We use the `params` instruction. Note that our single argument, `person`, is surrounded by square brackets. The main workflow can only take a single dictionary parameter, however, as we'll see later, sub-workflows can take several input arguments, hence the square brackets notation to specify a list of arguments. ![](/img/workflows-days/w7-input-argument.png) How do we pass this input argument? In the execution screen, in the input pane on the left, we create a JSON object, with a firstname and lastname keys. This JSON object is the dictionary in the person variable of our workflow definition. In this video, you'll see input arguments in action: {{< youtube 3dyKx2zBiXA >}} # Day 6 with Workflows — Arrays and dictionaries So far, in this series of articles on [Cloud Workflows](https://cloud.google.com/workflows), we have used simple data types, like strings, numbers and boolean values. However, it's possible to use more complex data structures, like [arrays](https://cloud.google.com/workflows/docs/reference/syntax?hl=en#arrays) and [dictionaries](https://cloud.google.com/workflows/docs/reference/syntax?hl=en#dictionaries). In this new episode, we're going to use those new structures. Arrays can be defined inline (like `anArray`) or spanning over several lines (like `anotherArray`): ```yaml - assignment:     assign:         - anArray: ["a", "b", "c"]         - anotherArray:             - one             - two - output:     return: ${anArray[0] + anotherArray[1]} ``` The output step will return the string `"atwo"`. For dictionaries, you can define them as follows: ```yaml - assignment:     assign:         - person:             firstname: "Guillaume"             lastname: "Laforge"             age: 43             kids: ["Marion", "Erine"] - output:     return: ${person.firstname + " and " + person.kids[1]} ``` The output step will return the string `"Guillaume and Erine"`. Notice that we nested an array within a dictionary. So you can easily create dictionaries containing arrays, containing other dictionaries, etc, just like any JSON or YAML structures. In the example we were able to access the second kid of the person, mixing both the field (dot) and index (square brackets) notations to access fields of our dictionary, and elements of our array. This video shows both arrays and dictionaries in action: {{< youtube 9JrqlV5s11Q >}} In the coming articles, we'll see that such data structures are handy for dealing with API endpoint calls. # Day 5 with Workflows — Visualizing the structure of your workflow definition So far, in our Cloud Workflows series, we have seen some of the YAML syntax for defining workflows. However, steps are defined after each other, as a series of step definitions, but in spite of the jump instructions, the conditionals, you don't really see visually what is going to be the next potential step in a workflow execution. Fortunately, a new UI enhancement has landed in the Google Cloud Console: the ability to visualize a workflow definition with a graph, when you're editing the definition. Furthermore, the graph is updated in quasi real-time as you make updates to the definition. ![](/img/workflows-days/w05-graph.png) Let's see this in action in the video below: {{< youtube RQ11ATLxf3I >}} Thanks to this visualization, it's easier to further understand how your workflow definition is structured, how executions operate. You can more easily track which steps follows a particular step. Enjoy! # Day 4 with Workflows — Jumping with switch conditions In the previous articles about Google [Cloud Workflows](https://cloud.google.com/workflows), we talked about how to [assign variables, create expressions]({{< ref "/posts/2020/12/03/day-3-with-workflows-variable-assignment-and-expressions" >}}), and also how to [jump from a step to another]({{< ref "/posts/2020/12/02/day-2-with-workflows-a-workflow-is-made-of-steps-and-jumps" >}}). It's time to combine both aspects to understand how we can do conditional jumps, thanks to the `switch` instruction. Let's start with a first step defining a variable, whose value we'll use in our switch condition: ```yaml - assignement:     assign:         - number: 42 ``` Then we're going to create our second step that will use a `switch` instruction, with an expression: ```yaml - evaluate:     switch:         - condition: ${number > 100}           next: highValue         - condition: ${number < 100}           next: lowValue     next: end ``` We define two conditions with two expressions, checking if the number is above or below 100, then we go to a different step (`highValue` or `lowValue` steps). If none of the conditions are met, we go to the end of the workflow (or we could return some value or raise some error). We also need our two steps to go to: ```yaml - highValue:     return: "It's high!" - lowValue:     return: "It's rather low!" ``` And we're done! If the number is `42`, like in our case, the execution of the workflow will go through the lowValue step, and return the string saying that it's a low value. {{< youtube thSKszcLWSg >}} Thanks to [switch conditionals](https://cloud.google.com/workflows/docs/reference/syntax?hl=en#jumps), with expressions and jumps, we can have non-linear logic in our workflow definitions. In upcoming articles, we will also have a look at how to use more complex data structures like arrays and dictionaries, and how to define inputs and outputs. # Day 3 with Workflows — Variable assignment and expressions Now that we have multiple steps in our workflow definition, let's see how we can pass data around, from a step to another. ![](/img/workflows-days/w03-var-assign.png) In a step, you can assign values to variables. Those values can be ints, doubles, strings, or booleans (and also null). Use the `assign` keyword as follows: ```yaml - assignments:     assign:         - two: 2         - pi: 3.14         - message: "Hello"         - bool: True ``` Those variables are available in the whole scope of the workflow, and can be accessed in other steps. So let's see how we can do something with those variables. Let's add a second step to our workflow definition: ```yaml - twoPi:     return: ${"Twice pi is " + string(two * pi)} ``` We are using the `${}` notation to create an expression. We're multiplying two numbers, we're converting them to a string, and we're concatenating two strings together, to get our final value. Note that not all operations are allowed on all types, so you might need to do some conversions with built-in [conversion functions](https://cloud.google.com/workflows/docs/reference/syntax?hl=en#conversion_functions) like the `string()` function in our example. There are all sorts of arithmetic operators or boolean logic operators. {{< youtube dygoGp_tcCk >}} For more information, you can read about [variable assignments](https://cloud.google.com/workflows/docs/reference/syntax?hl=en#assign-step), [data types](https://cloud.google.com/workflows/docs/reference/syntax?hl=en#data_types), and [expressions](https://cloud.google.com/workflows/docs/reference/syntax?hl=en#expressions). Next time, we'll also have a look at more complex data types. # Day 2 with Workflows — A workflow is made of steps and jumps Let's continue our discovery of [Goole Cloud Workflows](https://cloud.google.com/workflows)! Yesterday, we discovered the UI of Workflows. We [created our first workflow]({{< ref "/posts/2020/12/01/day-1-with-workflows-your-first-step-to-hello-world" >}}). We started with a single step, returning a greeting message: ```yaml - sayHello:     return: Hello from Cloud Workflows! ``` A workflow definition is made of steps. But not just one! You can create several steps. In YAML, the structure of your workflow will be something like: ```yaml - stepOne:     # do something - stepTwo:     # do something else - sayHello:     return: Hello from Cloud Workflows! ``` By default, steps are executed in the order they appear, from top to bottom. The execution will finish when either you return a value, or you reach the final step. If there's no return statement, a null value is returned as result of the workflow execution. A small step for a workflow execution, but you can also do a jump between steps! For that, you'll use the next instruction: ```yaml - stepOne:     next: stepTwo - stepThree:     next: sayHello - stepTwo:     next: stepThree - sayHello:     return: Hello from Cloud Workflows! ``` Here, we jump between steps, back and forth, before going to the final step that will return a value, and thus finish the execution of our workflow. {{< youtube BTzb1m5pDXI >}} Of course, we can go beyond a linear series of steps, and in subsequent articles, we'll see how we can create conditional jumps and switches, for more complex logic, and how we can pass some data and values between steps. # Day 1 with Workflows — Your first step to Hello World With more and more interconnected services, making sense of their interactions becomes critical. With Google [Cloud Workflows](https://cloud.google.com/workflows), developers can orchestrate and automate such complex systems by creating serverless workflows. In this series of articles, we will learn together how to use Goole Cloud Workflows, and get to know all its features, with short and easy to read tutorials. For our first day, we'll discover and use the Workflows UI in the [cloud console](https://console.cloud.google.com/). We will create a simple "hello world" workflow, consisting of a simple step. Going further, in the coming days, we'll learn more about advanced features. But first things first! {{< youtube 75BekrpL-qo >}} In the Google Cloud console UI, you can locate Workflows in the `Tools` section of the hamburger menu: ![](/img/workflows-days/wf01-01-menu.png) You can pin this menu item, to have it stick at the top of the list. The first time you are accessing this page, you'll be greeted with the following screen, which will ask you to enable the Workflows API. So just click on `ENABLE`: ![](/img/workflows-days/wf01-02-enable-api.png) Once the API is enabled, you'll be in the Workflows home screen: ![](/img/workflows-days/wf01-03-workflows-home.png) Click on the `CREATE` button to create your first workflow definition: ![](/img/workflows-days/wf01-04-empty-state.png) Find a name for your workflow (it should start with a letter). You can provide an optional description. Currently, only "us-central1" is available for the beta of Workflows, but more regions will be available later on. Notice that we have to select a service account that Workflows will use to call other Google Cloud APIs, however here, there's a warning telling us that the project requires a service account. As I've created a brand new project, I didn't have any service account created. If you had used, for example, Cloud Functions beforehand, a default service account would have been created. If you need to create a service account, you can create one in `IAM & Admin > Service Accounts`, then use this one. My first workflow will be called `w01-first-workflow`: ![](/img/workflows-days/wf01-05-form-filled.png) Move on to the next section with the `NEXT` button. That's where you will define your workflow: ![](/img/workflows-days/wf01-06-definition.png) This first workflow consists in one single step, called `sayHello`, and whose sole purpose is to return a hello world message: ```yaml - sayHello:     return: Hello from Cloud Workflows! ``` As you can see, workflow definitions are written using the [YAML](https://yaml.org/) configuration language. Click `DEPLOY` to deploy the workflow. You will then see the details of your new workflow. In the "executions" tab, you can see past executions. ![](/img/workflows-days/wf01-07-created.png) In the `logs` section, you can see the logging messages associated with your workflow creation, deployment and executions: ![](/img/workflows-days/wf01-08-logs.png) And in the `definitions` section, you can see the YAML description you just created: ![](/img/workflows-days/wf01-09-definition.png) Now click on the `EXECUTE` button. You will see the input section (we'll learn about input arguments in an upcoming article), and the YAML definition. Click the other `EXECUTE` button: ![](/img/workflows-days/wf01-10-exec-screen.png) You will see the result of your execution (`succeeded`, with other details about the execution), as well as both the input and the output, with our greeting message: ![](/img/workflows-days/wf01-11-executed.png) And here you are! You created your first workflow definition, and launched the first execution of this workflow! In the coming days, we will have a closer look at the structure of workflow definitions (its steps), how to define input arguments, but also how to create an execution of a workflow from the command-line. # Orchestrating microservices with Google Cloud Workflows The trend toward splitting a monolith into fine-grained loosely-coupled microservices has its merits. It allows us to scale parts of an application more easily. Teams become more effective on their focused perimeter. However, in a chain or graph of services interacting with each other via message buses or other eventing mechanisms, it becomes difficult to understand when things start to break. Your business processes spanning those services are in limbo. Here starts detective work to find out how to get back on track. **Choreography**: like a bunch of dancers on the floor composing a ballet. Loosely-coupled microservices compose business processes without really being aware of each other, casually interacting by receiving and sending messages or events. **Orchestration**: more like a conductor of an orchestra who directs musicians and their instruments to play each part. The approach of using a higher level solution that purposefully invokes and tracks each individual service, enables developers to know what the current state of a business process is. Both approaches have their pros and cons. The loosely-coupled aspects of choreography certainly enables agility. But business processes are harder to follow. Although orchestration adds a single-point-of-failure with its orchestrator tying all the pieces together, it brings clarity in the spaghetti of myriads of microservices. In addition to GCP's existing messaging ([Cloud Pub/Sub](https://cloud.google.com/pubsub/)) and eventing solutions ([Eventarc](https://cloud.google.com/blog/products/serverless/build-event-driven-applications-in-cloud-run)) for your service choreography, the newly launched product [Cloud Workflows](https://cloud.google.com/workflows)  is tackling the orchestration approach. Cloud Workflows is a scalable fully-managed serverless system that automates and coordinates services, takes care of error handling and retries on failure, and tells you if the overall process has finished. In this short video, during the "demo derby" at Google Cloud Next OnAir, I had the chance to present a demo of Cloud Workflows, with some concrete examples: {{< youtube id="_4fo_u5rY_8" t="7164s" >}} In this video, I started with the proverbial Hello World, using the Yaml syntax for defining workflows: ```yaml - hello:     return: "Hello from Cloud Workflows!" ``` I defined a `hello` step, whose sole purpose is to return a string, as the result of its execution. Next, I showed that workflow definitions can take arguments, and also return values thanks to more complex expressions: ```yaml main:     params: [args]     steps:         - returnGreeting:             return: ${"Hello " + args.first + " " + args.last} ``` Cloud Workflows is able to invoke any HTTP-based service (and supports OAuth2 and OIDC), whether in Google Cloud or outside (on premises, or other servers). Here, I invoke 2 Cloud Functions: ```yaml - getRandomNumber:     call: http.get     args:         url: https://us-central1-myprj.cloudfunctions.net/randomNumber     result: randomNumber - getNthPoemVerse:     call: http.get     args:         url: https://us-central1-myprj.cloudfunctions.net/theCatPoem         query:             nth: ${randomNumber.body.number}     result: randomVerse - returnOutput:     return: ${randomVerse.body} ``` The `getRandomNumber` step calls a function that returns a random number with an HTTP GET, and stores the result of that invocation in the `randomNumber` variable. The `getNthPoemVerse` calls another function that takes a query parameter, which is found in the `randomNumber` variable which holds the result of the previous function invocation. The `returnOutput` step then returns the resulting value. My fourth example shows variable assignment and conditional switches in action: ```yaml - getRandomNumber:     call: http.get     args:         url: https://us-central1-myprj.cloudfunctions.net/randomNumber     result: randomNumber - assign_vars:     assign:         - number: ${int(randomNumber.body.number)} - conditionalSwitch:     switch:         - condition: ${number < 33}           next: low         - condition: ${number < 66}           next: medium     next: high - low:     return: ${"That's pretty small! " + string(number)} - medium:     return: ${"Hmm, okay, an average number. " + string(number)} - high:     return: ${"It's a big number! " + string(number)} ``` Reusing the random function from the previous example, notice how variables are assigned, and how to create a switch with multiple conditions, as well as showing how to redirect the execution of the workflow to different steps, depending on the outcome of the switch. But there's really more to this! You can double check the [syntax reference](https://cloud.google.com/workflows/docs/reference/syntax), to see all the constructs you can use in your workflow definitions. ### Summary Cloud Workflows: - Orchestrate Google Cloud and HTTP-based API services into serverless workflows - Automate complex processes - Fully managed service requires no infrastructure or capacity planning - Fast scalability supports scaling down to zero and pay-per-use pricing model In terms of features: - Reliable workflow execution - Built-in error handling - Passing variable values between workflow steps - Built-in authentication for Google Cloud products - Low latency of execution - Support for external API calls - Built-in decisions and conditional step executions - Cloud Logging If you want to get started with [Cloud Workflows](https://cloud.google.com/workflows), you can head over to this [hands-on codelabs](https://codelabs.developers.google.com/codelabs/cloud-workflows-intro#0) from my colleague [Mete Atamel](https://twitter.com/meteatamel). Learn more by watching this [longer video](https://www.youtube.com/watch?v=Uz8G8fTwwXs) by Product Manager Filip Knapik who dives into Cloud Workflows. In upcoming articles, we'll come back to Workflows into more details, diving into some more advanced features, or how to migrate a choreographed example, into an orchestrated one. So, stay tuned! # The developer advocacy feedback loop For one of the [closing keynotes](https://2020.devrel.net/speaker/guillaume-laforge/) of [DevRelCon Earth 2020](https://2020.devrel.net/), I spoke about what I call the Developer Advocacy Feedback Loop. People often think about developer relations and advocacy as just being about external outreach. However, there's more to it! Developer Advocates are here to represent users, developers, technical practitioners, to influence the roadmap and development of the services and products to suit their needs. That's the internal advocacy that loops back into improving the products. Without further ado, let me share with you the slide deck here: {{< speakerdeck c9aa5be6e7534443963a5b981ef0b713 >}} Let me paraphrase what I presented in this talk. For the past 4 years, I've been a Developer Advocate, for Google, focusing on Google Cloud, and especially our serverless solutions (like App Engine, Cloud Functions, Cloud Run). I fell in the magic potion of advocacy, inadvertently, a long time ago while working on an open source project. This project is the [Apache Groovy](http://groovy-lang.org/) programming language. I was leading the project, but at the same time, I was also evangelising it at events, through articles,, and was trying to incorporate the feedback I was getting in the field back into the project. I was doing advocacy without really realizing it, like [Mr Jourdain](https://literature.stackexchange.com/questions/11844/meaning-of-the-prose-of-monsieur-jourdain) in Molière's play who was speaking in prose without knowing it. But I really saw a loop, a feedback loop, in the process, in how you spread the word about technologies, but also how you can listen to the feedback and improve your product. ![](/img/developer-advocacy-feedback-loop/01+-+feedback+loop.png) If you've studied Electronics, you might have seen such diagrams about the feedback loop. There's something in input, something in output, but there's a loop back, that brings some output back into the input channel. To make the parallel with advocacy... Advocacy is not just a one-way monologue, it's a conversation: you're here to tell a story to your kids for example, but you listen to feedback from the audience, on how to make your story even better. Not just how you tell the story (better intonation, pauses), but really improving the plot, the characters, the setting,≈‹ everything, perhaps making up a totally different story in the end! ![](/img/developer-advocacy-feedback-loop/03+-+devrel+multitude.png) Let me start with a short disclaimer. If you ask this room to give a definition of developer relations, or developer advocacy, or evangelism (a term I avoid because of its connotations), you'll get as many answers as there are attendees. I don't claim I have THE ultimate definitions for these concepts and approaches. I don't claim those things are the same things, or are different. And anyway, there's not just one way to do it, there's a multitude of ways. They are many things we do the same way, but I'm sure there are many incredible things you do that I'm not even aware of but that I'd like to learn more about! But I'll tell you how I am doing developer advocacy, and where this feedback loop comes into play. ![](/img/developer-advocacy-feedback-loop/02+-+who+are+we.png) So, who are we? DevRel is not always the same thing everywhere, in every company. And there's not just one way to do DevRel. Are we salespeople? Not necessarily, I don't get any bucks when I indirectly help sign a new customer deal. My metrics are more about the number of developers reached, or views on my articles or videos, or Twitter impressions. So are we marketing people? Well, I have some similar metrics for sure, I advertise the products or company I represent, but my goal is that my audience (the technical practitioners) be successful, even if they end up not using my technology. I want my audience to even advocate themselves for those products if possible (if the product is good and makes sense for them). Are we engineers? In my case, yes I am, I'm even in the Engineering org chart, and to show more empathy towards our engineer users, it's easier if we're engineers ourselves. We speak the same language. We're part of the same community. We have the same tool belt. Also as an engineer, I can even sometimes contribute to the products I talk about. But it's not because you're not an engineer that you can't succeed, and be a good advocate! Empathy is really key in this role, more so probably than engineering chops. Or are we PMs? In a previous life, in a small startup, I was actually wearing 2 hats: PM & DA. But it's tough to do two jobs like these at the same time. As a DA (without being a PM), with my contributions, my feedback from the field, from the community I advocate for, I do influence the roadmap of our products, for sure. But I'm only a part of the equation. However providing critical product feedback is super important in my job. That's the key aspect of the developer advocacy feedback loop! Perhaps we're just international travelers? We're measured by the number of visa stamps on our passports? Ah well, maybe. Or maybe not, we try to be greener, but with COVID-19, things have changed recently! The pandemic refines our job, our duties, our ways to communicate. There's lots we can do in the comfort of our home office too. Ultimately, we're all different, but we all have myriads of ways to contribute and reach our goal. Some of us may be focusing more on awesome videos tutorials, some on organizing hours-long hackathons, and others will be awesome beta-testers for our products, write cristal-clear code samples or SDKs, etc. There's not just one way to be a great Developer Advocate. You don't need to do it all. And we're a team. So we complement each other with our respective strengths. And we work with others too, like marketing, sales, consulting, tech writers, leadership. ![](/img/developer-advocacy-feedback-loop/04+-+our+goal.png) What do we do, what's our goal? We are empowering our users to reach their goals. We want to make them successful. We're enabling customer success. We're driving mindshare in the field, in our communities. We are making our users happy! ![](/img/developer-advocacy-feedback-loop/05+-+our+tools.png) How do we make our community, our users, our customers be successful? There are many tools for that. Some of the most well-known tools that we can use are outward facing: it's about external outreach (talks, articles, videos, etc.) But to make our communities more successful, we also need to get our products improved. That's where we create the feedback loop, with our internal influence, thanks to some tools I'll enumerate, we can help make the products better, by bringing our users' feedback up the chain to the PMs, Product Leads, etc. Let me show you. ![](/img/developer-advocacy-feedback-loop/06+-+the+advocacy+loop.png) Let me introduce you to our personas of my story, of my feedback loop. ![](/img/developer-advocacy-feedback-loop/07+-+the+personnas.png) At the top, you have the product leadership, the PM, CxOs, the SWEs. At the bottom, that's our users, our customers, our technical practitioners And in the middle, in between, there's you, the Developer Advocate. ![](/img/developer-advocacy-feedback-loop/08+-+company.png) But in a way, there are two teams. Here, in the white cloud, at the top, that's your company. ![](https://glaforge.appspot.com/media/09+-+community.png) But at the bottom, that's your community, with your users. You're not just part of the company, you're also part of the community. You are the advocate for your users, representing them to the product leadership, so that their voice is being heard! ![](/img/developer-advocacy-feedback-loop/10+-+external+outreach.png) That's the external outreach. What some call evangelism, the outward part of the loop. You're the voice of the company. You spread the word on your cool technology. You're creating great demos, code samples, polished videos. You're writing helpful articles, useful tutorials, readable documentation. You're attending and presenting at events to talk about the products. You're helping users succeed by answering questions on social media, StackOverflow, or other forums. ![](/img/developer-advocacy-feedback-loop/11+-+internal+advocacy.png) What makes it a feedback loop is this part. It's not just a by-product of the external outreach. It's an integral part of the advocacy work. There's the obvious stuff like filing bugs, or being a customer zero by testing the product before it launches. But things like writing trip reports, friction logs, customer empathy sessions may be new to you. If you can, make it a habit to produce such artifacts. And you can list, and track, and report about all those feedback elements that you bring upstream, and check how it's being enacted or not. Often people think about us mostly for the outreach part, the arrow going downward toward our community. They can think we're just kind of marketing puppets. And I've seen conference organisers complaining they only got "evangelists" at their show, when they wanted "real engineers" instead, working on the products or projects. But frankly, they are not necessarily always the best at explaining their own projects! Folks often forget that we're here to make them successful, and report their feedback, their needs, to advocate for them, and to influence the decision makers to make better products that fill the needs of those users. Both parts are critical! And please pay attention to that feedback loop, to that arrow going back to the top of the slide, to the leadership. So let's see some concrete examples of the things you can put in place to provide feedback, and show that DevRel is important and has a strong impact. ![](/img/developer-advocacy-feedback-loop/12+-+friction+logs.png) To make developers happy, you need to remove as much friction as possible. You want the developer experience to be as smooth as possible. You might need to work with UX designers and PMs directly for that. But you can also report about your findings, where you saw friction by writing a friction log. Last week, my colleague Emma [spoke about this](https://www.youtube.com/watch?v=765wLWVcyS0) at DevRelCon Earth, and another great colleague, Aja, wrote about [friction logging](https://devrel.net/developer-experience/an-introduction-to-friction-logging) on DevRel.net a while ago. Great resources to check out! ![](/img/developer-advocacy-feedback-loop/13+-+example+friction+log.png) I'm going to show you a real friction log. There's some metadata about the environment, date, user name, scenario title, etc. You're reporting about some concrete use case you were trying to implement (an app you were building, a new API you were trying to use, etc.) You document all the steps you followed, and tell what worked or not, how you expected things to work out. This document will be shared broadly via an alias which pings most PMs, tech leads, etc. So it's very visible. ![](/img/developer-advocacy-feedback-loop/14+-+friction+log+color+coding.png) But the key thing here is the color coding aspect. You show where there's friction, where there's frustration, where you'd quit if you were a real user. But also, it's super important to highlight what worked well, what surprised you, what delighted you. It's not just about the negative things. ![](/img/developer-advocacy-feedback-loop/15+-+tagging+stakeholders.png) And the last trick to make it effective: add comments, tagging key stakeholders (PMs, Tech Leads, etc), so they really acknowledge the problem. Create associated bug requests, and track them to check if progress is made. ![](/img/developer-advocacy-feedback-loop/16+-+friction+vlog.png) My colleague [Zack](https://twitter.com/ZackAkil) even developed an application for creating friction vlogs. Video friction logs! With a video, you can show concretely your frustration (but perhaps don't swear too much). A video shows where you struggle, where you lose time. You can navigate to various sections in the video, and annotate those sections, with the green / orange / red color coding scheme. The tool also creates a classical written friction log document as well. I found that application pretty neat, to be honest, especially as it also shows where users struggle, where they lose time. ![](/img/developer-advocacy-feedback-loop/17+-+advocacy+reporting.png) You can apply the same approach to other kinds of reporting activities. We often write reports for our trips, events, meetups, customer engagements. In particular, although we're not sales people, we're trying to show that we also have an impact on sales. Customers love having DevRel people come and show cool stuff! And we can collect and show the feedback coming from the field to the leadership. It's not just us sharing our own impressions and ideas, it's really coming from someone else's mouth, so it has more weight in the conversation. I'd like to highlight our internal advocacy reporting: we have someone on the team that collects all our bug reports (and included them in bug hotlists), all our friction logs, our trip reports, and who actively tracks how this feedback is taken into account, and it's a very effective way of showing that we do have impact to the leadership, beyond the usual metrics. And by the way, even those DevRel product feedback reports make use of the color coding we have in friction logs. So it's a very familiar thing for all our engineering team. ![](/img/developer-advocacy-feedback-loop/18+-+customer+empathy+sessions.png) Another interesting thing we're running is what we call customer empathy sessions, a concept invented by my colleague [Kim](https://twitter.com/kmbannerman). Gather various PMs, SWEs, DevRel people, potentially customers but not mandatory, in the same room (or virtually nowadays) and craft some concrete scenarios of something you'd like them to build in small groups (but where you know there's gonna be lots of friction). With teams of 3 or more, each one has a role: a driver, a scribe, and a searcher. Have them do the task. Then compare copies at the end. It's a bit like creating new Critical User Journeys that have not been addressed, that exhibit a lot of friction. But this time the engineers, the PM, will really feel the very same frustration our customers can potentially feel when they can't accomplish their tasks. The various teams often work in silos, on a particular aspect, and avoid certain paths (when you know you shouldn't click somewhere, you won't do it, you'll use the other path you know works). But customer empathy sessions are here to show what our users have to go through in real scenarios, beyond a handful of critical journeys. In summary, feel the pain, and show empathy toward your customers! Really, I won't stress this enough, but empathy is key here, and a real driver for positive change. ![](/img/developer-advocacy-feedback-loop/19+-+hallway+track+office+hours.png) We can do scalable advocacy, by creating things like videos that are broadcasted to many thousands of watchers. That have a long shelf time, which is less ephemeral than a conference talk. But sometimes, it's also good to do things that actually don't scale. Helping one single person can make a big difference: at a conference after my talk, I had a long conversation with an attendee that had a particular need. I onboarded them on our early access program for a new product, which seemed to be what they needed. They could provide key feedback to our PMs and engineers. And they helped us get that new product ready with a real use case. And the next year, the attendee was a key customer, that even came on stage to talk about the product. So I both won a new customer and a new advocate for that product. So the hallway track at events is very important. And that's the kind of feedback signals I'm missing in those times of pandemic. Another approach is office hours: you set up some time slots in your calendar, and anyone can book time with you. That's a great way to get feedback, and see what problems users are facing. I haven't tried that myself, as I'm a bit shy, and afraid someone would ask questions on topics I don't know much about! But that's very effective, and I have several colleagues doing that, and who are learning along the way. ![](/img/developer-advocacy-feedback-loop/20+-+create+new+products.png) Sometimes, your community, your users, will highlight a missing gap in your product portfolio. And it might give you some ideas of a product that would delight those persons, and they could become customers if you had that product. So that's actually how some of my colleagues went on creating totally new products, for example for gaming companies, or for secret management. On another occasion, as I had strong ideas on how a new product runtime should look like, I went on designing and prototyping an API that our users would use. Somehow, it's a bit like being the change you want to see in the world! And the API I designed, further improved with the engineering team, is now an API our customers are using today. ![](/img/developer-advocacy-feedback-loop/21+-+pudding.png) Time to wrap up. Often, the proof is in the pudding. It's not just about our intuitions or own personal experience. You need to gather feedback, in particular concrete customer feedback, to prove that you're right. And when it's a customer with some money to spend, usually product leadership listens. ![](/img/developer-advocacy-feedback-loop/22+-+roses.png) Sometimes, it's all roses and bloom! Our feedback, ideas, features are implemented! Woohoo! Success! ![](/img/developer-advocacy-feedback-loop/23+-+brick+wall.png) There's the ideal world where we indeed influence products, but sometimes, we also hit a brick wall, a dead end, we're not at the helm, and our feedback is not taken into account. Our companies can be big, work in silos, and it's sometimes a struggle to find the right people who are able to listen to us, and are able to get change enacted. Be resilient, let it not affect you personally, but return to the charge if you really think it's important for your community! ![](/img/developer-advocacy-feedback-loop/06+-+the+advocacy+loop.png) Remember: We're in it together! It's a team's effort. Let's make our users happy! And how to make them happy? By making great products, with great user and developer experience. By showing empathy toward our users, wear their shoes, listen to their feedback, and let that feedback be heard up above, to improve our products, by advocating for our users. That's where the feedback loop closes. Thanks for your attention. # Running Micronaut serverlessly on Google Cloud Platform Last week, I had the pleasure of presenting [Micronaut](https://micronaut.io/) in action on Google Cloud Platform, via a [webinar](https://objectcomputing.com/products/micronaut/resources/serverless-micronaut-on-google-cloud) organized by OCI. Particularly, I focused on the serverless compute options available: [Cloud Functions](https://cloud.google.com/functions), [App Engine](http://cloud.google.com/appengine), and [Cloud Run](https://cloud.google.com/run). Here are the slides I presented. However, the real meat is in the demos which are not displayed on this deck! So let's have a closer look at them, until the video is published online. {{< speakerdeck 9da7fe86bd4047508effb1ae34af5ed0 >}} On Google Cloud Platform, you have three solutions when you want to deploy your code in a serverless fashion (ie. hassle-free infrastructure, automatic scaling, pays-as-you-go):  - For event-oriented logic that reacts to cloud events (a new file in cloud storage, a change in a database document, a Pub/Sub message) you can go with a function.  - For a web frontend, a REST API, a mobile API backend, also for serving static assets for single-page apps, App Engine is going to do wonders.  - But you can also decide to containerize your applications and run them as containers on Cloud Run, for all kinds of needs. Both Cloud Functions and App Engine provide a Java 11 runtime (the latest LTS version of Java at the time of writing), but with Cloud Run, in a container, you can of course package whichever Java runtime environment that you want. And the good news is that you can run Micronaut easily on all those three environments! ## Micronaut on Cloud Functions ### HTTP functions Of those three solutions, Cloud Functions is the one that received a special treatment, as the Micronaut team worked on a [dedicated integration](https://micronaut-projects.github.io/micronaut-gcp/2.0.x/guide/#cloudFunction) with the [Functions Framework API](https://github.com/GoogleCloudPlatform/functions-framework-java) for Java. Micronaut supports both types of functions: HTTP and background functions. For HTTP functions, you can use a plain Micronaut controller. Your usual controllers can be turned into an HTTP function. ```java package com.example; import io.micronaut.http.annotation.*; @Controller("/hello") public class HelloController {     @Get(uri="/", produces="text/plain")     public String index() {         return "Micronaut on Cloud Functions";     } } ``` Micronaut Launch tool even allows you to create a dedicated scaffolded project with the right configuration (ie. the right Micronaut integration JAR, the Gradle configuration, including for running functions locally on your machine.) Pick the application type in the Launch configuration, and add the `google-cloud-function` module. In build.gradle, Launch will add the Functions Frameworks' invoker dependency, which allows you to run your functions locally on your machine (it's also the framework that is used in the cloud to invoke your functions, ie. the same portable and open source code): ```groovy invoker("com.google.cloud.functions.invoker:java-function-invoker:1.0.0-beta1") ``` It adds the Java API of the Functions Framework, as `compileOnly` as it's provided by the platform when running in the cloud: ```groovy compileOnly("com.google.cloud.functions:functions-framework-api") ``` And Micronaut's own GCP Functions integration dependency: ```groovy implementation("io.micronaut.gcp:micronaut-gcp-function-http") ``` And there's also a new task called `runFunction`, which allows you to run your function locally: ```bash ./gradlew runFunction ``` If you decide to use Maven, the same dependencies are applied to your project, but there's a dedicated Maven plugin that is provided to run functions locally. ``` ./mvnw function:run ``` Then to deploy your HTTP function, you can learn more about the topic in the [documentation](https://cloud.google.com/functions/docs/deploying). If you deploy with the gcloud command-line SDK, you will deploy with a command similar to the following one (depending on the region, or size of the instance you want to use): ```bash gcloud functions deploy hello \     --region europe-west1 \     --trigger-http --allow-unauthenticated \     --runtime java11 --memory 512MB \     --entry-point io.micronaut.gcp.function.http.HttpFunction ``` Note that Cloud Functions can build your functions from sources when you deploy, or it can deploy a pre-build shadowed JAR (as configured by Launch.) ### Background functions For background functions, in Launch, select the Micronaut serverless function type. Launch will create a class implementing the BackgroundFunction interface from the Function Frameworks APIs. But it will extend the `GoogleFunctionInitializer` class from Micronaut's function integration, which takes care of all the usual wiring (like dependency injection). This function by default receives a Pub/Sub message, but there are other types of events that you can receive, like when a new file is uploaded in cloud storage, a new or changed document in the Firestore nosql document database, etc. ```java package com.example; import com.google.cloud.functions.*; import io.micronaut.gcp.function.GoogleFunctionInitializer; import javax.inject.*; import java.util.*; public class PubSubFunction extends GoogleFunctionInitializer         implements BackgroundFunction {     @Inject LoggingService loggingService;     @Override     public void accept(PubSubMessage pubsubMsg, Context context) {         String textMessage = new String(Base64.getDecoder().decode(pubsubMsg.data));         loggingService.logMessage(textMessage);     } } class PubSubMessage {     String data;     Map attributes;     String messageId;     String publishTime; } @Singleton class LoggingService {     void logMessage(String txtMessage) {         System.out.println(txtMessage);     } } ``` When deploying, you'll define a different trigger, for example here, it's a Pub/Sub message, so you'll use a `--trigger-topic TOPIC_NAME` flag to tell the platform you want to receive messages on that topic. For deployment, the gcloud command would look as follows: ```bash gcloud functions deploy pubsubFn \     --region europe-west1 \     --trigger-topic TOPIC_NAME \     --runtime java11 --memory 512MB \     --entry-point com.example.PubSubFunction ``` ## Micronaut on App Engine Micronaut deploys fine as well on App Engine. I [wrote about it]({{< ref "/posts/2019/07/04/getting-started-with-micronaut-on-google-app-engine-java-11" >}}) in the past already. If you're using Micronaut Launch, just select the Application type. App Engine allows you to deploy the standalone runnable JARs generated by the configured shadow JAR plugin. But if you want to easily stage your application deliverable, to run the application locally, to deploy, you can also use the Gradle App Engine plugin. For that purpose, you should add the following build script section in `build.gradle`: ```groovy buildscript {     repositories {         mavenCentral()     }     dependencies {         classpath 'com.google.cloud.tools:appengine-gradle-plugin:2.3.0'     } } ``` And then apply the plugin with: ```groovy apply plugin: 'com.google.cloud.tools.appengine' ``` Before packaging the application, there's one extra step you need to go through, which is to add the special App Engine configuration file: app.yaml. You only need to add one line, unless you want to further configure the instance types, specify some JVM flags, point at static assets, etc. But otherwise, you only need this line in `src/main/appengine/app.yaml`: ```yaml runtime: java11 ``` Then, stage your application deliverable with: ```bash ./gradlew appengineStage ``` Cd in the directory, and you can deploy with the plugin or with the gcloud SDK: ```bash cd build/staged-app/ gcloud app deploy ``` During the demonstration, I showed a controller that was accessing some data from the [Cloud Firestore](https://cloud.google.com/firestore/) nosql database, listing some pet names: ```java package com.example; import java.util.*; import com.google.api.core.*; import com.google.cloud.firestore.*; import com.google.cloud.firestore.*; import io.micronaut.http.annotation.*; @Controller("/") public class WelcomeController {     @Get(uri="/", produces="text/html")     public String index() {         return "Hello Google Cloud!";     }     @Get(uri="/pets", produces="application/json")     public String pets() throws Exception {         StringBuilder petNames = new StringBuilder().append("[");         FirestoreOptions opts = FirestoreOptions.getDefaultInstance();         Firestore db = opts.getService();         ApiFuture query = db.collection("pets").get();         QuerySnapshot querySnapshot = query.get();         List documents = querySnapshot.getDocuments();         for (QueryDocumentSnapshot document : documents) {             petNames.append("\"")                 .append(document.getString("name"))                 .append("\", ");         }         return petNames.append("]").toString();     } } ``` ## Micronaut on Cloud Run ### Building a Micronaut container image with Jib In a previous article, I talked about how to try [Micronaut with Java 14 on Google Cloud]({{< ref "/posts/2020/03/24/start-the-fun-with-java-14-and-micronaut-inside-serverless-containers-on-cloud-run" >}}). I was explaining how to craft your own `Dockerfile`, instead of the one generated then by default by Micronaut Launch (now, it is using `openjdk:14-alpine`). But instead of fiddling with Docker, in my demos, I thought it was cleaner to use Jib. [Jib](https://github.com/GoogleContainerTools/jib) is a tool to create cleanly layered container images for your Java applications, without requiring a Docker daemon. There are plugins available for Gradle and Maven, I used the Gradle one by configuring my `build.gradle` with: ```groovy plugins {     ...     id "com.google.cloud.tools.jib" version "2.4.0" } ``` And by configuring the `jib` task with: ```groovy jib {     to {         image = "gcr.io/serverless-micronaut/micronaut-news"     }     from {         image = "openjdk:14-alpine"     } } ``` The from/image line defines the base image to use, and the `to`/`image` points at the location in Google Cloud Container Registry where the image will be built, and we can then point Cloud Run at this image for deployment: ```bash gcloud config set run/region europe-west1 gcloud config set run/platform managed ./gradlew jib gcloud run deploy news --image gcr.io/serverless-micronaut/micronaut-news --allow-unauthenticated ``` ## Bonus points: Server-Sent Events In the demo, I showed the usage of [Server-Sent Events](https://developer.mozilla.org/en-US/docs/Web/API/Server-sent_events/Using_server-sent_events). Neither Cloud Functions nor App Engine support any kind of streaming, as there's a global frontend server in the Google Cloud infrastructure that buffers requests and responses. But Cloud Run supports streaming (HTTP/2 streaming, gRPC streaming, server-sent events, and WebSocket streaming). So that was a great excuse to play with Micronaut's SSE support. I went with a slightly modified example from the documentation, to emit a few string messages a second apart: ```java package com.example; import io.micronaut.http.MediaType; import io.micronaut.http.annotation.*; import io.micronaut.http.sse.Event; import io.micronaut.scheduling.TaskExecutors; import io.micronaut.scheduling.annotation.ExecuteOn; import io.reactivex.Flowable; import org.reactivestreams.Publisher; @Controller("/news") public class NewsController {     @ExecuteOn(TaskExecutors.IO)     @Get(produces = MediaType.TEXT_EVENT_STREAM)     public Publisher> index() {          String[] ids = new String[] { "1", "2", "3", "4", "5" };         return Flowable.generate(() -> 0, (i, emitter) -> {              if (i < ids.length) {                 emitter.onNext(                      Event.of("Event #" + i)                 );                 try { Thread.sleep(1000); } catch (Throwable t) {}             } else {                 emitter.onComplete();              }             return ++i;         });     } } ``` Then I accessed the /news controller and was happy to see that the response was not buffered and that the events were showing up every second. Apart from getting on board of this alpha feature of Cloud Run (via the form mentioned to get my GCP project whitelisted), I didn't have to do anything special to my Micronaut setup from the previous section. No further configuration required, it just worked out of the box. ## Summary The great benefit to using Micronaut on Google Cloud Platform's serverless solutions is that thanks to Micronaut's ahead-of-time compilation techniques, it starts and runs super fast, and consumes much less memory than other Java frameworks. Further down the road, you can also take advantage of GraalVM for even faster startup and lower memory usage. Although my examples were in Java, you can also use Kotlin or Groovy if you prefer. # Video: Getting started with Java on Google Cloud Functions For the 24 hours of talks by Google Cloud DevRel, I recorded my talk about the new Java 11 runtime for Google Cloud Functions. I wrote about this runtime in this [article](http://glaforge.appspot.com/article/deploying-serverless-functions-in-groovy-on-the-new-java-11-runtime-for-google-cloud-functions)  showing for example how to run [Apache Groovy](https://groovy-lang.org/) functions, and I also wrote about it on the  [GCP blog](https://cloud.google.com/blog/products/application-development/introducing-java-11-on-google-cloud-functions)  and [Google Developers blog](https://developers.googleblog.com/2020/05/java-11-for-cloud-functions.html) as well. In this video, I'm giving a quick explanations on the serverless approach, the various serverless options provided by Google Cloud, and then I dive into the various shapes Java functions can take (HTTP and background functions), the interfaces you have to implement when authoring a function. And I also do various demonstrations, deploying Java functions, Groovy functions, or Micronaut functions! {{< youtube d7eYFI-jbcc >}} # Introducing Java 11 on Google Cloud Functions The Java programming language recently turned 25 years old, and it's still one of the top-used languages powering today's enterprise application customers. On Google Cloud, you can already run serverless Java microservices in App Engine and Cloud Run. Today we're bringing Java 11 to Google Cloud Functions, an event-driven serverless compute platform that lets you run locally or in the cloud without having to provision servers. That means you can now write Cloud Functions using your favorite JVM languages ([Java](https://github.com/GoogleCloudPlatform/java-docs-samples/tree/master/functions/helloworld/helloworld), [Kotlin](https://github.com/GoogleCloudPlatform/java-docs-samples/tree/master/functions/helloworld/kotlin-helloworld), [Groovy](https://github.com/GoogleCloudPlatform/java-docs-samples/tree/master/functions/helloworld/groovy-helloworld), [Scala](https://github.com/GoogleCloudPlatform/java-docs-samples/tree/master/functions/helloworld/scala-helloworld), etc) with our [Functions Framework for Java](https://github.com/GoogleCloudPlatform/functions-framework-java), and also with [Spring Cloud Functions](https://cloud.spring.io/spring-cloud-static/spring-cloud-function/3.0.6.RELEASE/reference/html/gcp.html) and [Micronaut](https://micronaut-projects.github.io/micronaut-gcp/2.0.x/guide/#simpleFunctions)! With Cloud Functions for Java 11, now in beta, you can use Java to build business-critical applications and integration layers, and deploy the function in a fully managed environment, complete with access to resources in a private VPC network. Java functions will scale automatically based on your load. You can write [HTTP functions](https://cloud.google.com/functions/docs/writing/http) to respond to HTTP events, and [background functions](https://cloud.google.com/functions/docs/writing/background) to process events sourced from various cloud and GCP services, such as Pub/Sub, Cloud Storage, Firestore, and more. ![/img/j11gcf/Cloud_Functions__Product_Strategy-01.jpg](/img/j11gcf/Cloud_Functions__Product_Strategy-01.jpg) Functions are a great fit for serverless application backends for integrating with third-party services and APIs, or for mobile or IoT backends. You can also use functions for real-time data processing systems, like processing files as they are uploaded to Cloud Storage, or to handle real-time streams of events from Pub/Sub. Last but not least, functions can serve intelligent applications like virtual assistants and chat bots, or video, image and sentiment analysis. ## Cloud Functions for Java 11 example {{< youtube UsYRKkibLPI >}} You can develop functions using the [Functions Framework for Java](https://github.com/GoogleCloudPlatform/functions-framework-java/), an open source functions-as-a-service framework for writing portable Java functions. You can develop and run your functions locally, deploy them to Cloud Functions, or to another Java environment. An HTTP function simply implements the [`HttpFunction`](https://javadoc.io/static/com.google.cloud.functions/functions-framework-api/1.0.1/com/google/cloud/functions/HttpFunction.html) interface: ```java public class HelloWorld implements HttpFunction { @Override public void service(HttpRequest request, HttpResponse response) throws IOException { var writer = response.getWriter(); writer.write("Hello world!"); } } ``` Add the Functions Framework API dependency to the Maven pom.xml: ```xml com.google.cloud.functions functions-framework-api 1.0.1 provided ``` Then add the the [Function Maven plugin](https://github.com/GoogleCloudPlatform/functions-framework-java#running-a-function-with-the-maven-plugin) so you can run the function locally: ```xml com.google.cloud.functions function-maven-plugin 0.9.2 function.HelloWorld ``` Run the function locally: ```bash mvn function:run ``` You can also use your IDE to launch this Maven target in Debugger mode to debug the function locally. ![/img/j11gcf/2_Cloud_Functions_for_Java_11.max-900x900.png](/img/j11gcf/2_Cloud_Functions_for_Java_11.max-900x900.png) To deploy the function, you can use the gcloud command line: ```bash gcloud beta functions deploy helloworld-function \ --entry-point function.HelloWorld --runtime java11 --trigger-http ``` Alternatively, you can also deploy with the Function Maven plugin: ```bash mvn function:deploy -Dfunction.deploy.name=helloworld-function ``` You can find the [full example on GitHub](https://github.com/GoogleCloudPlatform/java-docs-samples/tree/master/functions/helloworld/helloworld). In addition to running this function in the fully managed Cloud Functions environment, you can also [bring the Functions Framework runtime](https://github.com/GoogleCloudPlatform/functions-framework-java#running-the-functions-framework-directly) with you to other environments, such as Cloud Run, Google Kubernetes Engine, or a virtual machine. In addition, Java 8 users can now take advantage of Java 11 features. The majority of the use cases of the Java 8 programming model are supported in Java 11. ## Third-party framework support In addition to our Functions Framework for Java, both the [Micronaut ](https://micronaut-projects.github.io/micronaut-gcp/2.0.x/guide/#simpleFunctions)framework and the [Spring Cloud Function](https://spring.io/projects/spring-cloud-function) project now have out-of-the-box support for Google Cloud Functions. You can create both an HTTP function and background function using the respective framework's programming model, including capabilities like dependency injection. ### Micronaut The Micronaut team implemented dedicated support for the Cloud Functions Java 11 runtime. Instead of implementing Functions Framework's HttpFunction interface directly, you can use Micronaut's programming model, such that a Helloworld HTTP Function can simply be a [Micronaut controller](https://docs.micronaut.io/2.0.0.M2/guide/index.html#creatingServer): ```java @Controller("/hello") public class HelloController { @Get("/{name}") String greet(String name) { return "Hello " + name; } } ``` You can find a [full example of Micronaut with Cloud Functions](https://github.com/micronaut-projects/micronaut-gcp/tree/master/examples/hello-world-cloud-function) and its [documentation](https://micronaut-projects.github.io/micronaut-gcp/snapshot/guide/#httpFunctions) on GitHub. ### Spring Cloud Functions The Google Cloud Java Frameworks team worked with the Spring team to bring [Spring Cloud GCP](https://spring.io/projects/spring-cloud-gcp) project to help Spring Boot users easily leverage Google Cloud services. More recently, the team worked with the Spring Cloud Function team to bring you [Spring Cloud Function GCP Adapter](https://cloud.spring.io/spring-cloud-static/spring-cloud-function/3.0.7.RELEASE/reference/html/gcp.html). A function can just be a vanilla Java function, so you can run a Spring Cloud Function application on Cloud Functions without having to modify your code to run on Google Cloud. ```java @Bean public Function uppercase() { return value -> value.toUpperCase(); } ``` You can find a full example of a [Spring Cloud Function with Cloud Functions](https://github.com/spring-cloud/spring-cloud-function/tree/master/spring-cloud-function-samples/function-sample-gcp-http) on GitHub. ## JVM Languages In addition to using the [latest Java 11 language features](https://advancedweb.hu/new-language-features-since-java-8-to-14/) with Cloud Functions, you can also use your favorite JVM languages, such as Kotlin, Groovy, and Scala, and more. For example, here's a function written with Kotlin: ```kotlin class HelloWorld : HttpFunction { fun helloWorld(req: HttpRequest, res: HttpResponse) { with(res.writer) { write("Hello Kotlin World!") } } } ``` Here's the same function with Groovy: ```groovy class HelloWorld implements HttpFunction { void service(HttpRequest req, HttpResponse res) { res.writer.write("Hello Groovy World!") } } ``` You can take a [deeper dive into a Groovy example](http://glaforge.appspot.com/article/deploying-serverless-functions-in-groovy-on-the-new-java-11-runtime-for-google-cloud-functions), and otherwise, find all the examples on GitHub ([Kotlin](https://github.com/GoogleCloudPlatform/java-docs-samples/tree/master/functions/helloworld/kotlin-helloworld), [Groovy](https://github.com/GoogleCloudPlatform/java-docs-samples/tree/master/functions/helloworld/groovy-helloworld), [Scala](https://github.com/GoogleCloudPlatform/java-docs-samples/tree/master/functions/helloworld/scala-helloworld)). ## Try Cloud Functions for Java 11 today Cloud Functions for Java 11 is now in beta, so you can try it today with your favorite JVM language and frameworks. Read the [Quick Start guide](https://cloud.google.com/functions/docs/quickstart-java), learn how to [write your first functions](https://cloud.google.com/functions/docs/first-java), and try it out with a Google Cloud Platform [free trial](https://cloud.google.com/free). If you want to dive a little bit deeper into the technical aspects, you can also read this [article on Google Developers blog](https://developers.googleblog.com/2020/05/java-11-for-cloud-functions.html). If you're interested in the [open-source Functions Framework](https://github.com/GoogleCloudPlatform/functions-framework-java) for Java, please don't hesitate to have a look at the project and potentially even contribute to it. We're looking forward to seeing all the Java the functions you write!  * * * * * *Special thanks to Googlers Éamonn McManus, Magda Zakrzewska‎, Sławek Walkowski, Ludovic Champenois, Katie McCormick, Grant Timmerman, Ace Nassri, Averi Kitsch, Les Vogel, Kurtis Van Gent, Ronald Laeremans, Mike Eltsufin, Dmitry Solomakha, Daniel Zou, Jason Polites, Stewart Reichling, Michael Skura, Karol Farbiś, and Vinod Ramachandran. We also want to thank [Micronaut](https://micronaut.io/) and [Spring Cloud Function](https://spring.io/projects/spring-cloud-function) teams for working on the Cloud Functions support!* # Sip a Cup of Java 11 for Your Cloud Functions With the [beta of the new Java 11 runtime](https://cloud.google.com/blog/products/application-development/introducing-java-11-on-google-cloud-functions) for Google [Cloud Functions](https://cloud.google.com/functions), Java developers can now write their functions using the Java programming language (a language often used in enterprises) in addition to Node.js, Go, or Python. Cloud Functions allow you to run bits of code locally or in the cloud, without provisioning or managing servers: Deploy your code, and let the platform handle scaling up and down for you. Just focus on your code: handle incoming HTTP requests or respond to some cloud events, like messages coming from Cloud Pub/Sub or new files uploaded in Cloud Storage buckets. In this article, let's focus on what functions look like, how you can write portable functions, how to run and debug them locally or deploy them in the cloud or on-premises, thanks to the [Functions Framework](https://github.com/GoogleCloudPlatform/functions-framework-java), an open source library that runs your functions. But you will also learn about third-party frameworks that you might be familiar with, that also let you create functions using common programming paradigms. ## The shape of your functions There are two types of functions: [HTTP functions](https://cloud.google.com/functions/docs/writing/http), and [background functions](https://cloud.google.com/functions/docs/writing/background). HTTP functions respond to incoming HTTP requests, whereas background functions react to cloud-related events. The Java Functions Framework provides an [API](https://javadoc.io/doc/com.google.cloud.functions/functions-framework-api/latest/index.html) that you can use to author your functions, as well as an invoker which can be called to run your functions locally on your machine, or anywhere with a Java 11 environment. To get started with this API, you will need to add a dependency in your build files. If you use Maven, add the following dependency tag in `pom.xml`: ```xml com.google.cloud.functions functions-framework-api 1.0.1 provided ``` If you are using Gradle, add this dependency declaration in `build.gradle`: ```groovy compileOnly("com.google.cloud.functions:functions-framework-api") ``` ### Responding to HTTP requests A Java function that receives an incoming HTTP request implements the `[HttpFunction](https://javadoc.io/doc/com.google.cloud.functions/functions-framework-api/latest/com/google/cloud/functions/HttpFunction.html)` interface: ```java import com.google.cloud.functions.*; import java.io.*; public class Example implements HttpFunction { @Override public void service(HttpRequest request, HttpResponse response) throws IOException { var writer = response.getWriter(); writer.write("Hello developers!"); } } ``` The `service()` method provides an `[HttpRequest](https://javadoc.io/doc/com.google.cloud.functions/functions-framework-api/latest/com/google/cloud/functions/HttpRequest.html)` and an `[HttpResponse](https://javadoc.io/doc/com.google.cloud.functions/functions-framework-api/latest/com/google/cloud/functions/HttpResponse.html)` object. From the request, you can get information about the HTTP headers, the payload body, or the request parameters. It's also possible to handle multipart requests. With the response, you can set a status code or headers, define a body payload and a content-type. Responding to cloud events -------------------------- Background functions respond to events coming from the cloud, like new [Pub/Sub](https://cloud.google.com/pubsub) messages, [Cloud Storage](https://cloud.google.com/storage) file updates, or new or updated data in [Cloud Firestore](https://cloud.google.com/firestore). There are actually two ways to implement such functions, either by dealing with the JSON payloads representing those events, or by taking advantage of object marshalling thanks to the [Gson](https://github.com/google/gson) library, which takes care of the parsing transparently for the developer. With a [RawBackgroundFunction](https://javadoc.io/doc/com.google.cloud.functions/functions-framework-api/latest/com/google/cloud/functions/RawBackgroundFunction.html), the responsibility is on you to handle the incoming cloud event JSON-encoded payload. You receive a JSON string, so you are free to parse it however you like, with your JSON parser of your choice: ```java import com.google.cloud.functions.Context; import com.google.cloud.functions.RawBackgroundFunction; public class RawFunction implements RawBackgroundFunction { @Override public void accept(String json, Context context) { ... } } ``` But you also have the option to write a [BackgroundFunction](https://javadoc.io/doc/com.google.cloud.functions/functions-framework-api/latest/com/google/cloud/functions/BackgroundFunction.html) which uses Gson for unmarshalling a JSON representation into a Java class (a POJO, Plain-Old-Java-Object) representing that payload. To that end, you have to provide the POJO as a generic argument: ```java import com.google.cloud.functions.Context; import com.google.cloud.functions.BackgroundFunction; public class PubSubFunction implements BackgroundFunction { @Override public void accept(PubSubMsg msg, Context context) { System.out.println("Received message ID: " + msg.messageId); } } public class PubSubMsg { String data; Map attributes; String messageId; String publishTime; } ``` The [Context](https://javadoc.io/doc/com.google.cloud.functions/functions-framework-api/latest/com/google/cloud/functions/Context.html) parameter contains various metadata fields like timestamps, the type of events, and other attributes. Which type of background function should you use? It depends on the control you need to have on the incoming payload, or if the Gson unmarshalling doesn't fully fit your needs. But having the unmarshalling covered by the framework definitely streamlines the writing of your function. ## Running your function locally Coding is always great, but seeing your code actually running is even more rewarding. The Functions Framework comes with the API we used above, but also with an invoker tool that you can use to run functions locally. For improving developer productivity, having a direct and local feedback loop on your own computer makes it much more comfortable than deploying in the cloud for each change you make to your code. ### With Maven If you're building your functions with Maven, you can install the Function Maven plugin in your `pom.xml`: ```xml com.google.cloud.functions function-maven-plugin 0.9.2 com.example.Example ``` On the command-line, you can then run: ```bash $ mvn function:run ``` You can pass extra parameters like `--target` to define a different function to run (in case your project contains several functions), `--port` to specify the port to listen to, or `--classpath` to explicitly set the classpath needed by the function to run. These are the parameters of the underlying `[Invoker](https://github.com/GoogleCloudPlatform/functions-framework-java/blob/master/invoker/core/src/main/java/com/google/cloud/functions/invoker/runner/Invoker.java)` class. However, to set these parameters via the Maven plugin, you'll have to pass properties with `-Drun.functionTarget=com.example.Example` and `-Drun.port`. ### With Gradle With Gradle, there is no dedicated plugin, but it's easy to configure `build.gradle` to let you run functions. First, define a dedicated configuration for the invoker: ```groovy configurations { invoker } ``` In the dependencies, add the Invoker library: ```groovy dependencies { invoker 'com.google.cloud.functions.invoker:java-function-invoker:1.0.0-beta1' } ``` And then, create a new task to run the Invoker: ```groovy tasks.register("runFunction", JavaExec) { main = 'com.google.cloud.functions.invoker.runner.Invoker' classpath(configurations.invoker) inputs.files(configurations.runtimeClasspath, sourceSets.main.output) args('--target', project.findProperty('runFunction.target') ?: 'com.example.Example', '--port', project.findProperty('runFunction.port') ?: 8080 ) doFirst { args('--classpath', files(configurations.runtimeClasspath, sourceSets.main.output).asPath) } } ``` By default, the above launches the function `com.example.Example` on port 8080, but you can override those on the command-line, when running gradle or the gradle wrapper: ```bash $ gradle runFunction -PrunFunction.target=com.example.HelloWorld\ -PrunFunction.port=8080 ``` ### Running elsewhere, making your functions portable What's interesting about the [Functions Framework](https://github.com/GoogleCloudPlatform/functions-framework-java) is that you are not tied to the Cloud Functions platform for deploying your functions. As long as, in your target environment, you can run your functions with the Invoker class, you can run your functions on [Cloud Run](https://cloud.google.com/run), on [Google Kubernetes Engine](https://cloud.google.com/kubernetes-engine), on [Knative](https://cloud.google.com/knative/) environments, on other clouds when you can run Java, or more generally on any servers on-premises. It makes your functions highly portable between environments. But let's have a closer look at deployment now. ## Deploying your functions You can [deploy](https://github.com/GoogleCloudPlatform/functions-framework-java/blob/master/invoker/function-maven-plugin/src/main/java/com/google/cloud/functions/plugin/DeployFunction.java) functions with the Maven plugin as well, with various parameters to tweak for defining regions, memory size, etc. But here, we'll focus on using the [cloud SDK](https://cloud.google.com/sdk), with its `gcloud` command-line, to deploy our functions. For example, to deploy an HTTP function, you would type: ```bash $ gcloud functions deploy exampleFn\ --region europe-west1\ --trigger-http\ --allow-unauthenticated\ --runtime java11\ --entry-point com.example.Example\ --memory 512MB ``` For a background function that would be notified of new messages on a Pub/Sub topic, you would launch: ```bash $ gcloud functions deploy exampleFn\ --region europe-west1\ --trigger-topic msg-topic\ --runtime java11\ --entry-point com.example.PubSubFunction\ --memory 512MB ``` Note that deployments come in two flavors as well, although the above commands are the same: functions are deployed from source with a `pom.xml` and built in Google Cloud, but when using a build tool other than Maven, you can also use the same command to deploy a pre-compiled JAR that contains your function implementation. Of course, you'll have to create that JAR first. ## What about other languages and frameworks? So far, we looked at Java and the plain [Functions Framework](https://github.com/GoogleCloudPlatform/functions-framework-java), but you can definitely use alternative JVM languages such as [Apache Groovy](http://groovy-lang.org/), [Kotlin](https://kotlinlang.org/), or [Scala](https://www.scala-lang.org/), and third-party frameworks that integrate with Cloud Functions like [Micronaut](https://micronaut.io/) and [Spring Boot](https://spring.io/projects/spring-boot)! ### Pretty Groovy functions Without covering all those combinations, let's have a look at two examples. What would an HTTP function look like in Groovy? The first step will be to add Apache Groovy as a dependency in your `pom.xml`: ```xml org.codehaus.groovy groovy-all 3.0.4 pom ``` You will also need the GMaven compiler plugin to compile the Groovy code: ```xml org.codehaus.gmavenplus gmavenplus-plugin 1.9.0 addSources addTestSources compile compileTests ``` When writing the function code, just use Groovy instead of Java: ```groovy import com.google.cloud.functions.* class HelloWorldFunction implements HttpFunction { void service(HttpRequest request, HttpResponse response) { response.writer.write "Hello Groovy World!" } } ``` The same explanations regarding running your function locally or deploying it still applies: the Java platform is pretty open to alternative languages too! And the Cloud Functions builder will happily build your Groovy code in the cloud, since Maven lets you compile this code thanks to the Groovy library. ### Micronaut functions Third-party frameworks also offer a dedicated Cloud Functions integration. Let's have a look at Micronaut. [Micronaut](https://micronaut.io/) is a *"modern, JVM-based, full-stack framework for building modular, easily testable microservice and serverless applications"*, as explained on its website. It supports the notion of serverless functions, web apps and microservices, and has a dedicated integration for Google Cloud Functions. In addition to being a very efficient framework with super fast startup times (which is important, to avoid long cold starts on serverless services), what's interesting about using Micronaut is that you can use Micronaut's own programming model, including Dependency Injection, annotation-driven bean declaration, etc. For HTTP functions, you can use the framework's own `@Controller` / `@Get` annotations, instead of the Functions Framework's own interfaces. So for example, a Micronaut HTTP function would look like: ```java import io.micronaut.http.annotation.*; @Controller("/hello") public class HelloController { @Get(uri="/", produces="text/plain") public String index() { return "Example Response"; } } ``` This is the standard way in Micronaut to define a Web microservice, but it transparently builds upon the Functions Framework to run this service as a Cloud Function. Furthermore, this programming model offered by Micronaut is portable across other environments, since Micronaut runs in many different contexts. Last but not least, if you are using the [Micronaut Launch](https://micronaut.io/launch/) project (hosted on [Cloud Run](https://cloud.google.com/run)) which allows you to scaffold new projects easily (from the command-line or from a nice UI), you can opt for adding the `google-cloud-function` support module, and even choose your favorite language, build tool, or testing framework: [![Micronaut Launch](https://3.bp.blogspot.com/-dgW0l2rewn0/Xs1EprFntNI/AAAAAAAAI3Q/XlDOiwKN_98CGp8qGSRJkTv_ETjqynbjgCLcBGAsYHQ/s1600/micronautlaunch.png)](https://3.bp.blogspot.com/-dgW0l2rewn0/Xs1EprFntNI/AAAAAAAAI3Q/XlDOiwKN_98CGp8qGSRJkTv_ETjqynbjgCLcBGAsYHQ/s1600/micronautlaunch.png) Be sure to check out the documentation for the [Micronaut Cloud Functions support](https://micronaut-projects.github.io/micronaut-gcp/2.0.0.M3/guide/index.html#cloudFunction), and [Spring Cloud Function support](https://cloud.spring.io/spring-cloud-static/spring-cloud-function/3.0.6.RELEASE/reference/html/gcp.html). ## What's next? Now it's your turn to try Cloud Functions for Java 11 today, with your favorite JVM language or third-party frameworks. Read the [getting started guide](https://cloud.google.com/functions/docs/quickstart-java), and try this for free with Google Cloud Platform [free trial](https://cloud.google.com/free). Explore Cloud Functions' [features and use cases](https://cloud.google.com/functions), take a look at the [quickstarts](https://cloud.google.com/functions/docs/quickstarts), perhaps even contribute to the [open source Functions Framework](https://github.com/GoogleCloudPlatform/functions-framework-java). And we're looking forward to seeing what functions you're going to build on this platform! # Deploying serverless functions in Groovy on the new Java 11 runtime for Google Cloud Functions Java celebrates its 25th anniversary!  Earlier this year, the Apache Groovy team released the big  [3.0 version](https://groovy-lang.org/releasenotes/groovy-3.0.html) of the programming language.  [GMavenPlus](https://github.com/groovy/GMavenPlus/releases/tag/1.9.0) was published in version 1.9 (the Maven plugin for compiling Groovy code) which works with Java 14. And today, Google Cloud opens up the beta of the  [Java 11 runtime for Cloud Functions](https://cloud.google.com/blog/products/application-development/introducing-java-11-on-google-cloud-functions). What about combining them all? I've been working for a bit on the Java 11 runtime for Google Cloud Functions (that's the Function-as-a-Service platform of Google Cloud, pay-as-you-go, hassle-free / transparent scaling), and in this article, I'd like to highlight that you can also write and deploy functions with alternative JVM languages  like [Apache Groovy](http://groovy-lang.org/). So today, you're going to: - Write a simple Groovy 3.0 function, - Compile it with Maven 3.6 and the GMavenPlus 1.9 plugin,  - Deploy and run the function on the Cloud Functions Java 11 runtime! Note: If you want to try this at (work from?) home, you will need an account on Google Cloud, you can easily create a [free account](https://cloud.google.com/free)  and benefit from $300 of cloud credits to get started (including also free quotas for many products). You will also need to create a billing account, but for the purpose of this tutorial, you should be within the free quota (so your credit card shouldn't be billed). Then, head over to the [console.cloud.google.com](https://console.cloud.google.com/) cloud console to create a new project. And then navigate to the Cloud Functions section to enable the service for your project. Let's get started! So what do we need? A pom.xml file, and a Groovy class! Let's start with the `pom.xml` file, and what you should add to your build file. First of all, since I'm using Groovy as my function implementation language, I'm going to use GMavenPlus for compilation. So in the build/plugins section, I configure the plugin as follows: ```xml org.codehaus.gmavenplus gmavenplus-plugin 1.9.0 groovy-compile process-resources addSources compile org.codehaus.groovy groovy-all 3.0.4 runtime pom ``` That way, when I do an `mvn compile`, my Groovy sources are compiled as part of the compilation lifecycle of Maven. But I'm adding a second plugin, the Functions Framework plugin! That's a Maven plugin to run functions locally on your machine, before deploying into the cloud, so that you can have a local developer experience that's easy and fast. The [Functions Framework](https://github.com/GoogleCloudPlatform/functions-framework-java) is actually an open source project on Github. It's a lightweight API to write your functions with, and it's also a function runner / invoker. What's interesting is that it also means that you are not locked in the Cloud Functions platform, but you can run your function locally or anywhere else where you can run a JAR file on a JVM! Great portability! So let's configure the Functions Framework Maven plugin: ```xml com.google.cloud.functions function-maven-plugin 0.9.2 mypackage.HelloWorldFunction ``` I specify a configuration flag to point at the function I want to run. But we'll come back in a moment on how to run this function locally. We need to write it first! We need two more things in our `pom.xml`, a dependency on Groovy, but also on the Functions Framework Java API. ```xml com.google.cloud.functions functions-framework-api 1.0.1 provided org.codehaus.groovy groovy-all 3.0.4 pom ``` So you're all set for the build, let's now create our function in `src/main/groovy/mypackage/HelloWorldFunction.groovy`. There are two flavors of functions: [HTTP functions](https://cloud.google.com/functions/docs/writing/http) and  [background functions](https://cloud.google.com/functions/docs/writing/background). The latter react to cloud events like a new file stored in Cloud Storage, a new data update in the Firestore database, etc. Whereas the former directly exposes a URL that can be invoked via an HTTP call. That's the one I want to create to write a symbolic `"Hello Groovy World"` message in your browser window. ```groovy package mypackage import com.google.cloud.functions.* class HelloWorldFunction implements HttpFunction {     void service(HttpRequest request, HttpResponse response) {         response.writer.write "Hello Groovy World!"     } } ``` Yes, that's all there is to it! You implement a Functions Framework interface, and its `service()` method. You have a `request` / `response` mode (a `request` and a `response` parameters are passed to your method). You can access the writer to write back to the browser or client that invoked the function. Now it's time to run the function locally to see if it's working. Just type the following command in your terminal: ```bash mvn function:run ``` After a moment, and some build logs further, you should see something like: ``` INFO: Serving function... INFO: Function: mypackage.HelloWorldFunction INFO: URL: http://localhost:8080/ ``` With your browser (or curl), you can browse this local URL, and you will see the hello world message appearing. Yay! With the Maven plugin, you can also deploy, but you can use the [gcloud](https://cloud.google.com/sdk)  command-line tool to deploy the function: ```bash gcloud functions deploy helloFunction\ --region europe-west1\ --trigger-http --allow-unauthenticated\ --runtime java11\ --entry-point mypackage.HelloWorldFunction\ --memory 512MB ``` After a little moment, the function is deployed, and you'll notice you'll have a URL created for your function looking something like this: ``` https://europe-west1-myprojectname.cloudfunctions.net/helloFunction ``` The very same function now runs in the cloud! A pretty Groovy function! This function is portable: you can invoke it with the Functions Framework invoker, anywhere you can run a JVM. Going further, I encourage you to have a look at the [Functions Framework documentation on Github](https://github.com/GoogleCloudPlatform/functions-framework-java) to learn more about it. Here you deployed the function source and the pom.xml file, as the function will be built directly in the cloud. But it's also possible to compile and create a JAR locally and deploy that instead. That's interesting for example if you want to use another build tool, like Gradle. And this will be the purpose of another upcoming article! # Video: the Pic-a-Daily serverless workshop With my partner in crime, [Mete Atamel](https://twitter.com/meteatamel/status/1262316915642576896), we ran two editions of our "Pic-a-Daily" serverless workshop. It's an online, hands-on, workshop, where developers get their hands on the the [serverless products](https://cloud.google.com/serverless)  provided by [Google Cloud Platform](https://cloud.google.com/): - [Cloud Functions](https://cloud.google.com/functions) --- to develop and run functions, small units of logic glue, to react to events of your cloud projects and services - [App Engine](https://cloud.google.com/appengine) --- to deploy web apps, for web frontends, or API backends - [Cloud Run](https://cloud.google.com/run) --- to deploy and scale containerised services [![](/img/pic-a-daily-workshop/picadaily.png)](https://codelabs.developers.google.com/serverless-workshop/) The theme of the workshop is to build a simple photosharing application (hence the play on words, with a picture a day) with those serverless products, but along the way, developers also get to use other services like: - [Pub/Sub](https://cloud.google.com/pubsub/) --- as a messaging fabric to let events flow between your services - [Firestore](https://cloud.google.com/firestore) --- for storing picture metadata in the scalable document database - [Cloud Storage](https://cloud.google.com/storage) --- to store the image blobs - [Cloud Scheduler](https://cloud.google.com/scheduler) --- to run a services on a schedule (ie. cron as a service) - [Cloud Vision API](https://cloud.google.com/vision/) --- a machine learning API to make sense of what's in your pictures [![](/img/pic-a-daily-workshop/diagram.png)](https://codelabs.developers.google.com/serverless-workshop/) The workshop is freely accessible on our codelabs platform: ["Pic-a-Daily" serverless workshop](https://codelabs.developers.google.com/serverless-workshop/). So you can follow this hands-on workshop on your own, at your own pace. There are 4 codelabs: - The [first one](https://codelabs.developers.google.com/codelabs/cloud-picadaily-lab1/index.html?index=..%2F..serverless-workshop#13) lets you build a function that responds to events as new pictures are uploaded into Cloud Storage, invoking the Vision API to understand what is in the picture, and storing some picture metadata information in Firestore. - The [second lab](https://codelabs.developers.google.com/codelabs/cloud-picadaily-lab2/index.html?index=..%2F..serverless-workshop#0) will use a Cloud Run service which reacts to new files stored in Cloud Storage too, but will create thumbnails of the pictures. - A [third lab](https://codelabs.developers.google.com/codelabs/cloud-picadaily-lab3/index.html?index=..%2F..serverless-workshop#0) is also taking advantage of Cloud Run to run on a schedule, thanks to Cloud Scheduler. It creates a collage of the most recent pictures. - Last but not least, the [fourth lab](https://codelabs.developers.google.com/codelabs/cloud-picadaily-lab4/index.html?index=..%2F..serverless-workshop#0) will let you build a web frontend and backend API on Google App Engine. [![](/img/pic-a-daily-workshop/codelabs.png)](https://codelabs.developers.google.com/serverless-workshop/) We have a dedicated [Github repository](https://github.com/GoogleCloudPlatform/serverless-photosharing-workshop) where you can check-out the code of the various functions, apps and containers, and you can have a look at the [slide deck](https://speakerdeck.com/meteatamel/pic-a-daily-serverless-workshop) introducing the workshop and the technologies used. And now, the videos of the first edition are also available on YouTube! The first part covers Cloud Functions and Cloud Run with the first two labs: {{< youtube wEENQouNsGk >}} The second part covers Cloud Run and App Engine: {{< youtube Y9E1fQcPXP0 >}} # Machine learning applied music generation with Magenta I missed this talk from Alexandre Dubreuil, when attending Devoxx Belgium 2019, but I had the chance to watch while doing my elliptical bike run, confined at home. It's about applying Machine Learning to music generation, thanks to the [Magenta project](https://magenta.tensorflow.org/), which is based on [Tensorflow](https://www.tensorflow.org/). {{< youtube O4uBa0KMeNY >}} I like playing music (a bit of piano & guitar) once in a while, so as a geek, I've also always been interested in computer generated music. And it's hard to generate music that actually sounds pleasant to the ear! Alexandre explains that it's hard to encode the rules a computer could follow to play music, but that machine learning is pretty interesting, as it's able to learn complex functions, thus understanding what does sound good. He, then, covers the various types of music representations, like MIDI scores which are quite light in terms of data, and audio waves which on the high end of data as there are thousands of data points representing the position on the wave along the time axis. While MIDI represents a note of music, audio waves really represent the sound physically as a wave (of data points). Note that in the following part of the article, I'm not an ML / AI expert, so I'm just trying to explain what I actually understood :-) For MIDI, Recurrent Neural Networks (RNN) make sense, as they work on sequences for the input and output, and also have the ability to remember past information. And that's great as you find recurring patterns in music (series of chords, main song lines, etc.) RNN tend to forget progressively those past events, so those networks often use Long-Short-Term-Memory to keep some of their memory fresh. Variational Auto-Encoders are a pair of networks that diminish the dimensions of outputs compared to the quantity in input, but to then re-expand back to the same size of output. So VAEs try to actually generate back something that's close to what was initially given in input, but it events to reproduce similar patterns. For audio waves, Magenta comes with a Convolutional Neural Network (CNN) called WaveNet, that's used for example for voice generation on devices like Google Home. There are WaveNet Auto-Encoders that also generate audio waves, because it can learn to generate the actual sound of instruments, or create totally new instruments, or mixes of sounds. Alexandre shows some cool demos of weird instruments made of cat sounds and musical instruments. Magenta comes with various RNNs for drums, melody, polyphony, performance. With auto-encoders for WaveNet and MIDI too. There's also a Generative Adversarial Network (GAN) for audio waves. GANs are often used for generating things like pictures, for example. The demos in this presentation are quite cool, with creating new instruments (cat + musical instrument), or for generating sequences of notes (drum score, melody score) Alexandre ends the presentation with pointers to things like data sets of music, as neural networks further need to learn about style, performance, and networks need plenty of time to learn from existing music and instrument sounds, so as to create something nice to hear! He shows briefly some other cool demos using [TensorFlow.js](https://www.tensorflow.org/js), so that it works in the browser and that you can more easily experiment with music generation. Also, Alexandre wrote the book "[Hands-On Music Generation with Magenta](https://www.packtpub.com/eu/data/hands-on-music-generation-with-magenta)", so if you want to dive deeper, there's much to read and experiment with! # HTML semantic tags We all know about HTML 5, right? Well, I knew about some of the new semantic tags, like `header` / `nav` / `main` / `article` / `aside` / `footer`, but I'm still falling down to using tons of divs and spans instead. So as I want to refresh that blog at some point, it was time I revise those semantic tags. Let's take the little time we have during confinement to learn something! There are likely plenty of videos of the topic, but this one was in my top results, so I watched: {{< youtube kGW8Al_cga4 >}} [HTML & CSS Crash Course Tutorial #6 - HTML 5 Semantics](https://www.youtube.com/watch?v=kGW8Al_cga4&list=WL&index=3&t=0s). It's part of a series of videos on the topic of HTML & CSS by the [Net Ninja](https://www.youtube.com/channel/UCW5YeuERMmlnqo4oq8vwUpg). This particular episode was covering the topic of the semantic tags: So you have a main tag that wraps the meaty content of your page (ie. not stuff like `header` / `footer` / `navigation`). Inside, you would put articles, that wrap each piece of content (a blog post, a news article, etc). Sections tend to be for grouping some other information, like a list of resources, some contact info. Asides can be related content like similar articles, or something somewhat related to your current article (perhaps a short bio of a character you're mentioning in your article?) In the header section, you'd put the title of your site, the navigation. The footer will contain your contact info. Here's a basic structure of how those tags are organised: ![](/img/misc-learning/HTML+5+semantic+tag+structure.png) After an explanation of those tags, the author does a live demo, building up a web page with all those tags. So it was a good refresher for me to remember how to use those tags, rather than nesting `div` after `div`! # Modern web game development Next in my series of videos while doing sports at home, I watched this video from my colleague [Tom Greenaway](https://twitter.com/tcmg)! It's about modern web game development, and was recorded last year at Google I/O. {{< youtube aVTYxHL45SA >}} There are big gaming platforms, like Sony's PlayStation, Microsoft's XBox, Nintendo Switch, as well as plenty of mobile games on Android and iOS. But the Web itself, within your browser, is also a great platform for developing and publishing games! There's all that's needed for good games! Tom explains that you need a functioning game (runs well on device, looks good, sounds good). And today, most of the game engines you can use for developing games actually provide an HTML5 target. You need users, and you need to have a good monetisation strategy. The web already provides all the right APIs for nice graphics, sound mixing, etc, and its a very open platform for spreading virally. It was pretty interesting to hear about one of the key advantages of the web: it's URLs! You can be pretty creative with URLs. A game can create a URL for a given game session, for a particular state in a game, for inviting others to join. In addition to game engines with a web target, Tom mentions also that it's possible to port games from C/C++ for example, to JavaScript in the browser, with a tool like [Emscripten](https://emscripten.org/). Even things like OpenGL 3D rendering can be translated into WebGL. But he also advises to look at WebAssembly, as it's really become the new approach to native performance in the browser. He mentioned [construct](https://www.construct.net/fr), it's basically the Box2D game engine, but optimised for WebAssembly. For 3D graphics, for the web, the future lies in [WebGPU](https://gpuweb.github.io/gpuweb/), which is a more modern take on WebGL and OpenGL. For audio, there's the [Web Audio](https://www.w3.org/TR/webaudio/) APIs and worklets which allows you to even create effects in JavaScript or WebAssembly. But there are other useful APIs for game development, like the [Gamepad](https://www.w3.org/TR/gamepad/) API, the [Gyroscope](https://www.w3.org/TR/gyroscope/) API, etc. For getting users, ensure that your game is fun of course, but also make it fast, in particular load fast, to avoid using users even before you actually got them to load the game! But you also need to think about this user acquisition loop: make the game load and start fast to enter the action right away, so you're really pulled in in the game, and that's then a good reason for users to share this new cool games with others. Of course, being featured on game sites & libraries helps, it gives a big boost, but it's not necessarily what will make you earn the most in the long run. Tom also shares various examples of games that were successful and worked well. # Decoding a QR code by hand Doing sport at home on a treadmill or an elliptical bike is pretty boring, when you're confined, so to make things more interesting, I'm watching some videos to learn something new while exercising. This time, I found this old video about how to decode QR codes... by hand! Have you ever thought how these were encoded? {{< youtube KA8hDldvfv0 >}} This video comes from the [Robomatics](https://www.youtube.com/channel/UCnqifAqGaat6blFe8wlllOw) YouTube channel. You recognise easily QR codes thanks to the 3 big squares with the inner white line. I knew about the purple dotted lines that was fixed in those patterns. What I didn't know however was that there's a mask that is applied to the data, to avoid QR codes to potentially look all white or black. It was interesting to see also how the bytes were encoded: how they follow a path through out the matrix. However, what this video doesn't cover, for example, is how error correction is working. You might have some holes or a bad picture of a QR code, but still being able to decode it with some level of loss of data. So I'll have to learn how that part works some day! # Defence against the Docker arts by Joe Kutner Confined at home, because of the corona-virus pandemic, I'm also doing sport at home. I have a small treadmill for light walks (mostly during conf calls!) and also an elliptical bike. I'd much rather run outside though, but I have to use what I have, even if I hate that stationary elliptical bike in my basement. It's so boring! So to avoid feeling like wasting my time, I decided to watch videos during my sessions! Not necessarily series on Netflix. No! But interesting technical videos. So today, I'd like to share with you a series of posts on those interesting videos I'm watching while exercising. Today, thanks to the wonderful [Joe Kutner](https://twitter.com/codefinger), from [Heroku](https://www.heroku.com/), I learned about the Defence Against the Docker Arts! It was recorded at Devoxx Belgium 2019. {{< youtube ofH9_sE2qy0 >}} Joe starts with clearly differentiating Docker and Dockerfiles. Docker is an ecosystem, while Dockerfiles describe docker container images. An important distinction. The first part of the video, shows best practices on how to writer proper Dockerfiles, and references an article on the Docker blog [post](https://www.docker.com/blog/intro-guide-to-dockerfile-best-practices/) from last year on that topic:  - use official base images, rather than reinventing the wheel, as they are usually more up-to-date and secure - remember that images are built in layers, so to speed up your builds, ensure that the base layers are the ones that change less, and keep your source file changes in the last layer as they change the most - join several RUN commands into one by binding them together with ampersands - be explicit about the version of base images you use - try to chose minimal flavors of base images, as they can be pretty big - build from source in a consistent environment, so that developers are on the same page, with the same version of their environment (build tool, runtime versions) - fetch dependencies in a separate step, so dependencies are cached in their own layer - use multi-staged build to remove build dependencies not needed at runtime That's a lot of things to know! Joe then moves on to talk about higher-level approaches, starting with the [BuildKit](https://github.com/moby/buildkit) Go library. It's more for platform developers than developers, but it gives you lots of advanced controls on how to build docker images. Joe introduces [Jib](https://github.com/GoogleContainerTools/jib) which is a build plugin (both for Maven and Gradle) that let developers focus on writing and building their apps, but letting that plugin create properly layered docker images for you, using minimal base images. You can even build without a local Docker daemon. After BuildKit and Jib, Joe talks about the new Cloud Native [BuildPacks](https://buildpacks.io/), a tool that builds OCI images from source, cleverly. There are buildpacks for plenty of platforms and runtimes, not just Java. Those new cloud native buildpacks build upon years of experience from Heroku and CloudFoundry, on the first version of the buildpack concept. Joe says that buildpacks are reusable, fast, modular and safe, and goes on to show the power of this approach that allowed Heroku, for instance, to safely and rapidly upgrade Heartbleed affected images by replacing the underlying OS with a patched / upgraded version, thanks to image rebasing. # Start the fun with Java 14 and Micronaut inside serverless containers on Cloud Run Hot on the heels of the [announcement](https://mail.openjdk.java.net/pipermail/jdk-dev/2020-March/004089.html) of the general availability of JDK 14, I couldn't resist taking it for a spin. Without messing up my environment --- I'll confess I'm running 11 on my machine, but I'm still not even using everything that came past Java 8! --- I decided to test this new edition within the comfy setting of a Docker container. ## Minimal OpenJDK 14 image running JShell Super easy to get started (assuming you have Docker installed on your machine), create a Dockerfile with the following content: ```dockerfile FROM openjdk:14 CMD ["jshell"] ``` Only two lines: one to declare an [OpenJDK base image](https://hub.docker.com/_/openjdk) with the 14 tag, and one to launch the [JShell](https://docs.oracle.com/en/java/javase/14/jshell/introduction-jshell.html) REPL (introduced in Java 9). Build and tag the image with: ```bash $ docker build -t 14fun . ``` I called it 14fun, because you could almost pronounce it "one for the fun"! And then you can run this container image interactively with: ```bash $ docker run -it 14fun ``` Then you will land directly in JShell, and can try a hello world of sorts: ``` Mar 20, 2020 9:17:28 AM java.util.prefs.FileSystemPreferences$1 run INFO: Created user preferences directory. | Welcome to JShell -- Version 14 | For an introduction type: /help intro jshell> System.out.println("Stay at home!") Stay at home! ``` You can enter /exit to quit the REPL. And you certainly noticed that we're indeed on version 14, congrats! ## New and preview features If you read the [announcement](https://mail.openjdk.java.net/pipermail/jdk-dev/2020-March/004089.html), you will have remarked that some of the new features are not necessarily generally available, but are still only in preview mode.  Here's what's new: - [improved switch expressions](https://openjdk.java.net/jeps/361) (standard), - [pattern matching on instanceof](https://openjdk.java.net/jeps/305) (preview),  - [records](https://openjdk.java.net/jeps/359) (preview), and  - [text blocks](https://openjdk.java.net/jeps/368) (second preview). If you want to play with those upcoming features, you have to let the Java tooling know that you want to enable them. You can do that with the --enable-preview flag. So let's update our Dockerfile accordingly: ```dockerfile FROM openjdk:14 CMD ["jshell", "--enable-preview"] ``` Rebuild and rerun the docker commands. ### Text blocks What about trying the [text blocks](https://openjdk.java.net/jeps/368)? With text blocks, don't bother with appending strings with + operations, not forgetting the \n at the end of each line. It's now sweeter to write long strings spanning several lines, for example: ``` $ docker run -it 14fun  Mar 20, 2020 1:12:28 PM java.util.prefs.FileSystemPreferences$1 run INFO: Created user preferences directory. | Welcome to JShell -- Version 14 | For an introduction type: /help intro jshell> var s = """ ...> Hello everyone! ...> How are you doing? ...> """ s ==> "Hello everyone!\nHow are you doing?\n" ``` ### Records Java is often criticized for its verbosity --- think for instance how writing good POJOs can be tiresome, with proper equals() / hashCode() / toString() methods, constructors, getters and setters. Fortunately, IDEs help a lot here, but sometimes you really want some simple data holder classes without all the boilerplate. That's where [records](https://openjdk.java.net/jeps/359) come into the picture.  Note however that these are not immutable (for example, like in Apache Groovy with its [@Immutable](http://docs.groovy-lang.org/latest/html/gapi/groovy/transform/Immutable.html) transformation), if the fields in the record are mutable objects. Let's imagine a 3D point Record, what would it look like? ``` jshell> record Point3D(double x, double y, double z) { } | created record Point3D jshell> var p1 = new Point3D(0, 1, 2) p1 ==> Point3D[x=0.0, y=1.0, z=2.0] jshell> var p2 = new Point3D(0, 1, 2) p2 ==> Point3D[x=0.0, y=1.0, z=2.0] jshell> p1.equals(p2) $5 ==> true ``` Notice the toString() representation, and that equals() is implemented comparing the values of each field. ### Improved switch expressions Switch statements are... indeed statements, not expressions. It means they didn't so far return any value, or couldn't be passed as parameter values to methods.  Times they are [a-changin](https://en.wikipedia.org/wiki/The_Times_They_Are_a-Changin%27_(Bob_Dylan_album))! Switch borrows the arrow syntax from lambdas to get a... break from break! And they can be used as values too! ``` jshell> var day = "Saturday" day ==> "Saturday" jshell> var isWeekend = switch (day) { ...> case "Monday", "Tuesday", "Wednesday",  ...> "Thursday", "Friday" -> false; ...> case "Saturday", "Sunday" -> true; ...> default -> false; ...> } isWeekend ==> true ``` ### Pattern matching on instanceof Working with Apache Groovy for years, whether with its static type checking or dynamic nature, I'm quite used to skipping the redundant cast inside if (someObj instanceof SomeObject) {} blocks, thanks to [smart type inference](http://docs.groovy-lang.org/latest/html/documentation/core-semantics.html#_instanceof_inference) and flow typing. Java 14 takes a slightly different approach to this problem by with its [pattern matching on instanceof](https://openjdk.java.net/jeps/305), introducing a new local variable of the right type, rather than assuming the variable itself is of the right type. Well, it's better explained with an example: ``` jshell> String name = " Guillaume " name ==> " Guillaume " jshell> if (name instanceof String nameString) { ...> System.out.println(nameString.trim()); ...> } else { ...> System.out.println("Not a string!"); ...> } Guillaume ``` JDK 14 in a serverless container, with Cloud Run ------------------------------------------------ Together we discovered the cool new syntax enhancements and constructs, and how we can play with them inside a Docker container image. But what about deploying some Java 14 powered containerized app in the cloud, in a serverless fashion? (ie. transparent scaling from 0 to 1, 1 to n, and back, paying only for what you use). For that purpose, you can easily deploy and scale containers in the cloud thanks to [Cloud Run](https://cloud.run/). With the launch of Java / JDK 14, also came the first [2.0 milestone of the lovely and powerful Micronaut framework](https://objectcomputing.com/news/2020/03/20/micronaut-20-milestone-1-released)! Micronaut is probably the best framework for serverless microservices on the market, thanks to its awesome performance, lightness, in particular regarding super fast startup times. So it's the right occasion to have fun with [Micronaut](https://micronaut.io/) again. So let's build a Java 14 application, with Micronaut, running on Cloud Run. ### Create a Micronaut app To get started, have a look at the [installation guide](https://docs.micronaut.io/latest/guide/index.html#buildCLI) for Micronaut. In a nutshell, it's using the [Sdkman](https://sdkman.io/) tool to manage versions of various SDKs. You can install Sdkman easily: ```bash $ curl -s "https://get.sdkman.io" | bash ``` Once installed, you can also install the Micronaut command-line: ```bash $ sdk install micronaut 2.0.0.M1 ``` Next, we'll create an empty app named "app" with: ```bash $ mn create-app app ``` The project will be created in the app/ subdirectory, cd into it, to also create a controller, and call it hello: ```bash $ mn create-controller hello ``` You'll need to implement the controller, and tweak the app/build.gradle file to enable Java 14's preview features. Update this section at the bottom of the build file to add the --enable-preview flag: ```groovy tasks.withType(JavaCompile) { options.encoding = "UTF-8" options.compilerArgs.add('-parameters') options.compilerArgs.add('--enable-preview') } ``` Now, open the src/main/java/app/HelloController.java class: ```java package app; import io.micronaut.http.MediaType; import io.micronaut.http.annotation.Controller; import io.micronaut.http.annotation.Get; import io.micronaut.http.annotation.Produces; @Controller("/") public class HelloController { @Produces(MediaType.TEXT_HTML) @Get("/{day}") public String index(String day) { var isWeekend = switch(day.toLowerCase()) { case "saturday", "sunday" -> true; case "monday", "tuesday", "wednesday", "thursday", "friday" -> false; default -> false; }; return String.format( """ It's %s, it is %s the weekend! """, day, (isWeekend ? "" : "not")); } } ``` Notice how you take advantage of the improved switch expression and the text block! ### Create a Docker image Micronaut's project template comes with a default Dockerfile, but update it to look like this: ```dockerfile FROM openjdk:14 WORKDIR /app COPY ./ ./ RUN ./gradlew shadowJar EXPOSE 8080 CMD ["java", "--enable-preview", "-jar", "build/libs/app-0.1-all.jar"] ``` Then build this container image (with your name of choice) with: ```bash docker build -t IMAGE_NAME . ``` And check that it runs fine with this docker run command: ```bash docker run -p 8080:8080 -it IMAGE_NAME ``` Then head over to  or  to see if it works fine. So you now have a working Micronaut 2.0 application, running on JDK 14, using some of the new and preview features of Java 14! Congrats! ### Scaling your container image in the cloud Time to deploy your Java 14-powered Micronaut web application into the cloud, on Cloud Run. Why [Cloud Run](https://cloud.google.com/run/)? Because with Cloud Run, you can easily push a container in production in matters of seconds. It abstracts away all the infrastructure, so you don't have to worry about it. Google Cloud Platform handles it for you, so you can focus on your code instead. You pay proportionally to your usage: it's serveless, so if nobody pings your app, you won't pay anything as no container will be running. But as traffic ramps up, one or more containers will be lined up to serve your requests. If you haven't already, you can get started on Google Cloud Platform with its [free trial](https://cloud.google.com/free) (and free quota). For this tutorial however, you need to create a billing account. Once you have an account ready, create a new GCP project in the Google Cloud [console](https://console.cloud.google.com/). Head over to the Cloud Run section, from the hamburger menu, and click on the "Enable API" button. Last thing before heading to the command-line, install the [gcloud SDK](https://cloud.google.com/sdk/docs/quickstarts) command-line to work from your terminal. Once gcloud is installed, you can login with: ```bash gcloud auth login ``` Set the project name to the one you created in the console: ```bash gcloud config set project YOUR_PROJECT_ID ``` You'll be using the fully-managed version of Cloud Run: ```bash gcloud config set run/platform managed ``` Define a default region, for me, that's gonna be europe-west1 ```bash gcloud config set run/region REGION ``` It's possible to also build container images in [Cloud Build](https://cloud.google.com/cloud-build) (see some [instructions](https://cloud.google.com/run/docs/quickstarts/build-and-deploy) that show this), but here you are using Docker locally to build your images. So let's configure the Docker integration and Container Registry access with the following commands: ```bash gcloud auth configure-docker gcloud components install docker-credential-gcr ``` Tag your image with the following naming convention: ```bash docker build . --tag gcr.io/YOUR_PROJECT_ID/IMAGE_NAME ``` Let's push our image to Container Registry (and change the image and project names accordingly): ```bash docker push gcr.io/YOUR_PROJECT_ID/IMAGE_NAME gcloud run deploy weekend-service\ --image gcr.io/YOUR_PROJECT_ID/IMAGE_NAME --allow-unauthenticated ``` You should see output similar to this, showing the URL where you can access your app: ``` Deploying container to Cloud Run service [weekend-service] in project [YOUR_PROJECT_ID] region [europe-west1] ✓ Deploying new service... Done. ✓ Creating Revision... ✓ Routing traffic... ✓ Setting IAM Policy... Done. Service [weekend-service] revision [weekend-service-00001-xig] has been deployed and is serving 100 percent of traffic at https://weekend-service-brxby8yoda-ew.a.run.app ``` Navigate to that URL, append the name of the day, and check whether it's weekend time! ## And voilà!  Less than a minute later, your Java 14 + Micronaut container app has been deployed to Cloud Run. Automatically, you got a secured HTTPS endpoint for your app (you can also provide your own domain name), without bothering with the infrastructure and scaling aspects. # Serverless tip #7 — Create mini APIs with Cloud Functions and Express routing Requirements: - an existing Google Cloud Platform account and project - Cloud Functions should be enabled for that project Compared to the previous tip when using Exress' request path attribute, we can take advantage of Express routing. So to support the following paths: ``` https://us-central1-myproject.cloudfunctions.net/api/customers https://us-central1-myproject.cloudfunctions.net/api/customers/32 https://us-central1-myproject.cloudfunctions.net/api/customers/32/address ``` We can have our functions require Express by adding Express in `package.json`: ```json {   "name": "mini-api-router",   "version": "0.0.1",   "dependencies": {     "express": "^4.17.1"   } } ``` Then we can require that dependency in our new functions script: ```javascript // we use express explicitly const express = require('express'); const app = express(); // some customer data retrieved from Firestore or elsewhere const customers = {   "32": { name: 'Alice', address: '21 Jump Street' },   "33": { name: 'Bob', address: '1 Main Street' } }; // this time we can define the path easily app.get('/', (req, res) =>          res.send('Hello World!')); app.get('/customers', (req, res) =>          res.status(200).json(customers).end()); // we can also specify path variables like :id  // that we can retrieve via the request params object app.get('/customers/:id', (req, res) =>          res.status(200).json(customers[req.params.id]).end()); app.get('/customers/:id/address', (req, res) =>          res.status(200).json({address: customers[req.params.id].address}).end()); // we need to export the app object for Cloud Functions to expose it exports.api = app; ``` More information: - [Express framework](https://expressjs.com/) - Express [routing](http://expressjs.com/en/guide/routing.html#routing) # Serverless tip #6 — Create a mini web API with Cloud Functions Requirements: - an existing Google Cloud Platform account and project - Cloud Functions should be enabled for that project We often use individual HTTP [Cloud Functions](https://cloud.google.com/functions/) as a single endpoint, and we pass data to the functions with either query parameters, or via a POST body payload. Although it's a good practice to keep the scope of a function small, however, you can easily write mini Web APIs for a given function, with different paths for different needs, like with usual Web frameworks. So instead of having just a single endpoint with: ``` https://us-central1-myproject.cloudfunctions.net/myfunction ``` You can have sub-paths below the name of your function: ``` https://us-central1-myproject.cloudfunctions.net/myapi/customers https://us-central1-myproject.cloudfunctions.net/myapi/customers/32 https://us-central1-myproject.cloudfunctions.net/myapi/customers/32/address ``` Let's have a look at the Node functions runtime, and how you can implement this approach. The key trick here is to use the request path: req.path, which will give you the `/customers/32` part of the fully qualified URL. ```javascript // some customer data retrieved from Firestore or elsewhere const customers = {   "32": { name: 'Alice', address: '21 Jump Street' },   "33": { name: 'Bob', address: '1 Main Street' } }; exports.myapi = (req, res) => {   if (req.path.startsWith('/customers')) {     const pathElements = req.path.split('/') // split along the slashes       .filter(e => e)                        // remove the empty strings in the array       .splice(1);                            // remove the first "customers" element     // path: /customers     if (pathElements.length == 0) {        // return all customers       res.status(200).json(customers).end();     }     // path: /customers/32     else if (pathElements.length == 1) {       res.status(200).json(customers[pathElements[0]]).end();     }     // path: /customers/33/address     else if (pathElements.length == 2 && pathElements[1] == "address") {       res.status(200).json({address: customers[pathElements[0]].address}).end();     }   }   res.status(404).send('Unknown path').end(); }; ``` In the Node.JS runtime, Cloud Functions uses the Express framework under the hood. We have access to the request object, which has lots of useful attributes, including the path. In this simplistic example, we are using this path attribute directly, but it's also possible to use more advanced routing capabilities, as we shall see in a forthcoming tip. More information - [Express framework](https://expressjs.com/) - Node's request [path](https://expressjs.com/en/api.html#req.path) # Serverless tip #5 — How to invoke a secured Cloud Run service locally Requirements: - an existing Google Cloud Platform account with a project - you have enabled the Cloud Run service and already deployed a container image - your local environment's gcloud is already configured to point at your GCP project By default, when you deploy a [Cloud Run](https://cloud.run/) service, it is secured by default, unless you use the --allow-unauthenticated flag when using the gcloud command-line (or the appropriate checkbox on the Google Cloud Console). But once deployed, if you want to call it locally from your development machine, for testing purpose, you'll have to be authenticated. If you look at the Cloud Console, alongside the URL of the service, you can hover the little icon next to the URL, and you'll see the a pop-up showing how you can invoke that service with a curl command: ![](/img/serverless-tips/st5-cr-call-authenticated-service.png) ```bash curl -H \     "Authorization: Bearer $(gcloud auth print-identity-token)" \     https://authenticated-x2rq3lgmra-uc.a.run.app ``` Note how a bearer token generated by the gcloud command is passed as header to the curl request. More information - [https://cloud.run](https://cloud.run/), the serverless container platform - [Authentication overview documentation](https://cloud.google.com/run/docs/authenticating/overview) - [Unauthenticated invocations](https://cloud.google.com/run/docs/authenticating/public) of your services - [gcloud auth print-identity-token](https://cloud.google.com/sdk/gcloud/reference/auth/print-identity-token) # 8 production-ready features you'll find in Cloud Run fully managed Since we [launched Cloud Run](https://cloud.google.com/blog/products/serverless/announcing-cloud-run-the-newest-member-of-our-serverless-compute-stack) at Google Cloud Next in April, developers have discovered that "serverless" and "containers" run well together. With [Cloud Run](https://cloud.google.com/run/), not only do you benefit from fully managed infrastructure, up and down auto-scaling, and pay-as-you-go pricing, but you're also able to package your workload however you like, inside a stateless container listening for incoming requests, with any language, runtime, or library of your choice. And you get all this without compromising portability, thanks to its [Knative](https://knative.dev/) open-source underpinnings.  Many Google Cloud customers already use Cloud Run in production, for example, to deploy public websites or APIs, or as a way to perform fast and lightweight data transformations or background operations.  *"Cloud Run promises to dramatically reduce the operational complexity of deploying containerized software. The ability to put an automatically scaling service in production with one command is very attractive."* - Jamie Talbot, Principal Engineer at Mailchimp. [Cloud Run recently became generally available](https://cloud.google.com/blog/products/serverless/knative-based-cloud-run-services-are-ga), as both a fully managed platform or on [Anthos](https://cloud.google.com/anthos/), and offers a bunch of new features. What are those new capabilities? Today, let's take a look at what's new in the fully managed Cloud Run platform. ## 1. Service level agreement With general availability, Cloud Run now comes with a [Service Level Agreement](https://cloud.google.com/run/sla) (SLA). In addition, it now offers [data location commitments](https://cloud.google.com/terms/service-terms) that allow you to store customer data in a specific region/multi-region.  ## 2. Available in 9 GCP regions In addition to South Carolina, Iowa, Tokyo, and Belgium, in the coming weeks, you'll also be able to deploy containers to Cloud Run in North Virginia, Oregon, Netherlands, Finland, and Taiwan, for a total of nine [cloud regions](https://cloud.google.com/about/locations). ![/img/8cr/Cloud_run_regions.max-2000x2000.png](/img/8cr/Cloud_run_regions.max-2000x2000.png) ## 3. Max instances Auto-scaling can be magic, but there are times when you want to limit the maximum number of instances of your Cloud Run services, for example, to limit costs. Or imagine a backend service like a database is limited to a certain number of connections---you might want to limit the number of instances that can connect to that service. With the [max instance](https://cloud.google.com/run/docs/configuring/max-instances) feature, you can now set such a limit. Use the Cloud Console or Cloud SDK to set this limit: ```bash gcloud run services update SERVICE-NAME --max-instances 42 ``` ## 4. More secure: HTTPS only All fully managed Cloud Run services receive a stable and secure URL. Cloud Run now only accepts secure HTTPS connection and redirects any HTTP connection to the HTTPS endpoint.  But having an HTTPS endpoint does not mean that your service is publicly accessible---you are in control and can opt into [allowing public access](https://cloud.google.com/run/docs/authenticating/public) to your service. Alternatively, you can [require authentication](https://cloud.google.com/run/docs/authenticating/overview) by leveraging the "Cloud Run Invoker" IAM role. ## 5. Unary gRPC protocol support Cloud Run now lets you deploy and run [unary gRPC](https://grpc.io/docs/guides/concepts/) services (i.e., non-streaming gRPC), allowing your microservices to leverage this RPC framework.  To learn more, read Peter Malinas' tutorial on [Serverless gRPC with Cloud Run](https://medium.com/@petomalina/%EF%B8%8Fserverless-grpc-with-cloud-run-bab3622a47da) using Go, as well as Ahmet Alp Balkan's article on [gRPC authentication on Cloud Run](https://ahmet.im/blog/grpc-auth-cloud-run/). ## 6. New metrics to track your instances Out of the box, Cloud Run integrates with [Stackdriver Monitoring](https://cloud.google.com/monitoring/). From within the Google Cloud Console, the Cloud Run page now includes a new "Metrics" tab that shows charts of key performance indicators for your Cloud Run service: requests per second, request latency, used instance time, CPU and memory. A new built-in Stackdriver metric called [`container/billable_instance_time`](https://cloud.google.com/monitoring/api/metrics_gcp#gcp-run) gives you insights into the number of container instances for a service, with the billable time aggregated from all container instances. ![/img/8cr/billable_container_instance_time.max-1200x1200.jpg](/img/8cr/billable_container_instance_time.max-1200x1200.jpg) ## 7. Labels Like the bibs that identify the runners in a race, GCP [labels](https://cloud.google.com/run/docs/configuring/labels) can help you easily identify a set of services, break down costs, or distinguish different environments. You can set labels from the Cloud Run service list page in Cloud Console, or update labels with this command and flag: ```bash gcloud run services update SERVICE-NAME --update-labels KEY=VALUE ``` ## 8. Terraform support Finally, if you practice [Infrastructure as Code](https://cloud.google.com/solutions/infrastructure-as-code/), you'll be glad to know that [Terraform now  supports Cloud Run](https://www.terraform.io/docs/providers/google/r/cloud_run_service.html), allowing you to provision Cloud Run services from a Terraform configuration.  ```hcl resource "google_cloud_run_service" "default" { name     = "hello" location = "us-central1" template { spec { containers { image = "gcr.io/cloudrun/hello" } } } } ``` ## Ready, set, go! The baton is now in your hands. To start deploying your container images to Cloud Run, head over to our quickstart guides on [building and deploying your images](https://cloud.google.com/run/docs/quickstarts/build-and-deploy). With the [always free tier](https://cloud.google.com/free/) and the $300 credit for new GCP accounts, you're ready to take Cloud Run for a spin. To learn more, there's the [documentation](https://cloud.google.com/run/docs/) of course, as well as the [numerous samples](https://github.com/GoogleCloudPlatform/cloud-run-samples) with different language runtimes (don't miss the "Run on Google Cloud" [button](https://github.com/GoogleCloudPlatform/cloud-run-button) to automatically deploy your code). In addition, be sure to check out the community-contributed resources on the [Awesome Cloud Run](https://github.com/steren/awesome-cloudrun) github project. We're looking forward to seeing what you build and deploy! # Serverless tip #4 — Discover the full URL of your deployed Cloud Run services with gcloud format flag Requirements: - an existing Google Cloud Platform account - you have enabled the Cloud Run service and deployed already a container image One of the nice things with Cloud Run is that when you deploy your services, you get a URL like https://myservice-8oafjf26aq-ew.a.run.app/, with a certificate, on the run.app domain name, etc. You see the name of the service: myservice, the region shortcut where it was deployed: ew (Europe West), and then .a.run.app. However, you can't guess ahead of time what the final URL will be, as there is a randomly generated part in the URL (here: `8oafjf26aq`). Let's see how we can discover this whole URL. From my terminal, I can request the list of deployed services (here, on the fully managed Cloud Run): ```bash gcloud run services list \ --platform managed ``` It's going to show me something like the following output: ```    SERVICE      REGION        URL                                        LAST DEPLOYED BY     LAST DEPLOYED AT ✔  myservice    europe-west1  https://myservice-8oafjf26aq-ew.a.run.app  myself@foobar.com    2019-11-20T15:26:39.442Z ``` When describing the specific service (I had to specify the region as well, but you can set it by default if needed to avoid repeating yourself): ```bash gcloud run services describe myservice \ --platform managed \ --region europe-west1 ``` You'll see: ``` ✔ Service hello in region europe-west1 https://myservice-8oafjf26aq-ew.a.run.app Traffic:   100%               LATEST (currently myservice-00002-dox) Last updated on 2019-11-20T15:26:39.442Z by myself@foobar.com:   Revision myservice-00002-dox   Image:             gcr.io/my-sample-project/my-container-image:latest ``` So instead of parsing that ourselves somehow, there's a built-in way to get just the info we want, with the useful `--format` flag: ```bash gcloud run services describe myservice \ --format='value(status.url)' \ --platform managed --region europe-west1  ``` This time, in output, you'll get just the URL, which you can then export or reuse with other commands. ```bash https://myservice-8oafjf26aq-ew.a.run.app ``` The `glcoud` command provides three useful mechanisms to filter, format, or project the output and values returned. Here, we took advantage of format. More information: - [https://cloud.run](https://cloud.run/), the serverless container platform - [Filtering and formatting fun with gcloud, GCP's command line interface](https://cloud.google.com/blog/products/gcp/filtering-and-formatting-fun-with) - [gcloud filters](https://cloud.google.com/sdk/gcloud/reference/topic/filters) - [gcloud formats](https://cloud.google.com/sdk/gcloud/reference/topic/formats) - [gcloud projections](https://cloud.google.com/sdk/gcloud/reference/topic/projections) # Serverless tip #3 — Use the Cloud Run button on your Git repository to deploy your project in a click Requirements: - an existing Google Cloud Platform account - a Git or Github repository containing your project - your project can have a Dockerfile (but not mandatory) With [Cloud Run](https://cloud.run/), you can easily deploy a container image and let it scale up and down as needed, in a serverless fashion: - No need to focus on infrastructure (provisioning servers, clusters, upgrading OS, etc.) - Your application can scale transparently from 0 to 1, and from 1 to n (no need for a pager when your app is featured on Hackernews) - You pay as you go, proportionally to the usage If your project is hosted on Github, for example, how can you help users get started with your project? You usually explain all the steps needed to build a container image, or where to fetch a pre-made image from a hub, and then steps to actually deploy that image on the platform. But thanks to the Cloud Run button, you can add a button image on your README.md page for instance, and then users can click on it and get started with building and deploying to a GCP project automagically. ![](https://deploy.cloud.run/button.svg) If the Git repository contains a Dockerfile, it will be built using the docker build command. Otherwise, the CNCF Buildpacks (with the pack build command) will be used to build the repository. The Cloud Run button github project gives extra information on the parameterization of the deploy URL of the button, for example if you want to specify a particular branch or directory. More information - [https://cloud.run](https://cloud.run/), the serverless container platform - Cloud Run button [github project](https://github.com/GoogleCloudPlatform/cloud-run-button) - Cloud Run button [announcement](https://cloud.google.com/blog/products/serverless/introducing-cloud-run-button-click-to-deploy-your-git-repos-to-google-cloud) - [CNFC Buildpacks](https://buildpacks.io/) # Serverless tip #2 — Deploy an executable JVM application with gcloud without app.yaml or build tool plugin Requirements: - an existing Google Cloud Platform account and project - a Java or alternative language web application - a build that creates a standalone executable JAR file Usually App Engine applications in Java are deployed with the gcloud command-line interface, or via a Maven or Gradle build plugin. Either way, an `app.yaml` file to describe your application is required to let the cloud SDK know that the project at hand is an App Engine project. With the Java 11 runtime, however, it's possible to deploy a standalone executable JAR without `app.yaml`. The `gcloud app deploy` command now takes also a path to a standalone JAR file: ```bash gcloud app deploy path/to/app.jar ``` App Engine will automatically assume that you are deploying to the Java 11 runtime, using an F1 instance (256MB of RAM and 600MHz of CPU). So this deployment would be equivalent to having a simple app.yaml file as follows: ```yaml runtime: java11 instance_class: F1 ``` More information: - [gcloud app deploy](https://cloud.google.com/sdk/gcloud/reference/app/deploy) command details - [app.yaml](https://cloud.google.com/appengine/docs/standard/java11/config/appref) - [Announcement and example with Maven and Spring](https://cloud.google.com/blog/products/application-development/app-engine-java-11-is-ga-deploy-a-jar-scale-it-all-fully-managed) # Serverless tip #1 — Deploy a standalone JVM web app with Gradle and the App Engine plugin Requirements: - an existing Google Cloud Platform account and project - a Java or alternative language web application - a Gradle build that creates a standalone executable JAR file In youd `build.gradle` file, add the App Engine gradle plugin to your `buildscript` dependencies: ```groovy buildscript {     repositories {         jcenter()         mavenCentral()     }     dependencies {         classpath 'com.google.cloud.tools:appengine-gradle-plugin:2.+'     } } ``` Apply the plugin, to make use of it: ```groovy apply plugin: "com.google.cloud.tools.appengine-appyaml" ``` Then you can configure the `appengine` task to point at the standalone executable JAR: ```groovy appengine {     stage.artifact =              "${buildDir}/libs/${project.name}-${project.version}.jar"     deploy {         projectId = "YOUR-PROJECT-ID"         version = "1"     } } ``` You can customize the path of the artifact, specify the project ID outside, or define an App Engine version that's dependent on your project version, a git commit, etc. Note that the App Engine gradle plugin expects to find the `app.yaml` configuration file in `src/main/appengine`. You can then deploy your application with: ```bash $ ./gradlew appengineDeploy ``` More information: - [app.yaml](https://cloud.google.com/appengine/docs/standard/java11/config/appref) - [App Engine gradle plugin](https://github.com/GoogleCloudPlatform/app-gradle-plugin) # Implementing Webhooks, not as trivial as it may seem You've certainly interacted with **webhooks** at some point: with a Github commit webhook, for Slack or Dialogflow chatbots, for being notified of Stripe payments, or when you receive an SMS via Twilio. The concept is fairly well known, but there are some roadblocks along the way, whether you implement a webhook handler (the URL being called) or a webhook backend (the service notifying URLs). It's not necessarily as trivial as it may first seem. As I've been interested in Web APIs for a long time, I decided to look into this topic a bit more, by working on a new talk. ## Videos I've had the chance of giving this talk at GeeCON Prague: {{< youtube hRz38zGPSAU >}} As well as (in French) at BDX.IO: {{< youtube DRf7-dmhNHA >}} You can also watch the slide deck here: {{< speakerdeck b89328ff810c4955a8c2427d05f18bed >}} ## Summary Initially, I was focusing on the backend webhook implementation aspects, but both the sending and receiving ends of webhooks have their own challenges. Let me name the ones I encountered. On the handler / client / receiving side, your webhook handlers should: - Reply with a 200 HTTP status code, to let the service provider know that you successfully received the event notification. - Reply fast, so that the service provider doesn't have to keep as many open connections to the handlers, to let it scale more gracefully to more customers. So a good approach is to acknowledge the reception of the event, but treat that event asynchronously afterwards. - Ack reception and defer event handling, as mentioned above when replying fast, it's important to "ack" quickly the reception, and then you're free to do long event handling afterwards, potentially with some worker queue, with workers that can treat those events at their own pace. You can then scale your workers pool when you need to deal with more notifications. - Calls should be idempotent. Sometimes, for various reasons, it's possible you get event notifications twice for the same event. - Use IP whitelisting, when possible, to ensure that only some IP addresses can ping your handler. Since you're opening an URL to the public, better be sure that it's only the service provider that calls you. But it's not always possible to define such a whitelist, as IP addresses are not necessarily fixed for the service provider. - Check request signature, this time not to avoid a DDoS, but more to ensure the integrity of the event payload that you receive. More on signatures in the server part below. - Take advantage of serverless solutions, as sometimes, you don't get a frequent or regular flow of event notifications, why have a server running all the time? Instead, you can take advantage of serverless solutions, like [App Engine](https://cloud.google.com/appengine/) or [Cloud Run](https://cloud.google.com/run/), as you're only billed for the time used. On the service provider / server / notifier side, your webhook backend should: - Send small data payloads, instead of the whole resource that triggered the event. For instance, your service might notified handlers that there's a new video available. But you don't want to send tons of gigabytes of videos to each and every handler subscribed to that event. So just send a reference to that resource, and keep the event payload small. - Timeout if client is too slow, as you can't wait forever for a faulty handler to reply. Cut the connection if the client handler doesn't reply under a set time interval, and treat this as if the message hasn't been successfully delivered. Which means you'll have to retry sending that event later on. - Retry sending events with exponential backoff, to not overload a handler which is struggling to keep pace, in order to avoid DDoS-ing it with your retries. Instead, use exponential backoff to try to redeliver, for example, after 1s, 2s, 4s, 8s, 16s, etc.  - Keep track of non-responding handlers, after too many failed delivery attempts, mark those handlers as non-responding, and perhaps somehow notify the creator of the handler that it's not responding correctly. - Deliver messages from a work queue, as you have potentially tons of subscribers interested in your events, you don't want your event loop to be taking longer and longer to operate as the number of handlers grow, and instead, offload the work to some worker queue that you can scale independently from the work of keeping pace with ongoing events flow. - Batch events when too frequent, when there are too many event notifications to send. It might be more costly to send each and every event as they come, in real-time. If there are too many events, you can group them in one batch, so as to deliver them at an increased interval of time to your handlers. - Use a dead letter queue, for auditing purpose in particular. For non-responding handlers, or in case of handlers sometimes miss some events, you can push those never-received events in a dead letter queue, so that later on handler developers can check it to see if they actually missed something at some point in the flow. - Use HTTPS for secured connections, well, everyone should use HTTPS all the time these days anyone, but it's better for avoiding man-in-the-middle attacks, to avoid events replay, etc. - Sign requests with a secret, when handlers and service providers share a common secret, the provider can signe the request it sends to handlers, so that handlers can check the message is really coming from the service provider. For example, the Github API is using an HMAC signature, with a SHA-1 digest. - Use proper authentication/authorization mechanisms. This one is a bit vague, but the usual authentication/authorization best practices still apply to webhooks! Going further, I'd like to expand this presentation with more hands-on concrete demos, that put all those best practices into action, and perhaps create some animations to show what happens when handlers are flooded with notifications, when handlers don't respond rapidly enough, etc, as that would probably help visualise more concretely each of those problems or practices. Let's see how I can continue iterating and improving this presentation and topic! Resources Last but not least, there are some great resources available on the topic, that I've added at the end of my slide deck. Be sure to check them out as well: - [Crafting a great webhooks experience](https://speakerdeck.com/apistrat/crafting-a-great-webhooks-experience-by-john-sheehan) (John Sheehan) - [WebHooks: the definitive guide](https://requestbin.com/blog/working-with-webhooks/) - [WebHooks: The API Strikes Back](https://www.infoq.com/presentations/webhooks-api/) (InfoQ) - [Webhooks vs APIs](https://hackernoon.com/webhook-vs-api-whats-the-difference-8d41e6661652) - [What is a Webhooks push-style API & how does it work](https://www.programmableweb.com/news/what-webhooks-push-styled-api-and-how-does-it-work/analysis/The2017/03/28) (ProgrammableWeb) - [Webhooks do's & dont's: what we learned after integration 100+ APIs](https://restful.io/webhooks-dos-and-dont-s-what-we-learned-after-integrating-100-apis-d567405a3671) - [How & why Pusher adopted Webhooks](https://www.programmableweb.com/news/what-are-webhooks-and-how-do-they-enable-real-time-web/2012/01/30) - [Webhooks vs WebSub: Which Is Better For Real-Time Event Streaming?](https://nordicapis.com/webhooks-vs-websub-which-one-is-better-to-stream-your-events-in-real-time/) - [Webhooks, the devil is in the details](https://techblog.commercetools.com/webhooks-the-devil-in-the-details-ca7f7982c24f) - [How to design a webhook for my API](https://phalt.github.io/webhooks-in-apis/) - [Serverless webhooks to revolutionize the SaaS](https://tomasz.janczuk.org/2018/03/serverless-webhooks-to-revolutionize-the-saas.html) # App Engine 2nd generation runtimes and serverless containers with Cloud Run at Cloud Next Tokyo Last week, I was in Tokyo for the first time, to speak at the Google Cloud Next conference. During the DevDay, I spoke about Google App Engine and its 2nd generation runtimes, and I also presented Cloud Run on how to deploy and run containers in a serverless fashion. It's been awesome to visit Japan for the first time and get a chance to meet developers there. Here are the slides I presented: ## App Engine 2nd generation runtimes {{< speakerdeck 675bbf764343407d92eb4c2d9618de5c >}} ## Serverless containers with Cloud Run {{< speakerdeck 59137c528fc04162a7806605fd936b4e >}} # Getting started with Micronaut on Google App Engine Java 11 A [new Java runtime was announced for Google App Engine](https://cloud.google.com/blog/products/application-development/turn-it-up-to-eleven-java-11-runtime-comes-to-app-engine) standard: with Java 11. It's currently in beta, but anybody can already try it out. Another interesting announcement was the fact that the instances running your apps now get [double the memory](https://cloud.google.com/blog/products/application-development/app-engine-second-generation-runtimes-now-get-double-the-memory-plus-go-112-and-php-73-now-generally-available)! So with this double dose of great news, I decided to craft a little tutorial to show how to deploy a Micronaut application on App Engine Java 11. And because Apache Groovy is, well, groovy, I'll go ahead and use Groovy for my programming language, but of course, the same steps apply to Java workloads as well. ## Getting started on Google Cloud Platform In this article, I assume you've created an account on Google Cloud Platform already (follow the "getting started" blue buttons to create an account otherwise and benefit from the [free tier and free quota](https://cloud.google.com/free/)), and that you've downloaded and installed the [gcloud](https://cloud.google.com/sdk/gcloud/) command-line SDK. You'll be able to follow the first few steps in the [quickstart guide](https://cloud.google.com/appengine/docs/standard/java11/quickstart), to create your GCP project and make it ready for using App Engine, in particular those commands: ```bash $ gcloud projects create mn113-gae-java11 --set-as-default ``` You'll have to change the project ID from `"mn113-gae-java11"` to your own name of choice. ```bash $ gcloud app create --project=mn113-gae-java11 ``` It'll ask for a cloud region to use, I've decided to go with europe-west for this one. The above steps can as well be done from the cloud console UI as well, at [https://console.cloud.google.com](https://console.cloud.google.com/). Although your application will run for free within the free quota, we need to enable billing for our app, as it's going to use Cloud Build to build our app, and the latter requires billing to be enabled. To enable billing and the Cloud Build API, please follow the first step of the [quickstart guide](https://cloud.google.com/appengine/docs/standard/java11/quickstart) mentioned above. ## Building our Micronaut application Time to fire the Micronaut goodness! On my machine, I'm using SDKman to install my SDKs, so I've installed Java 11 and Micronaut 1.1.3 as explained in [Micronaut's getting started guide](https://docs.micronaut.io/latest/guide/index.html#buildCLI). Our first step will be to create our basic Micronaut application, thanks to the following command, with the mn command-line SDK: ```bash $ mn create-app mn113-gae-java11 --lang=groovy ``` The structure of your Micronaut project is created, with a Gradle-based build, an Application main class, an `application.yml` file to configure your application. As this application isn't yet doing anything useful, we're create a `"Hello World"` controller with: ```bash $ mn create-controller hello ``` We'll modify this newly created HelloController.groovy controller as follows: ```groovy package mn113.gae.java11 import io.micronaut.http.annotation.* @Controller("/hello") class HelloController {     @Get("/")     String index() {         return "Hello Micronaut!"     } } ``` | On the `/hello` path, we'll simply return a plain text response showing our greeting message. To run your application locally, to check everything is working fine, you'll simply run: ```bash $ ./gradlew run ``` And you can check that `localhost:8080/hello` returns the `Hello Micronaut` message. So far so good. ## Configure our Micronaut application for App Engine In order to deploy the App Engine, we'll use the App Engine Gradle plugin. So we need to amend our `build.gradle` a little. Let's define where Gradle will find the plugin: ```groovy buildscript {     repositories {         jcenter()         mavenCentral()     }     dependencies {         classpath 'com.google.cloud.tools:appengine-gradle-plugin:2.+'     } } ``` We'll make use of the plugin: ```groovy apply plugin: "com.google.cloud.tools.appengine-appyaml" ``` Let's configure the App Engine section: ```groovy appengine {     stage.artifact =             "${buildDir}/libs/${project.name}-${project.version}.jar"     deploy {         projectId = "mn113-gae-java11"         version = "1"     } } ``` Note that App Engine's version string is not supporting dots or underscores (only alphanumeric characters), hence why I replaced the `version` property. Furthermore a reported [issue](https://github.com/GoogleCloudPlatform/app-gradle-plugin/issues/353) prevents me from reusing the Gradle project's own project property in the `projectId` property. ## Configure the App Engine deployment App Engine has its own deployment configuration file, where you will define the App Engine runtime (in our case Java 11), and you can also decide what kind of instance will be used to run your code. Last but not least, you can customize the entry point which defines how your application should be started. In `src/main/appengine` we'll add a file named app.yaml: ```yaml runtime: java11 instance_class: F4 entrypoint: 'java -agentpath:/opt/cdbg/cdbg_java_agent.so=--log_dir=/var/log -jar mn113-gae-java11-0.1.jar' ``` ## Deploying to App Engine Now you're ready to deploy your Micronaut application on App Engine's Java 11 runtime! ```bash $ ./gradlew appengineDeploy ``` After a minute or so, and if billing and the Cloud Build API are enabled as said in the introduction, your Micronaut app should be deployed! You can then browse and get your Hello Micronaut greeting. ## What's next In upcoming articles, I'll cover some other aspects, like how to configure and optimize static asset serving, or perhaps how to integrate with databases or other services of Google Cloud Platform. So stay tuned! # Update on the recent serverless developments on GCP at DataXDay 2019 At [DataXDay 2019](https://dataxday.fr/), last week, I had the chance to present an updated version of my introductory talk on the [serverless compute options](https://cloud.google.com/serverless/) on Google Cloud Platform. There's always something new to cover! For instance, if I put my Java Champion hat on, I'd like to mention that there are new runtimes for App Engine standard, like the beta for Java 11, and there's twice the amount of memory as before. On Cloud Functions, we have an alpha for Java as well (currently Java 8, but it'll be soon moved to Java 11 instead, as customers are more interested in the latest LTS version) In this talk, I also covered [Cloud Run](http://cloud.run/), and Cloud Run on GKE (Google Kubernetes Engine), as well as telling a few words about the [Knative](https://knative.dev/) open source building blocks for Kubernetes, which allows to create serverless portable containers. Here's the slide deck I presented at the conference: {{< speakerdeck 620a008ef3694e93b3a43d15583ff980 >}} # Turn it up to eleven: Java 11 runtime comes to App Engine Yesterday, we [announced](https://cloud.google.com/blog/products/application-development/app-engine-second-generation-runtimes-now-get-double-the-memory-plus-go-112-and-php-73-now-generally-available) new second-generation runtimes for Go 1.12 and PHP 7.3. In addition, App Engine standard instances now run with double the memory. Today, we're happy to announce the availability of the new Java 11 second-generation runtime for App Engine standard in beta. Now, you can take advantage of the latest [Long-Term-Support version](https://www.oracle.com/technetwork/java/java-se-support-roadmap.html) of the Java programming language to develop and deploy your applications on our fully-managed serverless application platform. Based on technology from the [gVisor container sandbox](https://github.com/google/gvisor), second-generation runtimes let you write portable web apps and microservices that take advantage of App Engine's unique auto-scaling, built-in security and pay-per-use billing model---without some of App Engine's earlier runtime restrictions. Second generation-runtimes also let you build applications more idiomatically. You're free to use whichever framework or library you need for your project---there are no limitations in terms of what classes you can use, for instance. You can even use native dependencies if needed. Beyond Java, you can also use alternative JVM (Java Virtual Machine) languages like [Apache Groovy](http://groovy-lang.org/), [Kotlin](https://kotlinlang.org/) or [Scala](https://www.scala-lang.org/) if you wish. In addition to more developer freedom, you also get all the benefits of a serverless approach. App Engine can transparently scale your app up to n and back down to 0, so your application can handle the load when it's featured on primetime TV or goes viral on social networks. Likewise, it scales to zero if no traffic comes. Your bill will also be proportional to your usage, so if nobody uses your app, you won't pay a dime (there is also a free tier available). App Engine second-generation runtimes also mean you don't need to worry about security tasks like applying OS security patches and updates. Your code runs securely in a [gVisor](https://gvisor.dev/)-based sandbox, and we update the underlying layers for you. No need to provision or manage servers yourself---just focus on your code and your ideas! ## What's new? When you migrate to Java 11, you gain access to all the goodies of the most recent Java versions: you can now use advanced type inference with the new var keyword, create lists or maps easily and concisely with the new immutable collections, and simplify calling remote hosts thanks to the graduated HttpClient support. Last but not least, you can also use the JPMS module system introduced in Java 9. You'll also find some changes in the Java 11 runtime. For example, the Java 11 runtime does not provide a Servlet-based runtime anymore. Instead, you need to bundle a server with your application in the form of an executable JAR. This means that you are free to choose whichever library or framework you want, be it based on the Servlet API or other networking stacks like the Netty library. In other words, feel free to use [Spring Boot](https://spring.io/projects/spring-boot), [Vert.x](https://vertx.io/), [SparkJava](http://sparkjava.com/), [Ktor](https://ktor.io/), [Helidon](https://helidon.io/#/) or [Micronaut](https://micronaut.io/) if you wish! Last but not least, second-generation runtimes don't come with the built-in APIs like Datastore or memcache from the App Engine SDK. Instead, you can use the standalone services with their Google Cloud client libraries, or use other similar services of your choice. Be sure to look into our [migration guide](https://cloud.google.com/appengine/docs/standard/java11/java-differences) for more help on these moves. ## Getting started To deploy to App Engine Java 11, all you need is an app.yaml file where you specify `runtime: java11`, signifying that your application should use Java 11. That's enough to tell App Engine to use the Java 11 runtime, regardless of whether you're using an executable JAR, or a WAR file with a provided servlet-container. However, the new runtime also gives you more control on how your application starts: by specifying an extra `entrypoint` parameter in `app.yaml`, you can then customize the java command flags, like the -X memory settings. With Java 11, the `java` command now includes the ability to run single independent `*.java` files without compiling them with `javac`! For this short getting started section, we are going to use it to run the simplest hello world example with the JDK's built-in HTTP server: ```java import com.sun.net.httpserver.HttpServer; import java.io.*; import java.net.InetSocketAddress; public class Main { public static void main(String[] args) throws IOException { var server = HttpServer.create(new InetSocketAddress(8080), 0); server.createContext("/", t -> { var response = "Hello World from Java 11.".getBytes(); t.sendResponseHeaders(200, response.length); try (var os = t.getResponseBody()) { os.write(response); } }); server.start(); } } ``` Notice how our Main class uses the var keyword introduced in Java 10, and how we re-used the keyword again in the `try-`with-resources block, as `Java 11` makes possible. Now it's time to prepare our `app.yaml` file. First, specify the `java11` runtime. In addition, `entrypoint` define the actual `java` command with which to we'll be running to launch the server. The `java` command points at our single Java source file: ```yaml runtime: java11 entrypoint: java Main.java ``` Finally, don't forget to deploy your application with the `gcloud app deploy app.yaml` command. Of course, you can also take advantage of dedicated Maven and Gradle plugins for your deployments. ## Try Java 11 on App Engine standard today You can write your App Engine applications with Java 11 today, thanks to the newly released runtime in beta. Please read the documentation to get started and learn more about it, have a look at the [many samples](https://github.com/GoogleCloudPlatform/java-docs-samples/tree/master/appengine-java11) that are available, and check out the [migration guide](https://cloud.google.com/appengine/docs/standard/java11/java-differences) on moving from Java 8 to 11. And don't forget you can take advantage of the [App Engine free tier](https://cloud.google.com/free/docs/always-free-usage-limits) while you experiment with our platform. *From the App Engine Java 11 team Ludovic Champenois, Eamonn McManus, Ray Tsang, Alexis Moussine-Pouchkine, Averi Kitsch, Lawrence Latif, Angela Funk.* # A serverless Java developer's journey Last week at the Google [Cloud Next](https://cloud.withgoogle.com/next/sf/) conference, I had the chance to speak about the Java developer's journey through the "serverless" offering of [Google Cloud Platform](https://cloud.google.com/), with my colleague Vinod Ramachandran (Product Manager on some of our serverless products): Serverless Java in 2019 is going to be ubiquitous in your favorite cloud. Well, it's actually been 10 years since you could take advantage of Java on Google App Engine. But now you can run your apps on the brand-new Java 11 runtime. Not only servlet-based apps but also executable JARs. And what about authoring functions? Until now, you could only use Node or Python, but today, Java is the third runtime available for Google Cloud Functions. We will review the various ways you can develop your Java functions. Last but not least, thanks to serverless containers, containerized Java workloads run serverlessly, without you caring for infrastructure, scaling, or paying for idle machines. Through various demos, we will look at the many ways Java developers will be able to write, build, test, and deploy code in Java on the rich serverless offering of Google Cloud Platform. Until fairly recently, our compute serverless products consisted only of [Google App Engine](https://cloud.google.com/appengine/) for deploying apps and services, and [Cloud Functions](https://cloud.google.com/functions/) for deploying functions. Furthermore, for the Java developer, the situation wasn't that great as Cloud Functions wasn't offering any Java support (only Node, Python and Go runtimes), and only App Engine provided a Java 8 runtime. Fortunately, some very important announcements were made at Cloud Next: - First of all, in addition to the Java 8 runtime, we have launched an alpha for a brand [new App Engine Java 11 runtime](https://docs.google.com/forms/d/e/1FAIpQLSf5uE5eknJjFEmcVBI6sMitBU0QQ1LX_J7VrA_OTQabo6EEEw/viewform). - We introduced a [Java 8 flavor for Cloud Functions](https://docs.google.com/forms/d/e/1FAIpQLScC98jGi7CfG0n3UYlj7Xad8XScvZC8-BBOg7Pk3uSZx_2cdQ/viewform). - Last but not least, we launched a new product, [Cloud Run](https://cloud.google.com/run/), which allows you to run containers serverlessly, and thus any Java workload that can be containerized. So you can develop Java functions, Java apps and Java-powered containers in a serverless fashion: - Scaling on demand as needed to serve incoming requests as well as down to zero when no traffic comes. - Paying proportionally to the usage. - And all of that, without having to worry with server or cluster provisioning and management. Without further ado, let me share with you the video and the slides of this presentation: Video recording: {{< youtube WnhAYX1Phxw >}} Now it's your turn! If you want to try out functions, apps, and containers in Java, here are a few pointers to get you started: - [Sign-up form](https://docs.google.com/forms/d/e/1FAIpQLSf5uE5eknJjFEmcVBI6sMitBU0QQ1LX_J7VrA_OTQabo6EEEw/viewform) for App Engine Java 11 - [Sign-up form](https://docs.google.com/forms/d/e/1FAIpQLScC98jGi7CfG0n3UYlj7Xad8XScvZC8-BBOg7Pk3uSZx_2cdQ/viewform) for Cloud Functions Java 8 - Cloud Run [documentation](https://cloud.google.com/run/) # On curiosity and sharing with the world At the end of December, I was contacted by someone I didn't know, who asked me some interesting questions, and that led me to quite a bit of introspection. As a Java Champion and with your career history. I wanted to ask you what you consider are the most important skills for a Java programmer to have in their toolbox, especially a Senior Java programmer? Or maybe even a better question is what skills you developed that helped you become the Java Developer/Groovy Language Developer that you are today. In a nutshell, as I answered this person, for me it all boiled down to lots of curiosity, and the desire to share my findings with the world. It's not really about knowing specific methodologies, technologies or languages, or which soft or hard skills to master. It's about the core attitudes from which all the rest will derive from. But first, a bit of background about me. ## A bit of history Alright, so if I was contacted (and actually a few others as well) with those questions, it's because I'm considered to be a visible and public person. Because I'm known for my work in the Java community and more precisely in the Apache Groovy ecosystem. I've been in the field for quite a number of years, along with my contributions in Open Source, and that makes me a senior developer. But how did I get there? You've learned a lot during your studies, but often, not much of what you learned is immediately applicable in your daily duties and tasks. So there's even more to learn to become a productive developer. I started working as a junior Java developer in 2001. I was lucky to have had a great mentor that helped me design and write better code. I also spent quite some time reading Java and development related news websites or blogs. I wanted to know what were the latest trends (new language features, frameworks), the best tools for the job, how developers were developing on their projects. So clearly, I was pretty curious to look beyond just what I was doing at work, but to see if I could become a better programmer by learning from others. There's so much great content on the web, so much information that is shared, from best practices to bug fixes explanations, that you can learn a lot. That's also more or less when I started blogging. I saw so many useful blog posts that helped me, that I thought it would be a good thing to share back things I learned that could be helpful to others as well. In 2003, at work, I needed a way to extend an app I was working on, and clearly, some kind of scripting solution was what would allow the end-users of our app to further customize and tailor the application to their needs. So I spent some time reviewing existing Java scripting solutions, but none were really ideal. Fortunately, that's when Groovy was born. It was embryonic, and not really ready for prime time though. But it was what I needed. I started playing with Groovy, but quickly I encountered tons of problems and bugs. Since the code was Open Source, I started looking at its codebase, outside of work. I quickly understood where some of the bugs were coming from, and found ways to fix them. Since the community was pretty open, I participated in the mailing-lists to tell about those bugs, to help other users. It was nice to feel being part of a nice, friendly and helpful community. I used the bug tracker to file bugs and feature requests, and when I could I even submitted some patches to fix these. My patches were accepted, and in a handful of months, I was asked to become an official committer on the project (which I gladly accepted). By working with the other committers, I learned a lot about Java, the JVM, or how open source projects worked. That was super interesting. Since the code was public, I really wanted all my contributions to be top-notch, perfectly well tested and commented. Somehow I had the impression that the scrutiny of my peers mandated that I had to produce even better code than at work! So I perfected my craft. A lot. Since I had already started sharing my findings on my blog (and later on on social networks), I became part of the so-called "blogosphere", and started interacting with other bloggers. I wrote about Java and Groovy, of course, but the discussions with other open source developers, allowed me to also meet them in the real world. We even started a meetup of open source developers that shared what they were working on. That's how I did my first public presentation, to show Groovy to my peers, in 2004 or so, at our local gatherings. I came to know people working for big companies like Sun or Oracle, as well as smaller actors, from freelancers, to entrepreneurs. A handful of those companies started using Groovy, and that's how one day, someone asked if I'd be ready to talk with them at a big conference. That was for JavaOne! My first big conference and presentation was in the US in front of 600 persons. Woh... That's how I started sharing more widely with the world, and also started travelling to spread the word. I spent a lot of time on Groovy and its ecosystem, and I later got the chance to both work on those technologies for a living (after doing quite a bit of consulting), as well as even creating my own company to focus on the project. At the same time, I was still continuing presenting about Groovy, and still improving the language thanks to the feedback I was getting from the many developers I was meeting all around the world. I was doing developer advocacy at the same time as product management and development. Three hats in one. And by doing developer advocacy, that's also what landed me my current job of developer advocate at Google. ## The ever changing nature of our field From the narrated history above, there's a theme that emerges: curiosity. But what lead me to being curious? Tons of people are doing 9-to-5 jobs, and that's totally fine. However, as we spend so much time in our lives at work, for me, it had to be interesting and motivating. To be interesting, work has to be somehow entertaining --- beside spending quality time with great coworkers. If it's not interesting, you get bored very easily, and you don't want to wake up every morning to go to the office. So how not to be bored? By making your job more interesting. How to make it more interesting? Well, if you're passionate about what you're doing, it's much easier to go through the day and do fun and interesting things, event for a project that could appear as not being very fancy. Programming was first a hobby, for me, as a child and teenager. It never really occurred to me it could become my job. Initially, I just wanted to be... an "engineer". Perhaps in aerospace, or something like this. Who hasn't dreamt of becoming an astronaut? It's only late in my studies that I thought I could actually become a developer. So my hobby, my passion, became my job. But there's a big difference between working on stuff you want, versus being asked to work on stuff for the company which hired you. In order to not be bored, be sure to push for improving the project in interesting ways both for you and the end-users. If possible, perhaps try to introduce and learn new technologies that can make the product better, and at the same time make you learn something new. Be passionate about improving both your projects and your skills. Notice also that in our field, we actually don't really have a choice but to learn. When I was a student, my current job didn't even exist. When I started working, the languages or tools I'm using today weren't available then yet. So in IT, in programming, etc, there's always a new language, a new tool, a new practice, new patterns, etc, that come to light. It's a field where we have to be in a constant learning state, in order to stay relevant. If you're not learning, your skills will rot, you'll be less employable, you'll diminish your chances of having a fantastic job. So you have to be curious and learn all the time. To not be bored, but also to get better at your craft. With all those new tools, languages, frameworks, technologies, you have to keep up with what's going on. You have to be ready to learn something new. ## Sharing is caring We talked a lot about being curious, about learning all along, but I also mentioned about sharing. As the saying goes, sharing is caring, but it's also about creating opportunities for you. Sharing what I learned or worked on was helpful for others too (who encountered similar problems, for example), but it's also how I came to meet wonderful people along the way. Even mentors and role models. If I hadn't blogged or tweeted, I wouldn't have been able to start making presentations at meetups and conferences. And many of the friends I have today are friends I met along the way, at meetups, conferences, working on open source projects together, and so on. Without sharing my code, I wouldn't have had the opportunity to meet my future employers and colleagues, as well as the co-founders of my own startup. Sharing is great to be visible, of course, but it's a wonderful way to meet new people from whom you'll learn a lot. Open source is sharing too. Working on open source projects, nurturing communities and ecosystems around those, further allowed me to meet great people around the world. And it's what lead me to get the jobs at companies I was interested in. It created great professional opportunities. ## Summary Let's try to wrap up a bit. It's really not about learning a particular tool or technology. It's all about being curious and passionate about your craft, and to share what you've learned with the world. ### Be curious! It'll make your daily job more interesting. You will learn lots of great new technologies. You'll become a better developer by learning from your peers. You'll improve your craft and expertise. It'll increase your employability. You'll even likely become an expert in your field! ### Share with the world! Write, blog, tweet, present about the things you've learned at meetups or conferences. You'll learn a lot from others along the way. Write and share code, and/or contribute to open source projects. You'll meet awesome peers and mentors. And will create all sorts of interesting job opportunities. ### Be curious and share with the world! # Tip: Making a Google Cloud Storage bucket or file public [Google Cloud Storage](https://cloud.google.com/storage/) is the ideal product to store your object files (binary files, pictures, audio/video assets, and more). Until recently, there was an option in the Google cloud console with a checkbox to quickly make a file or bucket public. However, and I would add "unfortunately", users tended to inadvertently clicking the checkbox, thus making potentail confidential assets public. So this risky, but easy, option, has been removed to avoid any unwanted data leak. However, of course, it's still possible to [make buckets or files stored in Cloud Storage public](https://cloud.google.com/storage/docs/access-control/making-data-public). But you can't do it without paying attention! As I never quite remember how to do that (in spite of the linked documentation easily found with a quick Google search), I decided to highlight with a few screenshots how to achieve that! I assume you already have or created a GCP project, and you also have a bucket full of assets that you want to make public, because you need to share them on the Web, for your mobile application, etc. To illustrate this tip, let's have a look at the GCP cloud console: ![](/img/public-bucket/gcs-01-file-browser-small.png) ## Making a file public First, we'll have a look at making a single file public. You'll have to click the vertical triple dot icon on the right of the screen, and click on `Edit permissions`: ![](/img/public-bucket/gcs-02-permissions-drop-down-small.png) Once you've clicked on this option, you'll be given the choice to add new permissions to members or groups of members: ![](/img/public-bucket/gcs-03-permissions-dialog-small.png) In our case, we want to allow all the users to have read access on that particular file. So I'm giving the `Group` of `allUsers` the `Reader` access level. Then, once saved, in the file browser, you should see the following icon warning you the file is now public: ![](/img/public-bucket/gcs-04-public-warning-small.png) ## Making a bucket public Instead of doing this for each individual file, you can also do the same at the bucket level, to give read access to the bucket and all its files in one go. From the object browser, click on the `Permissions` tab. You will have to add the `allUsers` members the `Storage Object Viewer` role: ![](/img/public-bucket/gcs-05-bucket-permissions-small.png) Click on the `Add members button`, type `allUsers`, select the `Storage > Storage Object Viewer` option, as follows, and click `add`: ![](/img/public-bucket/gcs-06-add-role-to-bucket-users-small.png) Now if you head back to the file browser, you'll see all the files have the little warning icon telling you the resource is publicly accessible. ## For command-line gurus I showed the visual approach from the cloud console... but there's a one-liner you can use, thanks to the [gsutil](https://cloud.google.com/storage/docs/gsutil) command. For an individual file: ```bash gsutil acl ch -u AllUsers:R gs://[BUCKET_NAME]/[OBJECT_NAME] ``` For a whole bucket: ```bash gsutil iam ch allUsers:objectViewer gs://[BUCKET_NAME] ``` (Where you replace `[BUCKET_NAME]` with your project name, and `[OBJECT_NAME]` with the file name) ## More... There's also a REST API that you can use to handle your buckets and file, as well as different client libraries in different languages that you can use as well. # Mais c'est quoi un Developer Advocate ? J'ai eu le plaisir d'encadrer des stagiaires de 3ème récemment chez Google. Nous accueillons des enfants, neveux, nièces, cousins d'employés de Google (donc non :-P, je ne prends pas de stagiaire, pas la peine de demander !!!) pour leur faire découvrir les différents métiers que nous exerçons dans l'entreprise. Et il y en a beaucoup ! L'un de mes stagiaires m'a interviewé lorsque je décrivais mon travail de "Developer Advocate", au sein de Google Cloud. J'ai trouvé cette interview intéressante, et je me suis dit que ça valait le coup de la partager avec vous, en Français (si, si, j'écris en Français parfois sur ce blog.) L'exercice de cet entretien est intéressant parce qu'il me permet d'essayer d'expliquer de manière simple ce que je fais... et ce n'est pas évidant, vu la technicité de cet univers, des produits sur lesquels on travaille (qui ne sont pas à la portée du grand public). Donc voici quelques unes des questions auxquelles j'ai eu à répondre. J'espère que vous trouverez cela intéressant. ## Quel est ton métier ? Je suis **Developer Advocate** pour la branche **Google Cloud** de Google. C'est un titre en anglais qui n'a pas vraiment d'équivalent en Français. [Google Cloud](https://cloud.google.com/), c'est la partie de Google qui propose des produits comme par exemple Gmail pour la messagerie électronique, Google Docs pour le traitement de texte ou de tableurs, mais qui offre également aux développeurs et ingénieurs des serveurs sur lesquels les développeurs peuvent héberger leurs applications, des bases de données pour stocker leurs données, ou bien encore des services autour de l'intelligence artificielle pour reconnaître ce qu'il y a comme objets dans des images. ## Quelles sont les principales activités dans ce métier ? Le rôle d'un Developer Advocate est d'être une sorte de représentant des développeurs (nos utilisateurs et clients qui créent des applications). Nous collectons les retours de ces développeurs pour voir comment améliorer les produits de Google, en écoutant leurs besoins, leurs demandes, les problèmes qu'ils rencontrent. Par ailleurs, une autre grande partie de ce rôle (la partie la plus visible de l'extérieur), c'est de présenter et promouvoir les produits de Google (Google Cloud dans mon cas). Il s'agit alors d'écrire des articles, de coder des exemples de programmes utilisant nos technologies, enregistrer des vidéos, préparer des présentations que nous donnons lors de conférences locales ou internationales, puis aussi tirer partie des réseaux sociaux pour faire passer notre message. Enfin, nous sommes également les utilisateurs "zéro" de nos futurs nouveaux produits. Nous sommes donc les premiers à tester ces produits, pour voir ce qui fonctionne ou pas, s'ils sont faciles à utiliser, s'ils ont des bugs, si l'on peut trouver des façons de les améliorer ou de les enrichir avant de les sortir publiquement. ## Travailles-tu plutôt seul ou en équipe ? Pourquoi ? Je travaille dans une équipe internationale, répartie dans différents pays, avec des bureaux à San Francisco, Seattle, New York, Londres, Paris ou Tokyo. Parfois je travaille effectivement en équipe, à plusieurs sur un même sujet, mais beaucoup de mes activités sont aussi solitaires, comme lorsque l'on prépare une démonstration et une présentation pour une conférence. ## Dois-tu prendre des initiatives ? Pourquoi ? Le poste de Developer Advocate est un travail qui demande beaucoup d'autonomie, et donc qui nécessite de prendre soi même beaucoup d'initiatives. C'est à moi de décider sur quel sujet travailler, quel article écrire, à quelle conférence je souhaite assister. Parfois, les Developer Advocates ont des demandes venant des chefs de produits qui souhaiteraient que nous représentions leur produit, mais c'est généralement par soi même que l'on décide sur quoi travailler. ## As-tu des responsabilités ? Si oui lesquelles ? Pourquoi ? Je n'encadre personne, donc je n'ai en tout cas pas de responsabilité managériales. Par contre, je me focalise sur une certaine gamme de produits, et c'est de ma responsabilité de représenter et d'améliorer ces produits là. ## As-tu des contacts avec des personnes autres que tes collègues de travail ? Pourquoi ? Comme une partie de mon travail consiste à aller présenter les produits de Google Cloud à des conférences, des "meetups", des groupes d'utilisateurs, je rencontre effectivement beaucoup de monde lors de ces événements. Egalement au travers des réseaux sociaux, j'interagis souvent avec des développeurs aux quatre coins du monde. Et j'ai également la chance de pouvoir rencontrer nos clients, pour discuter de leurs problèmes, de leurs besoins, mais aussi de leurs réussites. ## Les activités sont-elles variées ou répétitives ? Les activités sont effectivement variées, aussi bien de la programmation (pour préparer des démonstrations des produits), de la communication (pour des présentations), de l'écriture (pour les articles, la documentation que nous rédigeons), mais aussi le voyage au travers le monde permets de découvrir de nouveaux horizons et de nouvelles cultures. ## Dois-tu organiser ton travail ou plutôt exécuter des consignes ? Comme je le disais plus haut, c'est un travail qui demande beaucoup d'autonomie, donc c'est à moi d'organiser mon travail, d'autant plus que j'ai déjà un certain niveau d'ancienneté. Il arrive parfois qu'on me demande certaines choses effectivement, mais le plus souvent c'est à moi de m'organiser. ## Est-ce que le travail est fatigant (nerveusement et physiquement) ? Pourquoi ? Parfois oui. Il y a beaucoup à faire : Google Cloud possède de nombreux produits, qui sont développés rapidement, il y a donc énormément de choses à couvrir et étudier. Ce qui est également fatigant, c'est le fait de voyager au travers le monde avec les décalages horaires, les tracas associés au voyage de manière générale. Outre le voyage et la quantité de travail, il faut travailler régulièrement avec la maison mère qui est sur la côte Ouest des Etats-Unis, avec 9 heures de décalage horaire, ce qui veut dire qu'il faut pouvoir faire des conférences téléphoniques avec mes collègues outre-Atlantique après 18 ou 19 heures, voire parfois même à 21 ou 22 heures ! Autre exemple, lorsque l'on donne des présentations à des "meetups", c'est souvent le soir après le travail, donc on peut rentrer parfois très tard chez soi. ## Est-ce que cette profession permet d'obtenir une promotion ? Laquelle ? Dans ce rôle, il y a différents grades, différents niveaux. Quand on commence dans ce métier, on entrera à un certain niveau, et au fil des années, on pourra progresser et monter de niveau en niveau. L'entreprise n'attends pas la même chose de ses employés en fonction de leur niveau d'ancienneté. Nous avons une sorte d'échelle qui décrit ce qui est attendu de chacun à chaque niveau de sa progression hiérarchique. ## Quels sont les diplômes requis ? Il faut généralement un Bac+5 (type mastère, école d'ingénieur) en informatique et avoir déjà une certaine expérience professionnelle dans le domaine de l'informatique pour bien comprendre la problématique de nos utilisateurs et clients. ## Quel est le contenu des études ? Dans mon cas (un DESS en informatique), ce sont des études qui permettent aux étudiants d'apprendre comment fonctionnent les ordinateurs, d'apprendre des langages de programmation, comment écrire des algorithmes, comment fonctionne les réseaux informatiques ou les bases de données, etc. ## Est-ce qu'il y a d'autres formations possibles (formation continue, cours du soir, autres) pour cette profession ? Bien qu'un diplôme Bac+5 en informatique soit le plus commun, du moment qu'on a une bonne expérience professionnelle dans le domaine, il est aussi possible de bifurquer vers cette profession. Mais il n'y a à proprement parlé pas de formation spécifique pour le rôle de Developer Advocate à ma connaissance. On demande souvent aux jeunes "qu'est-ce que tu voudras faire plus tard ?"  Dans mon cas, je voulais être "ingénieur", et j'aimais bien l'informatique, mais ce qui est amusant c'est qu'à l'époque où j'ai fait mes études, le métier de "Developer Advocate" n'existait pas vraiment. Donc qui peut dire quels métiers existeront quand vous serez sur le marché du travail ? Il faut savoir s'adapter, car de nouveaux métiers voient le jour régulièrement, et il n'y aura pas forcément de formation spécialisée pour y parvenir, ou de parcours type. ## Quelle sont les qualités requises ? Outre le fait d'être un bon ingénieur en informatique (savoir programmer, etc.), il faut aussi avoir de très bonnes capacités communicationnelles, un esprit de synthèse et de vulgarisation, car nous devons aider nos utilisateurs et clients à comprendre comment se servir de nos produits. Par ailleurs, la majeure partie de mon travail s'effectue en anglais, qui est la langue de la maison mère, mais la langue internationale en informatique et lorsque l'on voyage. Donc il faut absolument être bon en Anglais pour ce travail. # Machine learning APIs with Apache Groovy At [GR8Conf Europe](https://gr8conf.eu/) last year, I [talked](https://2018.gr8conf.eu/talks/610)  about how to take advantage of the [Google Cloud machine learning APIs](https://cloud.google.com/products/ai/building-blocks/)  using [Apache Groovy](http://groovy-lang.org/). With Groovy, you can call the Vision API that recognises what's in your pictures, or reads text. You can invoke the Natural Language API to understand the structure of your text. With the Speech-To-Text API, you can get transcriptions of what's been said in an audio stream, or with Text-To-Spech, you can also generate human-like voices from your own text. With those ready-made APIs, no need to be an expert data scientist! Just call an SDK or a REST API, and you're ready to go. Here's the video of the presentation: {{< youtube WKK1iUoB6CE >}} And here are the slides that I showed: {{< speakerdeck c2927b99b7b64b9c83a90fe81704a9f2 >}} In a [previous article](http://glaforge.appspot.com/article/vision-recognition-with-a-groovy-twist), I also presented one of the sample that I used with the Vision API. The [Call for Paper](https://www.gr8conf.eu/cfp) for the new edition of GR8Conf Europe is still open, so don't miss the opportunity to tell the world about all the cool things you've been doing with Apache Groovy! # Interview InfoQ en Français sur les microservices sur Google Cloud Platform Une fois n'est pas coutume, je vais parler de Google Cloud Platform en français ! Lors de la conférence Voxxed Days Microservices, que j'ai  [couverte récemment]({{< ref "/talks/2019/01/03/new-serverless-solutions-on-google-cloud-for-functions-apps-and-containers" >}}), j'ai eu l'occasion de répondre à une interview pour InfoQ France. [Interview sur les microservices sur Google Cloud Platform](https://www.infoq.com/fr/interviews/voxxeddays-microservices-2018-guillaume-laforge) Voici la liste des questions auxquelles j'ai répondues, et je vous laisserai écouter les réponses sur  [InfoQ France](https://www.infoq.com/fr/interviews/voxxeddays-microservices-2018-guillaume-laforge) ! - Pour ceux qui ne te connaissent pas, peux-tu nous dire qui tu es ? - Elles sont où les équipes produits ? - Et les utilisateurs, en France, il y en a beaucoup ? - Pour les néophytes, les microservices, qu'est-ce que c'est ? - C'est quoi le "nouveau" par rapport aux architectures dites distribuées, soa, webservices ? - On va parler de la platforme cloud de google, où ça en est ? - C'est quoi serverless, le retour du mainframe ? - Quelles nouveautés ? - Et Google vis à vis de java ? - C'est quoi les langages que vous poussez le plus ? - Le futur des microservices ? # New Serverless Solutions on Google Cloud for Functions Apps and Containers At [Voxxed Days Microservices](https://voxxeddays.com/microservices/), in Paris, I talked about the latest development in serverless solutions on [Google Cloud Platform](https://cloud.google.com/products/), to deploy [functions](https://cloud.google.com/functions/), [apps](https://cloud.google.com/appengine/)  and even [containers](https://cloud.google.com/blog/products/gcp/bringing-the-best-of-serverless-to-you). I answered an [interview](https://voxxeddays.com/microservices/2018/09/16/guillaume-laforge-on-serverless-solutions-on-google-cloud/)  on the theme of microservices, and how this maps to the Google cloud products. And the video of my presentation was published on YouTube: {{< youtube Ke89_rFx7tI >}} Here's the abstract of the session: > Plenty of novelties in the Serverless offering of Google Cloud Platform, whether you're developing functions, apps or containers. > > Let's get started with the new modern runtimes for the venerable Google App Engine, > sandboxed thanks to the open source gVisor container sandboxing technology. > Cloud Functions is now GA with Node.js, but also offers new languages like Python to let you implement your functions. > If you need more flexibility, you will also be able to run serverless containers: just dockerize your project and off you go! > > But the crux of the show has to be the new open source project, Knative, a collaboration of Google with key vendors like Pivotal, > IBM, Red Hat or SAP, which offers a set of portable building blocks on top of Kubernetes to build serverless platforms. > Additionally, you will be able to try out Knative on Google Kubernetes Engine thanks to a dedicated add-on. > > In this session, we'll review all the new serverless-related features of Google Cloud Platform with concrete demos, > so you can get started easily and rapidly. # Deploy a Micronaut application containerized with JIB to Google Kubernetes Engine A few weeks ago, I had the chance to be at [Devoxx Belgium](https://devoxx.be/) once again, to meet developers and learn about new things from awesome speakers. Google Cloud Platform had its own booth on the exhibition floor, and the team was running codelabs: 10 laptops were at the disposal of attendees to go through various hands-on tutorials on several GCP products. I took a chance at crafting my own codelab: deploying a Micronaut application, containerized with Jib, to Google Kubernetes Engine. For the impatient, follow this link: [g.co/codelabs/micronaut](https://g.co/codelabs/micronaut) Note: If you haven't got a GCP account already, know that there's a [free trial with $300 of cloud credits](https://console.cloud.google.com/freetrial) to get started. More information on the tools used: - [Micronaut](http://micronaut.io/) is a modern, JVM-based, full-stack framework for building modular, easily testable microservice and serverless applications. Micronaut aims to deliver great startup time, fast throughput, with a minimal memory footprint. Developers can develop with Micronaut in Java, Groovy or Kotlin. - Jib is an open source tool that lets you build Docker and OCI images for your Java applications. It is available as plugins for Maven and Gradle, and as a Java library. - [Kubernetes](https://kubernetes.io/) is an open source project which can run in many different environments, from laptops to high-availability multi-node clusters, from public clouds to on-premise deployments, from virtual machines to bare metal. - [Google Kubernetes Engine](https://cloud.google.com/kubernetes-engine/) is Google Cloud Platform's hosted Kubernetes platform. In this codelab, you deploy a simple Groovy-based Micronaut microservice to Kubernetes running on Kubernetes Engine. The goal of this codelab is for you to run your microservice as a replicated service running on Kubernetes. You take code that you have developed on your machine, turn it into a Docker container image built with Jib, and then run and scale that image on Kubernetes Engine. # An Intro to Google Cloud Platform In a matter of a few years, Google Cloud Platform has evolved from a very small set of products or APIs to a wealth of close to a hundred of products, services and APIs that developers can take advantage of. This week, at the event [Le Meilleur Dev de France](https://www.meilleurdevdefrance.com/), I gave an introduction to the whole platform, focusing on three key axis: compute, storage and machine learning. After an introduction on famous users of GCP, like Snapchat, Spotify or PokemonGo, I also gave a few examples of big French companies as well as French startups who have decided to go to the cloud with Google. Later on, over the course of three sections, as I was covering the multiple solutions in each areas (compute / storage / ML), I also tried to give concrete hints as to when to use what, depending on your application needs. Indeed, as many solutions are at your disposal, comes the paradox of choice, as with more options, choice becomes more complicated. Here are the slides I presented: {{< speakerdeck 5bc87feeb5914782a2476411c01d353a >}} # Nice Series on Spring Cloud Integration for Google Cloud Platform My friend and former colleague [Josh Long](https://twitter.com/starbuxman) wrote a nice series of articles showing the Spring Cloud integration for Google Cloud Platform. The series contains 8 articles, covering: - [the setup & authentication](https://spring.io/blog/2018/08/20/bootiful-gcp-getting-started-with-spring-cloud-for-google-cloud-platform-1-8),  - how to [access relational databases](https://spring.io/blog/2018/08/23/bootiful-gcp-relational-data-access-with-spring-cloud-gcp-2-8),  - how to [connect to Spanner](https://spring.io/blog/2018/08/27/bootiful-gcp-globally-consistent-data-access-with-spanner-3-8),  - how to [send/receive messages with Pub/Sub](https://spring.io/blog/2018/08/30/bootiful-gcp-integration-with-google-cloud-pub-sub-4-8),  - how to [use the runtime configuration](https://spring.io/blog/2018/09/03/bootiful-gcp-runtime-configuration-with-spring-cloud-gcp-runtime-config-5-8),  - how to look into [Stackdriver tracing](https://spring.io/blog/2018/09/06/bootiful-gcp-supporting-observability-with-spring-cloud-gcp-stackdriver-trace-6-8),  - how to [access other services like the machine learning APIs](https://spring.io/blog/2018/09/10/bootiful-gcp-use-spring-cloud-gcp-to-connect-to-other-gcp-services-7-8),  - and last but not least, how to [go to production](https://spring.io/blog/2018/09/13/bootiful-gcp-to-production-8-8). So if you're using Spring and Spring Boot, this is the way to go for getting started on using Google Cloud Platform. For further reference, you can go read the [documentation](https://cloud.spring.io/spring-cloud-gcp/) that covers this integration. # Building and deploying microservices with App Engine and Cloud Functions A coupe weeks ago, I had the chance to talk at [Cloud Next](https://cloud.withgoogle.com/next18/sf/) 2018, in San Francisco, with my colleague and friend [Alexis](https://twitter.com/alexismp). We talked about building and deploying microservices with Google App Engine and Cloud Functions. I've been a big fan of App Engine since 2009 when Google released the Java flavor, and have been enjoying doing a bit of Node / JavaScript on Cloud Functions since it came in beta. So I was very happy to be able to talk about those two serverless solutions. Without further ado, let's start by sharing the video (and slides) of the talk! {{< youtube oALEthV9z_U >}} Now that I've shared the video, let me tell you a bit more about this session and the demo we built. ![](/img/microservices-with-ae-cf/gae-gcf-01.png) First, a few words about the buzzword du jour: serverless. What I particularly like about this approach (which I also liked in the good old times of another old buzzword: PaaS) is how it lets me focus on the idea I want to implement, instead of being bothered from the get go with server or cluster provisioning, OS choice, monitoring/logging/alerting, etc. I can directly start coding something, and quickly deploy my app or function in the cloud and see how well it's working (compared to my dreamed up idea). Additionally, besides the ops-less aspect, I also don't have to think about scaling, as it scales pretty much auto-magically for me. Last but not least, I don't have to pay upfront big costs for renting machines or vms, as it's really pay as you go, and not paying for an idle server (after all, my idea might be just a quick idea not geared towards prime-time success!) ![](/img/microservices-with-ae-cf/gae-gcf-02.png) Google Cloud Platform offers various solutions which follow those characteristics, not only for compute with App Engine and Cloud Functions, but also for data storage (like Datastore which I'm using in my demo as my database), or the machine learning APIs (like the Vision API that I also integrated in my app). Database-as-a-Service, Platform-as-a-Service, Function-as-a-Service, Software-as-a-Service often fall into that category for me, if you don't pay for infrastructure, when it takes care of scaling, and if it's price proportionally to your usage. ## Cloud Functions Cloud Functions (or GCF for short) is a great fit for event driven problems: a new picture is stored on Google Cloud Storage? A function is triggered. I get a message on Pub/Sub? Another function is invoked. It's also possible to invoke a function directly via an HTTP call, without requiring any kind of gateway to expose it. At Next, the general availability of Cloud Functions was announced, with an SLA of 99.5%, additional regions (2 in the US, 1 in Europe, 1 in APAC), and also new runtimes with Node.js 8 and Python 3.7. Further improvements are the ability to get a function hooked to a VPN in order to connect your functions with your VMs, new scaling controls to limit the number of instances serving your function, a direction connection to Cloud SQL to take advantage of GCP's great network instead of going through the wider public Internet, and the availability of environment variables to customize your deployments for example to tackle different environments like dev vs staging vs prod. ## App Engine As I said, I've always been a big fan of App Engine, long before I actually joined Google. This blog you're reading has been running on App Engine Java for many years! GAE (for short) is really a great fit for hosting web frontends or backend APIs, which are generally more long-lived that functions. With Java 8 in GA last year, Node.js 8 in beta, new runtimes are also coming up: Python 3.7 and PHP 7.2. With the recently released new [instance scheduler](https://cloudplatform.googleblog.com/2018/05/Increase-performance-while-reducing-costs-with-the-new-App-Engine-scheduler.html), you have more control on your scaling which allows you to scale faster and have lower costs too. Deployments of new versions should also be faster with better caching and diff'ing between versions. During the hallway session, I had a nice conversation with an attendee who was pretty happy with the fact he'd be able to have Python 3.7 for both Cloud Functions and App Engine, which will allow them to have a chance to share some code between projects. The new runtimes are running on the [gVisor sandbox container runtime](https://cloudplatform.googleblog.com/2018/05/Open-sourcing-gVisor-a-sandboxed-container-runtime.html) technology, a lightweight solution to isolate containers securely to run your payloads. A big advantage of gVisor is that App Engine runtimes are not limited anymore with things like the class whitelist which prevented usage of some particular classes. ![](/img/microservices-with-ae-cf/gae-gcf-03.png) Back to the topic of microservices, App Engine has this concept of services. In your GCP project, your GAE application can run several services at the same time, potentially with different runtimes (for example a Java service and a Go service), and those services can be deployed with different versions. ![](/img/microservices-with-ae-cf/gae-gcf-04.png) Last thing I'll mention here for App Engine, that's the traffic splitting capability. You can easily split traffic (on the command-line or in the web UI) between different versions of a particular service. So for example if you want to do some A/B testing to see if users prefer the new feature or layout of your app, you can say that only 5% of incoming requests will be showing it, whereas the 95% of your users will continue to see the old version. This is also useful for canary deployments or blue / green deployments. ![](/img/microservices-with-ae-cf/gae-gcf-05.png) For my demo, I developed a simple picture sharing app. My web frontend is a Vue.js + App Engine Java backend using the SparkJava light framework. When a user takes a picture, it's uploaded to Google Cloud Storage, which triggers a Cloud Function which will store picture metadata in Datastore, and calls the Vision API to get the labels of things found in the picture, as well as check if the picture can be safely published (no racy, adult, spoof, violent content in it), and gives the dominant color in the image. Another function is triggered on at regular intervals to compute the most frequent tags (stored in Datastore), so a snapshot of them can be displayed in the dedicate page of the app. ![](/img/microservices-with-ae-cf/gae-gcf-06.png) ## Scaling down... Getting to scale down towards our talk conclusion, we also shared a few words about the upcoming serverless containers for Cloud Functions, which we unveiled at the conference and on the GCP [blog post](https://cloudplatform.googleblog.com/2018/07/bringing-the-best-of-serverless-to-you.html). For serverless compute, you can deploy functions and apps, but we're seeing units of compute in the form of containers as well, and sometimes your project might need specific native libraries or particular compute abilities (like GPUs), or you simply want more control over your business logic's environments. So it makes sense to let you serve containers as well, in addition to functions and apps. If you're interested in trying out serverless containers, feel free to request access to the EAP program via [g.co/serverlesscontainers](http://g.co/serverlesscontainers). # In the top 20 Java influencers for 2018 ![](/img/misc/JAX_London_2018_Infografik_Influencer_Teaser_600x3700_47308_v1.jpg) Just before heading to [Google Cloud Next](https://cloud.withgoogle.com/next18/sf/), I was notified I was listed 4th in JAX London's Top 20 Java influencers of 2018 on social media! It's an honor to be listed among famous figures like Josh Bloch, Brian Goetze, Martin Thompson, Arun Gupta, Jessica Kerr, Mario Fusco, Josh Long, Venkat Subramanian, Charles Nutter and many others. You can see the full list of the top influencers [here](https://jaxlondon.com/blog/java-core-languages/top-20-java-influencers-of-2018/). I'll definitely continue to advocate for Java (and Apache Groovy) developers around the world, and share whatever I learn along the way through articles or conference talks. I'm looking forward to meeting you, my fellow Java/Groovy developer friends, at an event near you. {{< x user="jaxlondon" id="1021684076779065344" >}} # The Big Green Button Automating Continuous Delivery With Chatbots Last month in sunny Napa valley, my awesome colleague [Seth Vargo](https://www.sethvargo.com/) and I had the chance to speak at [SwampUp](https://swampup.jfrog.com/), the devops focused conference organized by [JFrog](https://jfrog.com/). Our talk & demo were focused on the topic of "ChatOps". But what is ChatOps? Here's what our abstract said: > Heard of ChatOps? It's a movement in the DevOps community to take advantage of Chatbots.  > > Chatbots centralize the conversation and history of your daily operations including build status, issue management, deployment, and monitoring, so that you access all the information and actions needed at the whim of a chat message in your team communication solution. > > After a quick introduction about the principles and pros&cons of ChatOps, Guillaume Laforge & Seth Vargo will walk you through Dialogflow to design your conversational interface to your CI/CD infrastructure. With the help of demos on the Google Cloud Platform, we'll see together how chatbots can make you more productive in your DevOps activities. The video has been published last week: {{< youtube gIE1gQGTc >}} # Chatbots: switching the second gear My buddy [Wassim](https://twitter.com/manekinekko) and I were back on stage together to talk about chatbots, with [Actions on Google](https://developers.google.com/actions/) and [Dialogflow](https://twitter.com/manekinekko), at [DevFest Lille](https://devfest.gdglille.org/) and [Best of Web Paris](http://bestofweb.paris/). I'd like to share with you the slides of the presentation (the video has been recorded and will be available at a later time.) You might be interested in those two codelabs to get started on this journey: - [Build Actions for the Google Assistant - level 1](https://codelabs.developers.google.com/codelabs/actions-1/index.html?index=..%2F..%2Findex#0) - [Build Actions for the Google Assistant - level 2](https://codelabs.developers.google.com/codelabs/actions-2/index.html?index=..%2F..%2Findex#0) Here's the presentation given at DevFest Lille: {{< speakerdeck 2466bf9bc1544289ba163b1fc4f2589a >}} And at Best of Web Paris: {{< speakerdeck afa17255e9ea4db8a4138102cefeeec3 >}} # Putting a Groovy Twist on Cloud Vision Powerful machine learning APIs are at your fingertips if you're developing with [Google Cloud Platform](https://cloud.google.com/), as client libraries are available for various programming languages. Today, we're investigating the Cloud Vision API and its Java SDK, using the [Apache Groovy](http://groovy-lang.org/) programming language---a multi-faceted language for the Java platform that aims to improve developer productivity thanks to a concise, familiar and easy to learn syntax. At [GR8Conf Europe](https://gr8conf.eu/), in Denmark, the conference dedicated to the [Apache Groovy](http://groovy-lang.org/) ecosystem, I spoke about the machine learning APIs provided by [Google Cloud Platform](https://cloud.google.com/): Vision, Natural Language, Translate, and Speech (both recognition and synthesis). Since it's a groovy conference, we presented samples and demos using a pretty [Groovy](http://groovy-lang.org/) language. I wanted to share the underlying examples with a wider audience, so here's the first of a series of blog posts covering the demos I presented. I'll start with the Google Cloud Vision API, and I will cover the other APIs in future posts. Google Cloud Vision API lets you: - Get labels of objects and places from your pictures - Detect faces, with precise location of facial features - Determine if the picture is a particular landmark - Check for inappropriate content - Obtain some image attributes information - Find out if the picture is already available elsewhere on the internet - Detect brand logos - Extract text that appears in your images (OCR) You can try out those features online directly at the Cloud Vision API [product page](https://cloud.google.com/vision/). Here's the output of a visibly egocentric example that I tried on the product page: ![/img/groovy-vision/groovy-vision-13lw1.max-800x800.png](/img/groovy-vision/groovy-vision-13lw1.max-800x800.png) In this first installment, I focus on two aspects: labels and OCR. But before diving in, I want to illustrate some of the use cases that all of the Vision API features enable, when you want to enhance your apps by integrating the Vision API. ## Label detection What's in the picture? That's what label detection is all about. So for example, if you're showing a picture from your vacation of your family and your dog on the beach, you might get labels like *People, Beach, Photograph, Vacation, Fun, Sea, Sky, Sand, Tourism, Summer, Shore, Ocean, Leisure, Coast, Girl, Happiness, Travel, Recreation, Friendship, and Family*. Those labels are also accompanied by a percentage confidence score. Example use cases: - If you run a photography website, you might want to let your users search for particular pictures on specific topics. Rather than having someone manually label each and every picture uploaded, you can store labels generated by the API as keywords for your search engine. - If you're building the next friend-finder app for animals, perhaps you want to check that the picture indeed contains a dog. Again, labels will help to tell you if that's the case without manually checking each submitted image. - If you need to find pictures that match a certain theme, you can use the labels to help with automatic categorization. ## Face detection The Cloud Vision API can spot faces in your pictures with great precision: it gives you detailed information about the location of the face (with a bounding box), plus the position of each eye, eyebrow, and ear, as well as the nose, lips, mouth, and chin. It also tells you how the face is tilted, and even at what angle!\ You can also learn about the sentiment of the person's expression: joy, sorrow, anger, or surprise, for example. In addition, you're told if the face is exposed, blurred, or is sporting  headwear. Example use cases: - You are building a light-hearted app in which users add a mustache or silly hat to uploaded pictures, you now know how to initially position and size the added objects. - If you want to know (or estimate) the number of attendees in a meeting or presentation at a conference, you can use the API to provide a count of the number of faces in the picture. ## Landmark detection For pictures of famous landmarks, like say, the Eiffel Tower, Buckingham Palace, the Statue of Liberty, or the Golden Gate bridge, the API will tell you which landmark it recognized, and will even give you the GPS location coordinates. Example use cases: - If you want users of your app or site to upload only pictures of a particular location, you could use those coordinates (provided in latitude and longitude) to automatically check that the place photographed is within the right geo-fenced bounding box. - If you want to automatically show pictures on a map in your app, you could take advantage of the GPS coordinates, or automatically enrich tourism websites with pictures from the visitors of specific locales. ## Inappropriate content detection Vision API gives you a confidence percentage about different types of potentially inappropriate content in images, including adult content, depictions of violence, medical content (such as surgical procedures or MRIs), and spoofed pictures (with user-added text and marks). Example use case: - If you want to avoid the unwelcome surprise of inappropriate user-generated content showing up in your site or app, you can use this capability to filter images automatically. ## Image attributes and web annotations Pictures all have dominant colors, and you can use the Vision API get a sense of which colors are represented in your image and in which proportion. The Cloud Vision API gives you a palette of colors corresponding to your picture. In addition to color information, it also suggests possible crop hints, so you can crop the picture to different aspect ratios. You also get information about whether your picture can be found elsewhere on the net, with a list of URLs with matched images as well as full and partial matches. Beyond the label detection, the API identifies "entities", returning to you IDs of those entities from the Google [Knowledge Graph](https://www.google.com/intl/bn/insidesearch/features/search/knowledge.html). Example use cases: - If you want to make your app or website responsive, before loading the full images you may want to show colored placeholders for them. You can get that information with the palette information from the API. - If you'd like to automatically crop pictures while keeping their essential aspects, you can use the crop hints. - If you allow photographers to upload their pictures on your website and you want to check that no one is stealing pictures and posting them without proper attribution, you can use the API to check if each picture can be found elsewhere on the web. - For the picture of me in the introduction of this post, Vision API recognized entities like "[Guillaume Laforge](http://glaforge.appspot.com/)" (me!), [Groovy](http://groovy-lang.org/) (the programming language I've been working on since 2003), [JavaOne](https://en.wikipedia.org/wiki/JavaOne) (a conference I've often spoken at), "[Groovy in Action](https://www.manning.com/books/groovy-in-action-second-edition)" (the book I've co-authored), "[Java Champions](https://community.oracle.com/community/java/java-champions)" (I've recently been nominated!), and "Software Developer" (yes, I do code!). Thanks to those entities, you are able to automatically recognize famous people---more than just me! ## Brand or logo detection In addition to text recognition (discussed immediately below), the Vision API tells you if it recognized any logos or brands. Example use case: - If you want your company's brand or products to be displayed on supermarket shelves, you might have people take pictures of those shelves and confirm automatically that your logo is being displayed. ## OCR or text recognition With OCR text detection, you can find text that is displayed in your pictures. Not only does the API gives you the raw text and automatically detects the locale, but you also get all the bounding boxes for the recognized words, as well as a kind of document format, showing the hierarchy of the various blocks of text that appear. Example use case: - When you want to automatically scan expense receipts, enter text rapidly from pictures, or tackle any of the usual use cases for OCR, you can use the API to find and extract any text identified within. ## Time to get Groovy! Now that I've provided lots of use cases for where and when you may want to use Vision API, it's time to look at some code, right? So as I said, in this post I'll highlight just two features: label detection and text detection. Using the [Apache Groovy](http://groovy-lang.org/) programming language, I will illustrate two approaches: the first one uses a REST client library like [`groovy-wslite`](https://github.com/jwagenleitner/groovy-wslite), and the second one uses the [Java client library](https://cloud.google.com/vision/docs/libraries) provided for the API. ## Prerequisites In order to use Vision API, you'll need to have an account on Google Cloud Platform (GCP). You can benefit from the $300 [free trial](https://cloud.google.com/free/) credits, as well as free quotas. For instance, for Vision API, without even consuming your credits, you can make 1000 calls for free every month. You can also take a look at the API's [pricing](https://cloud.google.com/vision/pricing) details, once you exceed the quota or your credits. Briefly, if you're not already using GCP or still don't have an account, please [register and create one](https://cloud.google.com/getting-started/). Once you've set up your GCP account, create a cloud project and [enable access to Vision API](https://cloud.google.com/vision/docs/before-you-begin) for the project. That's it---now you're ready to follow the steps detailed below. Note that after you have enabled the API, you still need to authenticate your client to use the service. There are different possibilities here. In the OCR example, I call the REST API and use an API key passed as a query parameter. In the label detection sample, I use a service account with application default credentials. You can learn more about those approaches in the [authentication](https://cloud.google.com/vision/docs/auth) documentation. Okay, let's get started! ## OCR using the Vision REST endpoint During the spring and summer allergy season, many people are interested in which pollens are likely to trigger their allergies. In France (where I live), we have a website that shows a map of the country, and when you hover over your region, it shows you a picture of the active allergens and their levels. However, this is really just a picture with the names of said allergens. So I decided to extract a list of allergens from this image using the Vision API's OCR method: ![/img/groovy-vision/groovy-vision-2yg2j.max-600x600.png](/img/groovy-vision/groovy-vision-2yg2j.max-600x600.png) In Apache Groovy, when calling REST APIs, I often use the [`groovy-wslite`](https://github.com/jwagenleitner/groovy-wslite) library. But there are other similar great libraries like [`HTTPBuilder-NG`](https://http-builder-ng.github.io/http-builder-ng/), which offer similar capabilities via a nice syntax too. To start, let's [grab](http://docs.groovy-lang.org/latest/html/documentation/grape.html) the REST client library and instantiate it: ```groovy @Grab('com.github.groovy-wslite:groovy-wslite:1.1.2')import wslite.rest.* def client = new RESTClient('https://vision.googleapis.com/v1/') ``` Here's the URL of the picture with the text I'm interested in: ```groovy def imgUrl = "http://internationalragweedsociety.org/vigilance/d%2094.gif" def API_KEY = 'REPLACE_ME_WITH_REAL_API_KEY' ``` Be sure to change this dummy text for the API key with a proper API key that you can generate from the "APIs & services > Credentials" section in the cloud console, as explained [here](https://support.google.com/cloud/answer/6158862?hl=en). Next, I'm going to send a post request to the /images:annotate path with the API key as a query parameter. My request is in JSON, and I'm using [Groovy's nice list and maps syntax](http://docs.groovy-lang.org/latest/html/documentation/working-with-collections.html) to represent that [JSON](http://groovy-lang.org/json.html) request, providing the image URL and the feature I want to use (i.e. text detection): ```groovy def response = client.post(path: '/images:annotate', query: [key: API_KEY]) { type ContentType.JSON json "requests": [ [ "image": [ "source": [ "imageUri": imgUrl ] ], "features": [ [ "type": "TEXT_DETECTION" ] ] ] ] } ``` This corresponds to the following JSON payload: ```javascript { "requests": [{     "image": { "source": { "imageUri": imgUrl } },     "features": [         { "type": "TEXT_DETECTION" } ]} ]} ``` Thanks to Apache Groovy's flexible nature, it's then easy to go through the returned JSON payload to get the list and `println` all the text annotations and their descriptions (which correspond to the recognized text): ```groovy response.json.responses[0].textAnnotations.description.each { println it } ``` In spite of the low quality of the image and some odd font kerning, the API was able to find allergens like "châtaignier" (with the correct circumflex accent) and "urticacées" (acute accent). On the allergen "cupressacées," the diacritical mark was missed though, and a space is intertwined, but the font seems to be adding extra space between some letters. ## Label detection with the Java client library For my second sample, I was inspired by my visit to Copenhagen for [GR8Conf Europe](https://gr8conf.eu/). I decided to see what labels the API would return for a typical picture of the lovely colorful facades of the [Nyhavn](https://en.wikipedia.org/wiki/Nyhavn) harbor. ![/img/groovy-vision/groovy-vision-39426.max-800x800.jpeg](/img/groovy-vision/groovy-vision-39426.max-800x800.jpeg) Let's [grab](http://docs.groovy-lang.org/latest/html/documentation/grape.html) the Java client library for the vision API: ```groovy @Grab('com.google.cloud:google-cloud-vision:1.24.1') import com.google.cloud.vision.v1.* import com.google.protobuf.* ``` Here's the URL of the picture: ```groovy def imgUrl =    "https://upload.wikimedia.org/wikipedia/commons/3/39/Nyhavn_MichaD.jpg"    .toURL() ``` Now, let's instantiate the ImageAnnotatorClient class. It's a Closeable object, so we can use Groovy's `withCloseable{}` method: `ImageAnnotatorClient.create().withCloseable { vision ->` We need the bytes of the picture, which we obtain via the `.bytes` shortcut, and we create a ByteString object from the [protobuf](https://developers.google.com/protocol-buffers/) library used by the Vision API: ```groovy def imgBytes = ByteString.copyFrom(imgUrl.bytes) ``` To create the request with the AnnotateImageRequest builder, we can employ the Groovy [`tap{}`](http://docs.groovy-lang.org/latest/html/groovy-jdk/java/lang/Object.html#tap(groovy.lang.Closure)) method to simplify usage of the builder design pattern: ```groovy def request = AnnotateImageRequest.newBuilder().tap { addFeatures Feature.newBuilder()                        .setType(Feature.Type.LABEL_DETECTION) .build() setImage Image.newBuilder() .setContent(imgBytes) .build() }.build() ``` In that request, we ask for the label detection feature and pass the image bytes. Next, we call the API with our request and then iterate over the resulting label annotations and their confidence score: ```groovy vision.batchAnnotateImages([request]).responsesList.each { res -> if (res.hasError()) println "Error: ${res.error.message}"     res.labelAnnotationsList.each { annotation -> println "${annotation.description.padLeft(20)} (${annotation.score})" } } ``` The labels found (and their confidence) are as follows: ``` waterway (0.97506875) water transportation (0.9240114) town (0.9142202) canal (0.8753313) city (0.86910504) water (0.82833123) harbor (0.82821053) channel (0.73568773) sky (0.73083687) building (0.6117833) ``` That looks pretty accurate to me! ## Conclusion There are tons of situations where you can benefit from [Google Cloud Vision](https://cloud.google.com/vision/)'s simple and effective image analysis and your use of the API becomes even groovier when using the [Apache Groovy](http://groovy-lang.org/) language! Be sure to try out the API directly from the website with the built-in demo, and get started with our [Cloud Vision codelabs](https://codelabs.developers.google.com/codelabs/cloud-vision-intro/index.html?index=..%2F..%2Findex#0). If you want to go further, I also encourage you to also have a look at the alpha version of [Cloud AutoML Vision](https://cloud.google.com/automl/): you can extend the Vision API by training it on your own picture dataset. By doing so, you can let Cloud Vision recognize particular objects or elements in your photos with finer-grained labels that are specific to your needs. Upcoming installments in this series will cover natural language understanding (NLU), text translation, speech recognition, and voice synthesis. So stay tuned! # Automating Chrome Headless mode on App Engine with Node.JS 8 On the [Google Cloud](http://cloud.google.com/) front today, the big news is the release of the [new Node.JS 8 runtime for Google App Engine Standard](https://cloudplatform.googleblog.com/2018/06/Now-you-can-deploy-your-Node-js-app-to-App-Engine-standard-environment.html). It's been a while since a completely new runtime was added to the list of supported platforms (Python, Java, PHP, Go). You could already run anything in custom containers on App Engine Flex, including your own containerized Node app, but now you can have all the nice developer experience on the Standard environment, with fast deployment times, and 0 to 1 to n instance automatic scaling (you can see the difference between those two environments [here](https://cloud.google.com/appengine/docs/nodejs/)). To play with this new runtime, I decided to follow the steps in this guide about [using Chrome headless with Puppeteer](https://cloud.google.com/appengine/docs/standard/nodejs/using-headless-chrome-with-puppeteer). As my readers know, I'm not really a Node person, and usually dabble more with [Apache Groovy](http://groovy-lang.org/) and Java, but this runtime was interesting to me as there's a nice integration with native packages. Let me explain. The App Engine Node runtime [includes tons of native package out of the box](https://cloud.google.com/appengine/docs/standard/nodejs/reference/system-packages), without requiring you to install anything (except the Node modules that take advantage of those packages, of course.) For instance, if you need to do any audio / video manipulation, there's an ffmpeg package. If you want to deal with Git repositories, there's a git package. Need to manipulate images, there's ImageMagick, etc. And there are usually nice Node wrapper modules around those native components. Among those system pre-installed packages, there's all the necessary ones to run [Headless Chrome](https://developers.google.com/web/updates/2017/04/headless-chrome), ie. running the Chrome browser but without displaying its window basically. Furthermore, there's the [Puppeteer](https://developers.google.com/web/tools/puppeteer/) Node module, which is a library to control Chrome. With those two, you can completely automate the usage of Chrome on the server-side. What can you do with that? Well, you can: - look at / introspect / manipulate the DOM, - pre-render content for your single page apps, - take screenshots of web pages, - watch a particular page and compute diffs between different versions, etc. ## Let's get started! Without blindly recopying all the steps explained in the tutorial for [running Chrome headless](https://cloud.google.com/appengine/docs/standard/nodejs/using-headless-chrome-with-puppeteer), I'll simply highlight some of key points. The goal is to let puppeteer take screenshots of webpages. In your package.json, you need the reference the puppeteer module, and potentially express for handling your web requests: ```json "dependencies": { "express": "^4.16.3", "puppeteer": "^1.2.0" }, ``` Taking advantage of Node 8's async capabilities, in your `app.js` file, you can instantiate puppeteer: ```javascript const browser = await puppeteer.launch({ args: ['--no-sandbox', '--disable-setuid-sandbox'] }); ``` Then navigate to the desired page: ```javascript const page = await browser.newPage(); await page.goto(url); ``` Take a screenshot and render it back to the browser: ```javascript const imageBuffer = await page.screenshot(); res.set('Content-Type', 'image/png'); res.send(imageBuffer); ``` To deploy to the Node runtime, you also need the app.yaml deployment descriptor: ```yaml runtime: nodejs8 instance_class: F4_1G ``` We specify that we want to use the new node runtime, but also that we want a slightly bigger instance to run our Node app, as Chrome is pretty hungry with RAM! Then deploy your app with the [gcloud CLI](https://cloud.google.com/sdk/gcloud/). Be sure to check the [whole code on Github](https://github.com/GoogleCloudPlatform/nodejs-docs-samples/tree/master/appengine/headless-chrome) for all the details. One quick remark: although it's not mentioned in the tutorial, when you'll first try to deploy the application, it'll tell you that you need to enable the Container Builder API. The error message will be something like `Container Builder has not been used in project xyz before or it is disabled. Enable it by visiting...` You just need to follow the indicated URL to enable Container Builder. Container Builder is responsible for containerizing your application to be run on App Engine. Then I was able to navigate to my app, pass it a URL, and get back the screenshot of the web page at that URL. It's pretty handy if you want to integrate thumbnails of websites you reference in your blog posts, for example, or if you want to see if there are differences between different versions of a web page (for integration testing purposes). ## Conclusion The Java ecosystem has a wealth of libraries for various tasks, but often, there are native libraries which are more fully-featured, and Node generally provides nice wrappers for them. Chrome headless with Puppeteer is one example, but ImageMagick for image manipulation is another great one, where I could not find a good equivalent library in the Java ecosystem. So as they say, use the best tool for the job! In the age of microservices, feel free to use another tech stack that best fit the task at hand. And it's really exciting to see this [new Node 8 runtime for App Engine](https://cloudplatform.googleblog.com/2018/06/Now-you-can-deploy-your-Node-js-app-to-App-Engine-standard-environment.html) now being available so that you can take advantage of it in your projects. # Vision recognition with a Groovy twist Last week at [GR8Conf Europe](https://gr8conf.eu/), I spoke about the machine learning APIs provided by  [Google Cloud Platform](https://cloud.google.com/): Vision, Natural Language, Speech recognition and synthesis, etc. Since it's GR8Conf, that means showing samples and demos using a pretty [Groovy](http://groovy-lang.org/) language, and I promised to share my code afterwards. So here's a series of blog posts covering the demos I've presented. We'll start with the Vision API. The Vision API allows you to: - Get labels of what appears in your pictures, - Detect faces, with precise location of face features, - Tell you if the picture is a particular landmark, - Check for inappropriate content, - Give you some image attributes information, - Find if the picture is already available on the net, - Detects brand logos, - Or extract text that appears in your images (OCR). You can try out those features online directly from the Cloud Vision API product page: Here's a selfish example output: ![](/img/vision-groovy/ml-apis-01-try-out+small.png) In this first installment, I'll focus on two aspects: labels and OCR. But before diving, I wanted to illustrate some of the **use cases** that those features enable, when you want to boost your apps by integrating the Vision API. ## Label detection What's in the picture? That's what label detection is all about. So for example, if you're showing a picture from your vacations with your family and dog on the beach, you might be getting labels like *"People, Beach, Photograph, Vacation, Fun, Sea, Sky, Sand, Tourism, Summer, Shore, Ocean, Leisure, Coast, Girl, Happiness, Travel, Recreation, Friendship, Family"*. Those labels are also accompanied by percentage confidence. ### Example use cases: - If your website is a photography website, you might want your users to be able to search for particular pictures on specific topics. Rather than having someone manually label each and every picture uploaded, you can store those labels as keywords for your search engine. - You're building the next Instagram for animals, perhaps you want to check that the picture indeed contains a dog in it. Again labels will help to tell you if that's the case or not. - Those labels can help with automatic categorization as well, so you can more easily find pictures along certain themes. ## Face detection Vision API can spot faces in your pictures with great precision, as it gives you detailed information about the location of the face (with a bounding box), as well as the position of each eye, eyebrows, nose, lips / mouth, ear, or chin. It also tells you how your face is tilted, even with which angle! You can also learn about the sentiment of the person: joy, sorrow, anger, surprise. In addition, you're told if the face is exposed, blurred, or has some headwear. ### Example use cases: - Snapchat-style, you want to add a mustache or silly hat to pictures uploaded by your users. - Another handy aspect is that you can also count the number of faces in a picture. For example, if you want to count the number of attendees in a meeting or presentation at a conference, you'd have a good estimate of people with the number of faces recognized. ## Landmark detection If the picture is of a famous landmark, like say, the Eiffel Tower, Buckingham Palace, the Statue of Liberty, the Golden Gate, the API will tell you which landmark it recognized, and will even give you the GPS coordinates of the place. ### Example use cases: - Users of your app or site should only upload pictures of a particular location, you could use those coordinates to automatically check that the place photographed is within the right geo-fenced bounding box. - You want to automatically show pictures on a map on your app, you could take advantage of the GPS coordinates, or automatically enrich a tourism websites with pictures from the visitors of that place. ## Inappropriate content detection Vision API will give you a percentage confidence about whether the picture contains adult content, is a spoofed picture (with user added text and marks), bears some medical character, displays violence or racy materials. ### Example use case: - The main use case here is indeed to be able to filter images automatically, to avoid any bad surprise of user-generated content showing up in your site or app when it shouldn't. ## Image attributes and web annotations Pictures all have dominant colors, and you can get a sense of which colors are indeed representing the best your image, in which proportion. Vision API gives you a palette of colors corresponding to your picture. In addition to color information, it also suggests possible crop hints, so you could crop the picture to different aspect ratios. You get information about whether this picture can be seen elsewhere on the net, with a list of URL with matched images, full matches, or partial matches. Beyond the label detection, the API identifies "entities", returning to you IDs of those entities from the Google [Knowledge Graph](https://www.google.com/intl/bn/insidesearch/features/search/knowledge.html). ### Example use cases: - To make your app or website responsive, before loading the full images, you'd like to show colored placeholders for the picture. You can get that information with the palette information returned. - You'd like to automatically crop pictures to keep the essential aspects of it. - Photographers upload their pictures on your website, but you want to ensure that no one steals those pictures and put them on the net without proper attribution. You can find out whether this picture can be seen elsewhere on the web. - For the picture of me above, Vision API recognized entities like "Guillaume Laforge" (me!), Groovy (the programming language I've been working on since 2003), JavaOne (a conference I've often spoken at), "Groovy in Action" (the book I've co-authored), "Java Champions" (I've recently been nominated!), or "Software Developer" (yes, I still code a bit) ## Brand / logo detection In addition to text recognition, that we'll mention right below, Vision API tells you if it recognized some logos or brands. ### Example use case: - If you're a brand, have some particular products, and you want your brand or products to be displayed on supermarket shelves, you might have people take pictures of those shelves and confirm automatically if your logo is displayed or not. ### OCR / text recognition You can find the text that is displayed on your pictures. Not only does it gives you the raw text, but you also get all the bounding boxes for the recognized words, as well as offers a kind of document format, showing the various blocks of texts that appear. ### Example use case: - You want to automatically scan expense receipts, enter text rapidly from pictures, etc. The usual use cases for OCR! ## Time to get Groovy! People often wonder where and when they can use the Vision API, I think I've given enough use cases for now, with detailed explanations. But it's time to show some code, right? So as I said, I'll highlight just two features: label detection and text detection. Using the Apache Groovy programming language, I wanted to illustrate two approaches: the first one using the a REST client like [groovy-wslite](https://github.com/jwagenleitner/groovy-wslite), and the second one just using the [Java SDK](https://cloud.google.com/vision/docs/libraries) provided for the API. ### Preliminary In order to use the Vision API, you'll need to have created an account on Google Cloud Platform (GCP for short). You can benefit from the $300 of credits of the [free trial](https://cloud.google.com/free/), as well as free quotas. For instance, for the Vision API, without even using your credits, you can make 1000 calls for free every month. You can also have a look at the [pricing](https://cloud.google.com/vision/pricing) of the API, once you go above the quota or your credits. Briefly, if you're not already using GCP or have an account on it, please register and create your account on GCP, then create a cloud project. Once the project is created,  [enable access to Vision API](https://cloud.google.com/vision/docs/before-you-begin), and you're ready to follow the steps detailed hereafter. Although we've enabled the API, we still need somehow to be authenticated to use that service. There are different possibilities here. In the first sample with the OCR example, I'm calling the REST API and will be using an API key passed as query parameter, whereas for my label detection sample, I'm using a service account with applicaction default credentials. You can learn more about those approaches in the documentation on [authentication](https://cloud.google.com/vision/docs/auth). Okay, let's get started! ### OCR calling the Vision REST endpoint With spring and summer, allergetic people might be interested in which pollens are going to cause their allergies to bother them. Where I live, in France, there's a website showing a map of the country, and you can hover the region where you are, and see a picture of the allergens and their levels. However, this is really just a picture with the names of said allergens. So I decided to get their list with the Vision API. ![](/img/vision-groovy/ml-apis-03-allergens-labels+small.png) In Apache Groovy, when calling REST APIs, I often use the [groovy-wslite](https://github.com/jwagenleitner/groovy-wslite) library. But there are other similar great libraries like [HTTPBuilder-NG](https://http-builder-ng.github.io/http-builder-ng/), which offer similar capabilities with nice syntax too. Let's [grab](http://docs.groovy-lang.org/latest/html/documentation/grape.html) the REST client library and instantiate it: ```groovy @Grab('com.github.groovy-wslite:groovy-wslite:1.1.2') import wslite.rest.* def client = new RESTClient('https://vision.googleapis.com/v1/') ``` Here's the URL of the picture whose text I'm interested in: ```groovy def imgUrl = "http://internationalragweedsociety.org/vigilance/d%2094.gif" def API_KEY = 'REPLACE_ME_WITH_REAL_API_KEY' ``` Then I'm going to send a post request to the `/images:annotate` path with the API key as query parameter. My request is in JSON, and I'm using Groovy's nice list & maps syntax to represent that JSON request, providing the image URL and the feature I'm interested in (ie. text detection): ```groovy def response = client.post(path: '/images:annotate', query: [key: API_KEY]) { type ContentType.JSON json "requests": [ [ "image": [ "source": [ "imageUri": imgUrl ] ], "features": [ [ "type": "TEXT_DETECTION" ] ] ] ] } ``` This corresponds to the following JSON payload: ```json { "requests": [ { "image": { "source": { "imageUri": imgUrl } }, "features": [ { "type": "TEXT_DETECTION" } ] } ] } ``` Thanks to Groovy's flexible nature, it's then easy to go through the returned JSON payload to get the list and println all the text annotations and their description (which corresponds to the recognized text): ```groovy println response.json.responses[0].textAnnotations.description ``` ### Label detection with the Java client library For my second sample, as I visited Copenhagen for [GR8Conf Europe](https://gr8conf.eu/), I decided to see what labels the API would return for a typical picture of the lovely colorful facades of the  [Nyhavn](https://en.wikipedia.org/wiki/Nyhavn) harbor. ![](/img/vision-groovy/ml-apis-02-nyhavn+small.png) Let's [grab](http://docs.groovy-lang.org/latest/html/documentation/grape.html) the Java client library for the vision API: ```groovy @Grab('com.google.cloud:google-cloud-vision:1.24.1') import com.google.cloud.vision.v1.* import com.google.protobuf.* ``` Here's the URL of the picture: ```groovy def imgUrl = "https://upload.wikimedia.org/wikipedia/commons/3/39/Nyhavn_MichaD.jpg" .toURL() ``` Let's instantiate the ImageAnnotatorClient class. It's a closeable object, so we can use Groovy's withCloseable{} method: ``` ImageAnnotatorClient.create().withCloseable { vision -> ``` We need the bytes of the picture, that we obtain with the `.bytes` shortcut, and we create a `ByteString` object from the [protobuf](https://developers.google.com/protocol-buffers/) library used by the Vision API: ```groovy def imgBytes = ByteString.copyFrom(imgUrl.bytes) ``` Now it's time to create our request, with the @AnnotateImageRequest` builder, using Groovy's [tap{}](http://docs.groovy-lang.org/latest/html/groovy-jdk/java/lang/Object.html#tap(groovy.lang.Closure))  method to make the use of the builder pattern easier: ```groovy def request = AnnotateImageRequest.newBuilder().tap { addFeatures Feature.newBuilder().setType(Feature.Type.LABEL_DETECTION).build() setImage Image.newBuilder().setContent(imgBytes).build() }.build() ``` In that request, we ask for the label detection feature, and pass the image bytes. Then it's time to call the API with our request, and iterate over the resulting label annotations and their confidence score: ```groovy vision.batchAnnotateImages([request]) .responsesList.each { res -> if (res.hasError()) println "Error: ${res.error.message}" res.labelAnnotationsList.each { annotation -> println "${annotation.description.padLeft(20)} (${annotation.score})" } } } ``` The labels found (and their confidence) are the following: ``` waterway (0.97506875) water transportation (0.9240114) town (0.9142202) canal (0.8753313) city (0.86910504) water (0.82833123) harbor (0.82821053) channel (0.73568773) sky (0.73083687) building (0.6117833) ``` Pretty good job! ## Conclusion First of all, there are tons of situations where you can benefit from image recognition thanks to [Google Cloud Vision](https://cloud.google.com/vision/). Secondly, it can get even groovier when using the [Apache Groovy](http://groovy-lang.org/) language! In the upcoming installment, we'll speak about natural language understanding, text translation, speech recognition and voice synthesis. So stay tuned! # Machine learning APIs and AI panel discussion at QCon Last March, I had the chance to attend and speak at [QCon London](https://qconlondon.com/london2018/schedule/london2018/tabular.html). I spoke at the event for its first edition, many moons prior, so it was fun coming back and seeing how the conference evolved.  This year, [Eric Horesnyi](https://twitter.com/erichoresnyi) of [Streamdata](https://streamdata.io/)  was leading the Artificial Intelligence track, and invited me to speak about Machine Learning. First, I gave an overview of the Machine Learning offering, from the off-the-shelf ready-made APIs like  [Vision](https://cloud.google.com/vision/), [Speech](https://cloud.google.com/speech-to-text/),  [Natural Language](https://cloud.google.com/natural-language/), [Video Intelligence](https://cloud.google.com/video-intelligence/). I also mentioned [AutoML](https://cloud.google.com/automl/), to further train existing models like the Vision model in order to recognize your own specific details in pictures. For chatbots, I also covered [Dialogflow](https://dialogflow.com/). And I said a few words about Tensorflow and [Cloud Machine Learning Engine](https://cloud.google.com/ml-engine/)  for training & running your [Tensorflow](https://www.tensorflow.org/) models in the cloud. You can watch the video by clicking on the picture below: [![](/img/qcon-2018/qcon-ml-talk.png)](https://www.infoq.com/presentations/google-ml-services) Eric hosted a panel discussion with all the speakers in the AI track, where we discussed many interesting topics, to demystify AI and answer questions from the audience. Click on the picture below to watch the panel discussion: [![](/img/qcon-2018/qcon-ai-panel.png)](https://www.infoq.com/presentations/ai-panel) # Getting started with Groovy technologies on Google Cloud Platform Back to [GR8Conf Europe](https://gr8conf.eu/) in Denmark, for the yearly [Groovy](http://www.groovy-lang.org/) community reunion! I had the chance to present two talks. The first one on Google's [Machine Learning APIs](https://cloud.google.com/products/machine-learning/), with samples in Groovy using vision recognition, speech recognition & generation, natural language analysis. I'll come back on ML in Groovy in forthcoming articles. And the second talk was an overview of Google Cloud Platform, focusing on the compute and storage options, with demos using Groovy frameworks ([Ratpack](https://ratpack.io/), [Gaelyk](http://gaelyk.appspot.com/), and the newly released [Micronaut](http://micronaut.io/)) and how to deploy apps on Compute Engine, Kubernetes Engine, App Engine. I'll also come back in further articles on those demos, but in the meantime, I wanted to share my slide deck with you all! Without further ado, here's what I presented: {{< speakerdeck a9b1167d2ffb418082553a879f51c869 >}} > Time to release your marvelous project or website, powered with GR8 tech, into the wild? > Let's see together what Google Cloud Platform has to provide for the Apache Groovy ecosystem developer. > If you have a Grails or Ratpack app, what are the best compute options for deploying it? > What other services are useful to your app: databases with Cloud SQL, messaging > with Cloud Pub/Sub, monitoring, and more, are at your disposal. > We'll explore together what options you have to deploy and scale your next great idea in the Google cloud. # Ten years of App Engine with a Groovy twist The venerable Google [App Engine](https://cloud.google.com/appengine/) platform celebrated its [10th anniversary](https://cloudplatform.googleblog.com/2018/04/reflecting-on-our-ten-year-App-Engine-journey.html)! Back in 2008, it started with Python, as its first runtime, but I got way more interested in App Engine when the Java runtime would launch the following year. It's a bit of a special story for me, as I've always been a fan of App Engine, since the beginning. Over the years, I've built several apps running on App Engine. For instance, this blog you're reading now is running on App Engine, as well as my personal picture / video sharing app, some Github post-commit webhook for the [Apache Groovy](http://www.groovy-lang.org/) project, or the [Groovy Web Console](http://groovyconsole.appspot.com/) to share / edit / run Groovy scripts in the cloud. App Engine is my go-to platform for deploying and running my ideas in the cloud! I like to focus on the idea I want to implement, rather than thinking upfront about infrastructure, provisioning, ops, etc. App Engine was the pioneer in PaaS (Platform-as-a-Service) and the new trendy [Serverless](https://cloud.google.com/serverless/) approach. Although I've ranted back in the day about the pricing changes ([once]({{< ref "/posts/2011/09/01/google-app-engine-s-new-pricing-model" >}}) and [twice]({{< ref "/posts/2011/11/25/coming-back-to-the-new-google-app-engine-pricing-policy" >}})), it lead me to optimize my own apps and code. But ultimately, most of my apps run within the [free tier](https://cloud.google.com/free/) of App Engine. The "pay-as-you-go" approach is appealing: for my apps, it's been pretty much free for my use, except on those few occasions where I had big peaks of traffic and users, and then, i only had to spend a few dollars to cope with the load, but I didn't even have to think about dealing with infrastructure, as App Engine was transparently scaling my apps itself, without any intervention on my part. ![](/img/misc/google-app-engine-groovy.png) But let's step back a little and let me tell you more about my story with App Engine. In 2009, thanks to my friend Dick Wall, I was contacted by Google, signed an NDA, and worked with the engineering team who was responsible for the upcoming Java runtime. As the engineering team was working on its launch, they wanted to ensure that alternative languages like [Apache Groovy](http://www.groovy-lang.org/) would run well on the platform. So we worked hand in hand, patching Groovy to be more compliant with App Engine's sandboxing mechanism (which is now lifted, as past limitations are now gone in the newer runtimes.) Thanks to this work on the Groovy and App Engine integration, I got the chance to present at Google I/O 2009 about running Groovy and Grails on App Engine! {{< youtube NEnniZTdOYk >}} And as I worked on the integration, I quickly found nice handy shortcuts thanks to the flexible nature of Groovy, and I arranged those shortcuts into a reusable library: the [Gaelyk](http://gaelyk.appspot.com/) framework. Max Ross, Toby Reyelts, Don Scwhartz, Dick Wall, Patrick Chanezon, Christian Schalk, and later on, Ludovic Champenois, Éamonn McManus, Roberto Chinnici, and many others, I'd like to say thank you, congratulations, and happy anniversary for this lovely platform! It's an honor for me today to [work for Google Cloud Platform]({{< ref "/posts/2016/06/02/joining-google-as-a-developer-advocate-for-the-google-cloud-platform" >}}) (almost 2 years already!), and to use the awesome serverless products available, and I'm looking forward to covering the [serverless](https://cloud.google.com/serverless/) area even more! # What can we learn from millions of (groovy) source files in Github What can you learn from millions of (Groovy) source files stored on Github? In this presentation, I analized source files in the Github archives stored on BigQuery, and in particular Groovy source file, but also Gradle build files, or Grails controllers and services. {{< speakerdeck c9876f9890d84d378b5b18c9b57ad7aa >}} What kind of questions can we answer - How many Groovy files are there on Github? - What are the most popular Groovy file names? - How many lines of Groovy source code are there? - What's the distribution of size of source files? - What are the most frequent imported packages? - What are the most popular Groovy APIs used? - What are the most used AST transformations? - Do people use import aliases much? - Did developers adopt traits? For [Gradle](https://gradle.org/), here are the questions that I answered: - How many Gradle build files are there? - How many Maven build files are there? - Which versions of Gradle are being used? - How many of those Gradle files are settings files? - What are the most frequent build file names? - What are the most frequent Gradle plugins? - What are the most frequent "compile" and "test" dependencies? And for [Grails](https://grails.org/), here's what I covered: - What are the most used SQL database used? - What are the most frequent controller names? - What are the repositories with the biggest number of controllers? - What is the distribution of number of controllers? You can see a version of this talk in French in the following YouTube video, recorded at the BreizhCamp conference: {{< youtube wk2CRBRrki8 >}} And in English at Devoxx US: {{< youtube Aw4sgZ8kIeg >}} # Keynote: IT holy wars A few months ago, I had some fun doing the keynote of Voxxed Days Singapore and JUG Summer Camp, and I realized I didn't get a chance to share my deck. Furthermore, the videos of both events are actually available online: one in English, one in French! I spoke about the so-called IT Holy Wars, you know, like Vi vs Emacs, Tabs & Spaces, or other funny things of that kind. How developers circle from client-side to server-side, how we are polarized across strong positions on certain frameworks or practices, on clean vs ugly code, on tooling (build, front, back), dark / light background themes for your IDE, how do you format dates, and more. I had lots of fun preparing and delivering this talk, both in English and French. Let's start with English deck & video: {{< speakerdeck d3cac7c15adb45e8bf10233754b04de9 >}} {{< youtube 4MDTBBEEyho >}} And now "en français dans le texte" :  {{< speakerdeck f15e5f8b54544fd29408ca72dfd29885 >}} # Build apps for voice, chat, web and mobile **ServerlessConf Paris** is in full swing this week, and I had the chance yesterday to participate in a workshop with my colleague Frank, to cover building apps for voice, chat, web and mobile, using Google's serverless [solutions. In particular, for voice & chat, I spoke about Dialogflow and Google Assistant / Actions](https://developers.google.com/actions/) on Google, using [Cloud Functions](https://cloud.google.com/functions/) for my business logic, while in the afternoon we covered [Firebase](https://firebase.google.com/) un more depth. If you want to have a quick look at the deck, here are the slides that I presented: {{< speakerdeck 14c2dc2fe2d44413be0b4697469e4311 >}} # I'm a Java Champion! The Java Champions recently nominated me to become a Java Champion! {{< x user="java_champions" id="933194279412891648" >}} There's also a [post](https://blogs.oracle.com/java/new-java-champions-in-2017) on the Java Champions' blog, and [InfoQ](https://www.infoq.com/news/2018/01/JavaChampions2017) also echoed the new nominees recently. I'm super happy and proud of this nomination, and I'm looking forward to continuing being involved in the Java ecosystem, present at Java-friendly conferences, contribute to Open Source projects using Java and Apache Groovy, and write articles here and there using my favorite languages. # Pre-trained machine learning APIs Last month, for the first time, I visited Riga (Latvia), for the [DevTernity conference](https://devternity.com/). I really enjoyed my time there, and wish to come back with other topics next time. The organizers took very well care of the speakers, and the presentations were very interesting. I had the pleasure to talk about the [pre-trained machine learning APIs](https://cloud.google.com/products/machine-learning/) provided by Google Cloud Platform, and say a few words as well about [TensorFlow](https://www.tensorflow.org/) and [Cloud ML Engine](https://cloud.google.com/ml-engine/). The talk was recorded, and I wanted to share with you the video and the slides of the presentation. {{< youtube _5SmyANkq3o >}} And here is the deck: {{< speakerdeck c2927b99b7b64b9c83a90fe81704a9f2 >}} # The 2018 countdown: a tip a day about Google Cloud Platform A few weeks ago, I've started a new blog dedicated to [Google Cloud Platform](https://cloud.google.com/), to share tips'n tricks I come across while exploring the platform, getting to know new products, or gathered through experience with a particular service I've been using: [![](/img/gcp-tips/gcp-tips-frontpage.jpg)](https://googlecloud.tips/) With the holidays season, I went with a "2018 countdown" approach (like an "advent calendar" without the religious connotation), where I publish a tip every day of the month of December. As of today, December 18th, we already have 18 tips available! Those tips span about a dozen technologies! (as you can also explore the tips via "tags", which represent a technology / service / API / product each) ![](/img/gcp-tips/gcptips-tags.png) Initially, I thought it would be a challenge to author tips every day (without even thinking of the fact you have to publish tips during weekends, holidays or vacations), but there are actually plenty of tricks to share. Furthermore, I opened the blog to contributions: anyone who wants to contribute a tip or two is welcome, and should just share with me a quick gist describing the tip. I've already received a bunch of contributions, from 8 distinct authors. Thanks a lot to Alexandre, Bastien, Fabien, Graham, Jim, Mark, Victor, or Wassim! Although it all started as a tip a day for the 2018 countdown, it won't stop there. Perhaps the frequency will be a bit lower (once a week? more?), but I definitely intend on continuing sharing tips on a regular basis next year and beyond! If you want to help, please spread the word! Tell your friends and colleagues about the site: . Also please follow the [@gcptips](https://twitter.com/GcpTips) Twitter account where new tips'n tricks are announced. And if you've got some time, don't hesitate to share your own tips! All help is welcome :-) # Gradle vs Maven and Gradle in Kotlin or Groovy Once in a while, when talking about Gradle with developers, at conferences or within the Groovy community (but with the wider Java community as well), I hear questions about Gradle. In particular Gradle vs Maven, or whether developers adopt the Kotlin DSL for Gradle builds. In the past, I blogged several times about using BigQuery and the Github dataset to analyze open source projects hosted on Github, by running some SQL queries against that dataset. You might want to have a look at this past article on some [Gradle analysis with BigQuery]({{< ref "/posts/2016/12/03/analyzing-half-a-million-gradle-build-files" >}}). Considering those questions popped up recently, I decided to do a quick run through those questions with some simple queries. ## Gradle vs Maven? First, let's look at Maven builds. We can run the following query: ```sql SELECT count(*)  FROM [bigquery-public-data:github_repos.files]  WHERE path LIKE '%pom.xml' ``` There are 1,125,150 pom files. ![](/img/bq-groovy/gradle-builds-vs-maven-builds.png) Then, for Gradle, I ran this query (even if projects could have different build file names): ```sql SELECT count(*)  FROM [bigquery-public-data:github_repos.files]  WHERE path LIKE '%build.gradle' ``` There are 414,329 build.gradle files. ![](/img/bq-groovy/gradle-builds-in-groovy.png) So that's 1 Gradle build file for 2.7 Maven build file. ## Gradle builds in Kotlin or in Groovy? Now for Kotlin, the convention seems to be about naming your build files with build.gradle.kts. So let's run the following query: ```sql SELECT count(*)  FROM [bigquery-public-data:github_repos.files]  WHERE path LIKE '%build.gradle.kts' ``` There are only 207 Gradle builds files written in Kotlin. ![](/img/bq-groovy/gradle-builds-in-kotlin.png) Basically, Groovy-based Gradle builds are 2000 times more popular than Kotlin-based builds. ## A grain of salt Now, all that said, remember that developers can name their build files differently, that it's only a snapshot of the projects available on Github, and furthermore, just open source projects (at least projects that explicitly have a LICENSE file). Note for example as well that there are Gradle based projects that also have a pom.xml file available, although they're not using Maven for their build. Also, perhaps it'd be more interesting to run the queries by counting repositories, rather than build files: Perhaps Gradle users tend to split their build files in smaller build files, in a less monolithic way than with Gradle? Practices and habits may vary greatly. For the Gradle vs Maven question, at Devoxx Belgium, I ran the following (more complex) query where I look at repositories containing Gradle or Maven build files: ```sql #standardSQL select file, count(file) FROM (   (SELECT 'gradle' as file, count(repo_name)   FROM `bigquery-public-data.github_repos.files`   WHERE path LIKE '%build.gradle'   GROUP BY repo_name)   UNION ALL   (SELECT "maven" as file, count(repo_name)   FROM `bigquery-public-data.github_repos.files`   WHERE path LIKE '%pom.xml'   GROUP BY repo_name) ) group by file ``` Gradle and Maven are already much closer to each other by looking at repository counts than just by pure number of build files, perhaps indeed showing a trend with Gradle users to modularize their builds more. We get 118,386 repositories using Gradle versus 143,290 repositories using Maven. So Gradle is almost at the same level as Maven from that repository perspective, Still catching up with Maven! ## Famous last words Don't necessarily draw too big conclusions out of those figures, there are many ways to make stats, and those figures are only a small fraction of all the projects in existence in the world... but at least, they certainly exhibit a certain trend, which is still interesting to know and think about! # The JDK built-in web server with Apache Groovy In my timeline, I saw a tweet from Joe Walnes about the built-in HTTP server available in the JDK since Java 6. It's super convenient, starts super fast, easy to use, but I often forget about it. I'd probably not use it for serving planet-wide load, but it's very useful when you need to create a quick service, a little mock for testing some web or micro-service. Here's a little hello world for the fun. I'm taking advantage of Apache Groovy's closure-to-functional-interface coercion support, as well as the `with{}` method to reuse the `HttpServer` instance for two method calls on the same instance (I could've used it for the `http` variable as well, actually). ```groovy import com.sun.net.httpserver.HttpServer HttpServer.create(new InetSocketAddress(8080), 0).with {     createContext("/hello") { http ->         http.responseHeaders.add("Content-type", "text/plain")         http.sendResponseHeaders(200, 0)         http.responseBody.withWriter { out ->             out << "Hello ${http.remoteAddress.hostName}!"         }     }     start() } ``` # More voice control for Actions on Google Today, there were some interesting [announcements for Actions on Google](https://developers.googleblog.com/2017/11/help-users-find-interact-re-engage-with.html), for building your conversational interfaces for the [Google Assistant](https://assistant.google.com/). Among the great news, one item particularly caught my attention: the improved SSML support: > **Better SSML**: > We recently rolled out an update to the web simulator > which includes a new SSML audio design experience. > We now give you more options for creating natural, > quality dialog using newly supported SSML tags, including \, > \, \