Large-Language-Models

Things you never dared to ask about LLMs

📅 October 24, 2024 — by Guillaume Laforge

Along my learning journey about generative AI, lots of questions popped up in my mind. I was very curious to learn how things worked under the hood in Large Language Models (at least having an intuition rather than knowing the maths in and out). Sometimes, I would wonder about how tokens are created, or how hyperparameters influence text generation.

Before the dotAI conference, I was invited to talk at the meetup organised by DataStax. I presented about all those things you never dared to ask about LLMs, sharing both the questions I came up with while learning about generative AI, and the answers I found and discovered along the way.

Advanced RAG Techniques

📅 October 14, 2024 — by Guillaume Laforge

generative-ai large-language-models java langchain4j retrieval-augmented-generation

Retrieval Augmented Generation (RAG) is a pattern to let you prompt a large language model (LLM) about your own data, via in-context learning by providing extracts of documents found in a vector database (or potentially other sources too).

Implementing RAG isn’t very complicated, but the results you get are not necessarily up to your expectations. In the presentations below, I explore various advanced techniques to improve the quality of the responses returned by your RAG system:

A Gemini and Gemma tokenizer in Java

📅 October 4, 2024 — by Guillaume Laforge

java large-language-models machine-learning langchain4j generative-ai

It’s always interesting to know how the sausage is made, don’t you think? That’s why, a while ago, I looked at embedding model tokenization, and I implemented a little visualization to see the tokens in a colorful manner. Yet, I was still curious to see how Gemini would tokenize text…

Both LangChain4j Gemini modules (from Vertex AI and from Google AI Labs) can count the tokens included in a piece of text. However, both do so by calling a REST API endpoint method called countTokens. This is not ideal, as it requires a network hop to get the token counts, thus adding undesired extra latency. Wouldn’t it be nicer if we could count tokens locally instead?

Some advice and good practices when integrating an LLM in your application

📅 September 23, 2024 — by Guillaume Laforge

large-language-models machine-learning best-practices patterns

When integrating an LLM into your applicaton to extend it and make it smarter, it’s important to be aware of the pitfalls and best practices you need to follow to avoid some common problems and integrate them successfully. This article will guide you through some key best practices that I’ve come across.

Understanding the Challenges of Implementing LLMs in Real-World Applications

One of the first challenges is that LLMs are constantly being improved. This means that the model you start using could change under the hood, and suddenly your application doesn’t work as it did before. Your prompts might need adjustments to work with the newer version, or worse, they might even lead to unintended results!

Let's make Gemini Groovy!

📅 June 3, 2024 — by Guillaume Laforge

groovy google-cloud generative-ai large-language-models java langchain4j

The happy users of Gemini Advanced, the powerful AI web assistant powered by the Gemini model, can execute some Python code, thanks to a built-in Python interpreter. So, for math, logic, calculation questions, the assistant can let Gemini invent a Python script, and execute it, to let users get a more accurate answer to their queries.

But wearing my Apache Groovy hat on, I wondered if I could get Gemini to invoke some Groovy scripts as well, for advanced math questions!

Grounding Gemini with Web Search results in LangChain4j

📅 May 28, 2024 — by Guillaume Laforge

google-cloud generative-ai large-language-models java langchain4j

The latest release of LangChain4j (version 0.31) added the capability of grounding large language models with results from web searches. There’s an integration with Google Custom Search Engine, and also Tavily.

The fact of grounding an LLM’s response with the results from a search engine allows the LLM to find relevant information about the query from web searches, which will likely include up-to-date information that the model won’t have seen during its training, past its cut-off date when the training ended.

Gemini, Google's Large Language Model, for Java Developers

📅 May 3, 2024 — by Guillaume Laforge

google-cloud generative-ai large-language-models java langchain4j

As a follow-up to my talk on generative AI for Java developers, I’ve developed a new presentation that focuses more on the Gemini large multimodal model by Google.

In this talk, we cover the multimodality capabilities of the model, as it’s able to ingest code, PDF, audio, video, and is able to reason about them. Another specificity of Gemini is its huge context window of up to 1 million tokens! This opens interesting perspectives, especially in multimodal scenarios.

Calling Gemma with Ollama, TestContainers, and LangChain4j

📅 April 3, 2024 — by Guillaume Laforge

google-cloud generative-ai large-language-models java containers langchain4j

Lately, for my Generative AI powered Java apps, I’ve used the Gemini multimodal large language model from Google. But there’s also Gemma, its little sister model.

Gemma is a family of lightweight, state-of-the-art open models built from the same research and technology used to create the Gemini models. Gemma is available in two sizes: 2B and 7B. Its weights are freely available, and its small size means you can run it on your own, even on your laptop. So I was curious to give it a run with LangChain4j.

Gemini codelab for Java developers using LangChain4j

📅 March 27, 2024 — by Guillaume Laforge

google-cloud generative-ai large-language-models java langchain4j

No need to be a Python developer to do Generative AI! If you’re a Java developer, you can take advantage of LangChain4j to implement some advanced LLM integrations in your Java applications. And if you’re interested in using Gemini, one of the best models available, I invite you to have a look at the following “codelab” that I worked on:

Codelab — Gemini for Java Developers using LangChain4j

In this workshop, you’ll find various examples covering the following use cases, in crescendo approach:

Visualize PaLM-based LLM tokens

📅 February 5, 2024 — by Guillaume Laforge

google-cloud generative-ai large-language-models micronaut java cloud-run

As I was working on tweaking the Vertex AI text embedding model in LangChain4j, I wanted to better understand how the textembedding-gecko model tokenizes the text, in particular when we implement the Retrieval Augmented Generation approach.

The various PaLM-based models offer a computeTokens endpoint, which returns a list of tokens (encoded in Base 64) and their respective IDs.

Note: At the time of this writing, there’s no equivalent endpoint for Gemini models.

So I decided to create a small application that lets users:

|< << 3 of 5 >> >|