❯ Guillaume Laforge

Json

Data extraction: The many ways to get LLMs to spit JSON content

Data extraction from unstructured text is a very important task where LLMs shine, as they understand human languages well. Rumor has it that 80% of the worldwide knowledge and data comes in the form of unstructured text (vs 20% for data stored in databases, spreadsheets, JSON/XML, etc.) Let’s see how we can get access to that trove of information thanks to LLMs.

In this article, we’ll have a look at different techniques to make LLMs generate JSON output and extract data from text. This applies to most LLMs and frameworks, but for illustration purposes, we’ll use Gemini and LangChain4j in Java.

Read more...

Tech Watch #2 β€” Oct 06, 2023

  • Generative AI exists because of the transformer
    I confess I rarely read the Financial Times, but they have a really neat articles with animations on how large language models work, thanks to the transformer neural network architecture, an architecture invented by Google in 2017. They talk about text vector embeddings, how the self-attention makes LLM understand the relationship between words and the surrounding context, and also doesn’t forget to mention hallucinations, how “grounding” and RLHF (Reinforcement Learning with Human Feedback) can help mitigate them to some extent.

    Read more...

Upload and use JSON data in your workflow from GCS

Following up the article on writing and reading JSON files in cloud storage buckets (GCS), we saw that we could access the data of the JSON file, and use it in our workflow. Let’s have a look at a concrete use of this.

Today, we’ll take advantage of this mechanism to avoid hard-coding the URLs of the APIs we call from our workflow. That way, it makes the workflow more portable across environments.

Read more...

Reading in and writing a JSON file to a storage bucket from a workflow

Workflows provides several connectors for interacting with various Google Cloud APIs and services. In the past, I’ve used for example the Document AI connector to parse documents like expense receipts, or the Secret Manager connector to store and access secrets like passwords. Another useful connector I was interested in using today was the Google Cloud Storage connector, to store and read files stored in storage buckets.

Those connectors are auto-generated from their API discovery descriptors, but there are some limitations currently that prevent, for example, to download the content of a file. So instead of using the connector, I looked at the JSON API for cloud storage to see what it offered (insert and get methods).

Read more...