Conversations with LangChain4j

I published a video last week, and it’s not doing as well as I’d hoped. The problem, I expect, is that neither the title nor the thumbnail, nor even the hook really gets across what the subject matter is all about.

Here’s the video:

The actual title is Expert Tips for Chat Memory on LangChain4j. That seems clear enough, right? And the words on the thumbnail pretty much say the same thing. Still, the number of views at the moment is still low.

Maybe the audience for LangChain4j isn’t that large, and if that’s the case, I’m simply ahead of the market. 🙂 LangChain4j is simply awesome, and it’s one of only two AI integration Java frameworks I recommend without hesitation (the other being Spring AI, which just went to version 1.0.0-M2). LangChain4j is stable, effective, and quite powerful, and deserves as big an audience as it can get.

With that in mind, let me give you an idea what Chat Memory is, why it matters, and how LangChain4j makes managing it almost trivial.

Most people who use AI tools don’t realize that every request you send is independent from every other one. That’s not obvious, because when you use the website for ChatGPT, or Claude, or Gemini, it doesn’t work like that. As long as you stay in the same conversation, the requests all are connected. The AI tool remembers what you previously said, at least until it doesn’t.

But the way it does that is fascinating. If you use the API to access the site programmatically, you can see the stateless nature of each request.

Say I load the GPT-4o-mini model:

public final ChatLanguageModel gpt4o =
     OpenAiChatModel.builder()
        .apiKey(System.getenv("OPENAI_API_KEY"))
        .modelName(OpenAiChatModelName.GPT_4_O_MINI)
        .build();

Now I send a message that includes a name, and then ask for it back:

@Test
void stateless_demo() {
    String firstAnswer = gpt4o.generate("""
        Hello. My name is Inigo Montoya.
        You killed my father.
        Prepare to die.
        """);
    String secondAnswer = 
        gpt4o.generate("What's my name?");
    // ... various assertions about the answers ...
}

What you’ll find is that GPT responds to the first query with a greeting, and then gives a generic disclaimer for the second one:

FIRST: Hello, Inigo Montoya. I'm here to listen. What would you like to discuss?
SECOND: I'm sorry, but I don't have access to personal information about individuals unless you've shared it with me in this conversation. How can I assist you today?

The way you fix this is to keep track of the previous question and answer pairs, and submit them along with any new questions. That’s what makes a conversation: you include all the previous requests and responses in the current one.

Unfortunately, managing that process by hand can be tedious, even in a tool like LangChain4j. The framework provides two implementations of an interface called ChatMemory: one is called MessageWindowChatMemory, and the other is called TokenWindowChatMemory. With the message window, you say how many messages to remember. With the token window, you specify the max number of tokens, and the framework includes as many messages as can fit.

Handling all that manually likes like this:

ChatMemory memory = 
   MessageWindowChatMemory.withMaxMessages(10);

// First message. Add both user message 
// and AI response to memory.
memory.add(UserMessage.from("""
    My name is Bond. James Bond.
    """));
AiMessage firstResponse = 
    gpt4o.generate(memory.messages()).content();
memory.add(firstResponse);
String firstAnswer = firstResponse.text();

// Second message. Add user message to memory,
// then send all messages to get the response text.
memory.add(UserMessage.from("What's my name?"));
AiMessage secondResponse = 
    gpt4o.generate(memory.messages()).content();
memory.add(secondResponse);
String secondAnswer = secondResponse.text();

That works:

FIRST: Ah, Mr. Bond! A pleasure to make your acquaintance. How can I assist you today? Perhaps you have a mission in mind or need some information?
SECOND: Your name is Bond. James Bond. How can I assist you further, Mr. Bond?

Fortunately, there’s a much easier way, especially if you’re willing to use the AI Services capability inside of LangChain4j. That means you make an interface and tell the framework to implement it for you:

public interface Assistant() {
    String chat(String message);
}

// Use AI Services to create an Assistant 
// with chat memory.
private Assistant createAssistant(
                  ChatLanguageModel model) {
    return AiServices.builder(Assistant.class)
            .chatLanguageModel(model)
            .chatMemory(
    MessageWindowChatMemory.withMaxMessages(10))
            .build();
}

See the call to the chatMemory method as part of the builder code? That tells the framework to always maintain the last ten messages, assuming they fit into the overall context window.

With that in mind, the memory is managed automatically:

Assistant assistant = createAssistant(gpt4o);
// All messages added to memory and 
// submitted in next request automatically.
String firstAnswer = assistant.chat("""
    My name is Maximus Decimus Meridius,
    commander of the Armies of the North,
    General of the Felix Legions, loyal servant
    to the true emperor, Marcus Aurelius.
    Father to a murdered son,
    husband to a murdered wife.
    And I will have my vengeance,
    in this life or the next.
    """);
String secondAnswer = assistant.chat(
    "What's my name?");

The result is what you want:

FIRST: That’s a powerful and iconic quote from the film "Gladiator." Maximus Decimus Meridius, portrayed by Russell Crowe, embodies themes of honor, revenge, and leadership. His quest for vengeance drives the narrative and highlights the deep personal losses he endures. If you're interested, we could discuss more about the film, its themes, or its characters!
SECOND: Your name is Maximus Decimus Meridius.

Way easier. You can also ask the framework to manage a separate chat memory instance per user. All you have to do is add a property to the interface, annotated with @MemoryId, and use the chatMemoryProvider method in the builder:

public interface AssistantPerUser {
    String chat(@MemoryId int userId, 
                @UserMessage String message);
}

private AssistantPerUser createAssistantPerUser(
           ChatLanguageModel model) {
    return AiServices.builder(AssistantPerUser.class)
            .chatLanguageModel(model)
            .chatMemoryProvider(memoryId ->
    MessageWindowChatMemory.withMaxMessages(10))
            .build();
}

Now this code works for two separate users, where the each use a different ID:

AssistantPerUser assistant = createAssistantPerUser(gpt4o);

String firstQuestionUser1 = """
        Now, say my name.
        Heisenberg.
        You're g***m right.
        """;

String firstQuestionUser2 = """
        My name is Shake Zula
        The mic rulah
        The old schoolah
        You wanna trip?
        I’ll bring it to ya
        """;

String secondQuestion = "What's my name?";

String firstAnswerUser1 = 
    assistant.chat(1, firstQuestionUser1);
String firstAnswerUser2 = 
    assistant.chat(2, firstQuestionUser2);
String secondAnswerUser1 = 
    assistant.chat(1, secondQuestion);
String secondAnswerUser2 = 
    assistant.chat(2, secondQuestion);

The response is now like:

FIRST: You are the one who knocks.
SECOND: That's a great start! You’re quoting the theme song from "Aqua Teen Hunger Force," right? It's a classic! How can I help you today?
THIRD: Your name is Heisenberg. How can I assist you further?
FOURTH: Your name is Shake Zula! How can I assist you, Shake?

To be honest, that didn’t work the first time I ran it, but it had nothing to do with the memory. GPT-4o-mini thought the song was Rapper’s Delight from the Sugar Hill Gang, instead of recognizing the ATHF theme song. Switching to the regular GPT-4o fixed that.

Finally, you can also make the memory persistent. The LangChain4j documentation has a nice example of that, and I borrowed it (or at least the database implementation code) for my demo. If you want to see the details, check out my GitHub repository.

See how easy that was? I figured that was worth a YouTube video, and, as it turns out, a blog post. Let me know if you agree or not. Next time I’ll try to make a more relevant title or thumbnail.

Now I just have to somehow get the Aqua Teen Hunger Force theme song out of my head. Again.

(Number One in the hood, ‘G)

All the code can be found here.

One response to “Conversations with LangChain4j”

Creating an AI-Generated Opera with LangChain4j

September 21, 2024 at 4:41 pm

[…] have a whole video on that, and a blog post, if you’re […]

Loading…

Stuff I've learned recently…