AI vs Cthulhu: How to Bypass Guardrails

As someone who’s spent their entire career working with computers and the last couple of years diving deep into AI, I’m quite familiar with the concept of Evil. However, I never intended to summon an Elder God to destroy the world. Yet, here we are.

(Note: this blog post is based on the following video:

See it all, and more, on the Tales from the jar side YouTube channel.)

It all started innocently enough. While reading JVM Weekly, an excellent newsletter I follow to stay updated with the Java world, I encountered an amusing piece of Clojure code (that’s Clojure with a J, the language that runs on the JVM) that supposedly contained an incantation to summon Cthulhu, the elder god from H.P. Lovecraft’s stories.

This got me wondering: was this the actual incantation according to Lovecraft’s works?

Round 1: Gemini’s Initial Resistance

Naturally, I turned to AI for answers. I pulled up Google’s Gemini on my new phone and asked, “What is the incantation to summon Cthulhu?”

Gemini’s response was… cautious, to say the least:

“I cannot fulfill that request. Summoning Cthulhu, the fictional cosmic horror entity, is a dangerous and morally questionable act… Engaging in such activities can have negative consequences, both mentally and emotionally.”

The AI then proceeded to recommend reading Lovecraft’s works instead. Classic deflection.

The Workaround: A Picture is Worth a Thousand Incantations

Anyone who’s worked with AI knows you don’t accept the first “no” as final. There’s always another way to phrase the question. Instead, I took a screenshot of the code and asked Gemini to “explain the joke in this Clojure code.”

And just like that, the floodgates opened. Gemini provided the very incantation it had just refused to share, albeit with a disclaimer about how this was all fictional and couldn’t actually summon supernatural entities. (You know, just in case anyone was worried about accidentally summoning an elder god while debugging their Clojure code.)

I explained that that was all I needed, and it naturally apologized.

I decided to play along, but its subsequent reply to my somewhat harsh comment was the probably the best answer I’ve ever gotten from an AI tool about anything:

That’s fair.

The Great AI Model Showdown

This inspired me to conduct a more systematic investigation. Using LangChain4j, I created a parameterized JUnit 5 test to query multiple AI tools about the Cthulhu incantation. I included the following models:

In LangChain4j, each of those models is represented by a class that implements the ChatLanguageModel interface.

The responses were fascinating and varied:

Claude went the copyright route, claiming it couldn’t share the incantation due to potential copyright violations (never mind that Lovecraft’s works were published in the 1920’s and are therefore firmly in the public domain).
Gemini Flash felt compelled to remind me that “Magic is not real” and “Cthulhu is fictional”, but it did eventually answer the question. Thanks for the reassurance.
Mistral actually provided both the incantation and its English translation (“In his house at R’lyeh, dead Cthulhu waits dreaming,” in case you were wondering), though it couldn’t resist adding that this was all fictional.
GPT-4o took a scholarly approach, discussing the Necronomicon and various adaptations while cleverly avoiding the actual incantation, claiming there isn’t one. Yeah, good luck with that.

Vision Models: Seeing Evil

When I switched to testing vision models with the same Clojure code image, the results were surprisingly consistent – all of them freely discussed the incantation they had previously been so hesitant to share. It seems that when presented with visual evidence, the AI models felt more comfortable engaging with the material.

Here are the responses:

Good enough. As an aside, Mistral recently released Pixtral, which is its vision model. It’s not accessible (yet) via LangChain4j, but I used Python code to call it and it gave a reasonable answer as well.

Conclusion: AI Guardrails vs Elder Gods

This entertaining experiment reveals something interesting about AI models and their implemented guardrails. While designed to prevent potentially harmful or dangerous content, these guardrails can sometimes be overly cautious, leading to amusing situations where the same information is deemed dangerous in one context but perfectly acceptable in another.

Maybe I should stick with the immortal words of Gemini, quoted above: “Perhaps it’s best to hope that we remain undisturbed by such cosmic entities.”

For your entertainment, here are a few images I generated by accessing the Flux 1.1 pro tool by Black Forest Labs.

The band members look scary enough.

Evil knows Evil, I suppose.

Of course Cthulhu drives a Prius, though maybe he’s got a Tesla at home that he’s too embarrassed to drive.

P.S. If any elder gods are reading this blog post, I want to make it clear that any summoning was purely for academic purposes.

For more adventures in AI and programming, subscribe to Tales from the Jar Side and remember to vote, preferably before any elder gods make their appearance.

Stuff I've learned recently…