In the rapidly evolving landscape of AI APIs, Perplexity AI has emerged with a compelling proposition: real-time web search capabilities combined with AI-powered summation, all accessible through OpenAI-compatible endpoints. Even better? It comes at a fraction of the cost of traditional AI APIs.

Why Perplexity API Matters
While many developers are familiar with Perplexity’s consumer-facing search engine, their new API offering opens up possibilities for developers. Three key features make it particularly interesting:
- Real-time web access: Unlike many AI models that rely on static training data, Perplexity’s API can search and synthesize current information from the internet.
- Source citations: Results come with references, adding credibility and traceability to your AI-powered applications.
- OpenAI compatibility: If you’ve worked with OpenAI’s API, you already know how to use Perplexity’s API.
Getting Started
Integration is straightforward. In Python, you can use the existing OpenAI library:
import os
from openai import OpenAI
client = OpenAI(
api_key=os.getenv("PERPLEXITY_API_KEY"),
base_url="https://api.perplexity.ai"
)
response = client.chat.completions.create(
model="llama-3.1-sonar-small-128k-online",
messages=[
{"role": "system", "content":
"You are a helpful AI assistant"},
{"role": "user", "content":
"How many stars are in the universe?"}
]
)
For Java developers, LangChain4j provides a clean integration, again taking advantage of the OpenAI API. Just change the key, the base URL, and the model name:
ChatLanguageModel model= OpenAIChatModel.builder()
.apiKey(System.getenv("PERPLEXITY_API_KEY"))
.baseUrl("https://api.perplexity.ai")
.modelName(
"llama-3.1-sonar-small-128k-online")
.build();
String answer = perplexityModel.generate(
"How many r's are in the word 'strawberry'?");
System.out.println(answer);
The response was “The word “strawberry” contains three R’s: one R in “straw” and two more in “berry” (A-W-B-E-R-R). This is confirmed by multiple sources, including the interactions with AI models like ChatGPT and other large language models (LLMs) which often mistakenly count only two R’s due to their processing architecture and tokenization methods.“
A bit wordy, but reassuring for a small model.
Understanding the Cost Structure
Perplexity’s pricing model involves both a fixed cost and a variable amount, but, like similar models, you pay per request.
- Fixed cost: $5 per 1,000 requests (half a penny per request)
- Variable cost: Approximately half a penny (or less) per 1,000 tokens
This makes it more affordable than many alternatives, especially for applications requiring frequent API calls, if the smaller models are sufficient for your purpose.
Available Models
Perplexity offers several models, all based on Llama 3.1 model from Meta/Facebook.
- Sonar Online Models: Available in small (8B parameters), large (70B), and huge (405B) versions
- Chat Models: Optimized for conversational applications
The difference lies in their primary functionality: online models are specifically designed for real-time web access and information retrieval, while chat models excel at multi-turn conversations.
When to Choose Perplexity API
Consider Perplexity API when you need:
- Real-time information access in your applications
- Source-cited responses for better accountability
- Cost-effective AI API access
- Drop-in replacement for OpenAI’s API
Getting Started
To dive into implementing Perplexity API in your applications, check out the full video tutorial on the Tales from the Jar Side YouTube channel. The tutorial covers:
- Python and Java implementations
- Detailed pricing analysis
- Model comparison and selection
- Real-world usage examples
Looking Ahead
As AI APIs continue to evolve, Perplexity’s offering stands out for its combination of real-time web access, familiar developer experience, and competitive pricing. Whether you’re building a search-enhanced chatbot or need current information in your applications, it’s worth adding to your AI toolkit.
Ken Kousen is the author of Tales from the Jar Side, where he shares expert advice on Java, Kotlin, Spring, AI, and related topics. Subscribe to the channel for more technical tutorials and insights.

Leave a Reply