Perplexity AI API: Real-Time Search and AI Summation

In the rapidly evolving landscape of AI APIs, Perplexity AI has emerged with a compelling proposition: real-time web search capabilities combined with AI-powered summation, all accessible through OpenAI-compatible endpoints. Even better? It comes at a fraction of the cost of traditional AI APIs.

Why Perplexity API Matters

While many developers are familiar with Perplexity’s consumer-facing search engine, their new API offering opens up possibilities for developers. Three key features make it particularly interesting:

Real-time web access: Unlike many AI models that rely on static training data, Perplexity’s API can search and synthesize current information from the internet.
Source citations: Results come with references, adding credibility and traceability to your AI-powered applications.
OpenAI compatibility: If you’ve worked with OpenAI’s API, you already know how to use Perplexity’s API.

Getting Started

Integration is straightforward. In Python, you can use the existing OpenAI library:

import os
from openai import OpenAI

client = OpenAI(
    api_key=os.getenv("PERPLEXITY_API_KEY"),
    base_url="https://api.perplexity.ai"
)

response = client.chat.completions.create(
    model="llama-3.1-sonar-small-128k-online",
    messages=[
        {"role": "system", "content": 
              "You are a helpful AI assistant"},
        {"role": "user", "content": 
         "How many stars are in the universe?"}
    ]
)

For Java developers, LangChain4j provides a clean integration, again taking advantage of the OpenAI API. Just change the key, the base URL, and the model name:

ChatLanguageModel model= OpenAIChatModel.builder()
    .apiKey(System.getenv("PERPLEXITY_API_KEY"))
    .baseUrl("https://api.perplexity.ai")
    .modelName(
        "llama-3.1-sonar-small-128k-online")
    .build();

String answer = perplexityModel.generate(
    "How many r's are in the word 'strawberry'?");
System.out.println(answer);

The response was “The word “strawberry” contains three R’s: one R in “straw” and two more in “berry” (A-W-B-E-R-R). This is confirmed by multiple sources, including the interactions with AI models like ChatGPT and other large language models (LLMs) which often mistakenly count only two R’s due to their processing architecture and tokenization methods.“

A bit wordy, but reassuring for a small model.

Understanding the Cost Structure

Perplexity’s pricing model involves both a fixed cost and a variable amount, but, like similar models, you pay per request.

Fixed cost: $5 per 1,000 requests (half a penny per request)
Variable cost: Approximately half a penny (or less) per 1,000 tokens

This makes it more affordable than many alternatives, especially for applications requiring frequent API calls, if the smaller models are sufficient for your purpose.

Available Models

Perplexity offers several models, all based on Llama 3.1 model from Meta/Facebook.

Sonar Online Models: Available in small (8B parameters), large (70B), and huge (405B) versions
Chat Models: Optimized for conversational applications

The difference lies in their primary functionality: online models are specifically designed for real-time web access and information retrieval, while chat models excel at multi-turn conversations.

When to Choose Perplexity API

Consider Perplexity API when you need:

Real-time information access in your applications
Source-cited responses for better accountability
Cost-effective AI API access
Drop-in replacement for OpenAI’s API

Getting Started

To dive into implementing Perplexity API in your applications, check out the full video tutorial on the Tales from the Jar Side YouTube channel. The tutorial covers:

Python and Java implementations
Detailed pricing analysis
Model comparison and selection
Real-world usage examples

Looking Ahead

As AI APIs continue to evolve, Perplexity’s offering stands out for its combination of real-time web access, familiar developer experience, and competitive pricing. Whether you’re building a search-enhanced chatbot or need current information in your applications, it’s worth adding to your AI toolkit.

Ken Kousen is the author of Tales from the Jar Side, where he shares expert advice on Java, Kotlin, Spring, AI, and related topics. Subscribe to the channel for more technical tutorials and insights.

Stuff I've learned recently…