Skip to content
· LangHire Team

OpenAI-Compatible Providers: Connect LangHire to Together AI, Groq, DeepSeek, or Any Local Server

LangHire now supports any OpenAI-compatible API endpoint. Here's how to connect to Together AI, Groq, DeepSeek, LM Studio, vLLM, and more — with cost comparisons per job application.

guideprovidersself-hosting

LangHire has always let you choose your AI provider — OpenAI, Anthropic, AWS Bedrock, or Ollama for fully local inference. But many users wanted something in between: the ability to point LangHire at any API that speaks the OpenAI format.

Today we’re shipping exactly that.

What’s an OpenAI-Compatible API?

The OpenAI chat completions format (/v1/chat/completions) has become a de facto standard. Dozens of providers and local inference tools implement the same API shape, which means any client built for OpenAI can talk to them with just a URL change.

This includes:

  • Together AI — affordable hosted inference for Llama, Mistral, Qwen, and more
  • Groq — ultra-fast inference on custom LPU hardware
  • DeepSeek — strong reasoning models at low cost
  • Fireworks AI — optimized inference with function calling support
  • LM Studio — desktop app for running models locally with a GUI
  • vLLM — high-throughput self-hosted inference server
  • text-generation-webui — popular local model runner with OpenAI-compatible extension
  • Ollama (via /v1) — if you want the OpenAI-compatible interface instead of Ollama’s native one

How to Set It Up

  1. Go to Settings → LLM in LangHire
  2. Select “OpenAI-Compatible” as your provider
  3. Enter three things:
    • Base URL — the API endpoint (e.g. https://api.together.xyz/v1)
    • API Key — optional for local servers, required for cloud providers
    • Model name — whatever your server expects (e.g. meta-llama/Llama-3-70b-chat-hf)
  4. Click Test Connection to verify

That’s it. LangHire will use this endpoint for all AI operations — job collection, form filling, screening questions, and resume tailoring.

Provider-by-Provider Setup

Together AI

Together offers 100+ open models at competitive prices. Great balance of cost and quality.

FieldValue
Base URLhttps://api.together.xyz/v1
API KeyYour Together API key (get one here)
Modelmeta-llama/Llama-3.3-70B-Instruct-Turbo

Cost per application: ~$0.002–0.005 (significantly cheaper than OpenAI)

Groq

Groq’s custom hardware delivers extremely fast inference — often 10–20x faster than GPU-based providers. Great for batch applications where speed matters.

FieldValue
Base URLhttps://api.groq.com/openai/v1
API KeyYour Groq API key (get one here)
Modelllama-3.3-70b-versatile

Cost per application: ~$0.002–0.004 (and blazing fast)

DeepSeek

DeepSeek offers strong reasoning capabilities at very low cost. Good for complex screening questions.

FieldValue
Base URLhttps://api.deepseek.com/v1
API KeyYour DeepSeek API key
Modeldeepseek-chat

Cost per application: ~$0.001–0.003 (one of the cheapest options)

LM Studio (Local)

Run models on your own machine with a user-friendly GUI. Zero cost, zero data leaving your computer.

FieldValue
Base URLhttp://localhost:1234/v1
API Key(leave empty)
ModelWhatever you have loaded (e.g. llama-3.1-8b-instruct)

Cost per application: $0 (your electricity only)

vLLM (Self-Hosted)

For users with GPU servers who want high-throughput inference.

FieldValue
Base URLhttp://your-server:8000/v1
API Key(leave empty or set if configured)
ModelThe model you launched vLLM with

Cost per application: $0 beyond hardware/cloud GPU costs

Cost Comparison Per Job Application

A typical job application involves 3–8 LLM calls (page analysis, form filling, question answering). Here’s what that costs across providers:

ProviderModelCost/ApplicationSpeed
OpenAIGPT-4o~$0.01–0.03Fast
AnthropicClaude Sonnet~$0.01–0.03Fast
Together AILlama 3.3 70B~$0.002–0.005Moderate
GroqLlama 3.3 70B~$0.002–0.004Very fast
DeepSeekdeepseek-chat~$0.001–0.003Moderate
LM StudioLlama 3.1 8B$0Depends on hardware
OllamaAny local model$0Depends on hardware

If you’re applying to 50 jobs a day, the difference between $0.03/app (OpenAI) and $0.002/app (Together) is $1.50 vs $0.10 daily. Over a month-long job search, that’s $45 vs $3.

Which Provider Should You Choose?

Best quality (money no object): OpenAI GPT-4o or Anthropic Claude Sonnet — these remain the strongest general-purpose models for understanding complex forms and writing natural answers.

Best value: Together AI or Groq with Llama 3.3 70B — 90%+ of the quality at 1/10th the cost. This is the sweet spot for most users.

Cheapest cloud option: DeepSeek — remarkable quality for the price, especially on reasoning-heavy screening questions.

Maximum privacy: LM Studio or Ollama — nothing leaves your machine. Use at least a 7B parameter model; 70B+ if your hardware can handle it.

Fastest: Groq — if you’re running parallel workers and want applications submitted as quickly as possible.

Tips for Best Results

  1. Use at least a 70B model for form filling — smaller models (7B–13B) work but make more mistakes on complex multi-step ATS forms
  2. Test before batch applying — use the Test Connection button, then try one manual application to verify quality
  3. Local models need context — set a minimum 8K context window if your provider supports it; LangHire’s prompts can be lengthy with memory context included
  4. API key is truly optional — local servers (LM Studio, vLLM, Ollama) don’t need one. LangHire sends “not-needed” as a placeholder

Your Data, Your Choice

This is what local-first means in practice. You can now run LangHire with:

  • A local model (Ollama or LM Studio) → zero external API calls
  • A privacy-focused cloud provider → your choice who sees the prompts
  • A self-hosted GPU server → enterprise-grade inference under your control

No matter which option you pick, your resume, profile, application history, and learned memories never leave your machine. Only the LLM inference calls go out — and now you control exactly where they go.


Update LangHire to the latest version to see the new OpenAI-Compatible provider option in Settings → LLM. Download here or pull the latest from GitHub.