When I was building ScoreResume, I hit a wall that every engineer working with LLMs eventually hits: timeouts.
A resume analysis job involves chunking the resume, chunking the job description, running semantic retrieval with pgvector, and then making multiple LLM calls for multi-dimensional scoring. The whole thing can take 30–90 seconds. That’s way beyond any reasonable HTTP timeout.
The solution? An async processing pipeline using BullMQ and Redis. Here’s how I built it.
The Problem
When a user uploads their resume, they expect results — not a spinning loader that eventually times out. The naive approach is to call the LLM synchronously in the request handler:
// ❌ This will timeoutasync function analyzeResume(resumeId: string, jdId: string) { const resume = await getResume(resumeId); const jd = await getJobDescription(jdId);
const chunks = await chunkContent(resume, jd); const context = await retrieveRelevantChunks(chunks);
// This call alone can take 30-60 seconds const score = await llm.score(context);
await saveScore(resumeId, score); return score;}The user’s request hangs. The load balancer kills it at 30 seconds. The user sees a 502. Game over.
The Solution: Queue Everything
The fix is to decouple the AI analysis from the user’s request. The user uploads, gets an immediate “processing” response, and we handle the heavy lifting in the background.
Step 1: Define the Queue
import { Queue, QueueEvents } from "bullmq";import IORedis from "ioredis";
const connection = new IORedis(process.env.REDIS_URL!, { maxRetriesPerRequest: null,});
export const resumeQueue = new Queue("resume-analysis", { connection, defaultJobOptions: { attempts: 3, backoff: { type: "exponential", delay: 5000, }, removeOnComplete: 100, removeOnFail: 200, },});
export const resumeQueueEvents = new QueueEvents("resume-analysis", { connection,});Key decisions here:
maxRetriesPerRequest: null— BullMQ requires this on the Redis connection.attempts: 3— LLMs fail. Network blips happen. Retry.exponential backoff— 5s, 10s, 20s. Don’t hammer the API.removeOnComplete: 100— Keep the last 100 completed jobs for debugging, then auto-clean.
Step 2: Define the Worker
import { Worker } from "bullmq";import IORedis from "ioredis";
const connection = new IORedis(process.env.REDIS_URL!, { maxRetriesPerRequest: null,});
const worker = new Worker( "resume-analysis", async (job) => { const { resumeId, jdId, userId } = job.data;
console.log(`[Worker] Processing resume ${resumeId} for user ${userId}`);
const resume = await getResume(resumeId); const jd = await getJobDescription(jdId);
const chunks = await chunkContent(resume, jd); const context = await retrieveRelevantChunks(chunks); const score = await llm.score(context);
await saveScore(resumeId, score); await notifyUser(userId, "resume-ready", { resumeId });
return { success: true, scoreId: score.id }; }, { connection, concurrency: 5 },);
worker.on("failed", (job, err) => { console.error(`[Worker] Job ${job?.id} failed:`, err.message);});Step 3: Enqueue from the API
- // Old synchronous approach- const score = await analyzeResume(resumeId, jdId);- return Response.json(score);+ // New async approach+ await resumeQueue.add("analyze", {+ resumeId,+ jdId,+ userId: session.user.id,+ });+ return Response.json({+ status: "processing",+ message: "Your resume is being analyzed. We'll notify you when it's ready.",+ });The user gets an instant response. The work happens in the background. No timeouts.
Dead-Letter Queue Handling
After 3 failed attempts, the job moves to the failed set. But I don’t want to lose those — they’re valuable for debugging. So I added a dead-letter queue:
const dlqQueue = new Queue("resume-dlq", { connection });
resumeQueueEvents.on("failed", async ({ jobId, failedReason }) => { const job = await resumeQueue.getJob(jobId); if (!job) return;
// Move to dead-letter queue with the failure reason await dlqQueue.add("failed-analysis", { ...job.data, originalJobId: jobId, failureReason: failedReason, failedAt: new Date().toISOString(), });
// Alert the team await sentry.captureException(new Error(failedReason), { extra: { jobId, jobData: job.data }, });});Why This Matters
The async pipeline transformed ScoreResume’s reliability:
- Zero timeout failures — users never see a 502 from analysis.
- Automatic retries — transient LLM failures self-heal.
- Observable — every job is traceable in BullMQ’s dashboard.
- Scalable — bump
concurrencyto handle more parallel jobs.
If you’re building anything with LLMs that takes more than a few seconds, queue it. Your users (and your error budget) will thank you.