Building Async AI Pipelines with BullMQ and Redis

When I was building ScoreResume, I hit a wall that every engineer working with LLMs eventually hits: timeouts.

A resume analysis job involves chunking the resume, chunking the job description, running semantic retrieval with pgvector, and then making multiple LLM calls for multi-dimensional scoring. The whole thing can take 30–90 seconds. That’s way beyond any reasonable HTTP timeout.

The solution? An async processing pipeline using BullMQ and Redis. Here’s how I built it.

The Problem

When a user uploads their resume, they expect results — not a spinning loader that eventually times out. The naive approach is to call the LLM synchronously in the request handler:

// ❌ This will timeout
async function analyzeResume(resumeId: string, jdId: string) {
  const resume = await getResume(resumeId);
  const jd = await getJobDescription(jdId);

  const chunks = await chunkContent(resume, jd);
  const context = await retrieveRelevantChunks(chunks);

  // This call alone can take 30-60 seconds
  const score = await llm.score(context);

  await saveScore(resumeId, score);
  return score;
}

The user’s request hangs. The load balancer kills it at 30 seconds. The user sees a 502. Game over.

The Solution: Queue Everything

The fix is to decouple the AI analysis from the user’s request. The user uploads, gets an immediate “processing” response, and we handle the heavy lifting in the background.

Step 1: Define the Queue

import { Queue, QueueEvents } from "bullmq";
import IORedis from "ioredis";

const connection = new IORedis(process.env.REDIS_URL!, {
  maxRetriesPerRequest: null,
});

export const resumeQueue = new Queue("resume-analysis", {
  connection,
  defaultJobOptions: {
    attempts: 3,
    backoff: {
      type: "exponential",
      delay: 5000,
    },
    removeOnComplete: 100,
    removeOnFail: 200,
  },
});

export const resumeQueueEvents = new QueueEvents("resume-analysis", {
  connection,
});

Key decisions here:

maxRetriesPerRequest: null — BullMQ requires this on the Redis connection.
attempts: 3 — LLMs fail. Network blips happen. Retry.
exponential backoff — 5s, 10s, 20s. Don’t hammer the API.
removeOnComplete: 100 — Keep the last 100 completed jobs for debugging, then auto-clean.

Step 2: Define the Worker

import { Worker } from "bullmq";
import IORedis from "ioredis";

const connection = new IORedis(process.env.REDIS_URL!, {
  maxRetriesPerRequest: null,
});

const worker = new Worker(
  "resume-analysis",
  async (job) => {
    const { resumeId, jdId, userId } = job.data;

    console.log(`[Worker] Processing resume ${resumeId} for user ${userId}`);

    const resume = await getResume(resumeId);
    const jd = await getJobDescription(jdId);

    const chunks = await chunkContent(resume, jd);
    const context = await retrieveRelevantChunks(chunks);
    const score = await llm.score(context);

    await saveScore(resumeId, score);
    await notifyUser(userId, "resume-ready", { resumeId });

    return { success: true, scoreId: score.id };
  },
  { connection, concurrency: 5 },
);

worker.on("failed", (job, err) => {
  console.error(`[Worker] Job ${job?.id} failed:`, err.message);
});

Step 3: Enqueue from the API

- // Old synchronous approach
- const score = await analyzeResume(resumeId, jdId);
- return Response.json(score);
+ // New async approach
+ await resumeQueue.add("analyze", {
+   resumeId,
+   jdId,
+   userId: session.user.id,
+ });
+ return Response.json({
+   status: "processing",
+   message: "Your resume is being analyzed. We'll notify you when it's ready.",
+ });

The user gets an instant response. The work happens in the background. No timeouts.

Dead-Letter Queue Handling

After 3 failed attempts, the job moves to the failed set. But I don’t want to lose those — they’re valuable for debugging. So I added a dead-letter queue:

const dlqQueue = new Queue("resume-dlq", { connection });

resumeQueueEvents.on("failed", async ({ jobId, failedReason }) => {
  const job = await resumeQueue.getJob(jobId);
  if (!job) return;

  // Move to dead-letter queue with the failure reason
  await dlqQueue.add("failed-analysis", {
    ...job.data,
    originalJobId: jobId,
    failureReason: failedReason,
    failedAt: new Date().toISOString(),
  });

  // Alert the team
  await sentry.captureException(new Error(failedReason), {
    extra: { jobId, jobData: job.data },
  });
});

Why This Matters

The async pipeline transformed ScoreResume’s reliability:

Zero timeout failures — users never see a 502 from analysis.
Automatic retries — transient LLM failures self-heal.
Observable — every job is traceable in BullMQ’s dashboard.
Scalable — bump concurrency to handle more parallel jobs.

If you’re building anything with LLMs that takes more than a few seconds, queue it. Your users (and your error budget) will thank you.