Overview
CareerLens is an AI-native career intelligence platform that helps professionals understand their career trajectory, skill gaps, and market positioning. It is not a chatbot that answers career questions — it is a structured agent system that reasons, reflects, retrieves, and produces auditable analysis.
This is the system I am actively building. The architecture section reflects current design decisions that are in production or actively being deployed.
Problem
Existing career tools fall into two categories:
- Static assessments — personality tests, rigid skill matrices, rule-based recommendations
- LLM chatbots — general-purpose models that hallucinate job market data and give superficially plausible but unreliable advice
Neither approach is reliable enough for someone making a significant career decision. The problem requires grounded retrieval (real job market data), structured reasoning (not just token prediction), and quality control (reflection and validation before output).
Constraints
- Groundedness: Claims about job markets must be retrievable from real data, not generated
- Structured output: Analysis must be parseable by downstream systems (not free text)
- Latency: Multi-agent processing can be slow; progressive output UX required
- Cost: GPT-4o is expensive; only use where reasoning quality justifies cost
- Reliability: Agent loops can fail mid-way; must be resumable
Architecture
Agent Graph (LangGraph)
User Input (resume + career goals)
│
▼
┌─────────────────────────────────────────────┐
│ CareerLens Graph │
│ │
│ Profile Parser ──→ Skill Extractor │
│ │ │ │
│ ▼ ▼ │
│ Job Market Retriever ←── Skill Gap Analyzer│
│ │ │
│ ▼ │
│ Analysis Synthesizer │
│ │ │
│ ▼ │
│ Reflection Node ──→ [if low confidence] │
│ │ └── Re-retrieve │
│ │ │
│ ▼ │
│ Output Formatter (Structured JSON) │
└─────────────────────────────────────────────┘
│
▼
Structured Career Report
Node Responsibilities
| Node | Model | Responsibility |
|---|---|---|
| Profile Parser | GPT-4o | Extract structured profile from resume (JSON) |
| Skill Extractor | GPT-4o + Structured Outputs | Taxonomy-mapped skill extraction |
| Job Market Retriever | OpenSearch (RAG) | Retrieve relevant job postings and market data |
| Skill Gap Analyzer | GPT-4o | Compare profile skills vs. market requirements |
| Analysis Synthesizer | GPT-4o | Produce initial career analysis |
| Reflection Node | GPT-4o | Evaluate analysis quality; request re-retrieval if weak |
| Output Formatter | GPT-4o mini | Format final output to JSON schema |
Technology Decisions
| Decision | Choice | Why |
|---|---|---|
| Agent framework | LangGraph | Explicit state graph with conditional edges; debuggable; supports cycles for reflection |
| Primary LLM | GPT-4o | Best reasoning quality for multi-step career analysis |
| Retrieval | OpenSearch + custom embeddings | Existing infra; good BM25 hybrid capability |
| Structured outputs | OpenAI Structured Outputs | Guarantees JSON schema compliance; eliminates parsing failures |
| State persistence | PostgreSQL | Agent state checkpointed per step; resumable on failure |
| Response streaming | SSE | Progressive output; user sees analysis building in real-time |
Implementation
LangGraph State Schema
State is explicitly typed — no magic dictionaries:
from langgraph.graph import StateGraph, END
from pydantic import BaseModel
from typing import Optional
class CareerAnalysisState(BaseModel):
# Inputs
resume_text: str
career_goals: str
# Intermediate
structured_profile: Optional[dict] = None
extracted_skills: Optional[list[str]] = None
retrieved_jobs: Optional[list[dict]] = None
skill_gaps: Optional[list[dict]] = None
# Analysis
initial_analysis: Optional[str] = None
reflection_score: Optional[float] = None # 0-1 quality score
reflection_feedback: Optional[str] = None
# Output
final_report: Optional[dict] = None
# Control
retry_count: int = 0
max_retries: int = 3
Reflection Loop
The reflection node is where quality control happens. It evaluates the initial analysis and either approves it or triggers a re-retrieval cycle:
async def reflection_node(state: CareerAnalysisState) -> CareerAnalysisState:
"""
Evaluate analysis quality and trigger retry if below threshold.
Uses structured output to get a consistent quality score.
"""
class ReflectionResult(BaseModel):
quality_score: float # 0.0 to 1.0
issues: list[str]
needs_more_data: bool
specific_gaps: list[str]
result = await openai_client.beta.chat.completions.parse(
model="gpt-4o",
messages=[
{
"role": "system",
"content": REFLECTION_SYSTEM_PROMPT,
},
{
"role": "user",
"content": f"""
Analysis: {state.initial_analysis}
Retrieved data: {len(state.retrieved_jobs)} job postings
Claimed skill gaps: {state.skill_gaps}
Evaluate if this analysis is specific, grounded, and actionable.
""",
},
],
response_format=ReflectionResult,
)
reflection = result.choices[0].message.parsed
return state.model_copy(update={
"reflection_score": reflection.quality_score,
"reflection_feedback": "\n".join(reflection.issues),
})
def should_retry(state: CareerAnalysisState) -> str:
"""Conditional edge: retry retrieval or proceed to output."""
if (
state.reflection_score < 0.7
and state.retry_count < state.max_retries
):
return "retry_retrieval"
return "format_output"
Graph Assembly
def build_career_graph() -> CompiledGraph:
graph = StateGraph(CareerAnalysisState)
graph.add_node("parse_profile", parse_profile_node)
graph.add_node("extract_skills", extract_skills_node)
graph.add_node("retrieve_jobs", retrieve_jobs_node)
graph.add_node("analyze_gaps", analyze_gaps_node)
graph.add_node("synthesize", synthesize_node)
graph.add_node("reflect", reflection_node)
graph.add_node("format_output", format_output_node)
graph.set_entry_point("parse_profile")
graph.add_edge("parse_profile", "extract_skills")
graph.add_edge("extract_skills", "retrieve_jobs")
graph.add_edge("retrieve_jobs", "analyze_gaps")
graph.add_edge("analyze_gaps", "synthesize")
graph.add_edge("synthesize", "reflect")
graph.add_conditional_edges(
"reflect",
should_retry,
{
"retry_retrieval": "retrieve_jobs", # Loop back
"format_output": "format_output",
},
)
graph.add_edge("format_output", END)
return graph.compile(checkpointer=PostgresCheckpointer())
Failures & Lessons
Failure 1: Free-text LLM output for structured data Early versions generated analysis as free text, then tried to parse it. JSON parsing failed ~15% of the time on complex nested structures. OpenAI's Structured Outputs API (with Pydantic model as response_format) reduced this to 0%.
Failure 2: Unbounded reflection loops Without a max_retries guard, a poor retrieval result could trigger infinite reflection-retry cycles. Added explicit retry counter in state, capped at 3 iterations.
Failure 3: No state persistence Multi-agent processing takes 15–30 seconds. Without state checkpointing, any failure (network, LLM timeout) required complete restart. Added LangGraph's PostgresCheckpointer to persist state after each node completion.
Failure 4: Retrieval context window overflow Initially retrieved 50 job postings per query. GPT-4o context was full; analysis quality degraded significantly. Implemented relevance-ranked truncation: top 15 postings, with key fields extracted to minimize tokens.
Future Improvements
- Fine-tuned skill extraction — Domain-specific fine-tuned model for skill taxonomy classification; current GPT-4o approach has inconsistencies with specialized technical skills
- Market trend grounding — Add real-time salary data and job posting velocity as retrieval sources
- Personalized follow-up questions — Agentic clarification before processing when input is ambiguous
- Streaming agent output — Real-time node progress updates via WebSocket (partially implemented)
Key Takeaways
- LangGraph's explicit state graph is significantly more debuggable than implicit agent frameworks — you always know what state caused what behavior
- Structured Outputs (Pydantic response_format) is not optional for multi-step agents that need to pass data between nodes
- Reflection loops genuinely improve output quality, but must be bounded to prevent runaway processing
- State persistence is a first-class requirement for any agent system where processing takes more than a few seconds