Methodology4 min read

How a thousand papers become a single map.

The AI Research Lab continuously ingests papers from arXiv, embeds them into a vector database, and runs a coordinated team of specialist agents to extract structured insights — contradictions, emerging benchmarks, research frontiers, cross-paper connections.

What this is

The AI Research Lab is an agent-powered observatory for the AI literature. The goal is to surface what matters across hundreds of papers without requiring you to read them all.

Data sources

Papers are fetched from arxiv.org via the Semantic Scholar API for richer metadata (citations, open-access status, author affiliations). Quality signals come from HuggingFace, Papers with Code, and OpenReview.

cs.AIArtificial Intelligence

cs.CLComputation & Language

cs.LGMachine Learning

cs.MAMultiagent Systems

cs.CVComputer Vision

stat.MLStatistics / ML

Papers are filtered by recency (default 6 months), abstract length, and minimum relevance score — then sorted by influence signals to surface high-signal work early.

Ingestion pipeline

Each ingestion run follows four sequential steps.

01
Search
Query the Semantic Scholar Graph API with the topic's search terms. Paginate, filter by category and abstract length, deduplicate against stored IDs.
02
Embed
Each paper's abstract and title is chunked into ~500-token segments with 50-token overlap, then embedded via Gemini text-embedding-001 to 768-dimensional vectors.
03
Store
Metadata into PostgreSQL. Embeddings into a pgvector column indexed with HNSW (cosine) for sub-millisecond ANN lookup.
04
Link
Papers are joined to their topic; the topic's paper count and last-sync timestamp update atomically.

Agent system

Five specialist agents process the ingested papers. Each receives the full corpus for the topic and produces a structured artifact powering a dashboard tab.

Paper Analyzer

Extracts the core problem, the approach, the main result, and a plain-language takeaway for each paper. Powers the Papers tab.

Trend Mapper

Tracks how research intensity has shifted across sub-topics over time. Surfaces emerging and declining areas. Drives the Topic Evolution chart.

Contradiction Finder

Surfaces papers making conflicting empirical or methodological claims. Also identifies areas of consensus and open debates across the collection.

Benchmark Extractor

Pulls benchmark names, metrics, and scores. Warns when papers report incomparable numbers (different datasets, splits, or metrics).

Frontier Detector

Identifies paradigm shifts, breakthroughs, and underexplored gaps. Surfaces genuinely new directions over incremental improvements.

Three-phase execution

Agents are orchestrated in three phases to manage dependencies and maximize parallelism.

Phase I

Parallel foundation

Paper Analyzer · Trend Mapper

No dependencies. Run in parallel against the raw corpus.

Phase II

Builds on Phase I

Contradiction Finder · Benchmark Extractor

Use the structured summaries from the Paper Analyzer. Run in parallel with each other.

Phase III

Synthesis

Frontier Detector

Synthesizes outputs from all four prior agents. Runs last with full context.

Stack

Next.js 15

App Router

Drizzle ORM

PostgreSQL

pgvector

HNSW ANN index

Gemini

text-embedding-001

Instruct LLM

Agent reasoning

Recharts

Trend & landscape charts

Open source

The full source code — ingestion, agents, API routes, frontend — is on GitHub. Issues and pull requests are welcome.

View on GitHub

AI Research Lab · built by Abhi Das · Back to dashboard