coefficient-portfolio

Review Intelligence: Multi-Agent Review Collection Pipeline

A six-agent system that collects, processes, and manages product reviews and competitive intelligence across multiple marketplaces. Features progressive fallback web scraping, multi-level deduplication, and automated Google Sheets insertion.

Architecture

Master Orchestration Agent (coordinates sequential execution)
    ↓
┌──────────────────────────────────────────────┐
│ Data Collection Layer                         │
│ ├── Product Scraper (5 marketplaces, ALL ratings) │
│ ├── Competitor Scraper (marketplace sources, 1-3★) │
│ └── Alternative Sources Scraper (Capterra/Reddit) │
└──────────────────────────────────────────────┘
    ↓
Data Cleaning Agent (normalization + deduplication)
    ↓
Google Sheets Insertion Agent (batch insert + duplicate detection)

Pipeline Stages

Stage 1: Product Review Collection

Stage 2: Competitor Review Collection

Stage 3: Alternative Source Collection

Stage 4: Data Cleaning

Stage 5: Multi-Level Deduplication

Stage 6: Google Sheets Insertion

Agent Specs

Agent Role Key Capability
Master Orchestrator Pipeline coordinator Sequential execution, error handling, partial-result acceptance
Product Scraper Own-product review collection 5-marketplace coverage, all ratings
Competitor Scraper Competitive intelligence 10+ tools, negative reviews only (1-3★)
Alt Sources Scraper Non-marketplace collection Capterra, SoftwareAdvice, Reddit sentiment analysis
Data Cleaner Normalization + dedup 3-level deduplication, date/rating/content standardization
Sheets Inserter Data persistence Batch insert, row-2 strategy, duplicate prevention

Progressive Fallback Tool Strategy

Tier 1 (Cost-Effective): Exa Search + Firecrawl
    ↓ (if blocked/restricted)
Tier 2 (Browser Automation): Browserbase

Data Schema

Product Reviews (7 columns)

| Column | Type | |——–|——| | Review Date | YYYY-MM-DD | | Rating | 1-5 | | Review Content | Text | | Reviewer Name | Text | | Reviewer Location/Company | Text | | Marketplace | Standardized name | | Review Link | URL |

Competitor Reviews (8 columns)

| Column | Type | |——–|——| | Review Date | YYYY-MM-DD | | Rating | 1-3 only | | Review Content | Text | | Reviewer Name | Text | | Reviewer Location/Company | Text | | Marketplace | Standardized name | | Tool | Competitor tool name | | Review Link | URL |

Results

Stack

Key Decisions

  1. Negative reviews only for competitors — Collecting 1-3 star reviews provides competitive intelligence on weaknesses without overwhelming data volume.
  2. Progressive fallback over single tool — Exa/Firecrawl handle 80% of cases cheaply; Browserbase reserved for restricted platforms.
  3. Row-2 insertion strategy — Newest reviews always at top for visibility. Avoids append-to-bottom which buries recent data.
  4. Reddit sentiment mapping — Converts freeform community complaints into structured 1-3 star ratings using keyword analysis.
  5. Cross-source deduplication — Prevents the same review from appearing in both product and competitor sheets.

Visual Direction

Diagram: Layered architecture showing collection layer (3 scrapers) feeding into cleaning layer (normalization + 3-level dedup), then insertion layer (batch processing + row-2 strategy). Side panel shows progressive fallback decision tree (Exa → Firecrawl → Browserbase). Color-coded by marketplace source.