Automated content publishing pipeline that transforms structured HTML drafts into WordPress posts with custom blocks, taxonomy, and ACF field data.
Parses HTML files containing structured content (titles, metadata, FAQ blocks, TLDR lists, data tables, embedded media) and publishes them to a WordPress site via the REST API. Handles concurrent extraction with sequential publishing, image resolution from the media library, and file lifecycle management (draft → published).
Built for a B2B SaaS company’s content team to publish ~4,800 playbook articles at scale without manual WordPress entry.
┌─────────────────┐ ┌──────────────────┐ ┌───────────────────┐
│ HTML Drafts │────▶│ Extract Workers │────▶│ Publish Workers │
│ (structured) │ │ (concurrent x4) │ │ (sequential) │
└─────────────────┘ └──────────────────┘ └───────────────────┘
│ │
▼ ▼
┌──────────┐ ┌──────────────┐
│ JSDOM │ │ WordPress │
│ Parser │ │ REST API │
└──────────┘ └──────────────┘
.html filespublished/ directorypost-tag:, block:, content:image:) lets authors embed metadata inline without switching tools.The pipeline accepts HTML with custom semantic tags:
| Tag | Format | Purpose |
|---|---|---|
<h1> |
<h1>Title</h1> |
Post title (required) |
<h3> |
post-tag:author:Name |
Author metadata |
<h3> |
post-tag:url:slug |
URL slug |
<h3> |
post-tag:description:... |
Archive description |
<h3> |
post-tag:tags: + <ul> |
Taxonomy tags |
<h3> |
post-tag:categories: + <ul> |
Taxonomy categories |
<h3> |
post-tag:schema:{json} |
Schema.org markup |
<h3> |
block:tldr + <ul> |
TLDR step list |
<h3> |
block:faq + <table> |
FAQ Q&A pairs |
<h3> |
block:objects-reports + <table> |
Two-column data table |
<h3> |
block:youtube-embed:URL |
YouTube embed |
<h3> |
content:image:name:alt |
WordPress media image |
For designer: Pipeline flow diagram showing HTML files entering extraction workers (parallel), converging into sequential publish queue, with WordPress API as the output. Include the tag parsing step as a sub-process within extraction. Color-code the three phases: discovery (gray), extraction (blue), publishing (green).