NLWeb (Natural Language Web) 是由 Microsoft 提出的標準，旨在讓 AI agents 能夠更容易地發現和理解網站內容。透過 llms.txt、結構化 RSS feed、Schema.org JSON-LD 等組件，提供 AI 友善的內容索引。目前已有 Shopify、Snowflake、O'Reilly Media 等 12+ 企業採用。

llms.txt 和 llms-full.txt 有什麼差別？

llms.txt 是靜態的入口檔案，提供網站整體結構導覽（類似網站地圖的文字版）。llms-full.txt 則是動態生成的完整內容索引，列出所有文章、專案、標籤的詳細清單，由 build script 自動更新。

為什麼要從 RSS 2.0 升級到 Atom 1.0？

Atom 1.0 提供更完整的 metadata 支援（如 author、updated time、categories），並且對 HTML 內容有更好的處理（ <content type="html"> ）。相較於 RSS 2.0，Atom 更適合 AI agents 解析完整文章內容。

Schema.org JSON-LD 為什麼重要？

JSON-LD 是 Google 推薦的結構化資料格式，幫助搜尋引擎理解頁面內容。透過 BlogPosting、WebSite、Organization 等 schema，可以提升 SEO、出現在 Rich Results，同時也讓 AI agents 更容易提取文章 metadata。

NLWeb 語意層實作教學：llms.txt + Atom RSS + Schema.org

為什麼需要 NLWeb？

在研究 Agent Discovery Protocol 時，我發現一個有趣的現象：雖然有各種「AI Agents 專用」的複雜協議（ANP、A2A 等），但實際上已經有 12+ 企業（包含 Shopify、Snowflake、O’Reilly Media）採用了一個更務實的方案 — NLWeb 語意層。

這個方案的核心理念很簡單：

不需要全新的協議，只需要讓現有的網站結構更適合 AI 閱讀。

就像人類瀏覽網站時，我們會先看首頁導覽、RSS 訂閱、或網站地圖。AI agents 也需要類似的「入口」來快速理解網站內容。NLWeb 就是為此而生的標準。

NLWeb 是什麼？

NLWeb (Natural Language Web) 是由 Microsoft 在 Build 2025（2025-05-19）正式宣布的規範，主要包含四個組件：

組件	用途	網址範例
llms.txt	網站結構導覽入口	`/llms.txt`
llms-full.txt	完整內容索引（可選）	`/llms-full.txt`
RSS/Atom Feed	文章訂閱與完整內容	`/rss.xml`
Schema.org JSON-LD	結構化資料標記	各頁面 `<script type="application/ld+json">`

這些組件互補運作，形成一個完整的「語意層」：

llms.txt 告訴 AI：「網站有哪些重要區塊」
llms-full.txt 提供：「所有文章與專案的完整清單」
Atom feed 提供：「最新文章的完整 HTML 內容」
JSON-LD 提供：「每篇文章的結構化 metadata」

實作組件詳解

1. llms.txt - 網站結構導覽

檔案位置: site/public/llms.txt

這是一個靜態的純文字檔案，類似「人類可讀的網站地圖」。我的實作包含：

# Powei Lee - AI & Automation Blog

> 從金門出發的學習紀錄...

## 📚 內容導覽

### 部落格文章
- 所有文章: https://pwlee.dev/blog
- RSS Feed (Atom): https://pwlee.dev/rss.xml

### 專案紀錄
- 專案列表: https://pwlee.dev/projects

## 🏷️ 重點主題

### AI 與自動化
- AI 開發與應用: https://pwlee.dev/tags/ai
- 自動化工作流程: https://pwlee.dev/tags/automation

...（其他主題分類）

## 👤 關於作者
- 個人簡介: https://pwlee.dev/about
- GitHub: https://github.com/pwlee

設計原則：

✅ 階層清晰：使用 markdown 標題區分不同區塊
✅ 絕對連結：所有 URL 使用完整網址（非相對路徑）
✅ 語言標註：明確標示使用 Traditional Chinese (zh-Hant)
✅ 技術架構說明：讓 AI 理解網站是用什麼技術建置的

2. llms-full.txt - 自動生成的完整索引

檔案位置: site/public/llms-full.txt（build 時自動生成）

llms.txt 提供「目錄」，llms-full.txt 提供「完整清單」。這個檔案包含所有文章的詳細資訊：

## 📝 部落格文章

共 6 篇文章

### 1. 實作 NLWeb 語意層：讓 AI Agents 能夠閱讀你的網站
- URL: https://pwlee.dev/blog/nlweb-semantic-layer
- 發布日期: 2025-11-22
- 類型: tutorial
- 作者: Claude Code
- 標籤: AI, NLWeb, SEO, Schema.org, RSS, Astro
- 摘要: 完整實作 NLWeb 語意層...

### 2. [下一篇文章]
...

自動生成腳本 (site/scripts/generate-llms-full.ts):

這個 TypeScript 腳本會：

讀取 site/src/content/blog/ 和 site/src/content/projects/ 所有 markdown 檔案
使用 gray-matter 解析 frontmatter
過濾掉 draft: true 的內容
按發布日期排序
生成格式化的文字檔案

整合到 build 流程 (package.json):

{
  "scripts": {
    "prebuild": "tsx scripts/generate-llms-full.ts",
    "build": "astro build"
  }
}

每次執行 npm run build 前，prebuild 會自動執行，確保 llms-full.txt 永遠是最新的。

3. Atom 1.0 RSS Feed - 完整文章內容

檔案位置: site/src/pages/rss.xml.ts

從原本的 RSS 2.0 升級到 Atom 1.0 格式，並加入完整 HTML 內容：

主要改進：

項目	RSS 2.0 (舊)	Atom 1.0 (新)
內容	僅 summary	完整 HTML (`<content type="html">`)
作者	❌ 無	✅ `<author><name>`
標籤	❌ 無	✅ `<category term="...">`
更新時間	❌ 無	✅ `<published>` + `<updated>`
文章數量	20 篇	50 篇

技術挑戰：Markdown → HTML 轉換

Astro Content Collections 的 render() 返回的是 Astro Component，無法直接取得 HTML string。解決方案：

import { readFileSync } from "fs";
import matter from "gray-matter";
import { marked } from "marked";

function getPostHtmlContent(slug: string): string {
  // 1. 讀取 markdown 檔案
  const contentDir = join(process.cwd(), 'src', 'content', 'blog');
  const files = readdirSync(contentDir);
  const file = files.find((f: string) => {
    const fileContent = readFileSync(join(contentDir, f), 'utf-8');
    const { data } = matter(fileContent);
    return data.slug === slug;
  });

  // 2. 解析 frontmatter
  const fileContent = readFileSync(join(contentDir, file), 'utf-8');
  const { content } = matter(fileContent);

  // 3. 使用 marked 轉換為 HTML
  return marked.parse(content, { async: false });
}

Atom 1.0 格式範例：

<?xml version="1.0" encoding="UTF-8"?>
<feed xmlns="http://www.w3.org/2005/Atom" xml:lang="zh-Hant">
  <title>水溝神先知日記</title>
  <subtitle>從金門出發的學習紀錄...</subtitle>
  <link href="https://pwlee.dev" />
  <link href="https://pwlee.dev/rss.xml" rel="self" type="application/atom+xml" />
  <updated>2025-11-22T16:00:00.000Z</updated>
  <author>
    <name>Powei Lee</name>
    <uri>https://pwlee.dev</uri>
  </author>

  <entry>
    <title>實作 NLWeb 語意層</title>
    <link href="https://pwlee.dev/blog/nlweb-semantic-layer" />
    <id>https://pwlee.dev/blog/nlweb-semantic-layer</id>
    <published>2025-11-22T00:00:00.000Z</published>
    <updated>2025-11-22T00:00:00.000Z</updated>
    <author>
      <name>Claude Code</name>
    </author>
    <category term="AI" />
    <category term="NLWeb" />
    <summary>完整實作 NLWeb 語意層...</summary>
    <content type="html"><![CDATA[
      <!-- 完整的 HTML 內容 -->
      <h2>為什麼需要 NLWeb？</h2>
      <p>在研究 Agent Discovery Protocol 時...</p>
    ]]></content>
  </entry>
</feed>

4. Schema.org JSON-LD 優化

已實作的 Schema：

4.1 WebSite Schema (`site/src/layouts/Layout.astro`)

const webSiteSchema = siteUrl ? {
  "@context": "https://schema.org",
  "@type": "WebSite",
  "name": "Powei Lee",
  "url": siteUrl,
  "inLanguage": "zh-Hant",
  "potentialAction": {
    "@type": "SearchAction",
    "target": {
      "@type": "EntryPoint",
      "urlTemplate": `${siteUrl}/blog?q={search_term_string}`,
    },
    "query-input": "required name=search_term_string",
  },
} : null;

新增項目：

✅ inLanguage: "zh-Hant" - 語言標註
✅ potentialAction - SearchAction（告訴搜尋引擎網站有搜尋功能）

4.2 Organization Schema

const organizationSchema = {
  "@context": "https://schema.org",
  "@type": "Organization",
  "name": "Powei Lee",
  "url": siteUrl,
  "founder": {
    "@type": "Person",
    "name": "Powei Lee",
    "url": `${siteUrl}/about`,
  },
  "foundingDate": "2024-01-01",
  "sameAs": [
    "https://github.com/pwlee",
    "https://linkedin.com/in/powei-lee",
  ],
};

新增項目：

✅ founder - 創辦人資訊
✅ foundingDate - 網站創立日期

4.3 BlogPosting Schema (`site/src/pages/blog/[slug].astro`)

const articleStructuredData = {
  "@context": "https://schema.org",
  "@type": "BlogPosting",
  "url": canonicalUrl,
  "headline": post.title,
  "description": post.summary,
  "datePublished": publishedTimeIso,
  "dateModified": modifiedTimeIso,
  "inLanguage": "zh-Hant", // ✅ 新增
  "wordCount": post.readingTime * 200, // ✅ 新增（估算）
  "author": {
    "@type": "Person",
    "name": post.author ?? "Powei Lee",
    "url": `${siteUrl}/about`, // ✅ 新增
  },
  "publisher": {
    "@type": "Person",
    "name": "Powei Lee",
    "url": siteUrl, // ✅ 新增
  },
  "image": [post.coverImage],
  "keywords": post.tags.join(", "),
  "mainEntityOfPage": canonicalUrl,
};

新增項目：

✅ inLanguage - 文章語言
✅ wordCount - 字數（透過 readingTime 估算，200字/分鐘）
✅ 作者/發布者完整 URL

驗證與測試步驟

完成實作後，使用以下工具驗證：

1. RSS Feed 驗證

工具: W3C Feed Validator

網址: https://validator.w3.org/feed/
測試: https://pwlee.dev/rss.xml

檢查項目：

✅ Atom 1.0 格式正確
✅ <content type="html"> 包含完整 HTML
✅ <category> tags 正確
✅ <author> 資訊完整
✅ <updated> 時間格式正確（ISO 8601）

常見錯誤修正：

❌ 錯誤: <content> 未使用 <![CDATA[...]]> 包裹 HTML
✅ 修正: 使用 <![CDATA[ 包裹所有 HTML 內容

2. Schema.org JSON-LD 驗證

工具 1: Google Rich Results Test

網址: https://search.google.com/test/rich-results
測試: 任何部落格文章頁面（如 https://pwlee.dev/blog/nlweb-semantic-layer）

預期結果：

✅ BlogPosting schema detected
✅ BreadcrumbList schema detected
✅ 作者、發布日期、修改日期正確顯示
✅ 無錯誤或警告

工具 2: Schema.org Validator

網址: https://validator.schema.org/
測試: 貼上頁面 URL 或 JSON-LD 程式碼

檢查項目：

✅ @type: "BlogPosting" 正確
✅ 必填欄位齊全（headline, author, datePublished）
✅ inLanguage 符合 BCP 47 標準（zh-Hant）
✅ wordCount 為數字型別

3. llms.txt 可讀性測試

手動檢查：

✅ 在瀏覽器直接開啟 https://pwlee.dev/llms.txt
✅ 檢查格式是否清晰易讀
✅ 所有連結使用絕對 URL（非相對路徑）
✅ 階層結構明確（H1, H2, H3）

AI Agent 測試：

你可以直接詢問 AI（如 Claude、ChatGPT）：

“請閱讀 https://pwlee.dev/llms.txt 並告訴我這個網站的主要內容”

如果 AI 能夠正確理解並摘要網站結構，代表 llms.txt 實作成功。

4. Build Script 測試

# 測試 llms-full.txt 生成
cd site
npm run generate:llms

# 檢查輸出
cat public/llms-full.txt | head -50

# 完整 build 測試
npm run build

# 檢查 build 輸出
ls -lh dist/
ls -lh .vercel/output/static/

預期輸出：

🤖 Generating llms-full.txt...
✅ llms-full.txt generated successfully
   Output: /path/to/site/public/llms-full.txt
   Content length: 2145 characters

技術挑戰與解決方案

挑戰 1: Astro Content Collections 無法直接輸出 HTML

問題: Astro 的 render() 返回 Astro Component，無法在 Node.js script 中使用。

解決方案: 直接讀取 markdown 檔案，使用 marked 套件轉換：

import { marked } from "marked";
import matter from "gray-matter";

const { content } = matter(markdownFile);
const html = marked.parse(content, { async: false });

挑戰 2: llms-full.txt 需要在 build 時自動更新

問題: 手動更新會遺漏新文章。

解決方案: 使用 npm prebuild hook：

{
  "scripts": {
    "prebuild": "tsx scripts/generate-llms-full.ts",
    "build": "astro build"
  }
}

每次 npm run build 會自動執行 prebuild，確保 llms-full.txt 最新。

挑戰 3: Schema.org 欄位過多，容易遺漏

解決方案: 建立檢查清單並使用 TypeScript 型別檢查：

interface BlogPostingSchema {
  "@context": "https://schema.org";
  "@type": "BlogPosting";
  url: string;
  headline: string;
  inLanguage: string; // ✅ 必填
  wordCount?: number; // ✅ 建議
  author: {
    "@type": "Person";
    name: string;
    url: string; // ✅ 必填
  };
  // ...其他欄位
}

投資報酬率分析

開發成本

時間: 7-11 小時（完整實作）
- llms.txt: 0.5h
- llms-full.txt script: 2-3h
- RSS Feed 升級: 4-6h
- JSON-LD 優化: 1-2h
維護成本: 極低（自動化 build）
基礎設施成本: $0（完全靜態）

效益

SEO 提升：

✅ Google Rich Results（結構化搜尋結果）
✅ 完整的 JSON-LD 提升 E-E-A-T 分數
✅ Atom feed 優於 RSS 2.0

AI Agents 友善：

✅ 符合 NLWeb 規範（12+ 企業同級）
✅ LLM 能夠快速理解網站結構
✅ 為未來 Agent Discovery Protocol 預留接口

內容分發：

✅ RSS readers 得到完整文章（提升訂閱體驗）
✅ AI tools 可以直接引用完整內容
✅ 搜尋引擎更容易索引

參考資源

官方規範

NLWeb GitHub: https://github.com/nlweb-ai/NLWeb
llms.txt 規範: https://llmstxt.org/
Atom 1.0 RFC: https://datatracker.ietf.org/doc/html/rfc4287
Schema.org BlogPosting: https://schema.org/BlogPosting
Schema.org WebSite: https://schema.org/WebSite

驗證工具

W3C Feed Validator: https://validator.w3.org/feed/
Schema.org Validator: https://validator.schema.org/
Google Rich Results Test: https://search.google.com/test/rich-results
Lighthouse: https://pagespeed.web.dev/

企業案例研究

Agent Discovery Protocol 研究報告: docs/agent-discovery-research-2025.md
12+ 企業採用名單: Shopify, Snowflake, O’Reilly Media, Tripadvisor, Eventbrite, Chicago Public Media, Common Sense Media, DDM, Hearst, Milvus, Qdrant, Inception Labs

結語

NLWeb 語意層的核心價值在於：不需要複雜的新協議，只需要讓現有的網站結構更適合 AI 閱讀。

這次實作讓我深刻體會到，技術標準的成功不在於「多先進」，而在於「多實用」。ANP、A2A 等協議雖然技術完整，但沒有企業採用；反觀 NLWeb，因為簡單實用，已經有 12+ 企業在生產環境使用。

如果你的網站想要對 AI agents 友善，不妨從實作 llms.txt 開始 — 只需要 30 分鐘，就能讓你的網站加入「AI 可讀」的行列。

實作程式碼: GitHub - pw-astro 相關研究: Agent Discovery Protocol 研究報告

實作 NLWeb 語意層：讓 AI Agents 能夠閱讀你的網站

為什麼需要 NLWeb？

NLWeb 是什麼？

實作組件詳解

1. llms.txt - 網站結構導覽

2. llms-full.txt - 自動生成的完整索引

3. Atom 1.0 RSS Feed - 完整文章內容

4. Schema.org JSON-LD 優化

4.1 WebSite Schema (`site/src/layouts/Layout.astro`)

4.2 Organization Schema

4.3 BlogPosting Schema (`site/src/pages/blog/[slug].astro`)

驗證與測試步驟

1. RSS Feed 驗證

2. Schema.org JSON-LD 驗證

3. llms.txt 可讀性測試

4. Build Script 測試

技術挑戰與解決方案

挑戰 1: Astro Content Collections 無法直接輸出 HTML

挑戰 2: llms-full.txt 需要在 build 時自動更新

挑戰 3: Schema.org 欄位過多，容易遺漏

投資報酬率分析

開發成本

效益

參考資源

官方規範

驗證工具

企業案例研究

相關專案 idea

結語

常見問題

什麼是 NLWeb？

llms.txt 和 llms-full.txt 有什麼差別？

為什麼要從 RSS 2.0 升級到 Atom 1.0？

Schema.org JSON-LD 為什麼重要？

為什麼需要 NLWeb？

NLWeb 是什麼？

實作組件詳解

1. llms.txt - 網站結構導覽

2. llms-full.txt - 自動生成的完整索引

3. Atom 1.0 RSS Feed - 完整文章內容

4. Schema.org JSON-LD 優化

4.1 WebSite Schema (site/src/layouts/Layout.astro)

4.2 Organization Schema

4.3 BlogPosting Schema (site/src/pages/blog/[slug].astro)

驗證與測試步驟

1. RSS Feed 驗證

2. Schema.org JSON-LD 驗證

3. llms.txt 可讀性測試

4. Build Script 測試

技術挑戰與解決方案

挑戰 1: Astro Content Collections 無法直接輸出 HTML

挑戰 2: llms-full.txt 需要在 build 時自動更新

挑戰 3: Schema.org 欄位過多，容易遺漏

投資報酬率分析

開發成本

效益

參考資源

官方規範

驗證工具

企業案例研究

相關專案 idea

結語

常見問題

什麼是 NLWeb？

llms.txt 和 llms-full.txt 有什麼差別？

為什麼要從 RSS 2.0 升級到 Atom 1.0？

Schema.org JSON-LD 為什麼重要？

Life OS：Obsidian 知識庫整合專案

Vibe Coding 初體驗：AI 協作打造公廁尋找器

用 AI 寫作，寫出更流暢的自己：我的創作協作實驗

4.1 WebSite Schema (`site/src/layouts/Layout.astro`)

4.3 BlogPosting Schema (`site/src/pages/blog/[slug].astro`)