Enrichment & Provenance
Enrichment is the process of adding AI-generated metadata to an ACO — summary, tags, entities, token counts. ACP’s enrichment model has two defining properties:
- Every auto-generated field carries provenance — which model generated it, when, and at what confidence.
- Human-authored and machine-generated fields are always distinguishable — the presence or absence of a provenance record is the canonical signal.
Per-Field Provenance
Section titled “Per-Field Provenance”The provenance object on an ACO records which model generated each auto-generated field. It is a flat object where each key is a field name and each value is a provenance record.
provenance: summary: model: "claude-haiku-4-5" version: "20251001" timestamp: "2026-02-23T10:31:00Z" confidence: 0.91 tags: model: "claude-haiku-4-5" version: "20251001" timestamp: "2026-02-23T10:31:00Z" confidence: 0.88 key_entities: model: "gpt-4o-mini" version: "2024-07-18" timestamp: "2026-02-23T10:31:00Z" confidence: 0.95| Subfield | Type | Required | Description |
|---|---|---|---|
model | string | Yes | Model identifier used for generation. |
version | string | No | Model version or checkpoint. |
timestamp | string (ISO 8601) | Yes | When the field was generated. |
confidence | float 0.0–1.0 | No | Model’s self-assessed accuracy for the generated value. |
The provenance signal
Section titled “The provenance signal”- A field with a
provenanceentry is machine-generated. - A field without one is human-authored.
This is the canonical distinction. You do not need a separate flag or field to know whether a tag was typed by a human or generated by an AI — you check whether provenance.tags exists.
If a human edits a machine-generated field, the provenance entry SHOULD be removed. The field is now human-authored.
Dual Confidence Model
Section titled “Dual Confidence Model”ACP distinguishes two kinds of confidence scores that coexist on an object but measure fundamentally different things.
ACO-level confidence
Section titled “ACO-level confidence”confidence: 0.82A float from 0.0 to 1.0 representing the behavioral relevance of this object — how reliable it has proven to be as a reference source, computed from engagement signals: saves, shares, comments, recency of interaction, collection membership, and cross-referencing frequency.
This is NOT a model accuracy score. It is a signal about the object’s utility to consumers, computed by the implementation from usage patterns.
Per-field provenance confidence
Section titled “Per-field provenance confidence”provenance: summary: confidence: 0.91 tags: confidence: 0.88Each provenance record carries the generating model’s self-assessed accuracy for that specific field. This reflects how confident the model was in its output at generation time — not how useful the object has proven to be over time.
Why they are different
Section titled “Why they are different”ACO-level confidence | Provenance confidence | |
|---|---|---|
| What it measures | Behavioral relevance and utility | Model accuracy at generation time |
| Who sets it | Implementation (engagement-based) | Generating model |
| When it changes | Over time as usage data accumulates | Only if the field is regenerated |
| Cross-model comparability | Implementation-specific | NOT guaranteed to be comparable across models |
An object can be highly saved and referenced (high behavioral confidence) while having low-confidence auto-generated tags. Or an object can have high-confidence AI-generated fields but low usage over time. These are independent signals.
Guidance: Implementations SHOULD surface enrichments with per-field provenance confidence below 0.7 for human review. Implementations MAY define minimum thresholds below which auto-generated fields are not displayed.
Enrichment Pipelines
Section titled “Enrichment Pipelines”The four core enrichments ACP is designed to support:
Summary
Section titled “Summary”A concise human-readable description of the content body. Max 500 characters recommended. Useful for agents to preview content before deciding whether to fetch the full body.
summary: "Analysis of how tokenizer divergence across models affects context window planning for AI agents."provenance: summary: model: "claude-haiku-4-5" timestamp: "2026-02-23T10:31:00Z" confidence: 0.91Classification tags for search, filtering, and clustering. Lowercase recommended.
tags: ["tokenizers", "context-window", "ai-agents", "llm-infrastructure"]provenance: tags: model: "claude-haiku-4-5" timestamp: "2026-02-23T10:31:00Z" confidence: 0.88Key Entities
Section titled “Key Entities”Typed named entities extracted from the content body. Enables structured queries: “show me all ACOs mentioning Anthropic with confidence > 0.9.”
key_entities: - type: "organization" name: "Anthropic" confidence: 0.98 - type: "technology" name: "Claude" confidence: 0.97 - type: "concept" name: "tokenization" confidence: 0.93provenance: key_entities: model: "claude-haiku-4-5" timestamp: "2026-02-23T10:31:00Z" confidence: 0.95Note: Entity-level confidence values inherit their model identity from provenance.key_entities. Per-entity provenance is not carried individually — the batch record covers all entities.
Token Counts
Section titled “Token Counts”Per-tokenizer token counts. These are typically computed deterministically (not probabilistically), so provenance records for token_counts are not required — but they are valid if your implementation computes them with a model.
token_counts: cl100k: 2847 claude: 2791 approximate: 2830Idempotency Rules
Section titled “Idempotency Rules”Enrichment pipelines SHOULD follow these rules:
- Skip if provenance exists. If
provenance.summaryalready exists, do not regenerate the summary unless the caller passes aforceflag. - Force to overwrite. A
forceflag regenerates and overwrites the field and updates the provenance record. - Never overwrite human-authored fields. If a field has no provenance record, it is human-authored. Do not overwrite it automatically — even with
force. Require explicit human confirmation. - Recompute on content change. If the content body changes (detectable via
content_hash), all auto-generated fields SHOULD be flagged as stale. The provenance record remains until regeneration.
Enrichment Cost Reference
Section titled “Enrichment Cost Reference”From the ACP research synthesis (non-normative):
- Enrichment cost: approximately $0.002 per article using GPT-4o-mini or Claude Haiku
- Latency: 0.8–2.2 seconds per ACO
- YAML frontmatter is approximately 18% more efficient than JSON for metadata
These figures are reference points. Actual costs and latency depend on content length, model selection, and implementation.