P4 Batch 5 — Ingest Report
Date: 2026-05-13 Handles processed: 198 (food-matrix-filtered P4, 2025 papers, positions 400–598) Source pages created: 2 False positives / missing from filesystem: 196
Summary
Batch 5 reveals a major structural obstacle: a ~400-handle corpus gap spanning roughly FM_12483157 through FM_12519940. These handles appear in the triage manifest as food-matrix P4 candidates but are absent from raw/markdown/. This gap corresponds to PDFs in the untracked raw 2/ directory that have not been Marker-converted. Groups 1–3 were almost entirely blocked by this gap. Group 4 cleared the gap and found 2 relevant pages.
Strategic pivot for batch 6: Generate a filesystem-verified handle list — only process handles where raw/markdown/<handle>/ actually exists. This eliminates wasted agent cycles on the ~400 missing handles.
Source Pages Created
hadi2025-dried-fruits-heavy-metals-iraq (FM_12752863)
AAS measurement of Pb, Cd, and total Cr in 15 dried fruit samples (Iranian-origin) from Iraqi markets, Feb 2025. Key findings: Cd exceeded FAO/WHO limit (50 ppb) in 10 of 15 samples — apple 897 ppb, raisins 331 ppb (highest); Pb mean 642 ppb (all below 10 mg/kg limit); total Cr mean 354 ppb (peach 1,289 ppb). Single sample per commodity limits statistical weight. Evidence tier: A. Metals: Cd, Pb, Cr. Matrices: dried-fruit. Jurisdictions: IQ.
li2025-ratiometric-fluorescent-sensor-al-cu-food (FM_12682671)
Methods paper: dual-emission GQDs@AuNCs fluorescent sensor for Al³⁺ (LOD 0.66 µM) and Cu²⁺ (LOD 0.44 µM). Validated by spiked recovery in fried dough products and shellfish (scallops, Sinonovacula constricta) from Shenyang, China. No ambient concentration data — included as methods contribution for Al detection in Chinese fried foods where alum leavening agents are common. Evidence tier: A. Metals: Al. Matrices: fried-dough, shellfish.
Corpus Gap Finding
The FM_12483157–FM_12519940 range (~400 handles) is absent from both raw/markdown/ and raw/manifest/triage-manifest.csv. These handles appear only in the food-matrix filtered list because the triage manifest was built against a superset corpus including files not yet Marker-converted. The raw 2/ directory (currently untracked) contains the corresponding PDFs.
Impact: Groups 1–3 encountered 39, 50, and 50 missing handles respectively (139 total missing). Group 4 cleared the gap and found 2 relevant pages from the 8 readable handles in its range.
Remediation path: Convert raw 2/ PDFs with Marker and place outputs in raw/markdown/. Until then, batch processing should skip FM numbers in the gap range.
New-Page Proposals from Background Routing Triage
A background routing-unresolved agent completed 7 passes and identified 35 entries as new-page proposals awaiting Karen’s approval. Five ingredient slugs meet the 5-source threshold:
ingredients/freshwater-fish— 9 contributing sourcesingredients/cereals— 8 contributing sourcesingredients/breastmilk— 6 contributing sourcesingredients/shellfish— 6 contributing sourcesingredients/quinoa— 6 contributing sources
These proposals are documented in wiki/lint/2026-05-12-routing-triage.md.
Batch Commits
a825533— group 1: 0 pages, 11 readable FPs, 39 missing handles- (group 2: no commit — all 50 handles missing from filesystem)
dcf68b9— group 3: 0 pages, 50 missing + 21 adjacent FPsf5c6a76— group 4: 2 pages, 40 missing + 6 readable FPs