P4 Batch 2 — Ingest Report
Date: 2026-05-12 Handles processed: 200 (FM_12964337 to FM_11848250, P4 tier, year-descending, pre-food-matrix-filter) Source pages created: 7 False positives: 193 (96.5%)
Summary
P4 batch 2 confirms the pattern discovered in batch 1: the unfiltered P4 year-descending sort yields ~94–97% false positives because OCR year artifacts surface 2026–2029 handles first, most of which are materials science or clinical case-report papers where metal names appear as semiconductor dopants (PbI₂, CdS, As in DFT) or surgical hardware rather than food contaminants. The 200 handles span two processing waves: groups 1 and 4 yielded 0 relevant pages each; groups 2 and 3 yielded 3 and 4 respectively.
Strategic pivot for batch 3: Switch to food-matrix-filtered P4 processing. The manifest’s text_mined_ingredients field identifies 2,506 P4 papers with food terms; these are processed first, year-descending from 2025. Batch 3 starts with the top 200 of those 2,506.
Source Pages Created
Group 2 (FM_13008638 – FM_13055536)
ozkutlu2026-wheat-cd-zinc-mitigation (FM_13038746 area — actual handle FM in batch)
Zinc fertilization study, Turkey/Jordan wheat. Zn application reduces grain Cd by up to 70% at low ambient soil Cd but is limited at high contamination. Moderate evidence for mitigation strategy. Evidence tier: A. Metals: Cd. Matrices: wheat-grain.
chaura2026-phaseolus-multiomics-ionomics (cacao/bean multiomics study) Phaseolus bean ionomics panel; most accessions below 50 ppb Pb in grain but one outlier at 3,311 ppb Pb in a high-Pb-soil trial — highlights genotype × soil interaction. Metals: Pb, Cd, Ni. Matrices: legume-grain.
jaramillo-mazo2026-cacao-cd-bacteria (cacao rhizosphere microbiology) Colombian cacao plots; Flavobacterium sp. tolerates and partially sequesters Cd in rhizosphere; bacterial Cd dynamics under different soil management. Indirect food safety relevance (supply chain / mitigation). Metals: Cd. Matrices: cacao-soil.
Group 3 (FM_11741060 – FM_11761173)
kim2026-mixed-pb-mehg-cd-hippocampus (mixture neurotoxicity) In vitro hippocampal neuronal toxicity study, Pb + MeHg + Cd mixture. Synergistic cytotoxicity at low individual concentrations; mixture IC50 significantly below additive prediction. Relevant for cumulative exposure assessment and health pages. Evidence tier: A. Metals: Pb, MeHg, Cd. Matrices: in-vitro.
lawluvi2026-maternal-geophagy-ghana (geophagy clay safety) Survey of geophagic clays consumed by pregnant women in Ghana. Clays contain As, Cd, Cr, Pb, U, tHg at concentrations exceeding safety limits in multiple cases. Direct food safety relevance for exposed population. Evidence tier: A. Metals: As, Cd, Cr, Pb, U, tHg. Matrices: clay-geophagy.
auzier-guimaraes2025-mercury-tapajos-fish (PRISMA systematic review — Tapajós) Systematic review of 36 studies, 14,113 individual fish from the Tapajós River basin, Brazil. 89% of species have target hazard quotient ≥1 for mercury via fish consumption; 19 of 21 species exceed MeHg guidance thresholds for vulnerable populations. High-quality synthesis. Evidence tier: A. Metals: tHg, MeHg. Matrices: freshwater-fish.
naz2025-trace-elements-punjnad-fish (Pakistan freshwater fish) Five commercial freshwater species from Punjnad, Pakistan. Pb up to 46.03 mg/kg in Wallago attu liver (extreme outlier — liver, not fillet). Fillet values lower but Pb and Cd elevated. Evidence tier: A. Metals: Pb, Cd, Cr, Ni. Matrices: freshwater-fish.
False Positives by Category
Of 193 false positives across all four groups:
- Perovskite/photovoltaic materials science: ~35 handles. RSC Advances, Chemical Science, ACS AMI papers on CsPbI₃ solar cells, halide double perovskites, CdS quantum dots, SnO monolayers, graphdiyne photovoltaics, Ga₂O₃ dopants. Metal vocabulary (Pb, Cd, As, Sn) is entirely for semiconductor fabrication.
- Medical case reports and clinical medicine: ~40 handles. Cureus and similar journals — Chagas disease, ICD/pacemaker leads, neurocysticercosis, anasarca, CMV viremia, SIRT1 dermal fibroblast, Wernicke encephalopathy, rhabdomyolysis, amyloidosis, Yamaguchi syndrome. No food pathway.
- Electrochemical sensors and analytical chemistry (non-food): ~25 handles. Sensors for water treatment, wastewater, environmental monitoring without food-matrix validation.
- Environmental/agricultural remediation without food occurrence: ~30 handles. Arsenic removal from groundwater, soil stabilization, phytoremediation, constructed wetlands.
- Pharmacology, biomedicine, veterinary: ~20 handles. Drug-metal interactions, anti-cancer metal complexes, veterinary poultry/livestock management without contamination data.
- Other off-topic: ~43 handles. Seismics/metamaterials/THz devices, IRB administrative papers, academic performance surveys, ecological modeling, ocean plastic degradation.
Key Findings
The highest-value pages from this batch are:
- auzier-guimaraes2025 — PRISMA-quality Tapajós mercury systematic review; 14,113 fish observations; strong anchor for freshwater fish MeHg exposure.
- kim2026-mixed-pb-mehg-cd-hippocampus — mixture synergy at sub-effect-level concentrations; important for cumulative exposure framing.
- lawluvi2026-maternal-geophagy-ghana — geophagy clays as overlooked exposure pathway; routing flagged unresolved (clay-geophagy ingredient page does not yet exist).
Routing Audit Notes
routing_unresolved.csvgained entries forclay-geophagy(lawluvi2026) — page does not exist; proposed to Karen asingredients/clay-geophagy.mdbut awaiting approval.routing_malformed.csvhad 5 entries from group 2 commit (resolved in subsequent commit).- No issues with groups 1 or 4 (0 source pages, no routing entries).
New-Page Proposals
ingredients/clay-geophagy.md— lawluvi2026 plus at least 2 other sources in earlier batches document heavy metals in geophagic clay as a dietary exposure pathway for pregnant women in sub-Saharan Africa and diaspora populations. Threshold not yet met (need 5 sources) but flagging early. Karen to decide.
Strategic Note: P4 Processing Pivot
Manifest analysis (run 2026-05-12) confirms 2,506 P4 papers have food matrix terms in text_mined_ingredients. These have substantially higher yield than the unfiltered year-descending sort. P4 batch 3 and all subsequent P4 batches process this food-matrix-filtered subset first (sorted year-descending, 2025 first), then return to the non-food-matrix P4 set.
Food-matrix P4 breakdown by year: 2025=650, 2024=519, 2023=347, 2022=318, 2021=229, 2020=266, plus 11 OCR-artifact 2026–2029. Top ingredient terms: rice (442), fish (434), fruit (217), wheat (204), meat (201), milk (166).
Source Pages Inventory
| cite_key | metals | matrices | tier |
|---|---|---|---|
| ozkutlu2026-wheat-cd-zinc-mitigation | Cd | wheat-grain | A |
| chaura2026-phaseolus-multiomics-ionomics | Pb, Cd, Ni | legume-grain | A |
| jaramillo-mazo2026-cacao-cd-bacteria | Cd | cacao-soil | A |
| kim2026-mixed-pb-mehg-cd-hippocampus | Pb, MeHg, Cd | in-vitro | A |
| lawluvi2026-maternal-geophagy-ghana | As, Cd, Cr, Pb, U, tHg | clay-geophagy | A |
| auzier-guimaraes2025-mercury-tapajos-fish | tHg, MeHg | freshwater-fish | A |
| naz2025-trace-elements-punjnad-fish | Pb, Cd, Cr, Ni | freshwater-fish | A |
Batch Commits
ff0cfa6— group 1: 0 pages, 50 FPda722b6— group 2: 3 pages, 47 FPca21145— group 3: 4 pages, 46 FPecb357e— group 4: 0 pages, 50 FP