P4 Batch 2 — Ingest Report

Date: 2026-05-12 Handles processed: 200 (FM_12964337 to FM_11848250, P4 tier, year-descending, pre-food-matrix-filter) Source pages created: 7 False positives: 193 (96.5%)

Summary

P4 batch 2 confirms the pattern discovered in batch 1: the unfiltered P4 year-descending sort yields ~94–97% false positives because OCR year artifacts surface 2026–2029 handles first, most of which are materials science or clinical case-report papers where metal names appear as semiconductor dopants (PbI₂, CdS, As in DFT) or surgical hardware rather than food contaminants. The 200 handles span two processing waves: groups 1 and 4 yielded 0 relevant pages each; groups 2 and 3 yielded 3 and 4 respectively.

Strategic pivot for batch 3: Switch to food-matrix-filtered P4 processing. The manifest’s text_mined_ingredients field identifies 2,506 P4 papers with food terms; these are processed first, year-descending from 2025. Batch 3 starts with the top 200 of those 2,506.

Source Pages Created

Group 2 (FM_13008638 – FM_13055536)

ozkutlu2026-wheat-cd-zinc-mitigation (FM_13038746 area — actual handle FM in batch) Zinc fertilization study, Turkey/Jordan wheat. Zn application reduces grain Cd by up to 70% at low ambient soil Cd but is limited at high contamination. Moderate evidence for mitigation strategy. Evidence tier: A. Metals: Cd. Matrices: wheat-grain.

chaura2026-phaseolus-multiomics-ionomics (cacao/bean multiomics study) Phaseolus bean ionomics panel; most accessions below 50 ppb Pb in grain but one outlier at 3,311 ppb Pb in a high-Pb-soil trial — highlights genotype × soil interaction. Metals: Pb, Cd, Ni. Matrices: legume-grain.

jaramillo-mazo2026-cacao-cd-bacteria (cacao rhizosphere microbiology) Colombian cacao plots; Flavobacterium sp. tolerates and partially sequesters Cd in rhizosphere; bacterial Cd dynamics under different soil management. Indirect food safety relevance (supply chain / mitigation). Metals: Cd. Matrices: cacao-soil.

Group 3 (FM_11741060 – FM_11761173)

kim2026-mixed-pb-mehg-cd-hippocampus (mixture neurotoxicity) In vitro hippocampal neuronal toxicity study, Pb + MeHg + Cd mixture. Synergistic cytotoxicity at low individual concentrations; mixture IC50 significantly below additive prediction. Relevant for cumulative exposure assessment and health pages. Evidence tier: A. Metals: Pb, MeHg, Cd. Matrices: in-vitro.

lawluvi2026-maternal-geophagy-ghana (geophagy clay safety) Survey of geophagic clays consumed by pregnant women in Ghana. Clays contain As, Cd, Cr, Pb, U, tHg at concentrations exceeding safety limits in multiple cases. Direct food safety relevance for exposed population. Evidence tier: A. Metals: As, Cd, Cr, Pb, U, tHg. Matrices: clay-geophagy.

auzier-guimaraes2025-mercury-tapajos-fish (PRISMA systematic review — Tapajós) Systematic review of 36 studies, 14,113 individual fish from the Tapajós River basin, Brazil. 89% of species have target hazard quotient ≥1 for mercury via fish consumption; 19 of 21 species exceed MeHg guidance thresholds for vulnerable populations. High-quality synthesis. Evidence tier: A. Metals: tHg, MeHg. Matrices: freshwater-fish.

naz2025-trace-elements-punjnad-fish (Pakistan freshwater fish) Five commercial freshwater species from Punjnad, Pakistan. Pb up to 46.03 mg/kg in Wallago attu liver (extreme outlier — liver, not fillet). Fillet values lower but Pb and Cd elevated. Evidence tier: A. Metals: Pb, Cd, Cr, Ni. Matrices: freshwater-fish.

False Positives by Category

Of 193 false positives across all four groups:

  • Perovskite/photovoltaic materials science: ~35 handles. RSC Advances, Chemical Science, ACS AMI papers on CsPbI₃ solar cells, halide double perovskites, CdS quantum dots, SnO monolayers, graphdiyne photovoltaics, Ga₂O₃ dopants. Metal vocabulary (Pb, Cd, As, Sn) is entirely for semiconductor fabrication.
  • Medical case reports and clinical medicine: ~40 handles. Cureus and similar journals — Chagas disease, ICD/pacemaker leads, neurocysticercosis, anasarca, CMV viremia, SIRT1 dermal fibroblast, Wernicke encephalopathy, rhabdomyolysis, amyloidosis, Yamaguchi syndrome. No food pathway.
  • Electrochemical sensors and analytical chemistry (non-food): ~25 handles. Sensors for water treatment, wastewater, environmental monitoring without food-matrix validation.
  • Environmental/agricultural remediation without food occurrence: ~30 handles. Arsenic removal from groundwater, soil stabilization, phytoremediation, constructed wetlands.
  • Pharmacology, biomedicine, veterinary: ~20 handles. Drug-metal interactions, anti-cancer metal complexes, veterinary poultry/livestock management without contamination data.
  • Other off-topic: ~43 handles. Seismics/metamaterials/THz devices, IRB administrative papers, academic performance surveys, ecological modeling, ocean plastic degradation.

Key Findings

The highest-value pages from this batch are:

  1. auzier-guimaraes2025 — PRISMA-quality Tapajós mercury systematic review; 14,113 fish observations; strong anchor for freshwater fish MeHg exposure.
  2. kim2026-mixed-pb-mehg-cd-hippocampus — mixture synergy at sub-effect-level concentrations; important for cumulative exposure framing.
  3. lawluvi2026-maternal-geophagy-ghana — geophagy clays as overlooked exposure pathway; routing flagged unresolved (clay-geophagy ingredient page does not yet exist).

Routing Audit Notes

  • routing_unresolved.csv gained entries for clay-geophagy (lawluvi2026) — page does not exist; proposed to Karen as ingredients/clay-geophagy.md but awaiting approval.
  • routing_malformed.csv had 5 entries from group 2 commit (resolved in subsequent commit).
  • No issues with groups 1 or 4 (0 source pages, no routing entries).

New-Page Proposals

  • ingredients/clay-geophagy.md — lawluvi2026 plus at least 2 other sources in earlier batches document heavy metals in geophagic clay as a dietary exposure pathway for pregnant women in sub-Saharan Africa and diaspora populations. Threshold not yet met (need 5 sources) but flagging early. Karen to decide.

Strategic Note: P4 Processing Pivot

Manifest analysis (run 2026-05-12) confirms 2,506 P4 papers have food matrix terms in text_mined_ingredients. These have substantially higher yield than the unfiltered year-descending sort. P4 batch 3 and all subsequent P4 batches process this food-matrix-filtered subset first (sorted year-descending, 2025 first), then return to the non-food-matrix P4 set.

Food-matrix P4 breakdown by year: 2025=650, 2024=519, 2023=347, 2022=318, 2021=229, 2020=266, plus 11 OCR-artifact 2026–2029. Top ingredient terms: rice (442), fish (434), fruit (217), wheat (204), meat (201), milk (166).

Source Pages Inventory

cite_keymetalsmatricestier
ozkutlu2026-wheat-cd-zinc-mitigationCdwheat-grainA
chaura2026-phaseolus-multiomics-ionomicsPb, Cd, Nilegume-grainA
jaramillo-mazo2026-cacao-cd-bacteriaCdcacao-soilA
kim2026-mixed-pb-mehg-cd-hippocampusPb, MeHg, Cdin-vitroA
lawluvi2026-maternal-geophagy-ghanaAs, Cd, Cr, Pb, U, tHgclay-geophagyA
auzier-guimaraes2025-mercury-tapajos-fishtHg, MeHgfreshwater-fishA
naz2025-trace-elements-punjnad-fishPb, Cd, Cr, Nifreshwater-fishA

Batch Commits

  • ff0cfa6 — group 1: 0 pages, 50 FP
  • da722b6 — group 2: 3 pages, 47 FP
  • ca21145 — group 3: 4 pages, 46 FP
  • ecb357e — group 4: 0 pages, 50 FP