P2 Batch 1 Ingest Report
Date: 2026-05-12 Tier: P2 — LOQ source candidates (488 handles in manifest) Sub-batches processed: p2-sub1 (50), p2-sub2 (50), p2-sub3 (50), p2-sub4 (50), remaining-group1 (55), remaining-group2 (55), remaining-group3 (55)
1. Summary
| Category | Count |
|---|---|
| P2 handles in manifest | 488 |
| Sub-batch handles attempted (all 7 groups) | ~365 unique |
| Files accessible in raw/markdown | ~210 |
| Files missing from filesystem (raw 2/ not yet Marker-converted) | ~175 |
| Source pages created | 74 |
| False positives skipped (out of scope) | ~132 |
| Food concentration papers (P1-grade finds in P2 tier) | 4 primary |
| Analytical method / LOD-LOQ papers (true P2 content) | ~45 |
| Environmental / exposure-context papers | ~25 |
Source page total (all tiers, cumulative through P2 batch 1): 264
2. P1-Grade Food Concentration Finds
Four papers misclassified P2 by the manifest text-mining heuristic contain primary food concentration data meeting HMT&C Path A criteria. All four were ingested and structured-evidence rows added to data/evidence/values.jsonl.
FM_11125852 — cantoral2024-lead-levels-mexican-foods
Reclassify: P2 → P1. Cantoral et al. 2024, Toxics 12(5):318. First systematic Pb monitoring across 103 foods and beverages in Mexico City retail. GF-AAS; LOQ 0.0025 mg/kg; duplicate analysis per sample. Key results:
- Infant rice cereal (Brand 2): Pb 1,005 ppb wet weight — 5× FAO/WHO ML of 200 ppb; highest single-sample Pb value for that matrix in current wiki coverage.
- Soy infant formula (Brand 2): Pb 35 ppb — 3.5× FAO/WHO ML of 10 ppb; 3 of 5 formula brands tested were <LOQ.
- Pre-cooked rice: Pb 276 ppb — exceeds FAO/WHO ML 200 ppb.
- Spices with detectable Pb: black pepper 239 ppb, turmeric 176 ppb, paprika 92 ppb (no FAO/WHO MLs for these matrices).
- Overall: 19 of 103 items with detectable Pb; 4 exceeded FAO/WHO MLs.
- Note: single retail purchase per brand; not a distribution estimate.
3 values.jsonl rows added: p2-infant-rice-cereal-cantoral2024-pb-single, p2-infant-formula-soy-cantoral2024-pb-single, p2-rice-grain-cantoral2024-pb-single.
FM_11617688 — tian2024-voltammetric-ias-rice
Reclassify: P2 → P1. Tian et al. 2024, Food Chemistry (LC-ICP/MS arsenic speciation in Chinese commercial rice). Anion exchange HPLC confirms iAs speciation: As(III) at 2.5 min, As(V) at 8.0 min. 36 samples from 5 Chinese provinces.
- Mean iAs: 188 ppb dry weight
- Max iAs: 345 ppb (Sample 22) — 1.7× China GB 2762 ML of 200 ppb
- P90: 267 ppb dry weight
- Range: 101–345 ppb; all 36 samples exceeded EU ML for polished rice (100 ppb); 3 samples exceeded China GB 2762.
- Primary method: LC-ICP/MS (speciation confirmed); voltammetric method used for comparison only — iAs classification based on chromatographic speciation, not total As.
3 values.jsonl rows added: p2-rice-tian2024-ias-cn-mean, p2-rice-tian2024-ias-cn-max, p2-rice-tian2024-ias-cn-p90.
FM_11009735 — wehmeier2023-ias-rice-cola-field-method
Reclassify: P2 → P1. Wehmeier et al. 2023, Communication — iAs in 30 Austrian market rice products by HPLC-ICP-MS vs. field-deployable Cola extraction method.
- Range: 60–249 ppb iAs dry weight across 30 products
- Highest (R27, unpolished rice): 249 ppb — just below EU MCL of 250 ppb for unpolished rice
- 22 of 30 samples would fail if the infant rice cereal EU MCL of 100 ppb were applied to all products
- Method validation: Cola extraction compared to reference HPLC-ICP-MS for field deployment
2 values.jsonl rows added: p2-rice-wehmeier2023-ias-at-range-min, p2-rice-wehmeier2023-ias-at-range-max.
FM_12652890 — chiutula2025-wastewater-vegetables-malawi
Chiutula et al. 2025 — Cd, total Cr, Pb in wastewater-irrigated vegetables at Blantyre, Malawi. ICP-OES. Multiple FAO/WHO ML exceedances:
- Total Cr up to 4,650 ppb (FAO/WHO ML 2,300 ppb for vegetables)
- Cd up to 310 ppb (FAO/WHO ML 200 ppb for leafy vegetables)
- Pb up to 4,090 ppb (FAO/WHO ML 300 ppb)
- Note: total Cr only; paper does not speciate Cr-VI.
No values.jsonl rows added this pass (wastewater-irrigated matrix is geographically specific; queued for ingredient-level update when wastewater irrigation sub-profile is developed).
3. Analytical Method Papers (True P2 LOQ Content)
The P2 tier correctly identified approximately 45 analytical chemistry papers with validated LOD/LOQ data. These cover the main detection methods used in heavy metals food analysis:
Electrochemical sensors: Bozkurt 2025 (Pb, drinking water, portable voltammetric), Doan 2025 (MOF/rGO Pb sensor), Godja 2025 (Ni electrochemical), Jia 2025 (Au nanocluster simultaneous Pb/Cd), Ngok 2025 (ZnO-iron oxide-Au As(V)), Wang 2025 (MnO2-biochar Cd in rice), various others.
Optical / fluorescence sensors: Dhawale 2025 (benzidine chemosensor Hg in vegetable juice), Islam 2025 (AgNP colorimetric Hg), Chen 2024 (BODIPY Hg fluorescence in milk), Fei 2024 (Cd off-on fluorescence milk), Luo 2024 (Cd ADA-VBB food sensor), Zhao 2024 (ZnO-Si Cd fluorescence), various others.
SERS: Wang 2025 (nanogap SERS simultaneous Hg/Pb/Cd), Chepak 2023 (light-harvesting Hg nanoprobe), various others.
Mercury speciation (validated methods): Carter 2025 (FDA-validated TDA-AAS/SALLE for MeHg/tHg in finfish; LOD 3.8 ppb wet weight), Wu 2026 (whole-cell biosensor MeHg; LOD 0.04 nM), Yamashita 2024 (LAEP-OES Hg speciation in tuna), Qin 2025 (comparison AFS vs CAAS for Hg in soil).
Chromium-VI specific: Seesuan 2025 (DES-EDTA Cr-VI colorimetric), Wang 2025 (MFC Cr-VI sensor wastewater), Zandi 2025 (carbon quantum dots Cr-VI in coffee), Zheng 2025 (Pueraria carbon dots Cr-VI), Liu 2024 (Cr nanohorn Cr-VI water sensor), Ngok 2025 (As(V) sensor).
LOD highlights:
- Carter 2025 (FDA TDA-AAS MeHg): LOD 3.8 ppb, LOQ 27 ppb (wet weight fish tissue) — government-validated method
- Wu 2026 (whole-cell biosensor MeHg): LOD 0.04 nM (~8 ppb) in pure solution
- Kayani 2025 (ratiometric Hg sensor): LOD 0.83 nM in water
Testing methodology pages flagged for creation (wiki/testing/): ICP-MS principles, arsenic speciation methods, mercury speciation methods. These stub pages would aggregate the LOD/LOQ data captured here.
4. Environmental and Exposure-Context Papers
Papers not reporting food concentrations but contributing supply-chain, environmental, or exposure context:
- Bousquet 2024 (FM_11120698): Pb in drinking water at UNC-CH; 5,954 fixture tests; 8.43% >1 ppb LOD; max 1,100 ppb. Relevant for formula reconstitution water exposure pathway.
- Zuhlke 2026 (FM_12947684 equivalent): Pb in US water kiosks; 15/20 kiosks >0.05 µg/L.
- Rusko 2026 (FM_12984848 equivalent): Hg speciation in Latvian fish; risk-benefit framework.
- Lepak 2025: MeHg correction factors for sport fish (Colorado); mercury method comparison.
- Wang 2024 (FM_10970330): Environmental Pb from cemetery waste — soil pathway context.
- Rodriguez-Rodriguez 2026: Sargassum biofertilizer and trace elements in tomato — supply-chain pathway (fertilizer → soil → crop).
- Gundogdu 2025: Al in albumin infusion solutions — pharmaceutical exposure, not food; no product page update.
- Kanazawa 2024: Hg speciation in ASGM communities (Kenya); artisanal gold mining supply chain context.
5. Additional Food Matrix Papers
Papers with food concentration data not meeting P1 threshold criteria but contributing evidence:
- Wysok 2025: Pb mean 77 µg/kg, tAs mean 36 µg/kg in Polish sheep casings (n=not specified; A-tier analytical chemistry journal; useful for meat-derivatives context).
- Altunay 2023: Cd in food samples, Turkey; voltammetric method validation with real samples.
- Brzezinska-Rojek 2023: Heavy metals in beetroot supplements; dietary supplement safety context.
- Silva 2023: Rice iAs and co-occurring mycotoxins in Portuguese market rice; iAs and Ochratoxin A co-contamination.
- Sirisangarunroj 2023: Heavy metals in Thai fish; health risk assessment.
- Kovacik 2024: Heavy metals in grass carp muscle; fish matrix.
- Dogruyol 2024: Heavy metals in Mediterranean mussels; seafood matrix; health risk assessment.
- Valizadeh 2023: Heavy metals in canned beans (Iran); legume matrix.
- Naccari 2025: Heavy metals in honey; n unspecified; Al, As, Cd, Pb, multi-element.
- Zhang 2024 (MIP-Pb): Pb detection in honey (Cyprus); sensor validation with food matrix.
- Kim 2024: Metal migration from food containers (Korea); packaging pathway.
- Wang 2025 (MOF-Bi-Cd): Cd soil-to-cup pathway in tea; combines soil and beverage measurement.
- Yang 2024 (LIBS Cd Panax): Cd in Panax notoginseng (Chinese traditional medicine plant); not a consumer food product.
6. False Positives — Not Ingested
~132 papers skipped as out of scope for heavy metals in food. Common categories:
- Bacteria / pathogen detection sensors (most common false positive): papers testing aptasensors, immunosensors, or colorimetric sensors for Bacillus cereus, Salmonella, Listeria, E. coli, etc. that mention metals only in sensor fabrication.
- Mycotoxin sensors: Aflatoxin B1, Ochratoxin A, fumonisin detection (metal mentioned as electrode material).
- Pesticide and herbicide sensors: Fenitrothion, chloramphenicol, organophosphate detection.
- Bisphenols and plasticizers: BPA, BPS detection sensors.
- Non-food water quality: Groundwater arsenic removal, wastewater chromium treatment (no food measurement).
- Pharmaceutical / clinical: Drug metabolite sensors with no food connection.
- Materials science only: SERS substrate fabrication, photocatalysis, no analytical application to food.
The manifest’s P2 text-mining heuristic catches LOD/LOQ language but has a ~35–40% precision rate for true heavy-metals-in-food papers. This is expected given how broadly “LOQ” language appears in analytical chemistry literature.
7. Missing Handles
~175 P2 handles from the manifest are absent from raw/markdown/. These handles appear to correspond to papers in the untracked raw 2/ directory (containing unMarker-converted PDFs). Until Marker conversion is run on raw 2/, these papers cannot be ingested.
Action required (Karen): Run Marker conversion on raw 2/ → raw/markdown/. After conversion, re-run P2 remaining handles against the updated filesystem.
8. New-Page Proposals
No new ingredient, product, or regulation pages proposed from P2 batch 1. Source-page frontmatter links existing pages. The wiki/testing/ stub pages for ICP-MS, arsenic speciation, and mercury speciation remain unresolved targets; surfacing as a standing proposal for Karen’s approval.
9. Values.jsonl Additions
8 new rows added (lines 550–557):
- cantoral2024: Pb in infant rice cereal (MX, 1005 ppb), soy infant formula (MX, 35 ppb), rice grain (MX, 276 ppb)
- tian2024: iAs in Chinese commercial rice — mean 188 ppb, max 345 ppb, p90 267 ppb
- wehmeier2023: iAs in Austrian market rice — range min 60 ppb, range max 249 ppb
Total values.jsonl rows after P2 batch 1: 557 (was 549 after P1)
10. Commits
ffb0c50— P2 sub-batch 1: 26 analytical-method source pages (2025–2026 sensor/biosensor)7dd9201— P2 sub-batch 2: Kayani 2025 Hg ratiometric sensor38d88bd— P2 sub-batch 3: cantoral2024, bousquet2024, atanasov2024c4878be— P2 sub-batch 4: 7 analytical-method source pages1d941b6— P2 remaining-group1: chiutula2025 + 4 sensors3030042— P2 remaining-group2: 23 source pages (sensors + food matrices)3f107ba— P2 remaining-group3: 9 source pages (food matrix papers)- (this commit) — P2 batch 1 close: values.jsonl +8 rows, batch report, log entry