Category 5 Plant-Milk Corpus Pilot Audit
Trigger
The raw marker/PyTorch corpus contains 23,260 markdown documents under /Users/karenpendergrass/Desktop/heavy-metal-index/raw/markdown. This pilot tests whether ChatGPT can use the corpus without flattening it into unsourced prose or breaking the wiki structure.
Corpus Handling
- Raw corpus files remain immutable.
- Machine triage output is generated under index and
data/corpus/. - Load-bearing papers are promoted to index before product pages cite them.
- Finished-product beverage values stay on product pages and occurrence tables.
- Ingredient pages receive links and routing notes only unless the values are ingredient-only.
Sources Promoted
- milani2023-trace-elements-soy-based-beverages
- damato2026-inorganic-arsenic-rice-based-beverages
- marques2021-trace-elements-milks-plant-based-drinks
Product Pages Updated
- plant-milks-soy-based
- plant-milks-rice-based
- plant-milks-non-soy-non-rice
- regulatory-crosswalk-field-findings
Critical Comparison Layer
The strongest direct comparison is plant-milks-rice-based for iAs: eu2023-arsenic-rice-based-drinks gives 30 ug/kg and damato2026-inorganic-arsenic-rice-based-beverages reports N=25, mean 15 ug/kg, median 15 ug/kg, range 7-24 ug/kg.
The soy row now has useful field findings from milani2023-trace-elements-soy-based-beverages, but compliance comparisons are blocked until Brazilian/MERCOSUR legal rows are loaded directly and unit/basis conversion is reviewed.
The non-soy/non-rice row remains an evidence gap. marques2021-trace-elements-milks-plant-based-drinks supports routing and identifies an oat-drink Pb signal, but numeric comparison is blocked pending source-table review.