Any indicator function over a finite discrete domain can be encoded as a SAT instance in polynomial time. The construction is direct: for each non-member of the target class, build a clause that excludes it using feature-value differences. The conjunction of all such clauses is a CNF formula that accepts exactly the target class.
Snake implements this constructively: oppose() finds literals that separate classes, minimize_clause() removes redundant literals, and the loop terminates when all non-members are covered. No backtracking, no exponential search. Total complexity: O(L × n × b × m) where L = layers, n = samples, b = bucket size, m = features.
The benchmark is a 10-dimensional classification problem. A monolithic model would need to predict compound labels like "Optimal with outlier prix and rising trend and high risk" — a combinatorial explosion across 3 × 2 × 2 × 2 × 4 × 3 × 3 × 3 × 3 × 4 = 15,552 possible class combinations.
Instead, we decompose into 10 independent Snake models:
| Model | Classes | AUROC | Accuracy |
|---|---|---|---|
| supplier_score | Optimal / Acceptable / Outlier | 0.9715 | 85.6% |
| outlier_prix | Normal / Outlier | 0.9517 | 95.7% |
| outlier_delai | Normal / Outlier | 0.9385 | 94.0% |
| outlier_moq | Normal / Outlier | 0.9412 | 94.8% |
| tendance_prix | Stable / Hausse / Baisse / Volatile | 0.9841 | 96.7% |
| fiabilite | High / Medium / Low | 0.9634 | 97.3% |
| conditions_rating | Favorable / Standard / Defavorable | 0.9731 | 96.3% |
| competitivite | Competitive / Average / Uncompetitive | 0.9333 | 95.7% |
| risque_approvisionnement | Low / Medium / High | 0.9552 | 96.0% |
| recommandation | Maintenir / Developper / Negocier / Exclure | 0.9734 | 94.5% |
| Average | 0.9585 | 94.7% |
Each model has 2-4 classes. Each is easy to train (3000 samples, 15 layers, <15s). The 10 predictions compose into a complete supplier profile: the buyer sees one table with 10 independent assessments, each with its own probability and audit trail.
This formula was used to generate training labels for the supplier_score model. But Snake doesn't know about these weights. It learns the classification boundary from data, building SAT clauses that approximate the weighted sum without being told the weights.
The resulting clauses are interpretable. A Snake audit trail reads like: "if score_prix > 0.8 and score_delai > 0.6 → Optimal." The weights emerge from clause structure, not from a parameter vector. This is the explainability payoff of SAT-based classification: the model's decision is a boolean formula you can read.
A procurement manager doesn't think in composite scores. They think: "this supplier is cheap but slow." The spider chart (radar) renders 5 independent axes — and behind each axis, a dedicated Snake model validates the assessment. The composite score is for ranking. The individual axes are for understanding.
The best example of iterative improvement in the benchmark classifier. Three models had broken minority classes in V1:
| Model | Problem (V1) | Fix (V2) | Result |
|---|---|---|---|
| outlier_prix | 6% Outlier class → 0% recall | Rebalanced to 40% | 93.9% recall |
| recommandation | 2% Exclure class → 0% recall | Oversampled to 20% | 92.5% recall |
| risque_appro. | Fuzzy features → 84% accuracy | Sharper risk formula | 96.0% accuracy |
The fix was mechanical: generate more samples of the minority class with deliberate feature separation. outlier_prix went from 96 Outlier samples out of 2000 (5%) to 1200 out of 3000 (40%). The model immediately learned the boundary. Snake doesn't need massive data — it needs representative data.
The benchmark service handles multiple suppliers, each with their own product codes and potentially their own language:
GlassCorp: "VS-FEU442" → "Feuillete 44.2 LowE One" VitroSupply: "VTS-4420-LE" → "Feuillete 44.2 Low-E" EuroVerre: "EV-FEU-44.2-LE" → "Laminated 44.2 Low-E coated"
These are the same product. The harmonisation pipeline:
66019.Cross-language article matching is solved by the LLM layer. Snake works on the normalised refs — it never sees the raw multilingual descriptions. The LLM handles ambiguity, Snake handles classification. Clean separation of concerns.
POST /classify
→ extraction.py: Claude Haiku harmonises N supplier refs
→ articles_harmonises (normalised, matched to internal refs)
→ classification.py: 10 Snake models per offer
→ supplier_score, outlier_prix, outlier_delai, outlier_moq
→ tendance_prix, fiabilite, conditions_rating
→ competitivite, risque_approvisionnement, recommandation
→ routes.py: assembles benchmark response
→ ranking + outliers + spider + xai audit
All 10 Snake models load at startup from JSON. Inference is ~50ms per model per offer. The bottleneck is Claude extraction (~1.5-2s). Total latency: ~2.4s.
/comprendre defaults to anthropic: false. In this mode, no LLM is called — the text is parsed by regex, each product line becomes its own article (no cross-supplier matching), and the 10 Snake models run on the raw extracted features.
POST /comprendre (anthropic=false, default)
→ regex parser: extracts prices, delays, MOQ from text
→ passthrough harmonise: 1 line = 1 article (no matching)
→ 10 Snake models: score, outlier, trend, risk, recommendation
→ full benchmark response, mode: "regex", quality: 0.45
POST /comprendre (anthropic=true)
→ Claude Haiku: structures text into JSON offres
→ Claude Haiku: harmonises refs across suppliers
→ 10 Snake models: same pipeline
→ full benchmark response, mode: "haiku", quality: 0.89
The anthropic flag is explicit — the caller decides when to spend the ~$0.001 Haiku call. This is intentional: procurement tools run on internal networks where LLM calls may be restricted. Snake-only mode respects that constraint while still delivering actionable scoring.
See /genesis for the complete /comprendre specification.