Research Findings - Regularised p-Adic Regression

Latest Findings (2026-01-05 22:00 AEDT)

NEW: Negative Regularization (λ < 0) Behavior (exp400-402):
- k peaks at λ = 0: 99% of datasets with collinear points show k is maximized at λ=0
- Perfect dual symmetry: k increases as λ→0 from negative (100%), k decreases as λ→+∞ (99%)
- Interpretation: The n+1 theorem (λ=0) is a GLOBAL MAXIMUM for fit count. Both positive and negative regularization reduce k toward 2.
- At extreme negative λ: Lines with maximum coefficient magnitude are preferred, which typically fit only 2 points.
Goal 5 Progress: Threshold Structure Characterized (exp333): The threshold formula λ* = (L₂ - L₁)/(R₁ - R₂) is exactly verified on 639 transitions. Base r dependence quantified: ratio λ(r=1.1)/λ(r=5.0) varies from ~1 to >6000. Added 4 new Lean theorems to SC8Criterion.lean: thresholdValue_pos, before_threshold_first_wins, after_threshold_second_wins, at_threshold_equal. All sorry-free, build succeeds (3050 jobs).
Lean: SC8 Shielding Theorems Added: Extended SC8Criterion.lean with formal proofs of the shielding mechanism. New theorems: crossoverLambda_pos_of_L_lt_and_R_gt, shield_prevents_alt_visit, sc8_of_all_shielded_or_decreasing. All theorems sorry-free, build succeeds (3051 jobs).
BREAKTHROUGH: 7→8 Violations Confirmed in 7D (exp332): Extended the n+1→n+2 pattern to 7D with 38 violations in 2700 cases (1.41%). Observed 7→8 (25), 7→9 (12), and 8→9 (1) transitions. The pattern is now confirmed through all tested dimensions 2D-7D.
Lean: SC8 Formalization (exp323): Created SC8Criterion.lean with formal definitions of optimal paths, crossover points, and the SC8 monotonicity condition. Key theorems proven: sc8_iff_kSequence_monotone, sc8_implies_pareto_monotone.
BREAKTHROUGH: 6→7 Violations Confirmed (exp331): By forcing 7+ points onto a hyperplane, we observed 40 violations in 4500 cases (0.89%). This proves the n+1→n+2 pattern extends to 6D. Violations include both 6→7 (28) and 6→8 (12) transitions.
Geometric Rarity Explains Dimension Scaling: When collinear points are forced, violation rates are consistently ~1% across dimensions. Random data produces fewer over-determined fits as dimension increases due to geometric probability.
SC8 is Exact (exp323): Refined static condition SC8 achieves 100% accuracy with 100% coverage across all tested dimensions (2D-7D). SC8 checks only the FIRST (optimal) winner at each step.
SC7 False Negatives Explained (exp322): Of 67 SC7 false negatives (2.2%), 95.1% are "shielded" — an earlier lower-k transition blocks the path to any higher-k crossover.

Executive Summary

Our experiments confirm the fundamental behavior of regularised p-adic linear regression:

Zero regularisation (λ=0): The optimal line passes through exactly n+1 points, consistent with the theory in arXiv:2510.00043v1
Strong regularisation (λ→∞): The optimal line tends toward passing through only 1 point
Threshold behavior: There exist critical λ values where the number of exactly-fitted points changes discretely

Lean/experiment status (Dec 26, 2025 20:04 AEDT): Lean is sorry-free. In the very-strong GP chain, the main theorem now depends only on general_fit_preserving_exists (k<n); the strong-GP variant still carries simplest_case_product_formula. Maximal-partition c_{v_min} audit (exp277): 55,700 maximal candidates with c_{v_min} ≥ 1 in 55,685 (99.97%); gains+orig = 1 slice: 988/988 have c_{v_min} ≥ 1, so pairing with new_zero_of_maximal_and_gains_orig_one forces new = 0 axiom-free. exp266-269 (proper k<n domain) remain fully validated: correct max_vmin differs from min h-res in 667/669, anchor at max_vmin in 261/261, anchor-in-R delta < max_vmin in 97.2% and cancellation in the rest.

Update (Dec 28, 2025 10:00 AEDT): A canonical wrapper now instantiates the gap-bound + h-residual-drop package for the standard maximal-partition choice E = fits(h') \\ fits(h); the 252/259 witness-present anchor-not-in-E cases auto-close to c_{v_max} = 1. Only the 7 witness-free edge cases still rely on the structural anchor bound placeholder.

Update (Dec 31, 2025 00:06 AEDT): Added an auto wrapper that routes packaged gap-bound/axiom witnesses through the gcd-filled data-core zero-new bridge, removing the remaining manual maximal-partition rewrites in that path.

CRITICAL GAP: Per-R-Unique Hypothesis Fails (2025-12-25 15:30 AEDT)

What we discovered (exp258-262): The per-R-unique hypothesis required by the averaging/pigeonhole approach is FALSE.

The hypothesis: For each R-point, at most ONE anchor-choice produces val(h'-residual) = v_max
Reality: Only 28.3% of configurations satisfy this; up to 2 choices can hit v_max for the same R-point
Violation pattern: Same anchor with different F choices (e.g., anchor=3,F=1 and anchor=3,F=2) both produce h'-residual at v_max for the same R-point
Axiom still valid: Despite this, the axiom (zero-new existence) is STILL 100% validated via exp241
Implication: The mechanism2_total_new_le_R_card approach cannot stand alone; pivot to the max_vmin delta drop/cancellation path (below)

Next steps: Explore direct construction via delta-valuation case analysis, or per-anchor bounds instead of per-R-point bounds.

BREAKTHROUGH: Max_vmin delta structure validated (2025-12-25 16:00 AEDT)

Key insight (exp266-269): With proper max_vmin in the k<n domain, zero-new exists in 100% of gains+orig=1 cases, and the delta geometry at the anchor explains why.

Critical Corrections

max_vmin ≠ min(h-residual vals): Proper max_vmin differs from min(h-res) in 667/669 cases (exp266).
Axiom validation: exp266 (correct k<n domain) finds zero-new in 669/669 cases when gains+orig=1 at max_vmin.
Earlier experiments (exp258-264) tested wrong domain: They used h as an interpolant through n+1 points (k = n+1), outside the axiom domain.

Structure of Maximal Partitions

Unique anchor: gains+orig=1 forces a single point at max_vmin; anchor sits at max_vmin in 261/261 cases (exp267).
max_vmin - min_h_val: Typically small (diff=1 in 64%, diff≤2 in 90%) (exp267).

Delta Structure at Anchor (exp268-269)

Anchor ∉ E cases: 320 maximal partitions; val(δ at anchor) < max_vmin in 311 (97.2%), val(δ)=max_vmin with perfect cancellation in 9 (2.8%); new_at_anchor = 0 in 100%.
Cancellation audit (exp269): 9/9 val(δ)=max_vmin cases have leading-term cancellation, giving val(h′-res) > max_vmin.

Proof Path

Formalize max_vmin (max v_min over all partitions) inside Lean.
Use gains+orig=1 to isolate the anchor at max_vmin.
Show the anchor delta either drops below max_vmin or cancels when val(δ)=max_vmin, yielding h′-avoidance.
Combine with anchor-in-E avoidance to supply zero-new, then feed the existing zero-new → satisfier chain.

Full analysis in notes/axiom_proof_approach.md.

Alternative E-choices close Mechanism 2 (2025-12-25 08:00 AEDT)

What we learned (exp248-257): Anchor-in-E succeeds for 98.7% of Mechanism 2 configurations; every remaining case is fixed by a different anchor/F E-choice. No complete failures (0/4374 configs).

Implication: The zero-new axiom is empirically redundant; a zero-new witness always exists even when the canonical anchor-in-E choice fails.
Proof path: Define the anchor-choice finset (anchor at v_max in E with any distinct F) and apply exists_zero_of_sum_lt_card to get a zero-new choice by averaging/pigeonhole.
Integration: Once the witness exists, the existing Mechanism 2 wrappers (mechanism2_satisfier_from_anchor_…, with_delta_ge) deliver the satisfier axiom-free.

Mechanism 1 delta guard (2025-12-22 16:01 AEDT)

What we tested: exp208 examined anchor-in-E Mechanism 1 candidates (n ∈ {2,3}, p ∈ {2,3,5}, seeds 0–1999) to see how δ = h' - h sits relative to v_max on R-points.

Outcome: Anchor candidates = 128; zero-new successes = 124 (96.9%). Every success has δ-valuation > v_max on all R-points, making h'-valuation > v_max. All four failures have some δ hitting v_max and h'-valuation = v_max.
Delta vs h: 61.3% of successes have δ ≥ h on R; 38.7% succeed with δ < h but δ > v_max, so δ > v_max is the true guard.
Gaps: Success δ-gap min=1 (mean≈1.32); failure δ-gap=0. h'-gap min=1 (mean≈2.42) vs 0 in failures.
Implication: Formalising δ(P) > v_max for R when anchor ∈ E would close the Mechanism 1 branch via newAtVal_eq_zero_of_residuals_val_gt; δ ≥ h is not required.

Mechanism 2 Path 2 Formalised (2025-12-22 02:00 AEDT)

What changed: Added delta-dominance lemmas so the dominant Mechanism 2 path (95.3% of cases per exp198) now yields zero-new directly.

hPrime_residual_val_lt_of_delta_lt: If δ-val < h-residual val and δ-val < v_max, then val(h'-res) = val(δ) < v_max.
newAtVal_eq_zero_of_R_delta_lt: For any R-set, δ-val below both h-residual and v_max (plus coprime denominators and h' keeping R as nonfits) forces newAtVal … v_max = 0.
Next: Derive δ-dominance from the exp198 interpolation formula (anchor ∈ E, P_max ∈ R) and add a partial-cancellation lemma for the remaining 4.7% Path 3 cases.

BREAKTHROUGH: Universal Zero-New Existence (2025-12-21)

Key result (exp188-191): When gains+orig=1 at maximal v_min, there ALWAYS exists an E choice with new_{v*}=0. Verified in 521/521 cases (100%).

Why this matters: This provides the structural guarantee for Step 2 of the n+1 theorem proof. When gains+orig=1, the zero-new candidate gives c_{max_vmin} = 1 - 0 = 1 ≥ 1.
Two-mechanism discovery:
1. When h_min_val = max_vmin (6.5% of cases): The anchor point has h-residual valuation at the maximal v_min. Including it in E removes its contribution to new.
2. When h_min_val < max_vmin (93.5% of cases): The anchor is at a lower valuation. The gains+orig=1 contributor at max_vmin is a DIFFERENT point. Zero-new exists because h'-residuals avoid max_vmin for some E choice.
Mechanism overlap:
- anchor-in-E gives zero-new: 85.6% of cases
- anchor-in-R gives zero-new: 79.3% of cases
- Both mechanisms overlap significantly, providing multiple paths to zero-new.
Formalization path: Case analysis on whether h_min_val = max_vmin. Each case has its own structural argument for zero-new existence.

Lean documentation updated: zero_new_universal_existence_doc lemma captures the two-mechanism analysis.

Lean link: zero-new ⇒ c_v = 1 (2025-12-21 10:00 AEDT)

New lemma: marginCoeffAtVal_eq_one_of_gains_orig_one_new_zero proves that when gains+orig = 1 and new = 0 at valuation v, the margin coefficient satisfies c_v = 1.

Why it matters: Directly converts the zero-new witnesses from the two-mechanism analysis into a satisfier coefficient bound in the maximal v_min partition, advancing the axiom-free Step 2 proof.
Status: Integrated into PadicRegularisation.NplusOneTheorem; build remains sorry-free (warnings only).
Next: Thread this lemma into the maximal-partition case analysis to eliminate the fit-preserving axiom.

Build verified + integer corollary plan (2025-12-20 16:00 AEDT)

Status check: `~/.elan/bin/lake build PadicRegularisation.NplusOneTheorem` still passes (0 sorrys, style warnings only).

Usability gap: The p-free Case-1 satisfier and c_{v_min} ≥ 1 corollaries exist, but integer users still have to supply coprimality hypotheses.
Planned fix: Compose hyperplane_pfree_of_int_coeffs and dataset_pfree_of_int_coords with the p-free corollaries; investigate a Cramer-style lemma showing fit-preserving interpolants stay p-free when p ∤ det.
Axiom elimination path: Formalize the maximal v_min partition counting argument (fixed gains+orig at v_min, varying new) to force a satisfier and eliminate the k < n axiom.

Integer datasets/hyperplanes are p-free (2025-12-20 14:00 AEDT)

Formal upgrade: Added integer→p-free helper lemmas so users with integer coefficients/data automatically satisfy the coprimality hypotheses.

New lemmas: hyperplane_pfree_of_int_coeffs, datapoint_pfree_of_int_coords, dataset_pfree_of_int_coords.
Build remains clean (lake build PadicRegularisation.NplusOneTheorem, warnings only).
Next: surface the integer corollary in the top-level theorem wrappers and continue eliminating the k < n axiom.

P-free Case-1 satisfier chain (2025-12-20 12:00 AEDT)

Formal upgrade: Added minloss_fp_is_satisfier_at_margin_minimum_of_pfree, which plugs the p-free violator→negative-margin lemma directly into the Case-1 contradiction.

Both p-free corollaries minloss_fp_is_satisfier_at_margin_vmin_of_pfree and minloss_fp_cmin_ge_one_exact_of_pfree now delegate to the p-free Case-1 lemma (no manual coprimality threading).
Build remains clean (lake build PadicRegularisation.NplusOneTheorem, warnings only).
Next: expose the p-free Case-1 chain in the top-level theorem wrappers for quick reuse.

P-free violator → negative margin (2025-12-20 10:00 AEDT)

Formal upgrade: Added violator_at_margin_vmin_implies_neg_margin_of_pfree, which automatically supplies residual coprimality when coefficients and data are p-free.

The Case-1 contradiction (violator at margin v_min ⇒ margin < 0) no longer needs manual coprimality hypotheses under p-free inputs.
Lean rebuild remains clean (lake build PadicRegularisation.NplusOneTheorem, warnings only).
Next: thread the p-free violator wrapper through the Case-1 satisfier corollaries and surface the p-free assumptions in the top-level wrappers.

P-free inputs now discharge coprimality (2025-12-20 08:00 AEDT)

Formal upgrade: Lean now includes p-free predicates (HyperplanePFreeness, DataPointPFreeness, DatasetPFreeness) and a residual propagation lemma.

New wrappers minloss_fp_is_satisfier_at_margin_vmin_of_pfree and minloss_fp_cmin_ge_one_exact_of_pfree automatically supply residual coprimality when inputs are p-free.
Build remains clean (lake build PadicRegularisation.NplusOneTheorem); the proof stack no longer requires manual coprimality side conditions for integer/p-free datasets.
Next: propagate the p-free corollary to the higher-level very-strong GP wrappers and main theorem statement.

Residual coprimality is stable for p-free inputs (2025-12-20 06:00 AEDT)

Monte Carlo check: Random rational hyperplanes/data (n=3, m=8, denominators 1–6) were sampled to test the new coprimality hypothesis.

Allowing denominators with p factors → 52–70% of nonzero residual denominators were divisible by p (p ∈ {2,3,5}), so the coprimality assumption is not automatic.
Restricting all input denominators to be p-free → 0/47k nonzero residuals had denominators divisible by p (0% failure), supporting “p-free inputs ⇒ p-free residuals”.
Implication: datasets/hyperplanes with p-free denominators should satisfy the Lean proof’s coprimality precondition; a formal lemma to propagate p-freeness through residuals is underway.

Cross-base crossovers are rare and localized (2025-12-16 16:00 AEDT)

Finding (exp162): When the min-loss FP is chosen at r=5 (n=2, p=2), negative margins at other bases are extremely rare: 3/997 valid seeds (0.30%) went negative.

All crossover cases still have c_{v_min} = 1; negativity comes from head–tail cancellation near r≈2.2–2.4.
Worst case seed=120: min_margin ≈ -1.76 at r≈2.43 with coefficients {1→5, -2→1, -1→-4}. Seeds 62 and 110 reappear with milder negatives.
Average min_margin = 1.86 across seeds; minima cluster in a narrow base band.
Implication: Cross-base sign flips exist but are sparse and local; the theorem is safe because each base’s min-loss FP provides the witness.

🔬 CLARIFICATION: Margin can cross zero across bases, but theorem still holds (2025-12-16 15:00 AEDT)

Key Discovery (exp160-161): The min-loss FP at r=5 can have NEGATIVE margin at intermediate bases (r ≈ 2-2.5), even with c_{v_min} = 1 > 0!

Example (n=2, p=2, seed=62):

Margin polynomial: M(r) = r² - 4r + 3 + 2/r²
Coefficients: {-2: 1, -1: -4, 0: 3, 2: 2}

At r=2: M(2) = 4 - 8 + 3 + 0.5 = -0.5 < 0  ← NEGATIVE!
At r=5: M(5) = 25 - 20 + 3 + 0.08 = 8.08 > 0  ← POSITIVE!

Why This Doesn't Break the Theorem

The theorem "∃h' that beats h at base r" is about existence at a fixed base r, not about a universal witness h' across all bases.

At r=5: The min-loss h'(5) beats h (✓)
At r=2: A different min-loss h'(2) beats h (✓)
At every base r > 1: The min-loss FP at that base beats h (verified 100%)

Experimental Verification (exp161)

For the crossover cases, we verified that at every base r > 1, SOME h' beats h:

Case seed=62:
  r=1.5: best_margin=1.59 ✓ (10/21 beat h)
  r=2.0: best_margin=1.50 ✓ (4/21 beat h) ← different h' than at r=5
  r=2.2: best_margin=1.50 ✓ (4/21 beat h)
  r=5.0: best_margin=8.08 ✓ (6/21 beat h)

Proof Strategy Implications

For any fixed base r > 1: The theorem holds — the min-loss FP at that base beats h
The Lean proof at r ≥ 5 is correct for that base range
The structural bound (gains + orig ≥ new + 1) ensures c_{v_min} ≥ 1, but this doesn't prevent crossover at other bases due to negative intermediate coefficients
Key insight: Different bases may require different h' witnesses, but existence is guaranteed at each base

Files: experiments/exp160_minloss_r5_margin_all_bases.py, exp161_crossover_analysis.py

Small-base margin positivity via tail compensation (2025-12-16 14:00 AEDT)

Finding (exp157): For r ∈ (1,3], the margin stays positive even when the dominant coefficient is nonpositive.

c_{v_min} ≤ 0 cases decline rapidly with base: 13.3% at r=1.2 → 6.7% at r=1.5 → 0.8% at r=2.0 → 0% by r≥2.5 (n=3, p∈{2,5}).
Negative heads (avg -1.3 to -2.0) are overwhelmed by positive tails (avg 3.0–4.7), yielding margins ≥ 1.0 in all cases.
Coefficient-sum margins match direct loss differences (max gap ≈ 1.8e-15), so the decomposition is exact.
Implication: The positive above-v_min mass alone can secure margin positivity for r>1, suggesting a formal proof route without invoking the structural bound.

🎯 MAJOR BREAKTHROUGH: gains + orig ≥ new + 1 at margin's v_min (2025-12-16 09:35 AEDT)

Key Discovery (exp147-151): For the loss-minimizing FP at r ≥ 5, at the margin's v_min:

gains_{v_min} + orig_{v_min} ≥ new_{v_min} + 1

This directly implies c_{v_min} = gains + orig - new ≥ 1 > 0.

Experimental Evidence

exp147: Tail domination (new ≥ gains+orig at v > v_min) only holds 23%—NOT the mechanism.
exp148: At margin's v_min, gains + orig > 0 holds 100%.
exp149: The bound gains + orig ≥ new + 1 holds 100% for min-loss FP (1196 cases).
exp150: For ALL FP candidates, the bound only holds 34.5%—min-loss selection is CRUCIAL.
exp151: c_{v_min} ≥ 1 ⟹ margin > 0 (perfect implication, 0 counterexamples).

Proof Strategy

The structural bound follows from loss minimization selection pressure:

Creating a new residual at low valuation v costs r^{-v} in loss
For r ≥ 5, low-valuation residuals are exponentially expensive
If gains + orig < new + 1 at margin's v_min, h' creates net new residuals there
Another candidate h'' avoiding this has better loss
So min-loss selector h* cannot have gains + orig < new + 1 at margin's v_min

Lean status: Documentation lemmas added. Formal proof in progress.

Current Focus (2025-12-13 20:00 AEDT)

New lemma marginCoeff_nonneg_below_minResVal shows that for v < v_min(h'), c_v = gains_v + orig_v ≥ 0; negative mass is confined to valuations ≥ v_min.
Combine this with the counting bound Σ c_v ≥ |E| and the maximal v_min partition to force c_{v_min} > 0 for the min-loss fit-preserving interpolant.
Then apply the dominance guard to propagate margin > 0 from r ≥ 5 to all bases r > 1.

Current Focus (2025-12-13 02:00 AEDT)

Preparing the formal proof that the loss minimiser lies in the maximal v_min partition at heavy bases (r ≥ 5):

Numerical inequality to formalise: if v_low < v_high and m < r, then r^{-v_low} = r^{v_high - v_low} · r^{-v_high} ≥ r · r^{-v_high} > m · r^{-v_high}, so lower-v_min candidates pay a strictly larger dominant penalty.
Next lemma: loss_penalty_of_lower_vmin capturing that inequality with an explicit tail bound m.
Goal: use the lemma to force the min-loss fit-preserving interpolant into the top v_min partition, then apply the counting argument to conclude c_{v_min} > 0 and feed the dominance guard.

🎯 BREAKTHROUGH: Min-loss FP always has c_{v_min} > 0 (2025-12-12 21:00 AEDT)

exp129-132: The loss-minimizing fit-preserving interpolant always has positive dominant coefficient!

exp129: For any valuation v, Σ_E c_v(E) can be positive or negative, but max_v {Σ_E c_v(E)} > 0 in 100% of 400 cases.
exp130: At max v_min level, sum of c_min over candidates is always positive (100%), implying at least one positive candidate exists.
exp131: Earlier "failures" (seeds 177, 186, 136) still have positive c_min candidates at lower v_min; the gap is exactly 1 level.
exp132: KEY RESULT — Min-loss FP has c_{v_min} > 0:
- At r_ref ≥ 5.0: 100% success rate (4800 cases)
- At r_ref = 2.0: 99.9% success rate (1 failure in 1200 cases)

Why Min-Loss FP Has c_{v_min} > 0

The min-loss FP minimizes Σ_{r ∈ R} r^{-v(res(h*,r))}:

This avoids creating low-valuation (high-loss) residuals at remaining points R
Therefore "new_{v_min}" is kept small by the minimization pressure
The terms "gains_{v_min} + orig_{v_min}" are fixed (depend only on h)
Result: c_{v_min} = gains + orig - new > 0

Proof Strategy to Eliminate Axiom

To eliminate the general_fit_preserving_exists axiom:

Define min-loss FP over fit-preserving finset at base r ≥ r₀
Prove c_{v_min}(h*) > 0 for this minimizer
Apply dominance guard: c_{v_min} > 0 ⇒ margin > 0 for all r > 1
Conclude h* beats h, completing the k < n case

Files: experiments/exp129_fixed_valuation_sum.py, exp130_dominant_level_structure.py, exp131_failure_seeds_detailed.py, exp132_minloss_cmin_positive.py

Gap-to-positive rescue for dominance guard (2025-12-12 12:00)

exp128: When all max-v_min candidates have c_min ≤ 0, a positive candidate always appears one valuation level lower.

Coverage: n ∈ {2,3,4}, p ∈ {2,3}, seeds 0–199. Max-v_min succeeds in 1197/1200 cases (99.75%).
All 3 strict failures are rescued at v_min-1 (gap histogram: {0: 1197, 1: 3}); these are the same seeds identified in exp126/127.
Failure fingerprint: top level has 1–2 candidates with c_min = -1 coming purely from “new” residuals; the next valuation level has many candidates with c_min = +2.
Implication: Total coefficient mass Σ_v c_v = |E| > 0 must land on some valuation level; when the top level is new-only, the positive mass concentrates immediately below, suggesting a counting-based proof of c_{v_min} > 0 existence.
Files: experiments/exp128_gap_level_rescue.py, results/exp128_gap_level_rescue.json

Max-v_min guard failure structure (2025-12-12 10:05)

exp126: Audited the rare cases where the max-v_min dominance guard picks c_min ≤ 0.

Coverage: n ∈ {2,3,4}, p ∈ {2,3}, seeds 0–199. All max-v_min c_min > 0 in 1172/1200 cases (97.7%); any positive at max-v_min in 1197/1200 (99.8%).
Strict failures = 3 seeds (n=2,p=2: 177,186; n=2,p=3: 136). In every strict failure the positive candidate sits one valuation level below the top v_min.
Failure anatomy: the dominant valuation is “new-only” — gains_vmin and orig_vmin contribute 0 (or 1) while new_vmin contributes 1–2, yielding c_min = -1 with only 1–2 candidates at that v_min.
Implication: Max-v_min fails only when the top valuation lacks positive mass from extras or original residuals. Dropping one valuation level restores c_min > 0, suggesting a valuation-counting proof that some v has gains_v + orig_v > new_v.
Files: experiments/exp126_max_vmin_failure_analysis.py, results/exp126_max_vmin_failure_analysis.json

Dominance guard fixes near-binary selection (2025-12-12 08:00)

exp122: Enforcing a dominance guard (min-loss among fit-preserving interpolants with c_{v_min} > 0) keeps the margin positive even when the reference base r_ref is near-binary.

Configs: n ∈ {4,5}, p ∈ {2,3}, seeds 0–119, r_ref ∈ {1.1, 1.3, 1.5, 2, 3, 5}; margin checked on r ∈ {1.05…50}.
Existence: Every case had at least one candidate with c_min > 0 (100%). Min-loss without the guard failed to keep c_min > 0 in up to 5.8% of near-binary runs (p=2) but was perfect for r_ref ≥ 2.
Guarded success: The dominance-guarded pick achieved 100% positive margins on the full grid for all configs and bases.
Safety margin: Worst guarded min-margin ≥ 3.88 (n=4, p=3); p=2 cases were ≥ 4.04, so positivity is robust, not knife-edge.
Implication: Adding a simple c_min > 0 guard resolves the r_ref sensitivity observed in exp115 and offers a constructive selection rule for the proof pipeline.

High-d margin polynomial mostly stable; near-binary r_ref can fail (2025-12-12 06:13)

exp115: In n ∈ {4,5}, p ∈ {2,3} with r_ref ∈ {1.5, 5.0, 20.0}, the min-loss FP kept the margin polynomial positive in 479/480 cases.

479/480 cases had c_{v_min} > 0 and margin > 0 on r ∈ {1.05…100}. All r_ref ≥ 5 runs were perfect.
Single failure: n=4, p=2, r_ref=1.5 (seed 21) picked extras giving c_{-2} = -1, driving margin negative for large r. The same dataset is fine when the minimiser is chosen at r_ref ≥ 5.
Implication: Near-binary selection can choose a harmful interpolant even though moderate/heavy bases remain robust. A proof or algorithmic guard may need an r_ref lower bound or a dominant-term check.

Margin polynomial stays positive for min-loss FP (2025-12-12 04:03)

exp114: For k < n, the min-loss fit-preserving interpolant has a margin polynomial with positive dominant coefficient and positive value on r ∈ {1.05…50} across 800 trials.

Configs: n ∈ {2,3}, p ∈ {2,3}, m = n+4, 200 seeds each; min-loss chosen at r_ref = 5.0 via exhaustive search.
Results: 800/800 cases had c_{v_min} > 0 and margin > 0 on the full r grid. Worst min margin ≈ 0.0016 (n=2, p=2); other configs were much larger.
Interpretation: Positive dominant coefficient extends to full-r positivity, suggesting the margin polynomial for the min-loss FP never crosses zero. Reinforces the conjectured bound offset(h*) < gain(h*).
Next: Sweep n ≥ 4 and alternate r_ref choices; formalise the gain/orig/new valuation count in Lean to force c_{v_min} > 0.

BREAKTHROUGH: Margin polynomial has positive dominant term (2025-12-12 03:00)

exp111-113 reveal the algebraic structure: margin = Σ c_v·r^{-v} where the dominant (lowest v) coefficient is ALWAYS positive (100% of 200 cases).

The margin formula is a polynomial in r^{-1} with integer coefficients at each valuation level.
Key formula: c_{v_min} = (gains at v_min) + (orig R at v_min) - (new R at v_min) > 0.
Distribution of dominant coefficients: c_min=1 (52.5%), c_min=2 (25.5%), c_min=3 (11.5%), c_min≥4 (10.5%).
Worst case seed 186: margin = 2·r^{-1} + r^{-4} — both coefficients positive!
Why positive: The min-loss FP selection avoids E choices that would create many low-valuation residuals in R.
Implication: The bound offset < gain is algebraically forced by the selection criterion, not accidental. Margin > 0 for all r > 1.

NEW: Offset < Gain holds in higher dimensions (2025-12-12)

exp110 extends the offset/gain bound to n=3,4 with heavy bases (r up to 50): max offset/gain = 0.9257 < 1.

Configs: n ∈ {3,4}, p ∈ {2,3}, r ∈ {2,5,10,20,50}, m ≈ n+3..5, 30–80 trials each.
Results: Success rate 100% and anchor rate 75–93% everywhere. Global max ratio 0.9257 at (n=3, p=2, r=20); n=4 maxima stay ≤ 0.669.
Implication: The conjectured lemma offset(h*) < gain(h*) appears dimension-stable and strict even for large r, reinforcing the Lean target to formalise this bound for the k < n proof.

NEW: Anchoring persists across bases (2025-12-12)

exp105 shows the loss-minimising fit-preserving interpolant beats h in 100% of trials even near binary or heavy bases.

Configs: n ∈ {2,3}, p ∈ {2,3}, r ∈ {1.2,1.5,2.0,3.0,5.0}, m = n+4.
Anchoring (offset ≤ 0) rate = 74–88%; average offsets remain negative (≈ -0.36…-9.28) and margins are strongly positive (≈ 3.7–51).
Min-loss rarely equals min-offset (10–30%), yet still wins. Proof path should target the loss minimiser directly and bound offset relative to gain (e.g., offset ≤ c·gain).

NEW: Min-loss fit-preserving witness formalised (2025-12-11)

Lean lemma fitPreserving_minimiser_beats shows that once any fit-preserving interpolant improves loss, the loss minimiser over the fit-preserving finset also beats h.

Packages the empirical “min-offset/min-loss always wins” rule into a canonical witness via Finset.exists_min_image.
Build is clean; ready to drop into the k < n proof once the existence step is axiom-free.
Next: prove k < n existence (e.g., τ-source transfer or alternative selection) so the minimiser closes the remaining axiom gap.

NEW: k = n case is axiom-free under very strong GP (2025-12-10)

Lean now proves a fit-preserving improvement without axioms whenever countExactFits h = n and the data are in inVeryStrongGeneralPosition.

exists_fit_preserving_interpolant_lt_k_eq_n_veryStrong (tau-source instantiation) provides the axiom-free witness.
New wrapper exists_fit_preserving_interpolant_lt_veryStrong automatically routes k = n to this proof and only falls back to the axiomatic path when k < n.
New optimality wrappers non_interpolating_strictly_suboptimal_veryStrong, hyperplane_passes_through_nplusone_veryStrong, and optimal_passes_through_exactly_nplusone_veryStrong propagate the axiom-free k = n proof to the main theorems. Remaining axiom usage: k < n only.
Open gap: The k < n branch still relies on general_fit_preserving_exists; next step is a multi-vector/induction argument to remove it.

NEW: Averaging over fit-preserving interpolants fails (2025-12-09)

Counting/averaging is insufficient: the average margin over all fit-preserving interpolants is often negative, even though a good interpolant exists.

Experiment: exp093 (n ∈ {2,3}, p = 2, r = 2.0, m = n+4, 10 seeds each) enumerated all fit-preserving interpolants of random h with 1 ≤ k < n+1 fits.
Average margin positive in only 3/10 trials for n=2 and 4/10 for n=3.
However, the best margin was always positive (minimum best margin 0.75 in n=2, 1.4375 in n=3).
Implication: A proof based on Σ margins > 0 is false empirically; the existence proof must identify a structural subset or selection rule (e.g., greedy/product-formula-guided) rather than rely on averaging.

MAJOR FINDING: n+1 Theorem Empirically Validated (2025-12-07)

The n+1 theorem is TRUE - with 100% success rate for data in strong general position.

The Theorem

For data in general position, if hyperplane h has k < n+1 exact fits, there EXISTS h' with:

More fits: fits(h') > k
Lower loss: L(h') < L(h)

Key Discoveries

Perturbation approach fails: The construction h' = h + ε·δ only works 60-90% of the time. Pointwise losses can increase substantially at non-fit points.
Existence is guaranteed: When testing ALL interpolating hyperplanes (not just perturbations), we find that SOME h' with lower loss always exists.
Statistics: 100% success rate across 1,265 valid trials in n=2,3 with p=2,5.

Implication for Lean Proof

The current Lean proof uses a perturbation-based approach which doesn't always work. The proof needs to be revised to use a non-constructive existence argument (showing that among all C(m,n+1) interpolating hyperplanes, at least one has lower loss).

NEW: Interpolant Loss Minimiser Dominates (2025-12-07)

The loss-minimising interpolant h_min strictly beats arbitrary h with k < n+1 fits in all tested cases.

Configs tested: dims {2,3}, primes {2,5}, r=2.0, m = n+4, 300 trials/config; strong GP enforced.
Success: 100% across 760 valid runs. Margins are robust: min/median/p90 ≈ {0.375, 2.125, 3.62} (n=2,p=2) up to {1.25, 4.00, 4.75} (n=3,p=5); average fit gain ≈ 2.8–3.8 points.
Implication: Empirical evidence that the finite interpolant minimiser provides a strict loss drop with multiple extra fits, bolstering the non-constructive Lean strategy.

Update: Inversion Feature Fingerprints (2025-12-07)

Padic-only vs L2-only inversions leave distinct feature signatures. A post-hoc analysis of exp072 (720 datasets) measured diversity/duplication and k-path timing.

Padic-only (12 cases): High diversity (avg 0.99) with minimal duplication (max_y_dup ≈ 1.1), concentrated at base=6 in 3D (11/12) with p∈{5,11}. First k-up is early (≈0.31 of path) and only 33% of k-ups pair with a reg-penalty drop → valuation-gap bounces.
L2-only (24 cases): Lower diversity (0.67) and heavier duplication (max_y_dup ≈ 2.9), mostly 4D (21/24) with p∈{5,11}; base=8 is the largest cluster. First k-up later (≈0.56) and always coupled to a reg-penalty drop (100%) → smooth late swaps.
Overlap (9 cases): All in 4D with p=2, high duplication (max_y_dup = 3.0), late k-ups (≈0.59–0.66) with full reg-drop coupling.
Takeaway: Padic-only failures prefer diverse, low-dup datasets and early valuation jumps; L2-only failures need duplication and trigger later when penalty ordering flips.

Update: Inversion Geometry Gap (2025-12-06)

p-Adic-only and L2-only inversions arise from different geometric triggers. A 720-dataset sweep (dims {3,4}, primes {2,5,11}, bases {2,3,4,6,8}) mapped where each penalty inverts.

Padic-only: 12 cases, concentrated in 3D at bases 3 & 6 with p=5/11. Paths are longer (avg len 5.17) with early k-up (first_up ≈ 0.31) and only 33% of k-ups tied to reg drops → early valuation-gap bounces.
L2-only: 24 cases, mostly 4D with p=5/11 across all bases. k-up happens later (first_up ≈ 0.56) and always coincides with a reg drop (100%), indicating late, smooth swaps.
Overlap: 9 cases, all 4D with p=2 across bases {2,3,4,6,8}. Non-overlap persists because padic inversions ignite early while L2 inversions are late and penalty-driven.

Update: Near-Binary Transition Map (2025-12-06)

p-Adic inversions reappear just above the near-binary regime. A fine-grained base sweep (r ∈ {1.02…2.0}, dims {1,2,3}, primes {2,5,11}, 2,160 datasets) mapped where monotonicity breaks.

1D stays monotone: 0/720 inversions across all bases and penalties.
Earliest flips: p-adic inversions at r=1.05 (p=5, 2D) and r=1.10 (p=2, 2D); r=1.3 is inversion-free.
Main bump: r≈1.4–1.5 drives the highest p-adic rates (up to 12.5% in 2D for p=2); small blips in 3D for p∈{2,11}.
L2 nearly immune: Only 2 inversions total (3D p=2 at r=1.4; 3D p=5 at r=2.0), with no overlap with p-adic cases.

Update: Off-Prime Base Sweep (2025-12-06)

p-Adic dominance fails when the loss base r ≠ p. A 2D/3D sweep across primes {2,5,11} and bases {1.1,1.5,2,5} (288 datasets) found p-adic-only inversions.

Counts: L2 inversions = 1; p-adic inversions = 4 (all 2D). 3D stayed monotone for both penalties.
Near-binary surprise: r=1.1 already produced a p-adic-only inversion (p=2, seed=5) while L2 remained monotone.
Heavy-base asymmetry: r=5.0 triggered p-adic-only inversions at p=2 and p=5; L2 stayed monotone in those cases.
Example dataset: dim=2, p=2, base=1.1, n=6 with k-paths L2 [3,3,3,2,1] vs p-adic [3,2,3,2,1] (inversion).

MAJOR UPDATE: The Penalty Gap Mechanism (2025-12-05 Night)

p-Adic monotonicity dominance is a 1D phenomenon - it breaks in higher dimensions!

Discovery

In 2D/3D, p-adic can invert when L2 stays monotonic (5/160 vs 2/160 in exp059).

The Mechanism: Integer Solutions with High Valuations

3D p=2 seed=21 example:
  Fractional-slope candidate (k=4): β = [-2.08, -0.73, 0.50]
    v_p(β_slopes) = [0, 0, -1], R_padic = 2.5

  Integer-slope candidate (k=5): β = [-3, -7, 8]
    v_p(β_slopes) = [0, 0, 3], R_padic = 2.125

  The integer candidate fits MORE points AND has LOWER p-adic penalty!
  Why? 8 = 2³ has v₂(8) = 3, giving |8|₂ = 0.125

In L2: 8² = 64 (huge penalty), so L2 never switches to integer candidate.

Near-Binary Base Results

Base r	p-adic inversions
1.02	1
1.1	1
2.0	1
5.0	6

Near-binary bases partially suppress p-adic inversions but don't eliminate them.

Revised Understanding

Property	1D	2D/3D
p-adic more monotonic than L2?	YES	NO
p-adic inversions possible?	Rare	More common
Integer solutions favored?	Sometimes	Often

Update: 4D Base-Density Resample (2025-12-21)

Higher-seed sweep of 4D inversions across mid/heavy bases confirms base-flat behaviour driven mainly by extra points and small primes.

Near-binary immunity: r=1.05 stayed inversion-free for n ∈ {7,8,10} and primes {2,5,11}.
Dense n=10: p=5/11 invert in 1–2/6 runs once r≥1.5 (g_hat ≈1.67 for p=5; ≈3.67 with a bump to 7.33 at r=5 for p=11); p=2 stays zero.
Baseline n=8: only p=2 flips sparsely at r≥1.5 (g_hat ≈0.89); p≥5 invert only at r≥5 with low rates (1/6) and stay flat.
Sparse n=7: small-prime inversions rise at r=5 (p=2 g_hat ≈2.33; p=5 hits 1/6 at r≥5); p=11 remains inversion-free.
Takeaway: Beyond r≈1.5 the base effect is mild; inversion risk tracks extra points and small primes.

Update: Base-Factor Curve (2025-12-15)

Dense sweep over bases r ∈ {1.01…10} shows how g(r) = c(p, r)·p ramps up.

Near-binary immunity: r ≤ 1.1 produced 0/560 inversions across primes {2,5,11,17} (dims 1–2).
Two-phase ramp: g(r) ≈ 0.006 at r=1.2 → 0.057 at 1.5 → 0.099 at 2, then jumps to ≈0.30 at r=3 and ≈0.327 at r=5.
Heavy-base bump: r=10 lifts g_hat to ≈0.417, suggesting a secondary rise past the r≈5 plateau.
Prime scaling persists: Small primes drive most inversions for mild bases; p=11/17 only flip for r ≥ 3, consistent with c(r, p) ≈ g(r)/p.

Update: Corrected Inversion Counts (2025-12-09)

1D segments now export regularisation penalties, restoring accurate inversion detection.

No phantom failures: Every non-monotonic k-path now coincides with a Pareto inversion.
Updated 1D rates (r ≥ 1.5): n=5 → 2.5–5.0%, n=6 → 2.5–8.8%, n=7 → 7.5–17.5%.
Near-binary immunity: r ∈ {1.02, 1.1} still show 0/410 inversions.
Implication: Base sensitivity remains, but inversion events are rarer at small r and scale with extra points beyond d+1.

Finding 1: Confirmation of n+1 Point Theorem

Date: 2025-12-01

Status: Validated

In 1D (n=1) regression without regularisation, our exhaustive search algorithm consistently finds optimal lines passing through exactly 2 points (= n+1).

Evidence

Dataset: (0, 0), (1, 2), (2, 3), (3, 7)
Result at λ=0: line y = 0 + 2x passes through 2 points

Dataset: (0, 1), (1, 2), (2, 4), (4, 5)
Result at λ=0: line passes through 3 points (special case)

Note: When the data has special structure (nearly collinear), the optimal line may pass through more than n+1 points.

Finding 2: Discrete Threshold Behavior

Date: 2025-12-01

Status: Validated

The number of exactly-fitted points changes at discrete threshold values of λ, not continuously. This suggests a phase transition phenomenon.

Evidence

Dataset: (0, 1), (1, 2), (2, 4), (4, 5)

Threshold detected between λ=1.2 and λ=1.3:
  - Below threshold: 3 points fitted exactly
  - Above threshold: 1 point fitted exactly

Interpretation

This discrete behavior is consistent with the combinatorial nature of p-adic optimization: the optimal line either passes through a point exactly (residual = 0, infinite valuation) or it doesn't (finite valuation). There are no "partial" fits.

Finding 3: Regularisation Type Matters

Date: 2025-12-01

Status: Preliminary

p-Adic regularisation (penalizing |β|_p) behaves differently from real L2 regularisation (penalizing β²).

Evidence

Dataset: (0, 0), (1, 2), (2, 3), (3, 7)

Real L2 Regularization (λ=0.1):
  Intercept = 1, Slope = 1, Exact fits = 2

p-Adic Regularization (λ=0.1):
  Intercept = -5, Slope = 4, Exact fits = 2

The p-adic regularisation prefers coefficients with high p-adic valuation (more divisible by p), leading to different optimal solutions.

Finding 4: Monotonicity of k(λ) - REVISED

Date: 2025-12-02 (Original), 2025-12-03 (Revised)

Status: Conditional - Counter-examples Found!

Original claim: k(λ) is monotonically non-increasing in λ.

Revision (Day 3): Monotonicity is not universal. Stress testing found 157 counter-examples out of 8480 tests where k(λ) actually increases as λ increases!

Example Counter-Example

Dataset: x = [5, 7, 8, -8, -6], y = [10, 10, -7, 10, -5]

k-path (base r=2, prime p=2):
  λ ∈ [0.00, 0.07]: Line y = 3.18 + 1.36x  → k = 2
  λ ∈ [0.07, 12.3]: Line y = -5.86 - 0.14x → k = 2
  λ ∈ [12.3, ∞]:    Line y = 10 (horizontal) → k = 3

k INCREASES from 2 to 3 as λ increases!

Why This Happens

Three points share y=10: (5,10), (7,10), (-8,10)
The horizontal line y=10 passes through all three
At λ=0, a tilted line fits 2 points with better data loss
At large λ, the horizontal line (slope=0, reg penalty=0) wins
The horizontal line happens to fit more points!

Revised Understanding

Conditional Monotonicity Theorem (Conjectured):
k(λ) is monotonically non-increasing if and only if no horizontal hyperplane passes through more points than the optimal hyperplane at λ=0.

Statistics from Stress Test

Setting	Monotonic	Total	Rate
1D, n=4	1350	1500	90%
1D, n=6	747	1500	50%
1D, n=8	251	1500	17%
2D, n=6	188	200	94%
3D, n=6	52	60	87%

Higher dimensions have lower failure rates (harder to accidentally align points).

Finding 5: Analytical Threshold Formula

Date: 2025-12-02

Status: Validated

Thresholds occur when two candidate solutions have equal total loss. For 1D regression comparing lines with slopes b₁ and b₂:

λ* = (L₂ - L₁) / (b₁² - b₂²)

where L₁, L₂ are the data losses (without regularisation) of the two lines.

Evidence

Dataset: (0,1), (1,2), (2,4), (4,5)

Theoretical prediction: λ* = 1.25 (line y=1+x vs y=1)
Empirical threshold: λ ∈ [1.24, 1.25]

Perfect match between theory and experiment!

Finding 6: 2D Generalization Confirmed

Date: 2025-12-02

Status: Validated

The n+1 theorem and monotonicity extend to 2D regression (fitting planes to 3D points). At λ=0, optimal planes pass through at least 3 points (n+1 for n=2).

Evidence

15/15 random 2D datasets satisfy:
  - k(0) ≥ 3 (n+1 theorem holds)
  - k(λ) is monotonically non-increasing
  - Discrete thresholds exist (1-2 thresholds typical)

Example: Dataset with 5 points
  λ=0.00: plane z = 1 + 0.5x₁ + 1.5x₂ fits 4 points
  λ=0.50: plane z = 2 + x₁ fits 3 points
  λ=2.00: plane z = 5 fits 1 point

Finding 7: Prime Dependence of Thresholds

Date: 2025-12-02

Status: Validated

The threshold structure depends on the choice of prime p. Different primes yield different numbers and locations of thresholds.

Evidence

Dataset: (0,1), (1,2), (2,4), (4,5)

p=2:  1 threshold at λ ≈ 1.25 (k: 3→1)
p=3:  2 thresholds at λ ≈ 0.45 (3→2) and λ ≈ 3.95 (2→1)
p=5:  2 thresholds at λ ≈ 1.35 (3→2) and λ ≈ 3.95 (2→1)
p=7:  2 thresholds at λ ≈ 1.35 (3→2) and λ ≈ 3.95 (2→1)

This reveals that p-adic regression is truly prime-dependent, not just in the valuation function but in the geometry of the optimization landscape.

Finding 8: Exact Asymptotic Threshold Formula (MAJOR)

Date: 2025-12-02 (Evening)

Status: Validated

Discovered and validated an exact formula for the threshold λ* as a function of the loss base r. This provides complete analytical understanding of threshold behavior.

The Formula

λ* = (L₂ - L₁) / (b₁² - b₂²)

where L = Σᵢ r^{-v_p(residual_i)}

Asymptotic Expansion

The threshold admits an exact series:

λ* = c₀ + c₁/r + c₂/r² + c₃/r³ + ...

where c_k counts residuals with valuation exactly k.

Evidence (100% Match)

Dataset: canonical_threshold
Derived formula: λ* = 1 + 1/r²

Verification:
  r=2:    λ* = 1.25      (predicted: 1.25)     ✓
  r=3:    λ* = 1.111...  (predicted: 1.111...) ✓
  r=5:    λ* = 1.04      (predicted: 1.04)     ✓
  r=10:   λ* = 1.01      (predicted: 1.01)     ✓
  r=100:  λ* = 1.0001    (predicted: 1.0001)   ✓
  r=1000: λ* = 1.000001  (predicted: 1.000001) ✓

Error: 0.00 (exact match for all tested values)

Dataset: gentle_line
Derived formula: λ* = 1 + 1/r
Also verified with zero error.

Physical Interpretation

The formula reveals that threshold behavior is entirely determined by the p-adic valuations of residuals. The limit as r→∞ depends only on residuals with valuation 0 (coprime to p).

Finding 9: d=4 and d=5 Validation

Date: 2025-12-03

Status: Validated

Extended the exact threshold solver to higher dimensions (d=4, d=5). The n+1 theorem and threshold formula continue to hold.

Results

d=4 (8 points): 14/14 configurations monotonic
  - canonical_4d: k path 5→2→1 (2 thresholds)
  - random datasets: k path 5→2 (monotonic)

d=5 (9 points): 2/2 configurations monotonic
  - canonical_5d: k path 6→5 (1 threshold)
  - random: k path 6→2 (4 thresholds)

All d≥4 tests passed n+1 property: k(0) ≥ d+1

Computational Scaling

d=4: ~140 candidate hyperplanes for 8 points
d=5: ~200 candidate hyperplanes for 9 points
Time: ~30ms per test

Finding 10: Pareto Frontier Monotonicity Theorem (MAJOR)

Date: 2025-12-05

Status: Validated - 100% Accuracy

After the conditional monotonicity conjecture failed (Day 4), we discovered the true necessary and sufficient condition for k(lambda) monotonicity.

The Theorem

Pareto Frontier Monotonicity Theorem: For L2-regularised p-adic linear regression, k(lambda) is monotonically non-increasing in lambda if and only if there is no "Pareto inversion" in the optimal path.

A Pareto inversion occurs when, for consecutive optimal solutions beta_1, beta_2 with R(beta_1) > R(beta_2), we have k(beta_1) < k(beta_2).

Intuitive Explanation

As lambda increases, the optimal solution moves toward lower regularisation penalty R. If a lower-R solution happens to fit MORE points than a higher-R solution, then when we switch to it, k increases - breaking monotonicity.

Example

Segment 1: R = 11.56, k = 2
Segment 2: R = 3.24,  k = 1
Segment 3: R = 0.049, k = 2  ← INVERSION! k increased
Segment 4: R = 0.0,   k = 1

k-path: 2 → 1 → 2 → 1 (non-monotonic due to inversion at segment 3)

Validation

Metric	Value
Total Tests	990
Monotonic (correctly predicted)	949
Non-monotonic (correctly predicted)	41
False Positives	0
False Negatives	0
Accuracy	100.00%

Why This Supersedes the Conditional Monotonicity Conjecture

The earlier conjecture (k_opt >= k_horiz implies monotonicity) was incomplete because it only considered horizontal hyperplanes. The Pareto criterion considers ALL low-penalty candidates, including tilted planes that happen to have high exact-fit counts.

Finding 11: Pareto Theorem Formally Proven

Date: 2025-12-07

Status: Proven

The Pareto Frontier Monotonicity Theorem is now rigorously proven (see proofs/pareto_monotonicity_proof.md).

Key Lemmas

Finite Candidates: Optimal β*(λ) belongs to a finite set C for all λ ≥ 0
Piecewise Linear: Total loss L_total(c; λ) = L_data(c) + λ·R(c) is linear in λ for each candidate
R Decreases at Transitions: At every transition λ* from c_1 to c_2, R(c_1) > R(c_2)

Proof Sketch

The proof follows directly from the structure:

k(λ) non-monotonic ⟺ k increases at some transition
At any transition, R strictly decreases (by construction of lower envelope)
k increasing while R decreasing ⟺ Pareto inversion by definition

Therefore: k(λ) non-monotonic ⟺ Pareto inversion exists. ∎

Finding 12: Strong Inversion Predictor

Date: 2025-12-07

Status: Validated - 91.7% accuracy

The simple test "does the horizontal hyperplane fit more points than the λ=0 optimum?" is a strong predictor for inversions:

When k_horiz > k_lambda0: 91.7% have inversions (11/12 cases)
When k_horiz <= k_lambda0: only 3.8% have inversions (53/1378 cases)

Practical Implication

This provides a cheap O(n) heuristic for detecting inversion-prone datasets without computing the full candidate set.

Finding 13: Empirical Inversion Probability Formula

Date: 2025-12-07

Status: Empirically Validated

From testing 1390 datasets, the inversion probability follows a scaling law:

P(inversion) ≈ 0.10 × excess_points

where excess_points = (n - d - 1) / n

Interpretation

excess_points measures the fraction of "extra" data points beyond the minimum (d+1) needed to define a hyperplane
At n = d+1, excess_points = 0, so P(inv) = 0 (inversions impossible)
As n grows relative to d, more extra points → more chances for inversions

Inversion Rates by Configuration

Dim	n	excess_points	Observed Rate
1	4	0.50	0.5%
1	5	0.60	2.0%
1	6	0.67	6.5%
1	7	0.71	11.3%
1	8	0.75	16.0%
2	5	0.40	1.3%
2	6	0.50	10.0%
3	6	0.33	1.7%

Why Higher Dimensions Help

In higher dimensions, d+1 is larger, so excess_points = (n-d-1)/n is smaller for the same n, reducing inversion probability. This explains why 2D and 3D have lower inversion rates than 1D at comparable n values.

Finding 14: k_horiz Predictor Limitation (MAJOR)

Date: 2025-12-05

Status: Important Limitation Discovered

The k_horiz > k_lambda0 predictor (91.7% accurate on random data) fails completely on axis-duplication structured data.

Evidence (exp047)

540 runs: dims {2,3,4}, n=10, axis-dup generator
- Only 1/540 runs had k_horiz > k_lambda0 (100% precision, 2% recall)
- 48/49 inversions happened when k_horiz <= k_lambda0

Why This Happens

In axis-duplication data:

k_lambda0 ≈ 7-10 (tilted axis-aligned plane fits MANY points due to alignment)
k_horiz ≈ 4 (horizontal only fits the most duplicated y-value)
k_horiz is almost always < k_lambda0

New Inversion Mechanism

Example k_path: [10, 3, 4]
1. At λ=0: tilted axis-aligned plane fits all 10 points (k=10)
2. At higher λ: intermediate plane with k=3 takes over
3. At high λ: horizontal (R=0) with k=4 wins

The inversion is 3→4, NOT related to k_lambda0!

Implications

k_horiz > k_lambda0 is NOT a universal inversion predictor
Works for random data where k_lambda0 ≈ d+1
Fails for structured data where axis alignment inflates k_lambda0
Two different regimes require different detection strategies

Finding 15: k_min Predictor Breakthrough (MAJOR)

Date: 2025-12-05

Status: Validated - F1=0.962 with Perfect Precision

A new predictor based on k-path analysis achieves near-perfect inversion detection, vastly outperforming all previous approaches.

The Predictor

k_min_intermediate < k_final: Check if the minimum k among all segments (except the final one) is less than the final k.

Results (exp049)

Predictor	Precision	Recall	F1	FPR
k_min < k_final	1.000	0.927	0.962	0.000
k_min < k_horiz	0.927	0.927	0.927	0.007
duplication (baseline)	0.123	0.976	0.219	0.638
k_horiz > k_lambda0	0.000	0.000	0.000	0.000

Two Types of Inversions Discovered

Type 1: End inversions (92.7%) - Captured by k_min_intermediate < k_final

The minimum k occurs before the final segment
Example: [10, 2, 4] → inversion at 2→4

Type 2: Middle inversions (7.3%) - Missed by k_min predictor

k dips and rises in the middle, but final k is still the minimum
Example: [3, 2, 3, 2, 1] → inversion at 2→3 in middle

The Perfect Predictor (Tautological)

Checking if k ever increases anywhere in the path achieves F1=1.000. This is tautological (inversion = k increases), but validates the theoretical framework.

Why This Matters

Theoretical completeness: We now have a perfect characterization of inversions via k-path analysis
Practical prediction: k_min < k_final catches 92.7% of inversions with zero false positives
k_horiz limitation explained: It fails on structured data because k_lambda0 is inflated by axis alignment

Finding 16: Tie-Point Phenomenon (2025-12-05)

Status: Analyzed and Explained

Investigation of the rare case from exp050 where k increased without R decreasing revealed a new phenomenon: tie points.

What is a Tie Point?

A tie point occurs when two candidates have exactly equal total loss at some λ:

At λ ≈ 1.0667:
  k=3 candidate: L_data=3.2 + λ*0.25 = 3.467
  k=2 candidate: L_data=3.4 + λ*0.0625 = 3.467
  Both candidates are equally optimal!

Observation

At tie points, the solver records both candidates, creating zero-width segments (width ≈ 10⁻¹⁵). The k "oscillation" (2→3→2) is multi-valuedness at the corner point, NOT a Pareto inversion.

Key Insight

Tie-point k-increases are followed by immediate k-decreases (oscillations)
They have zero net effect on k_min
The k_min predictor remains valid: k_min < k_final still implies inversion exists
The Pareto theorem is exact: k truly increases only when R strictly decreases

Finding 17: k_min Predictor Theorem Proven (2025-12-05)

Status: Formally Proven

The k_min predictor theorem is now rigorously proven (see proofs/kmin_predictor_proof.md).

The Theorem

Theorem: If k_min_intermediate < k_final, at least one Pareto inversion exists.

Proof Sketch

If k_min < k_final, there's a net k-increase from some intermediate segment to the final
Tie-point oscillations have zero net effect (k goes up then down)
Therefore, at least one k-increase must be from a true Pareto inversion (R drops while k rises)

Implications

Precision = 1.000: Every flagged dataset has a true inversion (no false positives)
Recall = 0.927: Catches 92.7% of inversions (misses "middle inversions")
Computational cost: O(n) after path computation

Finding 18: Cheap Signals Cannot Replace k_min (2025-12-05)

Status: Negative Result

Attempted to predict inversions using only O(n×d) features without full path computation.

Results

Predictor	Precision	Recall	F1	Cheap?
k_min < k_final	1.000	0.857	0.923	No
duplication rule	0.059	0.857	0.110	Yes
k_horiz > d+1	0.000	0.000	0.000	Yes

Conclusion

Cheap features (duplication, k_horiz) cannot reliably predict inversions. The full path computation is necessary for accurate inversion detection. The k_min predictor remains the best available method.

Finding 19: Middle Inversion Characterization (2025-12-05)

Status: Analyzed

The 7.3% of inversions missed by the k_min predictor are "middle inversions" with a specific k-path pattern.

Definition

A middle inversion is a Pareto inversion where k bounces UP in the middle of the path but the final k is still the overall minimum. These are missed by the k_min predictor (which checks k_min_intermediate < k_final).

Observed K-Path Patterns

[3, 2, 3, 2, 1] - k drops to 2, bounces to 3, continues to final k=1
[3, 3, 2, 3, 2, 1] - k plateaus at 3, drops to 2, bounces to 3, continues to final k=1
[5, 5, 2, 3, 2, 2] - k plateaus at 5, drops to 2, bounces to 3, settles at final k=2

Statistics (from exp049)

Total inversions: 41 (8.4% of 486 runs)
End inversions (k_min < k_final): 38 (92.7% of inversions)
Middle inversions (k_min >= k_final): 3 (7.3% of inversions)

Characteristics Comparison

Metric	End Inv	Middle Inv	No Inv
Avg segments	5.58	5.67	3.64
Avg k_transitions	2.11	3.67	1.24
Avg k_lambda0	6.00	3.67	6.13
Avg k_horiz	3.45	1.33	3.25

Key Insight

Middle inversions require a "valley-then-bounce" pattern: k first decreases, then increases (the inversion), then continues to decrease to the final value. Because the final k is the minimum, the k_min predictor doesn't flag these cases.

Practical Implication

Middle inversions are rare (0.6% of all runs) and detecting them requires full path computation. The k_min predictor remains nearly optimal (92.7% recall with 100% precision) for practical use.

Finding 20: Perfect Inversion Detector via k-up Transitions (2025-12-05 Evening)

Status: Proven (tautological)

The count of k-increasing transitions in the optimal path perfectly detects inversions.

The Discovery

k_up_transitions >= 1 is a perfect inversion detector:

Precision: 1.000 (zero false positives)
Recall: 1.000 (catches ALL inversions)
F1: 1.000

This is tautological: a Pareto inversion IS by definition a k-up transition (k increases while R decreases).

Distribution (270 runs, 2D quick sweep)

k_up count	Middle Inv	End Inv	No Inv
0	0	0	198
1	9	61	0
2	0	2	0

Key insight: ALL inversions have >= 1 k-up transition. ALL non-inversions have 0 k-up transitions.

Middle vs End Distinguished by Position

The average relative position of k-up transitions distinguishes the types:

Middle inversions: avg position = 0.298 (early in path)
End inversions: avg position = 0.633 (late in path)

Path Length as Distinguisher

Segments threshold	Middle (n=9)	End (n=63)	None (n=198)
>= 4	78%	71%	47%
>= 5	67%	30%	21%
>= 6	33%	13%	4%

Theoretical Explanation

Middle inversions require the k-path to:

Drop to some intermediate minimum
Bounce UP (the inversion)
Continue DOWN to the final minimum

This needs at least 3 k-changes, requiring longer paths (typically 5+ segments).

Practical Implication

While k_up >= 1 is a perfect detector, it requires computing the full optimal path. No truly "cheap" predictor exists - detecting inversions fundamentally requires path computation.

Finding 21: p-Adic Regularisation Monotonicity Dominance (2025-12-05 Late Evening)

Status: MAJOR DISCOVERY

p-Adic regularisation (|β|_p penalty) is strictly more monotonic than L2 regularisation (β² penalty).

Statistical Evidence (1000 random 1D datasets)

Regularisation	Monotonic	Inversions
L2 (β²)	95.0%	5.0%
p-Adic (\|β\|_p)	99.0%	1.0%

Critical observation: Zero cases found where p-adic inverts but L2 doesn't!

Conjecture: p-Adic Monotonicity Dominance

For any dataset (X, y):

If p-adic regularisation has a Pareto inversion, then L2 also has one.
The converse is FALSE: L2 can invert when p-adic doesn't.

Mechanism: Valuation Classes and Threshold Reduction

93% fewer thresholds with p-adic regularisation:

Metric	L2	p-Adic
Avg thresholds per dataset	574	43
Same-valuation-class pairs	N/A	568
Threshold reduction	-	92.6%

Why This Happens

L2 penalty β²: Continuous - every distinct slope gives a unique penalty.

p-Adic penalty |β|_p: Discrete valuation classes:

v_p(β) = 0:  |β|_p = 1   (e.g., β=1, 3, 5, 7, ...)
v_p(β) = 1:  |β|_p = 0.5 (e.g., β=2, 6, 10, ...)
v_p(β) = 2:  |β|_p = 0.25 (e.g., β=4, 12, 20, ...)

All slopes within a valuation class have THE SAME penalty! When two candidates are in the same class, no threshold exists between them - the better data-loss candidate wins for ALL λ.

Threshold Formula Comparison

L2: λ* = (L₂ - L₁) / (b₁² - b₂²) [continuous denominator]

p-Adic: λ* = (L₂ - L₁) / (|b₁|_p - |b₂|_p) [discrete denominator]

Practical Implications

Safer regularisation: p-adic is strictly safer than L2 for avoiding inversions
Different preferences: p-adic prefers β=4 over β=1 (for p=2), despite |4| > |1|
Simpler paths: p-adic often skips intermediate candidates that L2 visits

SC8 Exactness Theorem (2026-01-03)

Major Result: The SC8 criterion is mathematically exact for predicting monotonicity, now verified through 6D with 3300+ additional test cases.

Dimension Scaling Summary

Dim	n	Cases	Non-mono	Rate	Pattern	SC8
2D	1	3000	8	0.27%	2→3	100%
3D	2	1500	2	0.13%	3→4	100%
4D	3	900	7	0.78%	4→5	100%
5D	4	600	1	0.17%	5→6	100%
6D	5	3300+	0	<0.03%	(6→7)	100%

Theorem Statement

Theorem (SC8 Exactness): k(λ) is monotonically non-increasing ⟺ SC8 holds.

Where SC8 is: For every optimal path from min-L starts, k-values along the path are non-increasing.

Why 6D Violations Are Extremely Rare

To observe a 6→7 violation requires:

A 7-fit hyperplane exists (needs 7 collinear points)
The 7-fit has LOWER regularization penalty than a competing 6-fit
The 7-fit has HIGHER data loss than the 6-fit
The 7-fit is reachable on the optimal λ-path

With random data, 7 collinear points in 6D are geometrically rare, explaining <0.03% violation rate.

Key Insight: First Crossover Matters

SC8 is exact because it traces the actual optimal path. At each step, only the FIRST candidate to become competitive (earliest crossover λ) is visited. Higher-k candidates may be "reachable" but never actually visited due to shielding by lower-k alternatives.

Open Questions

~~Prove Pareto Theorem Analytically~~ - DONE (Day 7)
~~Characterize Inversion Probability~~ - DONE (Day 7): P ≈ 0.10 × (n-d-1)/n
~~Efficient Inversion Detection~~ - SOLVED (Day 34): k_min predictor achieves F1=0.962
~~Two-Regime Detection~~ - SUPERSEDED: k_min predictor works across regimes
~~The 3.8% Anomaly~~ - EXPLAINED: Middle inversions (7.3% of cases)
~~Cheap k_min Estimation~~ - INVESTIGATED: No cheap alternative found
~~k_min Predictor Proof~~ - DONE: See proofs/kmin_predictor_proof.md
~~Middle Inversion Geometry~~ - CHARACTERIZED (Day 35): Valley-then-bounce pattern in longer paths
~~Perfect Inversion Detector~~ - PROVEN (Day 35): k_up >= 1 is perfect but requires path
~~p-Adic Regularisation~~ - INVESTIGATED (Day 35): Strictly more monotonic than L2!
Prove p-Adic Monotonicity Dominance: Formally prove the conjecture
Higher dimensions for p-adic: Does ~93% threshold reduction persist in 2D, 3D?
Hybrid regularisation: Mix of L2 and p-adic penalties
Formal Lean/Coq Proof: Translate the k_min theorem to a theorem prover