Research Findings

Key discoveries and validated results

Latest Findings (2026-01-05 22:00 AEDT)

Executive Summary

Our experiments confirm the fundamental behavior of regularised p-adic linear regression:

Lean/experiment status (Dec 26, 2025 20:04 AEDT): Lean is sorry-free. In the very-strong GP chain, the main theorem now depends only on general_fit_preserving_exists (k<n); the strong-GP variant still carries simplest_case_product_formula. Maximal-partition c_{v_min} audit (exp277): 55,700 maximal candidates with c_{v_min} ≥ 1 in 55,685 (99.97%); gains+orig = 1 slice: 988/988 have c_{v_min} ≥ 1, so pairing with new_zero_of_maximal_and_gains_orig_one forces new = 0 axiom-free. exp266-269 (proper k<n domain) remain fully validated: correct max_vmin differs from min h-res in 667/669, anchor at max_vmin in 261/261, anchor-in-R delta < max_vmin in 97.2% and cancellation in the rest.

Update (Dec 28, 2025 10:00 AEDT): A canonical wrapper now instantiates the gap-bound + h-residual-drop package for the standard maximal-partition choice E = fits(h') \\ fits(h); the 252/259 witness-present anchor-not-in-E cases auto-close to c_{v_max} = 1. Only the 7 witness-free edge cases still rely on the structural anchor bound placeholder.

Update (Dec 31, 2025 00:06 AEDT): Added an auto wrapper that routes packaged gap-bound/axiom witnesses through the gcd-filled data-core zero-new bridge, removing the remaining manual maximal-partition rewrites in that path.

CRITICAL GAP: Per-R-Unique Hypothesis Fails (2025-12-25 15:30 AEDT)

What we discovered (exp258-262): The per-R-unique hypothesis required by the averaging/pigeonhole approach is FALSE.

Next steps: Explore direct construction via delta-valuation case analysis, or per-anchor bounds instead of per-R-point bounds.

BREAKTHROUGH: Max_vmin delta structure validated (2025-12-25 16:00 AEDT)

Key insight (exp266-269): With proper max_vmin in the k<n domain, zero-new exists in 100% of gains+orig=1 cases, and the delta geometry at the anchor explains why.

Critical Corrections

Structure of Maximal Partitions

Delta Structure at Anchor (exp268-269)

Proof Path

  1. Formalize max_vmin (max v_min over all partitions) inside Lean.
  2. Use gains+orig=1 to isolate the anchor at max_vmin.
  3. Show the anchor delta either drops below max_vmin or cancels when val(δ)=max_vmin, yielding h′-avoidance.
  4. Combine with anchor-in-E avoidance to supply zero-new, then feed the existing zero-new → satisfier chain.

Full analysis in notes/axiom_proof_approach.md.

Alternative E-choices close Mechanism 2 (2025-12-25 08:00 AEDT)

What we learned (exp248-257): Anchor-in-E succeeds for 98.7% of Mechanism 2 configurations; every remaining case is fixed by a different anchor/F E-choice. No complete failures (0/4374 configs).

Mechanism 1 delta guard (2025-12-22 16:01 AEDT)

What we tested: exp208 examined anchor-in-E Mechanism 1 candidates (n ∈ {2,3}, p ∈ {2,3,5}, seeds 0–1999) to see how δ = h' - h sits relative to v_max on R-points.

Mechanism 2 Path 2 Formalised (2025-12-22 02:00 AEDT)

What changed: Added delta-dominance lemmas so the dominant Mechanism 2 path (95.3% of cases per exp198) now yields zero-new directly.

BREAKTHROUGH: Universal Zero-New Existence (2025-12-21)

Key result (exp188-191): When gains+orig=1 at maximal v_min, there ALWAYS exists an E choice with new_{v*}=0. Verified in 521/521 cases (100%).

Lean documentation updated: zero_new_universal_existence_doc lemma captures the two-mechanism analysis.

Lean link: zero-new ⇒ c_v = 1 (2025-12-21 10:00 AEDT)

New lemma: marginCoeffAtVal_eq_one_of_gains_orig_one_new_zero proves that when gains+orig = 1 and new = 0 at valuation v, the margin coefficient satisfies c_v = 1.

Build verified + integer corollary plan (2025-12-20 16:00 AEDT)

Status check: `~/.elan/bin/lake build PadicRegularisation.NplusOneTheorem` still passes (0 sorrys, style warnings only).

Integer datasets/hyperplanes are p-free (2025-12-20 14:00 AEDT)

Formal upgrade: Added integer→p-free helper lemmas so users with integer coefficients/data automatically satisfy the coprimality hypotheses.

P-free Case-1 satisfier chain (2025-12-20 12:00 AEDT)

Formal upgrade: Added minloss_fp_is_satisfier_at_margin_minimum_of_pfree, which plugs the p-free violator→negative-margin lemma directly into the Case-1 contradiction.

P-free violator → negative margin (2025-12-20 10:00 AEDT)

Formal upgrade: Added violator_at_margin_vmin_implies_neg_margin_of_pfree, which automatically supplies residual coprimality when coefficients and data are p-free.

P-free inputs now discharge coprimality (2025-12-20 08:00 AEDT)

Formal upgrade: Lean now includes p-free predicates (HyperplanePFreeness, DataPointPFreeness, DatasetPFreeness) and a residual propagation lemma.

Residual coprimality is stable for p-free inputs (2025-12-20 06:00 AEDT)

Monte Carlo check: Random rational hyperplanes/data (n=3, m=8, denominators 1–6) were sampled to test the new coprimality hypothesis.

Cross-base crossovers are rare and localized (2025-12-16 16:00 AEDT)

Finding (exp162): When the min-loss FP is chosen at r=5 (n=2, p=2), negative margins at other bases are extremely rare: 3/997 valid seeds (0.30%) went negative.

🔬 CLARIFICATION: Margin can cross zero across bases, but theorem still holds (2025-12-16 15:00 AEDT)

Key Discovery (exp160-161): The min-loss FP at r=5 can have NEGATIVE margin at intermediate bases (r ≈ 2-2.5), even with c_{v_min} = 1 > 0!

Example (n=2, p=2, seed=62):

Margin polynomial: M(r) = r² - 4r + 3 + 2/r²
Coefficients: {-2: 1, -1: -4, 0: 3, 2: 2}

At r=2: M(2) = 4 - 8 + 3 + 0.5 = -0.5 < 0  ← NEGATIVE!
At r=5: M(5) = 25 - 20 + 3 + 0.08 = 8.08 > 0  ← POSITIVE!

Why This Doesn't Break the Theorem

The theorem "∃h' that beats h at base r" is about existence at a fixed base r, not about a universal witness h' across all bases.

Experimental Verification (exp161)

For the crossover cases, we verified that at every base r > 1, SOME h' beats h:

Case seed=62:
  r=1.5: best_margin=1.59 ✓ (10/21 beat h)
  r=2.0: best_margin=1.50 ✓ (4/21 beat h) ← different h' than at r=5
  r=2.2: best_margin=1.50 ✓ (4/21 beat h)
  r=5.0: best_margin=8.08 ✓ (6/21 beat h)

Proof Strategy Implications

  1. For any fixed base r > 1: The theorem holds — the min-loss FP at that base beats h
  2. The Lean proof at r ≥ 5 is correct for that base range
  3. The structural bound (gains + orig ≥ new + 1) ensures c_{v_min} ≥ 1, but this doesn't prevent crossover at other bases due to negative intermediate coefficients
  4. Key insight: Different bases may require different h' witnesses, but existence is guaranteed at each base

Files: experiments/exp160_minloss_r5_margin_all_bases.py, exp161_crossover_analysis.py

Small-base margin positivity via tail compensation (2025-12-16 14:00 AEDT)

Finding (exp157): For r ∈ (1,3], the margin stays positive even when the dominant coefficient is nonpositive.

🎯 MAJOR BREAKTHROUGH: gains + orig ≥ new + 1 at margin's v_min (2025-12-16 09:35 AEDT)

Key Discovery (exp147-151): For the loss-minimizing FP at r ≥ 5, at the margin's v_min:

gains_{v_min} + orig_{v_min} ≥ new_{v_min} + 1

This directly implies c_{v_min} = gains + orig - new ≥ 1 > 0.

Experimental Evidence

Proof Strategy

The structural bound follows from loss minimization selection pressure:

  1. Creating a new residual at low valuation v costs r^{-v} in loss
  2. For r ≥ 5, low-valuation residuals are exponentially expensive
  3. If gains + orig < new + 1 at margin's v_min, h' creates net new residuals there
  4. Another candidate h'' avoiding this has better loss
  5. So min-loss selector h* cannot have gains + orig < new + 1 at margin's v_min

Lean status: Documentation lemmas added. Formal proof in progress.

Current Focus (2025-12-13 20:00 AEDT)

Current Focus (2025-12-13 02:00 AEDT)

Preparing the formal proof that the loss minimiser lies in the maximal v_min partition at heavy bases (r ≥ 5):

🎯 BREAKTHROUGH: Min-loss FP always has c_{v_min} > 0 (2025-12-12 21:00 AEDT)

exp129-132: The loss-minimizing fit-preserving interpolant always has positive dominant coefficient!

Why Min-Loss FP Has c_{v_min} > 0

The min-loss FP minimizes Σ_{r ∈ R} r^{-v(res(h*,r))}:

  1. This avoids creating low-valuation (high-loss) residuals at remaining points R
  2. Therefore "new_{v_min}" is kept small by the minimization pressure
  3. The terms "gains_{v_min} + orig_{v_min}" are fixed (depend only on h)
  4. Result: c_{v_min} = gains + orig - new > 0

Proof Strategy to Eliminate Axiom

To eliminate the general_fit_preserving_exists axiom:

  1. Define min-loss FP over fit-preserving finset at base r ≥ r₀
  2. Prove c_{v_min}(h*) > 0 for this minimizer
  3. Apply dominance guard: c_{v_min} > 0 ⇒ margin > 0 for all r > 1
  4. Conclude h* beats h, completing the k < n case

Files: experiments/exp129_fixed_valuation_sum.py, exp130_dominant_level_structure.py, exp131_failure_seeds_detailed.py, exp132_minloss_cmin_positive.py

Gap-to-positive rescue for dominance guard (2025-12-12 12:00)

exp128: When all max-v_min candidates have c_min ≤ 0, a positive candidate always appears one valuation level lower.

Max-v_min guard failure structure (2025-12-12 10:05)

exp126: Audited the rare cases where the max-v_min dominance guard picks c_min ≤ 0.

Dominance guard fixes near-binary selection (2025-12-12 08:00)

exp122: Enforcing a dominance guard (min-loss among fit-preserving interpolants with c_{v_min} > 0) keeps the margin positive even when the reference base r_ref is near-binary.

High-d margin polynomial mostly stable; near-binary r_ref can fail (2025-12-12 06:13)

exp115: In n ∈ {4,5}, p ∈ {2,3} with r_ref ∈ {1.5, 5.0, 20.0}, the min-loss FP kept the margin polynomial positive in 479/480 cases.

Margin polynomial stays positive for min-loss FP (2025-12-12 04:03)

exp114: For k < n, the min-loss fit-preserving interpolant has a margin polynomial with positive dominant coefficient and positive value on r ∈ {1.05…50} across 800 trials.

BREAKTHROUGH: Margin polynomial has positive dominant term (2025-12-12 03:00)

exp111-113 reveal the algebraic structure: margin = Σ c_v·r^{-v} where the dominant (lowest v) coefficient is ALWAYS positive (100% of 200 cases).

NEW: Offset < Gain holds in higher dimensions (2025-12-12)

exp110 extends the offset/gain bound to n=3,4 with heavy bases (r up to 50): max offset/gain = 0.9257 < 1.

NEW: Anchoring persists across bases (2025-12-12)

exp105 shows the loss-minimising fit-preserving interpolant beats h in 100% of trials even near binary or heavy bases.

NEW: Min-loss fit-preserving witness formalised (2025-12-11)

Lean lemma fitPreserving_minimiser_beats shows that once any fit-preserving interpolant improves loss, the loss minimiser over the fit-preserving finset also beats h.

NEW: k = n case is axiom-free under very strong GP (2025-12-10)

Lean now proves a fit-preserving improvement without axioms whenever countExactFits h = n and the data are in inVeryStrongGeneralPosition.

NEW: Averaging over fit-preserving interpolants fails (2025-12-09)

Counting/averaging is insufficient: the average margin over all fit-preserving interpolants is often negative, even though a good interpolant exists.

MAJOR FINDING: n+1 Theorem Empirically Validated (2025-12-07)

The n+1 theorem is TRUE - with 100% success rate for data in strong general position.

The Theorem

For data in general position, if hyperplane h has k < n+1 exact fits, there EXISTS h' with:

Key Discoveries

Implication for Lean Proof

The current Lean proof uses a perturbation-based approach which doesn't always work. The proof needs to be revised to use a non-constructive existence argument (showing that among all C(m,n+1) interpolating hyperplanes, at least one has lower loss).

NEW: Interpolant Loss Minimiser Dominates (2025-12-07)

The loss-minimising interpolant h_min strictly beats arbitrary h with k < n+1 fits in all tested cases.

Update: Inversion Feature Fingerprints (2025-12-07)

Padic-only vs L2-only inversions leave distinct feature signatures. A post-hoc analysis of exp072 (720 datasets) measured diversity/duplication and k-path timing.

Update: Inversion Geometry Gap (2025-12-06)

p-Adic-only and L2-only inversions arise from different geometric triggers. A 720-dataset sweep (dims {3,4}, primes {2,5,11}, bases {2,3,4,6,8}) mapped where each penalty inverts.

Update: Near-Binary Transition Map (2025-12-06)

p-Adic inversions reappear just above the near-binary regime. A fine-grained base sweep (r ∈ {1.02…2.0}, dims {1,2,3}, primes {2,5,11}, 2,160 datasets) mapped where monotonicity breaks.

Update: Off-Prime Base Sweep (2025-12-06)

p-Adic dominance fails when the loss base r ≠ p. A 2D/3D sweep across primes {2,5,11} and bases {1.1,1.5,2,5} (288 datasets) found p-adic-only inversions.

MAJOR UPDATE: The Penalty Gap Mechanism (2025-12-05 Night)

p-Adic monotonicity dominance is a 1D phenomenon - it breaks in higher dimensions!

Discovery

In 2D/3D, p-adic can invert when L2 stays monotonic (5/160 vs 2/160 in exp059).

The Mechanism: Integer Solutions with High Valuations

3D p=2 seed=21 example:
  Fractional-slope candidate (k=4): β = [-2.08, -0.73, 0.50]
    v_p(β_slopes) = [0, 0, -1], R_padic = 2.5

  Integer-slope candidate (k=5): β = [-3, -7, 8]
    v_p(β_slopes) = [0, 0, 3], R_padic = 2.125

  The integer candidate fits MORE points AND has LOWER p-adic penalty!
  Why? 8 = 2³ has v₂(8) = 3, giving |8|₂ = 0.125

In L2: 8² = 64 (huge penalty), so L2 never switches to integer candidate.
            

Near-Binary Base Results

Base rp-adic inversions
1.021
1.11
2.01
5.06

Near-binary bases partially suppress p-adic inversions but don't eliminate them.

Revised Understanding

Property1D2D/3D
p-adic more monotonic than L2?YESNO
p-adic inversions possible?RareMore common
Integer solutions favored?SometimesOften

Update: 4D Base-Density Resample (2025-12-21)

Higher-seed sweep of 4D inversions across mid/heavy bases confirms base-flat behaviour driven mainly by extra points and small primes.

Update: Base-Factor Curve (2025-12-15)

Dense sweep over bases r ∈ {1.01…10} shows how g(r) = c(p, r)·p ramps up.

Update: Corrected Inversion Counts (2025-12-09)

1D segments now export regularisation penalties, restoring accurate inversion detection.

Finding 1: Confirmation of n+1 Point Theorem

Date: 2025-12-01

Status: Validated

In 1D (n=1) regression without regularisation, our exhaustive search algorithm consistently finds optimal lines passing through exactly 2 points (= n+1).

Evidence

Dataset: (0, 0), (1, 2), (2, 3), (3, 7)
Result at λ=0: line y = 0 + 2x passes through 2 points

Dataset: (0, 1), (1, 2), (2, 4), (4, 5)
Result at λ=0: line passes through 3 points (special case)
            

Note: When the data has special structure (nearly collinear), the optimal line may pass through more than n+1 points.

Finding 2: Discrete Threshold Behavior

Date: 2025-12-01

Status: Validated

The number of exactly-fitted points changes at discrete threshold values of λ, not continuously. This suggests a phase transition phenomenon.

Evidence

Dataset: (0, 1), (1, 2), (2, 4), (4, 5)

Threshold detected between λ=1.2 and λ=1.3:
  - Below threshold: 3 points fitted exactly
  - Above threshold: 1 point fitted exactly
            

Interpretation

This discrete behavior is consistent with the combinatorial nature of p-adic optimization: the optimal line either passes through a point exactly (residual = 0, infinite valuation) or it doesn't (finite valuation). There are no "partial" fits.

Finding 3: Regularisation Type Matters

Date: 2025-12-01

Status: Preliminary

p-Adic regularisation (penalizing |β|p) behaves differently from real L2 regularisation (penalizing β²).

Evidence

Dataset: (0, 0), (1, 2), (2, 3), (3, 7)

Real L2 Regularization (λ=0.1):
  Intercept = 1, Slope = 1, Exact fits = 2

p-Adic Regularization (λ=0.1):
  Intercept = -5, Slope = 4, Exact fits = 2
            

The p-adic regularisation prefers coefficients with high p-adic valuation (more divisible by p), leading to different optimal solutions.

Finding 4: Monotonicity of k(λ) - REVISED

Date: 2025-12-02 (Original), 2025-12-03 (Revised)

Status: Conditional - Counter-examples Found!

Original claim: k(λ) is monotonically non-increasing in λ.

Revision (Day 3): Monotonicity is not universal. Stress testing found 157 counter-examples out of 8480 tests where k(λ) actually increases as λ increases!

Example Counter-Example

Dataset: x = [5, 7, 8, -8, -6], y = [10, 10, -7, 10, -5]

k-path (base r=2, prime p=2):
  λ ∈ [0.00, 0.07]: Line y = 3.18 + 1.36x  → k = 2
  λ ∈ [0.07, 12.3]: Line y = -5.86 - 0.14x → k = 2
  λ ∈ [12.3, ∞]:    Line y = 10 (horizontal) → k = 3

k INCREASES from 2 to 3 as λ increases!
            

Why This Happens

  1. Three points share y=10: (5,10), (7,10), (-8,10)
  2. The horizontal line y=10 passes through all three
  3. At λ=0, a tilted line fits 2 points with better data loss
  4. At large λ, the horizontal line (slope=0, reg penalty=0) wins
  5. The horizontal line happens to fit more points!

Revised Understanding

Conditional Monotonicity Theorem (Conjectured):
k(λ) is monotonically non-increasing if and only if no horizontal hyperplane passes through more points than the optimal hyperplane at λ=0.

Statistics from Stress Test

SettingMonotonicTotalRate
1D, n=41350150090%
1D, n=6747150050%
1D, n=8251150017%
2D, n=618820094%
3D, n=6526087%

Higher dimensions have lower failure rates (harder to accidentally align points).

Finding 5: Analytical Threshold Formula

Date: 2025-12-02

Status: Validated

Thresholds occur when two candidate solutions have equal total loss. For 1D regression comparing lines with slopes b₁ and b₂:

λ* = (L₂ - L₁) / (b₁² - b₂²)

where L₁, L₂ are the data losses (without regularisation) of the two lines.

Evidence

Dataset: (0,1), (1,2), (2,4), (4,5)

Theoretical prediction: λ* = 1.25 (line y=1+x vs y=1)
Empirical threshold: λ ∈ [1.24, 1.25]

Perfect match between theory and experiment!
            

Finding 6: 2D Generalization Confirmed

Date: 2025-12-02

Status: Validated

The n+1 theorem and monotonicity extend to 2D regression (fitting planes to 3D points). At λ=0, optimal planes pass through at least 3 points (n+1 for n=2).

Evidence

15/15 random 2D datasets satisfy:
  - k(0) ≥ 3 (n+1 theorem holds)
  - k(λ) is monotonically non-increasing
  - Discrete thresholds exist (1-2 thresholds typical)

Example: Dataset with 5 points
  λ=0.00: plane z = 1 + 0.5x₁ + 1.5x₂ fits 4 points
  λ=0.50: plane z = 2 + x₁ fits 3 points
  λ=2.00: plane z = 5 fits 1 point
            

Finding 7: Prime Dependence of Thresholds

Date: 2025-12-02

Status: Validated

The threshold structure depends on the choice of prime p. Different primes yield different numbers and locations of thresholds.

Evidence

Dataset: (0,1), (1,2), (2,4), (4,5)

p=2:  1 threshold at λ ≈ 1.25 (k: 3→1)
p=3:  2 thresholds at λ ≈ 0.45 (3→2) and λ ≈ 3.95 (2→1)
p=5:  2 thresholds at λ ≈ 1.35 (3→2) and λ ≈ 3.95 (2→1)
p=7:  2 thresholds at λ ≈ 1.35 (3→2) and λ ≈ 3.95 (2→1)
            

This reveals that p-adic regression is truly prime-dependent, not just in the valuation function but in the geometry of the optimization landscape.

Finding 8: Exact Asymptotic Threshold Formula (MAJOR)

Date: 2025-12-02 (Evening)

Status: Validated

Discovered and validated an exact formula for the threshold λ* as a function of the loss base r. This provides complete analytical understanding of threshold behavior.

The Formula

λ* = (L₂ - L₁) / (b₁² - b₂²)

where L = Σᵢ r-vp(residuali)

Asymptotic Expansion

The threshold admits an exact series:

λ* = c₀ + c₁/r + c₂/r² + c₃/r³ + ...

where ck counts residuals with valuation exactly k.

Evidence (100% Match)

Dataset: canonical_threshold
Derived formula: λ* = 1 + 1/r²

Verification:
  r=2:    λ* = 1.25      (predicted: 1.25)     ✓
  r=3:    λ* = 1.111...  (predicted: 1.111...) ✓
  r=5:    λ* = 1.04      (predicted: 1.04)     ✓
  r=10:   λ* = 1.01      (predicted: 1.01)     ✓
  r=100:  λ* = 1.0001    (predicted: 1.0001)   ✓
  r=1000: λ* = 1.000001  (predicted: 1.000001) ✓

Error: 0.00 (exact match for all tested values)

Dataset: gentle_line
Derived formula: λ* = 1 + 1/r
Also verified with zero error.
            

Physical Interpretation

The formula reveals that threshold behavior is entirely determined by the p-adic valuations of residuals. The limit as r→∞ depends only on residuals with valuation 0 (coprime to p).

Finding 9: d=4 and d=5 Validation

Date: 2025-12-03

Status: Validated

Extended the exact threshold solver to higher dimensions (d=4, d=5). The n+1 theorem and threshold formula continue to hold.

Results

d=4 (8 points): 14/14 configurations monotonic
  - canonical_4d: k path 5→2→1 (2 thresholds)
  - random datasets: k path 5→2 (monotonic)

d=5 (9 points): 2/2 configurations monotonic
  - canonical_5d: k path 6→5 (1 threshold)
  - random: k path 6→2 (4 thresholds)

All d≥4 tests passed n+1 property: k(0) ≥ d+1
            

Computational Scaling

d=4: ~140 candidate hyperplanes for 8 points
d=5: ~200 candidate hyperplanes for 9 points
Time: ~30ms per test
            

Finding 10: Pareto Frontier Monotonicity Theorem (MAJOR)

Date: 2025-12-05

Status: Validated - 100% Accuracy

After the conditional monotonicity conjecture failed (Day 4), we discovered the true necessary and sufficient condition for k(lambda) monotonicity.

The Theorem

Pareto Frontier Monotonicity Theorem: For L2-regularised p-adic linear regression, k(lambda) is monotonically non-increasing in lambda if and only if there is no "Pareto inversion" in the optimal path.

A Pareto inversion occurs when, for consecutive optimal solutions beta_1, beta_2 with R(beta_1) > R(beta_2), we have k(beta_1) < k(beta_2).

Intuitive Explanation

As lambda increases, the optimal solution moves toward lower regularisation penalty R. If a lower-R solution happens to fit MORE points than a higher-R solution, then when we switch to it, k increases - breaking monotonicity.

Example

Segment 1: R = 11.56, k = 2
Segment 2: R = 3.24,  k = 1
Segment 3: R = 0.049, k = 2  ← INVERSION! k increased
Segment 4: R = 0.0,   k = 1

k-path: 2 → 1 → 2 → 1 (non-monotonic due to inversion at segment 3)
            

Validation

MetricValue
Total Tests990
Monotonic (correctly predicted)949
Non-monotonic (correctly predicted)41
False Positives0
False Negatives0
Accuracy100.00%

Why This Supersedes the Conditional Monotonicity Conjecture

The earlier conjecture (k_opt >= k_horiz implies monotonicity) was incomplete because it only considered horizontal hyperplanes. The Pareto criterion considers ALL low-penalty candidates, including tilted planes that happen to have high exact-fit counts.

Finding 11: Pareto Theorem Formally Proven

Date: 2025-12-07

Status: Proven

The Pareto Frontier Monotonicity Theorem is now rigorously proven (see proofs/pareto_monotonicity_proof.md).

Key Lemmas

  1. Finite Candidates: Optimal β*(λ) belongs to a finite set C for all λ ≥ 0
  2. Piecewise Linear: Total loss L_total(c; λ) = L_data(c) + λ·R(c) is linear in λ for each candidate
  3. R Decreases at Transitions: At every transition λ* from c_1 to c_2, R(c_1) > R(c_2)

Proof Sketch

The proof follows directly from the structure:

Therefore: k(λ) non-monotonic ⟺ Pareto inversion exists. ∎

Finding 12: Strong Inversion Predictor

Date: 2025-12-07

Status: Validated - 91.7% accuracy

The simple test "does the horizontal hyperplane fit more points than the λ=0 optimum?" is a strong predictor for inversions:

When k_horiz > k_lambda0: 91.7% have inversions (11/12 cases)
When k_horiz <= k_lambda0: only 3.8% have inversions (53/1378 cases)
            

Practical Implication

This provides a cheap O(n) heuristic for detecting inversion-prone datasets without computing the full candidate set.

Finding 13: Empirical Inversion Probability Formula

Date: 2025-12-07

Status: Empirically Validated

From testing 1390 datasets, the inversion probability follows a scaling law:

P(inversion) ≈ 0.10 × excess_points

where excess_points = (n - d - 1) / n

Interpretation

Inversion Rates by Configuration

Dimnexcess_pointsObserved Rate
140.500.5%
150.602.0%
160.676.5%
170.7111.3%
180.7516.0%
250.401.3%
260.5010.0%
360.331.7%

Why Higher Dimensions Help

In higher dimensions, d+1 is larger, so excess_points = (n-d-1)/n is smaller for the same n, reducing inversion probability. This explains why 2D and 3D have lower inversion rates than 1D at comparable n values.

Finding 14: k_horiz Predictor Limitation (MAJOR)

Date: 2025-12-05

Status: Important Limitation Discovered

The k_horiz > k_lambda0 predictor (91.7% accurate on random data) fails completely on axis-duplication structured data.

Evidence (exp047)

540 runs: dims {2,3,4}, n=10, axis-dup generator
- Only 1/540 runs had k_horiz > k_lambda0 (100% precision, 2% recall)
- 48/49 inversions happened when k_horiz <= k_lambda0
            

Why This Happens

In axis-duplication data:

New Inversion Mechanism

Example k_path: [10, 3, 4]
1. At λ=0: tilted axis-aligned plane fits all 10 points (k=10)
2. At higher λ: intermediate plane with k=3 takes over
3. At high λ: horizontal (R=0) with k=4 wins

The inversion is 3→4, NOT related to k_lambda0!
            

Implications

Finding 15: k_min Predictor Breakthrough (MAJOR)

Date: 2025-12-05

Status: Validated - F1=0.962 with Perfect Precision

A new predictor based on k-path analysis achieves near-perfect inversion detection, vastly outperforming all previous approaches.

The Predictor

k_min_intermediate < k_final: Check if the minimum k among all segments (except the final one) is less than the final k.

Results (exp049)

PredictorPrecisionRecallF1FPR
k_min < k_final1.0000.9270.9620.000
k_min < k_horiz0.9270.9270.9270.007
duplication (baseline)0.1230.9760.2190.638
k_horiz > k_lambda00.0000.0000.0000.000

Two Types of Inversions Discovered

Type 1: End inversions (92.7%) - Captured by k_min_intermediate < k_final

Type 2: Middle inversions (7.3%) - Missed by k_min predictor

The Perfect Predictor (Tautological)

Checking if k ever increases anywhere in the path achieves F1=1.000. This is tautological (inversion = k increases), but validates the theoretical framework.

Why This Matters

Finding 16: Tie-Point Phenomenon (2025-12-05)

Status: Analyzed and Explained

Investigation of the rare case from exp050 where k increased without R decreasing revealed a new phenomenon: tie points.

What is a Tie Point?

A tie point occurs when two candidates have exactly equal total loss at some λ:

At λ ≈ 1.0667:
  k=3 candidate: L_data=3.2 + λ*0.25 = 3.467
  k=2 candidate: L_data=3.4 + λ*0.0625 = 3.467
  Both candidates are equally optimal!
            

Observation

At tie points, the solver records both candidates, creating zero-width segments (width ≈ 10⁻¹⁵). The k "oscillation" (2→3→2) is multi-valuedness at the corner point, NOT a Pareto inversion.

Key Insight

Finding 17: k_min Predictor Theorem Proven (2025-12-05)

Status: Formally Proven

The k_min predictor theorem is now rigorously proven (see proofs/kmin_predictor_proof.md).

The Theorem

Theorem: If k_min_intermediate < k_final, at least one Pareto inversion exists.

Proof Sketch

  1. If k_min < k_final, there's a net k-increase from some intermediate segment to the final
  2. Tie-point oscillations have zero net effect (k goes up then down)
  3. Therefore, at least one k-increase must be from a true Pareto inversion (R drops while k rises)

Implications

Finding 18: Cheap Signals Cannot Replace k_min (2025-12-05)

Status: Negative Result

Attempted to predict inversions using only O(n×d) features without full path computation.

Results

PredictorPrecisionRecallF1Cheap?
k_min < k_final1.0000.8570.923No
duplication rule0.0590.8570.110Yes
k_horiz > d+10.0000.0000.000Yes

Conclusion

Cheap features (duplication, k_horiz) cannot reliably predict inversions. The full path computation is necessary for accurate inversion detection. The k_min predictor remains the best available method.

Finding 19: Middle Inversion Characterization (2025-12-05)

Status: Analyzed

The 7.3% of inversions missed by the k_min predictor are "middle inversions" with a specific k-path pattern.

Definition

A middle inversion is a Pareto inversion where k bounces UP in the middle of the path but the final k is still the overall minimum. These are missed by the k_min predictor (which checks k_min_intermediate < k_final).

Observed K-Path Patterns

[3, 2, 3, 2, 1] - k drops to 2, bounces to 3, continues to final k=1
[3, 3, 2, 3, 2, 1] - k plateaus at 3, drops to 2, bounces to 3, continues to final k=1
[5, 5, 2, 3, 2, 2] - k plateaus at 5, drops to 2, bounces to 3, settles at final k=2
            

Statistics (from exp049)

Characteristics Comparison

MetricEnd InvMiddle InvNo Inv
Avg segments5.585.673.64
Avg k_transitions2.113.671.24
Avg k_lambda06.003.676.13
Avg k_horiz3.451.333.25

Key Insight

Middle inversions require a "valley-then-bounce" pattern: k first decreases, then increases (the inversion), then continues to decrease to the final value. Because the final k is the minimum, the k_min predictor doesn't flag these cases.

Practical Implication

Middle inversions are rare (0.6% of all runs) and detecting them requires full path computation. The k_min predictor remains nearly optimal (92.7% recall with 100% precision) for practical use.

Finding 20: Perfect Inversion Detector via k-up Transitions (2025-12-05 Evening)

Status: Proven (tautological)

The count of k-increasing transitions in the optimal path perfectly detects inversions.

The Discovery

k_up_transitions >= 1 is a perfect inversion detector:

This is tautological: a Pareto inversion IS by definition a k-up transition (k increases while R decreases).

Distribution (270 runs, 2D quick sweep)

k_up countMiddle InvEnd InvNo Inv
000198
19610
2020

Key insight: ALL inversions have >= 1 k-up transition. ALL non-inversions have 0 k-up transitions.

Middle vs End Distinguished by Position

The average relative position of k-up transitions distinguishes the types:

Path Length as Distinguisher

Segments thresholdMiddle (n=9)End (n=63)None (n=198)
>= 478%71%47%
>= 567%30%21%
>= 633%13%4%

Theoretical Explanation

Middle inversions require the k-path to:

  1. Drop to some intermediate minimum
  2. Bounce UP (the inversion)
  3. Continue DOWN to the final minimum

This needs at least 3 k-changes, requiring longer paths (typically 5+ segments).

Practical Implication

While k_up >= 1 is a perfect detector, it requires computing the full optimal path. No truly "cheap" predictor exists - detecting inversions fundamentally requires path computation.

Finding 21: p-Adic Regularisation Monotonicity Dominance (2025-12-05 Late Evening)

Status: MAJOR DISCOVERY

p-Adic regularisation (|β|_p penalty) is strictly more monotonic than L2 regularisation (β² penalty).

Statistical Evidence (1000 random 1D datasets)

RegularisationMonotonicInversions
L2 (β²)95.0%5.0%
p-Adic (|β|_p)99.0%1.0%

Critical observation: Zero cases found where p-adic inverts but L2 doesn't!

Conjecture: p-Adic Monotonicity Dominance

For any dataset (X, y):

Mechanism: Valuation Classes and Threshold Reduction

93% fewer thresholds with p-adic regularisation:

MetricL2p-Adic
Avg thresholds per dataset57443
Same-valuation-class pairsN/A568
Threshold reduction-92.6%

Why This Happens

L2 penalty β²: Continuous - every distinct slope gives a unique penalty.

p-Adic penalty |β|_p: Discrete valuation classes:

v_p(β) = 0:  |β|_p = 1   (e.g., β=1, 3, 5, 7, ...)
v_p(β) = 1:  |β|_p = 0.5 (e.g., β=2, 6, 10, ...)
v_p(β) = 2:  |β|_p = 0.25 (e.g., β=4, 12, 20, ...)
            

All slopes within a valuation class have THE SAME penalty! When two candidates are in the same class, no threshold exists between them - the better data-loss candidate wins for ALL λ.

Threshold Formula Comparison

L2: λ* = (L₂ - L₁) / (b₁² - b₂²) [continuous denominator]

p-Adic: λ* = (L₂ - L₁) / (|b₁|_p - |b₂|_p) [discrete denominator]

Practical Implications

SC8 Exactness Theorem (2026-01-03)

Major Result: The SC8 criterion is mathematically exact for predicting monotonicity, now verified through 6D with 3300+ additional test cases.

Dimension Scaling Summary

DimnCasesNon-monoRatePatternSC8
2D1300080.27%2→3100%
3D2150020.13%3→4100%
4D390070.78%4→5100%
5D460010.17%5→6100%
6D53300+0<0.03%(6→7)100%

Theorem Statement

Theorem (SC8 Exactness): k(λ) is monotonically non-increasing ⟺ SC8 holds.

Where SC8 is: For every optimal path from min-L starts, k-values along the path are non-increasing.

Why 6D Violations Are Extremely Rare

To observe a 6→7 violation requires:

  1. A 7-fit hyperplane exists (needs 7 collinear points)
  2. The 7-fit has LOWER regularization penalty than a competing 6-fit
  3. The 7-fit has HIGHER data loss than the 6-fit
  4. The 7-fit is reachable on the optimal λ-path

With random data, 7 collinear points in 6D are geometrically rare, explaining <0.03% violation rate.

Key Insight: First Crossover Matters

SC8 is exact because it traces the actual optimal path. At each step, only the FIRST candidate to become competitive (earliest crossover λ) is visited. Higher-k candidates may be "reachable" but never actually visited due to shielding by lower-k alternatives.

Open Questions

  1. Prove Pareto Theorem Analytically - DONE (Day 7)
  2. Characterize Inversion Probability - DONE (Day 7): P ≈ 0.10 × (n-d-1)/n
  3. Efficient Inversion Detection - SOLVED (Day 34): k_min predictor achieves F1=0.962
  4. Two-Regime Detection - SUPERSEDED: k_min predictor works across regimes
  5. The 3.8% Anomaly - EXPLAINED: Middle inversions (7.3% of cases)
  6. Cheap k_min Estimation - INVESTIGATED: No cheap alternative found
  7. k_min Predictor Proof - DONE: See proofs/kmin_predictor_proof.md
  8. Middle Inversion Geometry - CHARACTERIZED (Day 35): Valley-then-bounce pattern in longer paths
  9. Perfect Inversion Detector - PROVEN (Day 35): k_up >= 1 is perfect but requires path
  10. p-Adic Regularisation - INVESTIGATED (Day 35): Strictly more monotonic than L2!
  11. Prove p-Adic Monotonicity Dominance: Formally prove the conjecture
  12. Higher dimensions for p-adic: Does ~93% threshold reduction persist in 2D, 3D?
  13. Hybrid regularisation: Mix of L2 and p-adic penalties
  14. Formal Lean/Coq Proof: Translate the k_min theorem to a theorem prover