Research Hypotheses - Regularised p-Adic Regression

Validated Hypotheses

H1: The n+1 Point Property Holds at λ=0

Statement: At zero regularisation, the optimal hyperplane in n-dimensional p-adic regression passes through at least n+1 points.

Status: VALIDATED (2025-12-01)

Evidence: All 1D experiments at λ=0 show 2 or more exact fits. Extended to 2D: 15/15 datasets show k(0) ≥ 3.

H2: Strong Regularisation Reduces Exact Fits

Statement: As λ→∞, the number of exactly-fitted points decreases toward 1.

Status: VALIDATED (2025-12-01)

Evidence: Experiments show transition from 2-3 exact fits at λ=0 to 1 exact fit at large λ.

H3: Discrete Phase Transitions

Statement: The function k(λ) giving the number of exactly-fitted points is a step function with finitely many discontinuities.

Status: VALIDATED (2025-12-02)

Evidence: All experiments show discrete jumps in k(λ). Threshold formula derived: λ* = (L₂ - L₁) / (b₁² - b₂²).

H4: Threshold Depends on Data Geometry

Statement: The critical λ values depend on the p-adic distances between data points and their geometric configuration.

Status: VALIDATED (2025-12-02)

Evidence: Different datasets show different threshold values. Also depends on the choice of prime p.

H8: Higher-Dimensional Generalization

Statement: In n dimensions with regularisation, the optimal hyperplane passes through k points where 1 ≤ k ≤ n+1, with k depending on λ.

Status: VALIDATED for n=2 (2025-12-02)

Evidence: 2D experiments confirm: k(0) ≥ 3, k monotonically decreases with λ, discrete thresholds exist.

H14: p-free inputs keep residual denominators p-free

Statement: If all coefficients/features/targets have denominators coprime to p, then every nonzero residual also has denominator coprime to p.

Status: VALIDATED (2025-12-20)

Evidence: Monte Carlo across p ∈ {2,3,5} with p-free denominators produced 0/47k nonzero residuals divisible by p. Lean proof complete: residual_den_coprime_of_pfree and residuals_coprime_all_indices_of_pfree prove p-freeness propagation through residual computation. Wrappers minloss_fp_is_satisfier_at_margin_vmin_of_pfree and minloss_fp_cmin_ge_one_exact_of_pfree automatically discharge coprimality hypotheses for p-free data.

Active Hypotheses (Under Investigation)

H5: p-Adic Regularisation Prefers High-Valuation Coefficients

Statement: When using p-adic regularisation (penalty = |β|_p), the optimal coefficients tend to have high p-adic valuations (divisible by higher powers of p).

Status: UNDER INVESTIGATION

Evidence: Preliminary experiment shows different optimal slopes for p-adic vs real L2 regularisation.

H7: Generalized Base r Interpolation

Statement: Using r^-v(t) instead of p^-v(t):

As r→1: Solution approaches binary (nearest neighbor) selection
As r→∞: Solution approaches minimax (minimize maximum residual)

Status: PROPOSED

H9: Prime Dependence Structure

Statement: The number and location of thresholds depend systematically on the choice of prime p.

Status: UNDER INVESTIGATION

Evidence: For the same dataset, p=2 gives 1 threshold while p=3,5,7 give 2 thresholds at different locations.

H15: Pareto Frontier Characterization of Achievable k-Values [STRONGLY SUPPORTED]

Statement: A value k is achievable (i.e., k(λ) = k for some λ ≥ 0) if and only if the Pareto frontier of (data-loss, regularization-penalty) pairs includes a candidate hyperplane with exactly k exact fits.

Rationale: The optimal hyperplane at any λ must lie on the Pareto frontier. If no k-fit hyperplane lies on the frontier, k is skipped. If a k-fit hyperplane is Pareto-optimal in the (L, R) tradeoff, there exists a λ range where it dominates.

Status: STRONGLY SUPPORTED (2026-01-02)

Evidence:

2D (exp303): 0/90 violations across r ∈ {1.1, 2, 5}
3D (exp304): 0/90 violations — H15 holds in higher dimensions
Analytical 2D (exp305): Exact λ-breakpoints show skipped k is mathematically excluded (not a grid artifact)
Analytical 3D (exp306): Achievable k matches frontier k exactly; skipping only ~3% (1/30 seeds at r=2, 1/30 at r=5)

Note: This replaces the falsified H12 and provides a geometric characterization of which k-values can be achieved through regularization. Skip frequency is low with exact analysis (~2-4% in 2D, ~0-3% in 3D; earlier higher 3D rates were λ-grid artifacts).

H16: Exact Reachability Criterion for Monotonicity [VERIFIED]

Statement: k(λ) is monotonically non-increasing if and only if for every optimal path traced from a minimal-L candidate following the λ-trajectory, the k-values along the path are non-increasing.

Status: VERIFIED 100% in 2D-6D (2026-01-03)

Evidence:

exp317 (2D): 6000/6000 (100%) match between reachability criterion and exact path tracing
exp324 (3D): SC8 exact: TP=1498, FP=0, TN=2, FN=0 (100% accuracy)
exp325 (4D): SC8 exact: TP=893, FP=0, TN=7, FN=0 (100% accuracy)
exp326 (5D): SC8 exact: TP=599, FP=0, TN=1, FN=0 (100% accuracy)
exp327-330 (6D): SC8 exact: 3300+ cases, 100% accuracy, 0 violations found
Comparison: Simple Pareto k-ordering only 99.82% accurate (11 mismatches in 2D)

Key Insight (exp316): The Pareto frontier criterion fails because:

False positives: Multiple candidates can tie for minimal L, including off-frontier points. Paths from off-frontier starts can visit higher-k off-frontier candidates before reaching the frontier.
False negatives: Some frontier points with higher k are unreachable from minimal-L starts. The optimal λ-path "jumps over" them to lower-R alternatives with the same or lower k.

Mathematical Formulation:

  Define: L_min = min{L(C) : C is a candidate hyperplane}
          MinL = {C : L(C) = L_min}
          Path(C) = sequence of k-values along optimal λ-trajectory from C

  THEOREM: k(λ) is monotone ⟺ ∀C ∈ MinL, Path(C) is non-increasing

H17: Static Sufficient Condition for Monotonicity [REFINED]

Statement: Two nested conditions for predicting monotonicity:

SC7 (conservative): If no higher-k candidate is reachable from any minimal-L start, then k(λ) is guaranteed monotone. Coverage: ~97% with 0 false positives.
SC8 (exact): If every FIRST (optimal) transition at each step has non-increasing k, then k(λ) is monotone. Coverage: 100% with 0 false positives across all tested dimensions.

Status: SC8 EXACT 100% in 2D-6D (2026-01-03)

Evidence:

2D (exp323): SC8: 100% accuracy (3000 cases); SC7: 97.77% coverage
3D (exp324): SC8: 100% accuracy (1500 cases); SC7: 98.20% coverage
4D (exp325): SC8: 100% accuracy (900 cases); SC7: 97.09% coverage
5D (exp326): SC8: 100% accuracy (600 cases); SC7: 96.83% coverage
6D (exp327-330): SC8: 100% accuracy (3300+ cases); SC7: 100% (no violations)

Key Insight (exp322-323): SC7 is conservative because it flags ANY reachable higher-k candidate, even when the optimal path never visits it. SC8 only checks the FIRST (earliest crossover) winner at each step, which exactly matches the actual optimal path. The 67 SC7 false negatives all have optimal paths that "shield" the higher-k candidate by transitioning to a lower-k first.

Definitions:

  SC7: ∀ minimal-L start C, ∀ step along path from C:
       max{k(C') : C' is a winner} ≤ k(current)

  SC8: ∀ minimal-L start C, ∀ step along path from C:
       k(first winner) ≤ k(current)

Implication: SC8 is equivalent to H16 (the exact reachability criterion) but expressed as a sufficient condition. SC7 is weaker but computationally equivalent (both trace the path). The practical value is that SC7's conservative check at each step never produces false positives.

H18: Higher-Dimensional Monotonicity Violations [CONFIRMED through 7D]

Statement: Monotonicity violations generalize to higher dimensions, following the pattern n+1 → n+2 (one extra fit).

Status: CONFIRMED through 7D including forced collinear tests (2026-01-03 07:00 AEDT)

Evidence:

2D (n=1): Violations are 2→3 (exp318: 8/3000 = 0.27%)
3D (n=2): Violations are 3→4 (exp324: 2/1500 = 0.13%)
4D (n=3): Violations are 4→5 (exp325: 7/900 = 0.78%)
5D (n=4): Violations are 5→6 (exp326: 1/600 = 0.17%)
6D (n=5) random: 0/3300+ violations (<0.03%) with random points
6D (n=5) forced collinear (exp331): 40/4500 = 0.89% violations. 6→7 and 6→8 violations confirmed!
7D (n=6) forced collinear (exp332): 38/2700 = 1.41% violations! 7→8 (25), 7→9 (12), 8→9 (1) transitions observed.

Key Finding (exp331-332): The apparent rarity of high-D violations is due to geometric probability, not structural impossibility. When n+2 points are forced onto a hyperplane:

6D: 6→7 violations (28 cases), 6→8 violations (12 cases)
7D: 7→8 violations (25 cases), 7→9 violations (12 cases), 8→9 (1 case)
Violation rate (~1%) is consistent across dimensions with forced collinearity

Pattern: Non-monotonicity occurs when a hyperplane with n+2 fits has lower regularization penalty than one with n+1 fits, allowing it to become optimal at intermediate λ. The n+1 → n+2 pattern is dimension-invariant across all tested dimensions.

Theoretical Insight: Violations require over-determined fits (n+2+ collinear points). As dimension increases, the probability of random points being collinear decreases exponentially. This explains the apparent dimension scaling without implying structural impossibility.

Proposed Hypotheses (To Be Tested)

H13: Negative Regularisation Behaviour

Statement: With negative regularisation strength λ < 0, the optimal hyperplane behaviour changes qualitatively—potentially favouring larger coefficient norms rather than smaller ones.

Questions to investigate:

Does a well-defined optimum still exist for λ < 0?
If so, how does k(λ) behave for λ < 0?
Does negative regularisation increase the number of exact fits beyond N+1 (impossible geometrically, but what does the loss landscape look like)?
Is there a critical negative λ below which the optimisation problem becomes unbounded?

Status: PROPOSED (2025-12-11)

H10: Threshold Count Upper Bound

Statement: For a dataset of m points in n dimensions, the number of thresholds is at most C(m, n+1) - 1.

Rationale: There are C(m, n+1) ways to choose n+1 points to determine a hyperplane. Each threshold corresponds to a switch between candidate hyperplanes.

H11: Universal Threshold at Strong Regularisation

Statement: For sufficiently large λ, the optimal solution is always the horizontal hyperplane through the "p-adic median" of the y-values.

Rationale: When slopes are too expensive, the intercept-only solution minimizes the data loss.

Invalidated Hypotheses

H6: Monotonic Decrease in k(λ) [INVALIDATED]

Statement: The number of exact fits k(λ) is monotonically non-increasing in λ.

Status: INVALIDATED (2026-01-02)

Counterexamples:

exp307: 36/1000 seeds (1.2%) with k-increases, all k: 2 → 3
exp308: Even with unique y-values, 8/3000 cases (0.27%) violate monotonicity
exp314: Full path analysis finds 82/3000 (2.7%) starting from any minimal-L candidate

Primary Mechanism (exp309): The violation occurs when two candidates with different k-values compete on the Pareto frontier:

A k_low-fit candidate has minimal L (wins at λ=0) but higher R
A k_high-fit candidate has higher L but LOWER R
As λ increases, the low-R candidate eventually wins, causing k to INCREASE

Example (Seed 304, r=5):

  k=3 at slope=-1:  L=0.0416, R=1.0    (wins at λ=0)
  k=2 at slope=5/11: L=0.0432, R=0.21  (wins at λ∈[0.002,0.05])
  k=3 at slope=1/3:  L=0.048,  R=0.11  (wins at λ∈[0.05,19.7])
  k=1 horizontal:    L=2.24,   R=0.0   (wins at λ>19.7)

Structural Condition (refined): Simple Pareto k-ordering is only approximate (exp315 finds 2 false positives, 3 false negatives in 3000 cases).

EXACT Criterion (H16, exp316-317): k(λ) is monotone ⟺ every optimal path from minimal-L candidates has non-increasing k. This is 100% accurate (6000/6000 verified). See H16 above for the full theorem.

H12: Intermediate Fit Counts Are Achievable [INVALIDATED]

Statement: For an N-dimensional dataset in general position, for every k with 1 ≤ k ≤ N+1, there exists a regularisation strength λ_k such that the optimal hyperplane passes through exactly k points.

Status: INVALIDATED (2026-01-02)

Evidence: exp301 and exp302 found 2D counterexamples where k(λ) jumps from 3 to 1, skipping k=2 entirely. In seed 4, the dataset (-2,-1)→-4, (1,2)→-3, (-4,-4)→-5, (1,3)→-1, (-5,-2)→3 transitions directly from k=3 to k=1 at λ≈3.35.

Mechanism: For intermediate k to be optimal at some λ, we need: (L_k+1−L_k)/(R_k−R_k+1) < λ < (L_k−1−L_k)/(R_k−R_k−1). When this interval is empty, k is never optimal and gets "skipped".

Implication: The achievable k-values depend on the Pareto frontier structure in the (data-loss, regularization-penalty) space. Not all intermediate values are guaranteed.