Noise Sensitivity on

Equivalence Relations

Omar Elshinawy

A Thesis submitted in partial fulfillment
of the requirements for the degree of

Bachelor of Science in

Mathematics

May 20, 2025


Abstract

Noise sensitivity was first introduced by Benjamini, Kalai and Schramm in their seminal work on Boolean functions. We propose a construction of a similar taste on binary relations. By flipping every relation with a small probability p, a natural question on recoverability arises, to which give a positive answer in the case of an equivalence relation (X, ∼). We prove that equivalence relations are noise-stable under the prescribed model. In particular, we propose a simple reconstruction algorithm, and show that it achieves an asymptotically zero misclassification error.

Through this notebook, we warmly invite the reader to breathe the living mathematics of this thesis.


Constructor University  |  School of Computer Science and Engineering

Supervised by Prof. Dr. Keivan Mallahi-Karai


Preamble¶

The preamble carries all the different functions and libraries being used. One may seek

> noise_sensitivity.py¶

to access the library source code. Alternatively, one may find a static reference to the same functions at the end of this document.

In [1]:
from noise_sensitivity import *

Statement¶

For this discussion, let $(X, \sim)$ be an equivalence relation, with elements $1, 2, \dots, n.$ We set $X = \bigsqcup_{i=1}^k X_i,$ with $\sum_{i=1}^k n_i = n$ being the size of each equivalence class. The only restriction is on $n_i \geq \epsilon n,$ that is a lower bound to avoid sparsity.

In [2]:
n = 25; p=1/n; epsilon=0.3
In [3]:
X = equivalence(n, epsilon)
graph(X,"Graph of $(X, \sim)$", equivalence=True)
print(f"n = {[len(X_i) for X_i in X]} are the sizes of each equivalence class.\n")
No description has been provided for this image
N.B: Reflexivity i.e. self-connectedness does not reflect in this graph.

n = [8, 8, 9] are the sizes of each equivalence class.


As observed above, it is indeed natural to model the graph of an equivalence relation. We shall borrow some graph-theoretic constructs as we move on. Despite this, one continues to work within the equivalence relation framework.

$\textbf{Definition}~\text{(Noisy Relation.)}$ Let $X_{ij}=X_{ji}$ be a Bernoulli random variable with parameter $p.$ Set the condition

$$ i \sim' j \iff \begin{cases} i \sim j ~\land ~ X_{ij}=0\\ i \not \sim j ~\land~ X_{ij} = 1 \end{cases} $$

Then, $\sim'$ is said to be a $\textit{noisy}$ relation.

In words, we say that $\sim'$ flips the relation on $x_i, x_j$ with probability $p.$ The relation $\sim'$ is by definition symmetric. Further, it inherits its reflexivity from $\sim,$ since there is no noise when $i=j$. Under some noise, transitivity (in its full length) is lost. We conjecture that the transitive property is highly resilient to noise, and that we may thereby recover the original relation $\sim.$


$\textbf{Example }\text{(Noisy Relation.)}$

In [4]:
X_ = noisy(X, p)
graph(X_, f"Noisy Equivalence Relation $(X, \sim')$ with $p={p}$", equivalence=False)
No description has been provided for this image
N.B: Reflexivity i.e. self-connectedness does not reflect in this graph.

The structure is indeed perturbed, the main elements do however exist. The central question to our discussion is, how does one formalise this familiarity with what each equivalence class should be?

Motivation¶

Given a perturbed adjacency matrix $\tilde A,$ the goal is to restore it to its original state. This requires a specific language, which we prescribe below.

$\textbf{Definition}~\text{(Score.)}$ For $i, j \in X,$ define

$$ s(i, j) = \sum_{w \in X\backslash\{i, j\}} \mathbf 1[i \sim w] \cdot \mathbf 1 [w\sim j] $$

to be the count of neighbours of $w$ that witness the relation to $j.$

$\textit{Remark.}$ This measure of transitivity naturally pushes the transitive property to its full length.

$\textbf{Definition}~\text{(Score Matrix.)}$ Define

$$ S_X := \begin{pmatrix} s(1, 1) & s(1, 2) & \dots & s(1, n)\\ s(2, 1)& s(2, 2) & \dots & s(2, n)\\ \vdots & \dots & \ddots & \vdots\\ s(n, 1) & \dots & s(n, n-1) & s(n, n) \end{pmatrix} $$

to store the scores $s(i, j)$ for every $i, j\in X.$

$\textbf{Example}~\text{(Equivalence Relation.)}$ In this case, it should be clear that $s(x, x) =\deg{x}.$ This is a useful consequence of our definition. For $x, i \in X_i$ and $j \in X_j,$ one also obtains that $s(x, i) = n_i-2$ and $s(x,j) = 0.$

In [5]:
S_X = score(X, equivalence=True)
matrix(S_X, title="Equivalence Relation Score Matrix", width=3)
Equivalence Relation Score Matrix
  8  0  0  0  0  0  0  0  0  0  0  0  7  0  0  7  7  7  7  0  0  7  7  0  7  
  0  7  6  0  0  0  0  0  6  6  6  6  0  6  0  0  0  0  0  0  6  0  0  0  0  
  0  6  7  0  0  0  0  0  6  6  6  6  0  6  0  0  0  0  0  0  6  0  0  0  0  
  0  0  0  7  6  6  6  6  0  0  0  0  0  0  6  0  0  0  0  6  0  0  0  6  0  
  0  0  0  6  7  6  6  6  0  0  0  0  0  0  6  0  0  0  0  6  0  0  0  6  0  
  0  0  0  6  6  7  6  6  0  0  0  0  0  0  6  0  0  0  0  6  0  0  0  6  0  
  0  0  0  6  6  6  7  6  0  0  0  0  0  0  6  0  0  0  0  6  0  0  0  6  0  
  0  0  0  6  6  6  6  7  0  0  0  0  0  0  6  0  0  0  0  6  0  0  0  6  0  
  0  6  6  0  0  0  0  0  7  6  6  6  0  6  0  0  0  0  0  0  6  0  0  0  0  
  0  6  6  0  0  0  0  0  6  7  6  6  0  6  0  0  0  0  0  0  6  0  0  0  0  
  0  6  6  0  0  0  0  0  6  6  7  6  0  6  0  0  0  0  0  0  6  0  0  0  0  
  0  6  6  0  0  0  0  0  6  6  6  7  0  6  0  0  0  0  0  0  6  0  0  0  0  
  7  0  0  0  0  0  0  0  0  0  0  0  8  0  0  7  7  7  7  0  0  7  7  0  7  
  0  6  6  0  0  0  0  0  6  6  6  6  0  7  0  0  0  0  0  0  6  0  0  0  0  
  0  0  0  6  6  6  6  6  0  0  0  0  0  0  7  0  0  0  0  6  0  0  0  6  0  
  7  0  0  0  0  0  0  0  0  0  0  0  7  0  0  8  7  7  7  0  0  7  7  0  7  
  7  0  0  0  0  0  0  0  0  0  0  0  7  0  0  7  8  7  7  0  0  7  7  0  7  
  7  0  0  0  0  0  0  0  0  0  0  0  7  0  0  7  7  8  7  0  0  7  7  0  7  
  7  0  0  0  0  0  0  0  0  0  0  0  7  0  0  7  7  7  8  0  0  7  7  0  7  
  0  0  0  6  6  6  6  6  0  0  0  0  0  0  6  0  0  0  0  7  0  0  0  6  0  
  0  6  6  0  0  0  0  0  6  6  6  6  0  6  0  0  0  0  0  0  7  0  0  0  0  
  7  0  0  0  0  0  0  0  0  0  0  0  7  0  0  7  7  7  7  0  0  8  7  0  7  
  7  0  0  0  0  0  0  0  0  0  0  0  7  0  0  7  7  7  7  0  0  7  8  0  7  
  0  0  0  6  6  6  6  6  0  0  0  0  0  0  6  0  0  0  0  6  0  0  0  7  0  
  7  0  0  0  0  0  0  0  0  0  0  0  7  0  0  7  7  7  7  0  0  7  7  0  8  

$\textbf{Example}~\text{(Noisy Relation.)}$

In [6]:
S_X_ = score(X_)
matrix(S_X_, title="Noisy Score Matrix", width=3)
Noisy Score Matrix
 10  3  3  0  0  0  0  1  1  3  2  1  7  2  1  7  7  7  7  0  2  7  7  0  7  
  3  7  5  0  0  0  0  0  5  7  5  5  1  5  0  1  2  1  0  0  5  1  1  0  1  
  3  5  8  0  0  0  0  0  6  5  6  6  1  6  0  1  0  1  3  0  6  1  1  0  1  
  0  0  0  7  6  6  6  6  0  0  0  0  0  0  6  0  0  0  0  6  0  1  1  6  0  
  0  0  0  6  7  6  6  6  0  0  0  0  0  0  6  0  0  0  0  6  0  1  1  6  0  
  0  0  0  6  6  7  6  6  0  0  0  0  0  0  6  0  0  0  0  6  0  1  1  6  0  
  0  0  0  6  6  6  7  6  0  0  0  0  0  0  6  0  0  0  0  6  0  1  1  6  0  
  1  0  0  6  6  6  6  8  0  0  0  0  1  0  6  1  1  1  1  6  0  0  2  6  1  
  1  5  6  0  0  0  0  0  8  5  6  7  1  6  0  1  2  1  3  0  6  1  1  0  1  
  3  7  5  0  0  0  0  0  5  7  5  5  1  5  0  1  2  1  0  0  5  1  1  0  1  
  2  5  6  0  0  0  0  0  6  5  7  6  0  6  0  0  1  0  2  0  6  0  0  0  0  
  1  5  6  0  0  0  0  0  7  5  6  8  1  6  0  1  2  1  3  0  6  1  1  0  1  
  7  1  1  0  0  0  0  1  1  1  0  1  8  0  1  7  7  7  7  0  0  7  7  0  7  
  2  5  6  0  0  0  0  0  6  5  6  6  0  7  0  0  1  0  2  0  6  0  0  0  0  
  1  0  0  6  6  6  6  6  0  0  0  0  1  0  8  1  1  1  1  6  0  2  0  6  1  
  7  1  1  0  0  0  0  1  1  1  0  1  7  0  1  8  7  7  7  0  0  7  7  0  7  
  7  2  0  0  0  0  0  1  2  2  1  2  7  1  1  7  9  7  7  0  1  7  7  0  7  
  7  1  1  0  0  0  0  1  1  1  0  1  7  0  1  7  7  8  7  0  0  7  7  0  7  
  7  0  3  0  0  0  0  1  3  0  2  3  7  2  1  7  7  7 10  0  2  7  7  0  7  
  0  0  0  6  6  6  6  6  0  0  0  0  0  0  6  0  0  0  0  7  0  1  1  6  0  
  2  5  6  0  0  0  0  0  6  5  6  6  0  6  0  0  1  0  2  0  7  0  0  0  0  
  7  1  1  1  1  1  1  0  1  1  0  1  7  0  2  7  7  7  7  1  0  9  7  1  7  
  7  1  1  1  1  1  1  2  1  1  0  1  7  0  0  7  7  7  7  1  0  7  9  1  7  
  0  0  0  6  6  6  6  6  0  0  0  0  0  0  6  0  0  0  0  6  0  1  1  7  0  
  7  1  1  0  0  0  0  1  1  1  0  1  7  0  1  7  7  7  7  0  0  7  7  0  8  

Let us order the scores of each vertex with respect to other vertices.

In [7]:
s = np.sort(S_X_, axis=1)[:, ::-1]
matrix(M1=s, title="Noisy Sorted Score Matrix\n", width=3)
Noisy Sorted Score Matrix

 10  7  7  7  7  7  7  7  7  3  3  3  2  2  2  1  1  1  1  0  0  0  0  0  0  
  7  7  5  5  5  5  5  5  3  2  1  1  1  1  1  1  0  0  0  0  0  0  0  0  0  
  8  6  6  6  6  6  5  5  3  3  1  1  1  1  1  1  0  0  0  0  0  0  0  0  0  
  7  6  6  6  6  6  6  6  1  1  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  
  7  6  6  6  6  6  6  6  1  1  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  
  7  6  6  6  6  6  6  6  1  1  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  
  7  6  6  6  6  6  6  6  1  1  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  
  8  6  6  6  6  6  6  6  2  1  1  1  1  1  1  1  0  0  0  0  0  0  0  0  0  
  8  7  6  6  6  6  5  5  3  2  1  1  1  1  1  1  1  0  0  0  0  0  0  0  0  
  7  7  5  5  5  5  5  5  3  2  1  1  1  1  1  1  0  0  0  0  0  0  0  0  0  
  7  6  6  6  6  6  5  5  2  2  1  0  0  0  0  0  0  0  0  0  0  0  0  0  0  
  8  7  6  6  6  6  5  5  3  2  1  1  1  1  1  1  1  0  0  0  0  0  0  0  0  
  8  7  7  7  7  7  7  7  7  1  1  1  1  1  1  1  0  0  0  0  0  0  0  0  0  
  7  6  6  6  6  6  5  5  2  2  1  0  0  0  0  0  0  0  0  0  0  0  0  0  0  
  8  6  6  6  6  6  6  6  2  1  1  1  1  1  1  1  0  0  0  0  0  0  0  0  0  
  8  7  7  7  7  7  7  7  7  1  1  1  1  1  1  1  0  0  0  0  0  0  0  0  0  
  9  7  7  7  7  7  7  7  7  2  2  2  2  1  1  1  1  1  0  0  0  0  0  0  0  
  8  7  7  7  7  7  7  7  7  1  1  1  1  1  1  1  0  0  0  0  0  0  0  0  0  
 10  7  7  7  7  7  7  7  7  3  3  3  2  2  2  1  1  0  0  0  0  0  0  0  0  
  7  6  6  6  6  6  6  6  1  1  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  
  7  6  6  6  6  6  5  5  2  2  1  0  0  0  0  0  0  0  0  0  0  0  0  0  0  
  9  7  7  7  7  7  7  7  7  2  1  1  1  1  1  1  1  1  1  1  1  0  0  0  0  
  9  7  7  7  7  7  7  7  7  2  1  1  1  1  1  1  1  1  1  1  1  0  0  0  0  
  7  6  6  6  6  6  6  6  1  1  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  
  8  7  7  7  7  7  7  7  7  1  1  1  1  1  1  1  0  0  0  0  0  0  0  0  0  

The score gaps are $\textit{mostly}$ visible!

Strategy¶

With the goal to determine the thresholds, let us first compute the minimum class size, that is the smallest integer above $\epsilon \cdot n.$

In [8]:
n_min = int(np.ceil(epsilon * n)) - 1 # minus 1 uses 0-based indexing
print(f"The minimum class size is {n_min+1}.")
The minimum class size is 8.

We initialise a threshold array. This will store the score thresholds for each row of the score matrix.

In [9]:
tau = np.zeros(n)

Fixing $x \in X$, let us see how one might make the correct sieve of scores.

In [10]:
x = 0
In [11]:
print(f"Scores of {x+1} : {s[x, :]}") # 0-based indexing
Scores of 1 : [10.  7.  7.  7.  7.  7.  7.  7.  7.  3.  3.  3.  2.  2.  2.  1.  1.  1.  1.  0.  0.  0.  0.  0.  0.]

We know that the class to which $x$ belongs to is at least of size $\epsilon n.$ One may safely assume that the largest $\lceil \epsilon n \rceil$ elements are members of the same class. This reduces the problem to that of considering

In [12]:
print(s[x, n_min:]) 
[7. 7. 3. 3. 3. 2. 2. 2. 1. 1. 1. 1. 0. 0. 0. 0. 0. 0.]

Suppose that we make the cut at the $k-$th biggest score. Then, $n_i = k$ by assumption, and the $k+1-$st biggest score is a non-member. The observed score gap $\hat \xi$ is therefore given by the difference of both scores.

In [13]:
k = n_min + 2; gap = s[x, k]-s[x, k+1]
print(f"A cut at k = {k+1} gives the gap {gap}")
A cut at k = 10 gives the gap 0.0

As discussed in $\textbf{Lemma 3.1},$ the theoretical gap $\xi$ depends on knowing $n_j,$ which is impractical for all purposes. Instead, we bound $\xi(n, n_i, n_j, p)$ from above via $\xi_+(n, n_i, \epsilon, p)$.

$$\begin{equation*} \xi_+ = (n_i-2)-(3n_i+\epsilon n-6)\cdot p + (2n-4)\cdot p^2 \tag{$\textbf{3.1}$} \end{equation*}$$

$\textit{Remark.}$ We can do this, as $\xi$ is comparable to $\xi_+$ in the limit.

In [14]:
mean = xi(n, k+1, epsilon, p)  # 1-based indexing, hence k+1
t = mean - gap
print(f"Theoretical vs Observed Gap: t = {mean} - {gap} = {t}")
Theoretical vs Observed Gap: t = 6.8136 - 0.0 = 6.8136

This gives us the deviation from $\xi_+.$ Now, if $\hat \xi$ has deviated from $\xi_+$ by $t,$ and $\xi_+$ is greater than $\xi,$ then $\hat \xi$ has deviated from $\xi$ by at least $t.$ From there we may apply the Bernstein concentration bound,

$$ \begin{equation*} \mathbb P[\xi - \hat \xi \geq t] \leq \exp\left(-\dfrac{t^2}{\sigma^2 + 2/3\cdot t}\right). \end{equation*} $$

The other obstacle we must clear off is that of the variance $\sigma^2,$ which depends on $n_j$. In the same spirit, we establish an upper bound $\sigma^2_+(n, n_i, \epsilon, p)$ on $\sigma^2(n, n_i, n_j, p)$.

$$\begin{equation*} \sigma^2_+ = (n_i-2)\cdot(3p-9p^2 +10p^3 - 4p^4) + n\cdot (p-3p^2 + 6p^3 - 4p^4) + (n-n_i - \epsilon n)\cdot (2p^2-2p^3)\tag{$\textbf{3.2}$} \end{equation*}$$

$\textit{Remark.}$ $\sigma^2_+$ is also comparable to $\sigma^2$ in the limit.

In [15]:
sigma_2 = var(n, k+1, epsilon, p) # 1-based indexing, hence k+1

The result is

$$ \begin{equation*} \mathbb P[\xi - \hat \xi \geq t] \leq \exp\left(-\dfrac{t^2}{\sigma^2 + 2/3\cdot t}\right) \leq \exp\left(-\dfrac{t^2}{\sigma^2_+ + 2/3\cdot t}\right) \tag{by $\textbf{3.2}$} \end{equation*} $$

a $\textit{computable}$ exponential upper bound, in the sense that it no longer depends on $n_j.$

In [16]:
bound = bernstein(sigma_2, t)
print(f"nᵢ = {k+1} | P[ξ̂  deviated from ξ by {t}] ≤ {bound}")
nᵢ = 10 | P[ξ̂  deviated from ξ by 6.8136] ≤ 0.0031667272766464295

The exponential gives an extremely sharp sieve. To obtain the best cut, one finds $k \in \big[\lceil \epsilon n \rceil, n-1\big]$ for which the deviation probability is $\textit{least unlikely}$. For fixed $x,$ we have these possible cuts,

In [17]:
for k in range(n_min, n - 1):
    sigma_2 = var(n, k+1, epsilon, p); mean = xi(n, k+1, epsilon, p); gap = s[x, k] - s[x, k + 1]
    t = mean - gap # deviation
    if t < 0: 
        continue # skips upper deviations
    else:
        T = bernstein(sigma_2, t)
        print(f"Cut at nᵢ={k+1} | P[deviation] ≤ {T}")
Cut at nᵢ=8 | P[deviation] ≤ 0.019435595420831876
Cut at nᵢ=9 | P[deviation] ≤ 0.44418453353757326
Cut at nᵢ=10 | P[deviation] ≤ 0.0031667272766464295
Cut at nᵢ=11 | P[deviation] ≤ 0.001254667098148025
Cut at nᵢ=12 | P[deviation] ≤ 0.0016898809490244855
Cut at nᵢ=13 | P[deviation] ≤ 0.00019245421859838077
Cut at nᵢ=14 | P[deviation] ≤ 7.475235125052697e-05
Cut at nᵢ=15 | P[deviation] ≤ 0.00010320656237684635
Cut at nᵢ=16 | P[deviation] ≤ 1.114917783263806e-05
Cut at nᵢ=17 | P[deviation] ≤ 4.286896783374554e-06
Cut at nᵢ=18 | P[deviation] ≤ 1.6445639658491365e-06
Cut at nᵢ=19 | P[deviation] ≤ 2.318511263495415e-06
Cut at nᵢ=20 | P[deviation] ≤ 2.407017772362285e-07
Cut at nᵢ=21 | P[deviation] ≤ 9.188096646659168e-08
Cut at nᵢ=22 | P[deviation] ≤ 3.5030222447038494e-08
Cut at nᵢ=23 | P[deviation] ≤ 1.3341350468324918e-08
Cut at nᵢ=24 | P[deviation] ≤ 5.0763714392211695e-09

Then one chooses the cut for which the deviation is $\textit{least unlikely}$.

In [18]:
tau[x] = s[x, np.argmax(T)]
print(f"Scores of {x+1} : {s[x, :]}") # 0-based indexing
print(f"⇒ Cut at nᵢ= {np.argmax(T)} with τ(x) = {tau[x]}") # 0-based indexing
Scores of 1 : [10.  7.  7.  7.  7.  7.  7.  7.  7.  3.  3.  3.  2.  2.  2.  1.  1.  1.  1.  0.  0.  0.  0.  0.  0.]
⇒ Cut at nᵢ= 0 with τ(x) = 10.0

One repeats this procedure for all $x\in X$ to obtain the full algorithm.

Full Algorithm¶

Below one finds the full code.

In [19]:
def recovery(A, p, epsilon, return_tau=False, draw=True):
    
    n = A.shape[0] 
    n_min = round(epsilon * n) - 1  # minimum equivalence class size, index-0 counting

    if draw:
        graph(A, title="Noisy Relation") 

    X_ = A.copy()

    # Score and sort
    S_X_ = score(X_)
    s = np.sort(S_X_, axis=1)[:, ::-1]

    # Compute tau thresholds
    tau = np.zeros(n)
    
    # for all x
    for x in range(n):
        T = np.zeros(n - 1)
        # assume nᵢ = k+1
        for k in range(n_min, n - 1): 
            # variance | lemma 3.2
            sigma_2 = var(n, k+1, epsilon, p)
            # expected gap | lemma 3.1
            mean = xi(n, k+1, epsilon, p)
            # measured gap
            gap = s[x, k] - s[x, k + 1]
            # deviation
            t = mean - gap
            
            # upper deviations are uninteresting
            if t < 0:
                continue
            else:
                # apply the sieve
                T[k] = bernstein(sigma_2, t)
        
        # tau stores the score thresholds
        tau[x] = s[x, np.argmax(T)]

    # Recover adjacency matrix
    Adjacency = (S_X_ >= tau[:, np.newaxis]).astype(int)

    if draw:
        graph(Adjacency, title="Recovery")

    return (Adjacency, tau) if return_tau else Adjacency

Let us create an equivalence relation of size $n,$ with $n_i \geq \epsilon n.$ By construction, $p$ is of order $1/n$.

In [20]:
n=20; epsilon=0.3; p=1/n;
X = equivalence(n, epsilon)
graph(X, equivalence=True, title="Equivalence Relation")
No description has been provided for this image
N.B: Reflexivity i.e. self-connectedness does not reflect in this graph.

X_ is our perturbed adjacency matrix. We are ready for the recovery algorithm! Feel free to run the below cell multiple times.

In [21]:
X_ = noisy(X, p); A_X_ = recovery(X_, p, epsilon)
No description has been provided for this image
N.B: Reflexivity i.e. self-connectedness does not reflect in this graph.

No description has been provided for this image
N.B: Reflexivity i.e. self-connectedness does not reflect in this graph.

This algorithm is asymptotically successful, meaning that one is more likely to recover for bigger $n$. It therefore $\textit{almost always}$ works. $\boxed \xi$

Preamble¶

import math
import numpy as np
import networkx as nx
import matplotlib.pyplot as plt
import networkx.algorithms.community as nx_comm
from collections import defaultdict

np.set_printoptions(formatter={'float': '{:4}'.format, 'int': '{:4}'.format})
np.set_printoptions(linewidth=1000)

Functions¶

def equivalence(n, epsilon, seed=None, hard=True):
    """
    Returns an equivalence relation of size n.
    """
    
    rng = np.random.default_rng(seed)
    
    # Calculate minimum block size
    m = max(1, math.ceil(epsilon * n))
    m = min(m, n)  # Ensure m doesn't exceed n
    
    # Verify if at least two blocks can be formed
    if n < 2 * m:
        raise ValueError(f"Impossible to form two blocks: n={n} < 2*m={2*m} (m={m}, epsilon={epsilon})")
    
    # Determine maximum partitions possible
    k_max = n // m
    # Randomly choose number of partitions (at least 2)
    k = rng.integers(2, k_max + 1)
    
    # Calculate remaining elements after assigning minimum size
    r = n - k * m
    
    # Generate random composition for block sizes
    if r == 0:
        comp = [0] * k
    else:
        # Stars and bars method for composition
        positions = rng.choice(np.arange(1, r + k), size=k - 1, replace=False)
        positions_sorted = np.sort(positions)
        comp = [positions_sorted[0] - 1]
        for i in range(1, k - 1):
            comp.append(positions_sorted[i] - positions_sorted[i - 1] - 1)
        comp.append((r + k - 1) - positions_sorted[-1])
    
    # Calculate final block sizes
    sizes = [m + c for c in comp]
    
    # Create partitions
    if hard:
        # Assign elements randomly to blocks
        labels = np.repeat(np.arange(k), sizes)
        rng.shuffle(labels)
        blocks = [[] for _ in range(k)]
        for element in range(1, n + 1):
            blocks[labels[element - 1]].append(element)
        return blocks
    else:
        # Create contiguous blocks
        blocks = []
        start = 1
        for size in sizes:
            end = start + size - 1
            blocks.append(list(range(start, end + 1)))
            start = end + 1
        return blocks

def A(X):
    """
    Computes the adjacency matrix of an equivalence relation (X, ~)
    """
    
    n = sum([len(X_i) for X_i in X])
    A_X = np.zeros((n, n), dtype=int)

    for block in X:
        for i in block:
            for j in block:
                A_X[i-1][j-1] = 1  # Convert to 0-based indexing

    return A_X

def noisy(X, p):
    """Flips upper triangle entries (i < j) with probability p."""  
    A_X = A(X)    

    for i in range(n):
        for j in range(i + 1, n):
            if np.random.rand() < p:
                A_X[i, j] = 1 - A_X[i, j]  # Flip the bit
                A_X[j, i] = A_X[i, j]
    return A_X

def score(X, equivalence=False):
    """
    Returns the Score Matrix, as per Lemma 3.3
    """
# Build Adjacency Matrix First
    if equivalence:
        A_X = A(X)
        n = A_X.shape[0]
    else:  
        n = X.shape[0]
        A_X = X.copy()
        
    score = np.zeros((n, n), dtype=int)
    
    S_X = (A_X @ A_X) - 2*A_X + np.eye(n)
    
    return S_X

def graph(X, title="", equivalence=False):

    """
    Graphing tool for equivalence classes.
    P.S: The modularity algorithm is used to detect communities
    in the graph, another way to tackle the problem.
    """
    
# Get adjacency matrix
    if equivalence:
        A_X = A(X)
            
    else:
        A_X = X.copy()

# Graph using modularity algorithm
    
    np.fill_diagonal(A_X, 0)
    G = nx.from_numpy_array(A_X)
    
    clusters = list(nx_comm.greedy_modularity_communities(G))
        
    pos = {}
    canvas_width = 100  
    offset = canvas_width / len(clusters)

    for i, cluster in enumerate(clusters):
        cluster = list(cluster)
        n = len(cluster)
        radius = 7
        angle_step = 2 * np.pi / n

        for j, node in enumerate(cluster):
            angle = j * angle_step
            x = radius * np.cos(angle) + i * offset
            y = radius * np.sin(angle)
            pos[node] = np.array([x, y])

# Plot Results            

    plt.figure(figsize=(15, 2.25))  # canvas size control
    node_labels = {i: i + 1 for i in G.nodes} # 1-based indexing
    nx.draw(G, pos=pos, labels=node_labels, with_labels=True, node_color='lightblue', node_size=200)
    plt.title(title, fontsize=18)
    plt.show()
    print("N.B: Reflexivity i.e. self-connectedness does not reflect in this graph.\n")

def matrix(M1, M2=None, width=2, nan_repr='·', title=None, spacing=2, upper_1=False, upper_2=False):
    """
    Pretty-prints one/two 2D numpy arrays side by side with 
    optional alignment and NaN formatting.
    """
    if title:
        print(title)
    if upper_1:
        M1 = M1.astype(float)
        n = M1.shape[0]
        M1[np.tril_indices(n, k=-1)] = np.nan
        
    if upper_2:
        M2 = M2.astype(float)
        n = M2.shape[0]
        M1[np.tril_indices(n, k=-1)] = np.nan

    # Find the number of rows (assuming both matrices have the same number of rows)
    rows = max(M1.shape[0], M2.shape[0] if M2 is not None else 0)
    
    for i in range(rows):
        row_str = ""
        
        # Print from M1
        for j in range(M1.shape[1]):
            val = M1[i, j] if i < M1.shape[0] else np.nan
            if np.isnan(val):
                row_str += f"{nan_repr:>{width}}"
            elif isinstance(val, float) and val.is_integer():
                row_str += f"{int(val):{width}d}"
            else:
                row_str += f"{val:{width}}"
        
        # Add space between matrices
        row_str += ' ' * spacing

        # If M2 is provided, print from M2
        if M2 is not None:
            for j in range(M2.shape[1]):
                val = M2[i, j] if i < M2.shape[0] else np.nan
                if np.isnan(val):
                    row_str += f"{nan_repr:>{width}}"
                elif isinstance(val, float) and val.is_integer():
                    row_str += f"{int(val):{width}d}"
                else:
                    row_str += f"{val:{width}}"
        
        print(row_str)

def xi(n, n_i, p):
    """
    Computes an upper bound ξ₊ for the expected gap ξ, as per Lemma 3.1
    """
    return (n_i-2) - (3*n_i+epsilon*n-6)*p + (2*n-4)*(p**2)

def var(n, n_i, p):
    """
    Computes an upper bound σ²₊ for variance σ², as per Lemma 3.2
    """

    return (n_i - 2)*(3*p -9*p**2+10*p**3 -4*p**4)+ n*(p-3*p**2+6*p**3 - 4*p**4) + (n-n_i-epsilon*n)*(2*p**2-2*p**3)

def bernstein(sigma_2, t):
    """
    Returns the Bernstein concentration bound.
    """
    return np.exp(-(t**2)/(2*sigma_2 + (2/3)*t))