Random Binary Variables

Ulf Hamster 2 min.
python random numbers correlated random numbers

Load packages

import numpy as np
np.random.seed(42)
from luriegold import luriegold
import korr

Specify Correlation Matrix

Specify a subjective/manual correlation matrix

rho = np.array([
    [1., .5, -.3],
    [.5, 1., .7],
    [-.3, .7, 1.]
])

rho.round(4)
array([[ 1. ,  0.5, -0.3],
       [ 0.5,  1. ,  0.7],
       [-0.3,  0.7,  1. ]])

Adjust ill-conditioned matrix with the Lurie-Goldberg algorithm

rho, _, _ = luriegold(rho)

rho.round(4)
array([[ 1.    ,  0.4913, -0.2921],
       [ 0.4913,  1.    ,  0.6895],
       [-0.2921,  0.6895,  1.    ]])

Generate Random Continuous Variables

First, generate unrelated random numbers

n_examples = 10000
X = np.random.standard_normal(size=(n_examples, 3))

Second, transform unrelated random numbers to correlated random numbers with the cholesky method

X = np.dot(X, np.linalg.cholesky(rho).T)

Check if the generated correlated random numbers X resemble the correlation matrix rho

tmp, _ = korr.pearson(X)
tmp.round(4)
array([[ 1.    ,  0.4867, -0.3015],
       [ 0.4867,  1.    ,  0.6862],
       [-0.3015,  0.6862,  1.    ]])

Random Binary Variables

Convert X to binary variables

X = (X >= 0.0).astype(np.uint8)

Compute the Matthews correlation for binary variables

tmp, _ = korr.mcc(X)
tmp.round(4)
array([[ 1.    ,  0.3239, -0.1982],
       [ 0.3239,  1.    ,  0.478 ],
       [-0.1982,  0.478 ,  1.    ]])

The MCC correlations don't resemble the desired, specified correlation matrix. The absolute values of the coefficients seem to be systematically smaller.

If you scale by 50% the coefficient align in the example.

tmp2 = tmp * 1.5
tmp2[np.eye(3).astype(bool)] = 1
tmp2.round(4)
array([[ 1.    ,  0.4858, -0.2973],
       [ 0.4858,  1.    ,  0.717 ],
       [-0.2973,  0.717 ,  1.    ]])