Quality control¶
Box-Cox¶
- limix.qc.boxcox(x)[source]
Box-Cox transformation for normality conformance.
It applies the power transformation
\[\begin{split}f(x) = \begin{cases} \frac{x^{\lambda} - 1}{\lambda}, & \text{if } \lambda > 0; \\ \log(x), & \text{if } \lambda = 0. \end{cases}\end{split}\]to the provided data, hopefully making it more normal distribution-like. The λ parameter is fit by maximum likelihood estimation.
- Parameters
X (array_like) – Data to be transformed.
- Returns
boxcox – Box-Cox transformed data.
- Return type
ndarray
Examples
(Source code, png)
Dependent columns¶
- limix.qc.remove_dependent_cols(X, tol=1.49e-08)[source]
Remove dependent columns.
Return a matrix with dependent columns removed.
- Parameters
X (array_like) – Matrix to might have dependent columns.
tol (float, optional) – Threshold above which columns are considered dependents.
- Returns
rank – Full column rank matrix.
- Return type
ndarray
- limix.qc.unique_variants(X)[source]
Filters out variants with the same genetic profile.
- Parameters
X (array_like) – Samples-by-variants matrix of genotype values.
- Returns
genotype – Genotype array with unique variants.
- Return type
ndarray
Example
>>> from numpy.random import RandomState >>> from numpy import kron, ones, sort >>> from limix.qc import unique_variants >>> random = RandomState(1) >>> >>> N = 4 >>> X = kron(random.randn(N, 3) < 0, ones((1, 2))) >>> >>> print(X) [[0. 0. 1. 1. 1. 1.] [1. 1. 0. 0. 1. 1.] [0. 0. 1. 1. 0. 0.] [1. 1. 0. 0. 1. 1.]] >>> >>> print(unique_variants(X)) [[0. 1. 1.] [1. 1. 0.] [0. 0. 1.] [1. 1. 0.]]
Genotype¶
- limix.qc.indep_pairwise(X, window_size, step_size, threshold, verbose=True)[source]
Determine pair-wise independent variants.
Independent variants are defined via squared Pearson correlations between pairs of variants inside a sliding window.
- Parameters
X (array_like) – Sample by variants matrix.
window_size (int) – Number of variants inside each window.
step_size (int) – Number of variants the sliding window skips.
threshold (float) – Squared Pearson correlation threshold for independence.
verbose (bool) – True for progress information; False otherwise.
- Returns
ok – Boolean array defining independent variants
- Return type
ndarray
Example
>>> from numpy.random import RandomState >>> from limix.qc import indep_pairwise >>> >>> random = RandomState(0) >>> X = random.randn(10, 20) >>> >>> indep_pairwise(X, 4, 2, 0.5, verbose=False) array([ True, True, False, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True])
- limix.qc.compute_maf(X)[source]
Compute minor allele frequencies.
It assumes that
X
encodes 0, 1, and 2 representing the number of alleles (or dosage), orNaN
to represent missing values.- Parameters
X (array_like) – Genotype matrix.
- Returns
Minor allele frequencies.
- Return type
array_like
Examples
>>> from numpy.random import RandomState >>> from limix.qc import compute_maf >>> >>> random = RandomState(0) >>> X = random.randint(0, 3, size=(100, 10)) >>> >>> print(compute_maf(X)) [0.49 0.49 0.445 0.495 0.5 0.45 0.48 0.48 0.47 0.435]
Impute¶
- limix.qc.mean_impute(X, axis=- 1, inplace=False)[source]
Impute
NaN
values.It defaults to column-wise imputation.
- Parameters
- Returns
Imputed array.
- Return type
ndarray
Examples
>>> from numpy.random import RandomState >>> from numpy import nan, array_str >>> from limix.qc import mean_impute >>> >>> random = RandomState(0) >>> X = random.randn(5, 2) >>> X[0, 0] = nan >>> >>> print(array_str(mean_impute(X), precision=4)) [[ 0.9233 0.4002] [ 0.9787 2.2409] [ 1.8676 -0.9773] [ 0.9501 -0.1514] [-0.1032 0.4106]]
- limix.qc.count_missingness(X)[source]
Count the number of missing values per column.
- Parameters
X (array_like) – Matrix.
- Returns
count – Number of missing values per column.
- Return type
ndarray
Kinship¶
- limix.qc.normalise_covariance(K, out=None)[source]
Variance rescaling of covariance matrix 𝙺.
Let n be the number of rows (or columns) of 𝙺 and let mᵢ be the average of the values in the i-th column. Gower rescaling is defined as
\[𝙺(n - 1)/(𝚝𝚛𝚊𝚌𝚎(𝙺) - ∑mᵢ).\]Notes
The reasoning of the scaling is as follows. Let 𝐠 be a vector of n independent samples and let 𝙲 be the Gower’s centering matrix. The unbiased variance estimator is
\[v = ∑ (gᵢ-ḡ)²/(n-1) = 𝚝𝚛𝚊𝚌𝚎((𝐠-ḡ𝟏)ᵀ(𝐠-ḡ𝟏))/(n-1) = 𝚝𝚛𝚊𝚌𝚎(𝙲𝐠𝐠ᵀ𝙲)/(n-1)\]Let 𝙺 be the covariance matrix of 𝐠. The expectation of the unbiased variance estimator is
\[𝐄[v] = 𝚝𝚛𝚊𝚌𝚎(𝙲𝐄[𝐠𝐠ᵀ]𝙲)/(n-1) = 𝚝𝚛𝚊𝚌𝚎(𝙲𝙺𝙲)/(n-1),\]assuming that 𝐄[gᵢ]=0. We thus divide 𝙺 by 𝐄[v] to achieve an unbiased normalisation on the random variable gᵢ.
- Parameters
K (array_like) – Covariance matrix to be normalised.
out (array_like, optional) – Result destination. Defaults to
None
.
Examples
>>> from numpy import dot, mean, zeros >>> from numpy.random import RandomState >>> from limix.qc import normalise_covariance >>> >>> random = RandomState(0) >>> X = random.randn(10, 10) >>> K = dot(X, X.T) >>> Z = random.multivariate_normal(zeros(10), K, 500) >>> print("%.3f" % mean(Z.var(1, ddof=1))) 9.824 >>> Kn = normalise_covariance(K) >>> Zn = random.multivariate_normal(zeros(10), Kn, 500) >>> print("%.3f" % mean(Zn.var(1, ddof=1))) 1.008
Normalisation¶
- limix.qc.mean_standardize(X, axis=- 1, inplace=False)[source]
Zero-mean and one-deviation normalisation.
Normalise in such a way that the mean and variance are equal to zero and one. This transformation is taken over the flattened array by default, otherwise over the specified axis. Missing values represented by
NaN
are ignored.- Parameters
- Returns
X – Normalized array.
- Return type
ndarray
Example
>>> import limix >>> from numpy import arange >>> >>> X = arange(15).reshape((5, 3)).astype(float) >>> print(X) [[ 0. 1. 2.] [ 3. 4. 5.] [ 6. 7. 8.] [ 9. 10. 11.] [12. 13. 14.]] >>> X = arange(6).reshape((2, 3)).astype(float) >>> X = limix.qc.mean_standardize(X, axis=0) >>> print(X) [[-1.22474487 0. 1.22474487] [-1.22474487 0. 1.22474487]]
- limix.qc.quantile_gaussianize(X, axis=1, inplace=False)[source]
Normalize a sequence of values via rank and Normal c.d.f.
It defaults to column-wise normalization.
- Parameters
- Returns
Gaussian-normalized values.
- Return type
array_like
Examples
>>> from limix.qc import quantile_gaussianize >>> from numpy import array_str >>> >>> qg = quantile_gaussianize([-1, 0, 2]) >>> print(qg) [-0.67448975 0. 0.67448975]