limix.qtl.scanΒΆ

limix.qtl.scan(G, Y, lik='normal', K=None, M=None, idx=None, A=None, A0=None, A1=None, verbose=True)[source]ΒΆ

Multi-trait association and interaction testing via linear mixed models.

Let n, c, and p be the number of samples, covariates, and traits, respectively. The outcome variable Y is a nΓ—p matrix distributed according to

vec(Y) ~ N((A βŠ— M) vec(𝚨), Kβ‚€ = Cβ‚€ βŠ— K + C₁ βŠ— I) under Hβ‚€.

A and M are design matrices of dimensions pΓ—p and nΓ—c provided by the user, where X is the usual matrix of covariates commonly used in single-trait models. 𝚨 is a cΓ—p matrix of fixed-effect sizes per trait. Cβ‚€ and C₁ are both symmetric matrices of dimensions pΓ—p, for which C₁ is guaranteed by our implementation to be of full rank. The parameters of the Hβ‚€ model are the matrices 𝚨, Cβ‚€, and C₁.

The additional models H₁ and Hβ‚‚ are define as

vec(Y) ~ N((A βŠ— M) vec(𝚨) + (Aβ‚€ βŠ— Gα΅’) vec(πš©β‚€), sβ‹…Kβ‚€)

and

vec(Y) ~ N((A βŠ— M) vec(𝚨) + (Aβ‚€ βŠ— Gα΅’) vec(πš©β‚€) + (A₁ βŠ— Gα΅’) vec(πš©β‚), sβ‹…Kβ‚€)

It performs likelihood-ratio tests for the following cases, where the first hypothesis is the null one while the second hypothesis is the alternative one:

  • Hβ‚€ vs H₁: testing for vec(πš©β‚€) β‰  𝟎 while vec(πš©β‚) = 𝟎

  • Hβ‚€ vs Hβ‚‚: testing for [vec(πš©β‚€) vec(πš©β‚)] β‰  𝟎

  • H₁ vs Hβ‚‚: testing for vec(πš©β‚) β‰  𝟎

It supports generalized linear mixed models (GLMM) when a single trait is used. In this case, the following likelihoods are implemented:

  • Bernoulli

  • Probit

  • Binomial

  • Poisson

Formally, let p(πœ‡) be one of the supported probability distributions where πœ‡ is its mean. The Hβ‚€ model is defined as follows:

yα΅’ ∼ p(πœ‡α΅’=g(zα΅’)) for 𝐳 ∼ 𝓝(..., ...).

g(β‹…) is the corresponding canonical link function for the Bernoulli, Binomial, and Poisson likelihoods. The Probit likelihood, on the other hand, is a Bernoulli likelihood with probit link function.

Parameters
  • G (nΓ—m array_like) – Genetic candidates.

  • Y (nΓ—p array_like) – Rows are samples and columns are phenotypes.

  • lik (tuple, "normal", "bernoulli", "probit", "binomial", "poisson") – Sample likelihood describing the residual distribution. Either a tuple or a string specifying the likelihood is required. The Normal, Bernoulli, Probit, and Poisson likelihoods can be selected by providing a string. Binomial likelihood on the other hand requires a tuple because of the number of trials: ("binomial", array_like). Defaults to "normal".

  • K (nΓ—n array_like) – Sample covariance, often the so-called kinship matrix.

  • M (nΓ—c array_like) – Covariates matrix.

  • idx (list) – List of candidate indices that defines the set of candidates to be used in the tests.

  • A (pΓ—p array_like) – Symmetric trait-by-trait design matrix.

  • A0 (pΓ—pβ‚€ array_like, optional) – Matrix Aβ‚€, possibility a non-symmetric one. If None, it defines an empty matrix, pβ‚€=0. Defaults to None.

  • A1 (pΓ—p₁ array_like, optional) – Matrix A₁, possibility a non-symmetric one. If None, it defines an identity matrix, pβ‚€=p. Defaults to None.

  • verbose (bool, optional) – True to display progress and summary; False otherwise.

Returns

result – P-values, log of marginal likelihoods, effect sizes, and associated statistics.

Return type

limix.qtl._result.STScanResult, limix.qtl._result.MTScanResult

Examples

>>> from limix.qtl import scan
>>> from numpy import reshape, kron, eye
>>> from numpy import concatenate
>>> from numpy.random import RandomState
>>> import scipy.stats as st
>>> from limix.qc import normalise_covariance
>>>
>>> def vec(x):
...     return reshape(x, (-1,) + x.shape[2:], order="F")
>>>
>>> def unvec(x, shape):
...     return reshape(x, shape, order="F")
>>>
>>> random = RandomState(0)
>>> n = 30
>>> ntraits = 2
>>> ncovariates = 3
>>>
>>> A = random.randn(ntraits, ntraits)
>>> A = A @ A.T
>>> M = random.randn(n, ncovariates)
>>>
>>> C0 = random.randn(ntraits, ntraits)
>>> C0 = C0 @ C0.T
>>>
>>> C1 = random.randn(ntraits, ntraits)
>>> C1 = C1 @ C1.T
>>>
>>> G = random.randn(n, 4)
>>>
>>> A0 = random.randn(ntraits, 1)
>>> A1 = random.randn(ntraits, 2)
>>> A01 = concatenate((A0, A1), axis=1)
>>>
>>> K = random.randn(n, n + 1)
>>> K = normalise_covariance(K @ K.T)
>>>
>>> beta = vec(random.randn(ntraits, ncovariates))
>>> alpha = vec(random.randn(A01.shape[1], G.shape[1]))
>>>
>>> mvn = st.multivariate_normal
>>> m = kron(A, M) @ beta + kron(A01, G) @ alpha
>>> Y = unvec(mvn(m, kron(C0, K) + kron(C1, eye(n))).rvs(), (n, -1))
>>>
>>> idx = [[0, 1], 2, [3]]
>>> r = scan(G, Y, idx=idx, K=K, M=M, A=A, A0=A0, A1=A1, verbose=False)
>>> from numpy import dot, exp, sqrt, ones
>>> from numpy.random import RandomState
>>> from pandas import DataFrame
>>> import pandas as pd
>>> from limix.qtl import scan
>>>
>>> random = RandomState(1)
>>> pd.options.display.float_format = "{:9.6f}".format
>>>
>>> n = 30
>>> p = 3
>>> samples_index = range(n)
>>>
>>> M = DataFrame(dict(offset=ones(n), age=random.randint(10, 60, n)))
>>> M.index = samples_index
>>>
>>> X = random.randn(n, 100)
>>> K = dot(X, X.T)
>>>
>>> candidates = random.randn(n, p)
>>> candidates = DataFrame(candidates, index=samples_index,
...                                    columns=['rs0', 'rs1', 'rs2'])
>>>
>>> y = random.poisson(exp(random.randn(n)))
>>>
>>> result = scan(candidates, y, 'poisson', K, M=M, verbose=False)
>>>
>>> result.stats  
           lml0       lml2  dof20    scale2      pv20
test
0    -48.720890 -48.536860      1  0.943532  0.544063
1    -48.720890 -47.908341      1  0.904814  0.202382
2    -48.720890 -48.534754      1  0.943400  0.541768
>>> print(result)  
Hypothesis 0
------------

𝐳 ~ 𝓝(π™ΌπœΆ, 0.000⋅𝙺 + 0.788⋅𝙸) for yα΅’ ~ Poisson(Ξ»α΅’=g(zα΅’)) and g(x)=eΛ£

M     = ['offset' 'age']
𝜢     = [ 0.39528889 -0.00556797]
se(𝜢) = [0.50173695 0.01505240]
lml   = -48.720890273519444

Hypothesis 2
------------

𝐳 ~ 𝓝(π™ΌπœΆ + G𝛃, s(0.000⋅𝙺 + 0.788⋅𝙸)) for yα΅’ ~ Poisson(Ξ»α΅’=g(zα΅’)) and g(x)=eΛ£

          lml       cov. effsizes   cand. effsizes
--------------------------------------------------
mean   -4.833e+01       2.393e-01       -1.966e-01
std     3.623e-01       2.713e-01        1.028e-01
min    -4.854e+01      -8.490e-03       -3.151e-01
25%    -4.854e+01      -7.684e-03       -2.295e-01
50%    -4.853e+01       2.243e-01       -1.439e-01
75%    -4.822e+01       4.725e-01       -1.374e-01
max    -4.791e+01       5.255e-01       -1.309e-01

Likelihood-ratio test p-values
------------------------------

       𝓗₀ vs 𝓗₂
----------------
mean   4.294e-01
std    1.966e-01
min    2.024e-01
25%    3.721e-01
50%    5.418e-01
75%    5.429e-01
max    5.441e-01
>>> from numpy import zeros
>>>
>>> nsamples = 50
>>>
>>> X = random.randn(nsamples, 2)
>>> G = random.randn(nsamples, 100)
>>> K = dot(G, G.T)
>>> ntrials = random.randint(1, 100, nsamples)
>>> z = dot(G, random.randn(100)) / sqrt(100)
>>>
>>> successes = zeros(len(ntrials), int)
>>> for i, nt in enumerate(ntrials):
...     for _ in range(nt):
...         successes[i] += int(z[i] + 0.5 * random.randn() > 0)
>>>
>>> result = scan(X, successes, ("binomial", ntrials), K, verbose=False)
>>> print(result)  
Hypothesis 0
------------

𝐳 ~ 𝓝(π™ΌπœΆ, 0.152⋅𝙺 + 1.738⋅𝙸) for yα΅’ ~ Binom(ΞΌα΅’=g(zα΅’), nα΅’) and g(x)=1/(1+e⁻ˣ)

M     = ['offset']
𝜢     = [0.40956942]
se(𝜢) = [0.55141166]
lml   = -142.80784719977515

Hypothesis 2
------------

𝐳 ~ 𝓝(π™ΌπœΆ + G𝛃, s(0.152⋅𝙺 + 1.738⋅𝙸)) for yα΅’ ~ Binom(ΞΌα΅’=g(zα΅’), nα΅’) and g(x)=1/(1+e⁻ˣ)

          lml       cov. effsizes   cand. effsizes
--------------------------------------------------
mean   -1.425e+02       3.701e-01        2.271e-01
std     4.110e-01       2.296e-02        5.680e-01
min    -1.427e+02       3.539e-01       -1.745e-01
25%    -1.426e+02       3.620e-01        2.631e-02
50%    -1.425e+02       3.701e-01        2.271e-01
75%    -1.423e+02       3.782e-01        4.279e-01
max    -1.422e+02       3.864e-01        6.287e-01

Likelihood-ratio test p-values
------------------------------

       𝓗₀ vs 𝓗₂
----------------
mean   4.959e-01
std    3.362e-01
min    2.582e-01
25%    3.771e-01
50%    4.959e-01
75%    6.148e-01
max    7.336e-01

Notes

It will raise a ValueError exception if non-finite values are passed. Please, refer to the limix.qc.mean_impute() function for missing value imputation.