API reference

scanpro(data, clusters_col, conds_col, samples_col=None, covariates=None, conditions=None, transform='logit', robust=True, n_sims=100, n_reps='auto', run_partial_sim=True, pairwise=False, verbosity=1, seed=1)

Wrapper function for scanpro. The data must have replicates, since scanpro requires replicated data to run. If the data doesn’t have replicates, the function {sim_scanpro} will generate artificial replicates using bootstrapping and run propeller multiple times. The values are then pooled to get robust estimation of p values.

Parameters:

data (anndata.AnnData or pandas.DataFrame) – Single cell data with columns containing sample, condition and cluster/celltype information.
clusters_col (str) – Name of column in date or data.obs where cluster/celltype information are stored.
conds_col (str) – Column in data or data.obs where condition information are stored.
samples_col (str) – Column in data or data.obs where sample information are stored, if None, dataset is assumed to be not replicated and conds_col will be set as samples_col, defaults to None.
covariates (list) – List of covariates to include in the model, defaults to None.
transform (str) – Method of transformation of proportions, defaults to ‘logit’.
conditions (str) – List of condtitions of interest to compare, defaults to None.
robust (bool) – Robust ebayes estimation to mitigate the effect of outliers, defaults to True.
n_sims (int) – Number of simulations to perform if data does not have replicates, defaults to 100.
n_reps (int) – Number of replicates to simulate if data does not have replicates, ‘auto’ will generate pseudo-replicates for each sample based on its cell count, (3 for #cells<5000, 5 for #cells<14000 and 8 for #cells>14000), defaults to ‘auto’.
run_partial_sim (bool) – If True, the bootstrapping method will be also performed on datasets that are partially replicated (where some samples have replicates).
pairwise (bool) – If True, all pairwise comparisons between conditions will be performed, defaults to False.
verbosity (int) – Verbosity level for logging progress. 0=silent, 1=info, 2=debug. Defaults to 1.
seed (int) – Seed for random number generator, defaults to 1.

Raises:

ValueError – Data must have at least two conditions!

Return ScanproResult:

A scanpro object containing estimated mean proportions for each cluster and p-values.

run_scanpro(data, clusters_col, conds_col, samples_col=None, covariates=None, conditions=None, transform='logit', robust=True, n_sims=100, n_reps='auto', run_partial_sim=True, verbosity=1, seed=1)

Wrapper function for scanpro. The data must have replicates, since scanpro requires replicated data to run. If the data doesn’t have replicates, the function {sim_scanpro} will generate artificial replicates using bootstrapping and run propeller multiple times. The values are then pooled to get robust estimation of p values.

Parameters:

data (anndata.AnnData or pandas.DataFrame) – Single cell data with columns containing sample, condition and cluster/celltype information.
clusters_col (str) – Name of column in date or data.obs where cluster/celltype information are stored.
conds_col (str) – Column in data or data.obs where condition information are stored.
samples_col (str) – Column in data or data.obs where sample information are stored, if None, dataset is assumed to be not replicated and conds_col will be set as samples_col, defaults to None.
covariates (list) – List of covariates to include in the model, defaults to None.
transform (str) – Method of transformation of proportions, defaults to ‘logit’.
conditions (str) – List of condtitions of interest to compare, defaults to None.
robust (bool) – Robust ebayes estimation to mitigate the effect of outliers, defaults to True.
n_sims (int) – Number of simulations to perform if data does not have replicates, defaults to 100.
n_reps (int) – Number of replicates to simulate if data does not have replicates, ‘auto’ will generate pseudo-replicates for each sample based on its cell count, (3 for #cells<5000, 5 for #cells<14000 and 8 for #cells>14000), defaults to ‘auto’.
run_partial_sim (bool) – If True, the bootstrapping method will be also performed on datasets that are partially replicated (where some samples have replicates).
verbosity (int) – Verbosity level for logging progress. 0=silent, 1=info, 2=debug. Defaults to 1.
seed (int) – Seed for random number generator, defaults to 1.

Raises:

ValueError – Data must have at least two conditions!

Return ScanproResult:

A scanpro object containing estimated mean proportions for each cluster and p-values.

run_stats(data, clusters, samples, conds, transform='logit', covariates=None, conditions=None, robust=True, verbosity=1)

Test the significance of changes in cell proportions across conditions in single-cell data. The function uses empirical bayes to moderate statistical tests to give robust estimation of significance.

Parameters:

adata (anndata.AnnData or pandas.DataFrame) – Anndata object containing single-cell data.
clusters (str) – Column in adata.obs where cluster or celltype information are stored.
samples (str) – Column in adata.obs where sample information are stored.
conds (str) – Column in adata.obs where condition information are stored.
transform (str) – Method of normalization of proportions (logit or arcsin), defaults to ‘logit’
conditions (str) – List of condtitions of interest to compare, defaults to None.
robust (bool) – Robust ebayes estimation to mitigate the effect of outliers, defaults to True

Return ScanproResult:

A scanpro object containing estimated mean proportions for each cluster and p-values.

anova(props, prop_trans, design, coef, robust=True, verbosity=1)

Test the significance of changes in cell proportion across 3 or more conditions using empirical bayes and moderated ANOVA.

Parameters:

props (pandas.DataFrame) – True cell proportions.
prop_trans (pandas.DataFrame) – Normalized cell proportions.
design (pandas.DataFrame) – Design matrix where rows are samples and columns are coefficients of condtions of interest to be estimated.
coef (numpy.ndarray) – Array specifiying columns of interest in the design matrix.
robust (bool) – Robust empirical bayes estimation of posterior variances.

Return pandas.DataFrame:

Dataframe containing estimated mean proportions for each condition, F-statistics, p-values and adjusted p-values.

t_test(props, prop_trans, design, contrasts, robust=True, verbosity=1)

Test the significance of changes in cell proportion across 2 conditions using empirical bayes and moderated t-test.

Parameters:

props (pandas.DataFrame) – True cell proportions.
prop_trans (pandas.DataFrame) – Normalized cell proportions.
design (pandas.DataFrame) – Design matrix where rows are samples and columns are coefficients of condtions of interest to be estimated.
contrasts (list) – A list specifiying 2 conditions in the design matrix to be tested; [1, -1].
robust (bool) – Robust empirical bayes estimation of posterior variances.

Return pandas.DataFrame:

Dataframe containing estimated mean proportions for each condition, F-statistics, p-values and adjusted p-values.

sim_scanpro(data, clusters_col, conds_col, covariates=None, transform='arcsin', n_reps=8, n_sims=100, conditions=None, robust=True, verbosity=1)

Run scanpro multiple times on same dataset and pool estimates together.

Parameters:

data (anndata.AnnData or pandas.DataFrame) – Single cell data with columns containing sample, condition and cluster/celltype information.
clusters_col (str) – Name of column in date or data.obs where cluster/celltype information are stored.
conds_col (str) – Column in data or data.obs where condition informtaion are stored.
transform (str) – Method of transformation of proportions, defaults to ‘logit’.
n_reps (int) – Number of replicates to simulate if data does not have replicates, defaults to 8.
n_sims (int) – Number of simulations to perform if data does not have replicates, defaults to 100.
conditions (str) – List of condtitions of interest to compare, defaults to None.
robust (bool) – Robust ebayes estimation to mitigate the effect of outliers, defaults to True.
verbosity (bool) – Verbosity level, defaults to 1.

Return ScanproResults:

A ScanproResult object containing estimated mean proportions for each cluster and median p-values from all simulations.