API reference

scanpro(data, clusters_col, conds_col, samples_col=None, covariates=None, conditions=None, transform='logit', robust=True, n_sims=100, n_reps='auto', run_partial_sim=True, pairwise=False, verbosity=1, seed=1)

Wrapper function for scanpro. The data must have replicates, since scanpro requires replicated data to run. If the data doesn’t have replicates, the function {sim_scanpro} will generate artificial replicates using bootstrapping and run propeller multiple times. The values are then pooled to get robust estimation of p values.

Parameters:
  • data (anndata.AnnData or pandas.DataFrame) – Single cell data with columns containing sample, condition and cluster/celltype information.

  • clusters_col (str) – Name of column in date or data.obs where cluster/celltype information are stored.

  • conds_col (str) – Column in data or data.obs where condition information are stored.

  • samples_col (str) – Column in data or data.obs where sample information are stored, if None, dataset is assumed to be not replicated and conds_col will be set as samples_col, defaults to None.

  • covariates (list) – List of covariates to include in the model, defaults to None.

  • transform (str) – Method of transformation of proportions, defaults to ‘logit’.

  • conditions (str) – List of condtitions of interest to compare, defaults to None.

  • robust (bool) – Robust ebayes estimation to mitigate the effect of outliers, defaults to True.

  • n_sims (int) – Number of simulations to perform if data does not have replicates, defaults to 100.

  • n_reps (int) – Number of replicates to simulate if data does not have replicates, ‘auto’ will generate pseudo-replicates for each sample based on its cell count, (3 for #cells<5000, 5 for #cells<14000 and 8 for #cells>14000), defaults to ‘auto’.

  • run_partial_sim (bool) – If True, the bootstrapping method will be also performed on datasets that are partially replicated (where some samples have replicates).

  • pairwise (bool) – If True, all pairwise comparisons between conditions will be performed, defaults to False.

  • verbosity (int) – Verbosity level for logging progress. 0=silent, 1=info, 2=debug. Defaults to 1.

  • seed (int) – Seed for random number generator, defaults to 1.

Raises:

ValueError – Data must have at least two conditions!

Return ScanproResult:

A scanpro object containing estimated mean proportions for each cluster and p-values.

run_scanpro(data, clusters_col, conds_col, samples_col=None, covariates=None, conditions=None, transform='logit', robust=True, n_sims=100, n_reps='auto', run_partial_sim=True, verbosity=1, seed=1)

Wrapper function for scanpro. The data must have replicates, since scanpro requires replicated data to run. If the data doesn’t have replicates, the function {sim_scanpro} will generate artificial replicates using bootstrapping and run propeller multiple times. The values are then pooled to get robust estimation of p values.

Parameters:
  • data (anndata.AnnData or pandas.DataFrame) – Single cell data with columns containing sample, condition and cluster/celltype information.

  • clusters_col (str) – Name of column in date or data.obs where cluster/celltype information are stored.

  • conds_col (str) – Column in data or data.obs where condition information are stored.

  • samples_col (str) – Column in data or data.obs where sample information are stored, if None, dataset is assumed to be not replicated and conds_col will be set as samples_col, defaults to None.

  • covariates (list) – List of covariates to include in the model, defaults to None.

  • transform (str) – Method of transformation of proportions, defaults to ‘logit’.

  • conditions (str) – List of condtitions of interest to compare, defaults to None.

  • robust (bool) – Robust ebayes estimation to mitigate the effect of outliers, defaults to True.

  • n_sims (int) – Number of simulations to perform if data does not have replicates, defaults to 100.

  • n_reps (int) – Number of replicates to simulate if data does not have replicates, ‘auto’ will generate pseudo-replicates for each sample based on its cell count, (3 for #cells<5000, 5 for #cells<14000 and 8 for #cells>14000), defaults to ‘auto’.

  • run_partial_sim (bool) – If True, the bootstrapping method will be also performed on datasets that are partially replicated (where some samples have replicates).

  • verbosity (int) – Verbosity level for logging progress. 0=silent, 1=info, 2=debug. Defaults to 1.

  • seed (int) – Seed for random number generator, defaults to 1.

Raises:

ValueError – Data must have at least two conditions!

Return ScanproResult:

A scanpro object containing estimated mean proportions for each cluster and p-values.

run_stats(data, clusters, samples, conds, transform='logit', covariates=None, conditions=None, robust=True, verbosity=1)

Test the significance of changes in cell proportions across conditions in single-cell data. The function uses empirical bayes to moderate statistical tests to give robust estimation of significance.

Parameters:
  • adata (anndata.AnnData or pandas.DataFrame) – Anndata object containing single-cell data.

  • clusters (str) – Column in adata.obs where cluster or celltype information are stored.

  • samples (str) – Column in adata.obs where sample information are stored.

  • conds (str) – Column in adata.obs where condition information are stored.

  • transform (str) – Method of normalization of proportions (logit or arcsin), defaults to ‘logit’

  • conditions (str) – List of condtitions of interest to compare, defaults to None.

  • robust (bool) – Robust ebayes estimation to mitigate the effect of outliers, defaults to True

Return ScanproResult:

A scanpro object containing estimated mean proportions for each cluster and p-values.

anova(props, prop_trans, design, coef, robust=True, verbosity=1)

Test the significance of changes in cell proportion across 3 or more conditions using empirical bayes and moderated ANOVA.

Parameters:
  • props (pandas.DataFrame) – True cell proportions.

  • prop_trans (pandas.DataFrame) – Normalized cell proportions.

  • design (pandas.DataFrame) – Design matrix where rows are samples and columns are coefficients of condtions of interest to be estimated.

  • coef (numpy.ndarray) – Array specifiying columns of interest in the design matrix.

  • robust (bool) – Robust empirical bayes estimation of posterior variances.

Return pandas.DataFrame:

Dataframe containing estimated mean proportions for each condition, F-statistics, p-values and adjusted p-values.

t_test(props, prop_trans, design, contrasts, robust=True, verbosity=1)

Test the significance of changes in cell proportion across 2 conditions using empirical bayes and moderated t-test.

Parameters:
  • props (pandas.DataFrame) – True cell proportions.

  • prop_trans (pandas.DataFrame) – Normalized cell proportions.

  • design (pandas.DataFrame) – Design matrix where rows are samples and columns are coefficients of condtions of interest to be estimated.

  • contrasts (list) – A list specifiying 2 conditions in the design matrix to be tested; [1, -1].

  • robust (bool) – Robust empirical bayes estimation of posterior variances.

Return pandas.DataFrame:

Dataframe containing estimated mean proportions for each condition, F-statistics, p-values and adjusted p-values.

sim_scanpro(data, clusters_col, conds_col, covariates=None, transform='arcsin', n_reps=8, n_sims=100, conditions=None, robust=True, verbosity=1)

Run scanpro multiple times on same dataset and pool estimates together.

Parameters:
  • data (anndata.AnnData or pandas.DataFrame) – Single cell data with columns containing sample, condition and cluster/celltype information.

  • clusters_col (str) – Name of column in date or data.obs where cluster/celltype information are stored.

  • conds_col (str) – Column in data or data.obs where condition informtaion are stored.

  • transform (str) – Method of transformation of proportions, defaults to ‘logit’.

  • n_reps (int) – Number of replicates to simulate if data does not have replicates, defaults to 8.

  • n_sims (int) – Number of simulations to perform if data does not have replicates, defaults to 100.

  • conditions (str) – List of condtitions of interest to compare, defaults to None.

  • robust (bool) – Robust ebayes estimation to mitigate the effect of outliers, defaults to True.

  • verbosity (bool) – Verbosity level, defaults to 1.

Return ScanproResults:

A ScanproResult object containing estimated mean proportions for each cluster and median p-values from all simulations.