mofax documentation#

Indices and tables#

MOFA model#

class mofax.mofa_model(filepath, mode='r')#

Class around HDF5-based model on disk.

This class is a thin wrapper for the HDF5 file where the trained MOFA+ model is stored. It also provides utility functions to get factors, weights, features, and samples (cells) info in the form of Pandas dataframes, and data as a NumPy array.

calculate_variance_explained(factors: int | List[int] | str | List[str] | None = None, groups: int | List[int] | str | List[str] | None = None, views: int | List[int] | str | List[str] | None = None, groups_label: str | None = None, per_factor: bool | None = None) DataFrame#

Calculate the variance explained estimates for each factor in each view and/or group. Allow also for predefined groups

Parameters:
  • factors (optional) – List of factors to consider (default is None, all factors)

  • groups (optional) – List of groups to consider (default is None, all groups)

  • views (optional) – List of views to consider (default is None, all views)

  • groups_label (optional) – Group label to split samples by (default is None)

  • per_factor (optional) – If calculate R2 per factor or for all factors (default)

close()#

Close the connection to the HDF5 file

fetch_values(variables: str | List[str], unique: bool = True)#

Fetch metadata column, factors, or feature values as well as covariates. Shorthand to get_data, get_factors, metadata, and covariates calls.

Parameters:

variables (str) – Features, metadata columns, or factors (FactorN) to fetch. For MEFISTO models with covariates, covariates are accepted such as ‘covariate1’ or ‘covariate1_transformed’.

get_cells(groups=None)#

Get the cell metadata table (cell ID and its respective group)

Parameters:

groups (optional) – List of groups to consider

get_data(views: str | int | None = None, features: str | List[str] | None = None, groups: int | List[int] | str | List[str] | None = None, df: bool = False)#

Fetch the training data

Parameters:
  • view (optional) – view to consider

  • features (optional) – Features to consider (from one view)

  • groups (optional) – groups to consider

  • df (optional) – Boolean value if to return Y matrix as a DataFrame

get_factors(groups: int | List[int] | str | List[str] | None = None, factors: int | List[int] | str | List[str] | None = None, df: bool = False, concatenate_groups: bool = True, scale: bool = False, absolute_values: bool = False)#

Get the matrix with factors as a NumPy array or as a DataFrame (df=True).

Parameters:
  • groups (optional) – List of groups to consider

  • factors (optional) – Indices of factors to consider

  • df (optional) – Boolean value if to return the factor matrix Z as a (wide) pd.DataFrame

  • concatenate_groups (optional) – If concatenate Z matrices (True by default)

  • scale (optional) – If return values scaled to zero mean and unit variance (per factor when concatenated or per factor and per group otherwise)

  • absolute_values (optional) – If return absolute values for weights

get_features(views=None)#

Get the features metadata table (feature name and its respective view)

Parameters:

views (optional) – List of views to consider

get_groups()#

Get the groups names

get_interpolated_factors(groups: int | List[int] | str | List[str] | None = None, factors: int | List[int] | str | List[str] | None = None, df: bool = False, df_long: bool = False, concatenate_groups: bool = True, scale: bool = False, absolute_values: bool = False)#

Get the matrix with interpolated factors.

If df_long is False, a dictionary with keys (“mean”, “variance”) is returned with NumPy arrays (df=False) or DataFrames (df=True) as values.

If df_long is True, a DataFrame with columns (“new_value”, “factor”, “mean”, “variance”) is returned.

Parameters:
  • groups (optional) – List of groups to consider

  • factors (optional) – Indices of factors to consider

  • df (optional) – Boolean value if to return mean and variance matrices as (wide) DataFrames (can be superseded by df_long=True)

  • df_long (optional) – Boolean value if to return a single long DataFrame (supersedes df=False and concatenate_groups=False)

  • concatenate_groups (optional) – If concatenate Z matrices (True by default, can be superseded by df_long=True)

  • scale (optional) – If return values scaled to zero mean and unit variance (per factor when concatenated or per factor and per group otherwise)

  • absolute_values (optional) – If return absolute values for weights

get_r2(factors: int | List[int] | str | List[str] | None = None, groups: int | List[int] | str | List[str] | None = None, views: int | List[int] | str | List[str] | None = None, groups_label: str | None = None, per_factor: bool | None = None) DataFrame#

Get variance explained (R2) per factor, view, and group. factors : optional

List of factors to consider (all by default)

groupsoptional

List of groups to consider (all by default)

viewsoptional

List of views to consider (all by default)

groups_labeloptional

Sample (cell) metadata column to be used as group assignment

per_factoroptional

If compute R2 per factor if it is calculated

get_samples(groups=None)#

Get the sample metadata table (sample ID and its respective group)

Parameters:

groups (optional) – List of groups to consider

get_shape(groups=None, views=None)#

Get the shape of all the data, samples (cells) and features pulled across groups and views.

Parameters:
  • groups (optional) – List of groups to consider

  • views (optional) – List of views to consider

get_top_features(factors: int | List[int] | None = None, views: int | List[int] | str | List[str] | None = None, n_features: int | None = None, clip_threshold: float | None = None, scale: bool = False, absolute_values: bool = False, only_positive: bool = False, only_negative: bool = False, per_view: bool = True, df: bool = False)#

Fetch a list of top feature names

Parameters:
  • factors (optional) – Factors to use (all factors in the model by default)

  • view (options) – The view to get the factor weights for (first view by default)

  • n_features (optional) – Number of features for each factor by their absolute value (10 by default)

  • clip_threshold (optional) – Absolute weight threshold to clip all values to (no threshold by default)

  • absolute_values (optional) – If to fetch absolute weight values

  • only_positive (optional) – If to fetch only positive weights

  • only_negative (optional) – If to fetch only negative weights

  • per_view (optional) – Get n_features per view rather than globally (True by default)

  • df (optional) – Boolean value if to return a DataFrame

get_variance_explained(factors: int | List[int] | str | List[str] | None = None, groups: int | List[int] | str | List[str] | None = None, views: int | List[int] | str | List[str] | None = None) DataFrame#

Get variance explained estimates (R2) for each factor across view(s) and/or group(s).

factorsoptional

List of factors to consider (all by default)

groupsoptional

List of groups to consider (all by default)

viewsoptional

List of views to consider (all by default)

get_views()#

Get the views names

get_views_contributions(scaled: bool = True)#

Project new data onto the factor space of the model.

For the projection, a pseudo-inverse of the weights matrix is calculated and its product with the provided data matrix is calculated.

Parameters:

scaled (bool, optional) – Whether to scale contributions scores per sample so that they sum up to 1 (True by default)

Returns:

Dataframe with view contribution scores, samples in rows and views in columns

Return type:

pd.DataFrame

get_weights(views: int | List[int] | str | List[str] | None = None, factors: int | List[int] | None = None, df: bool = False, scale: bool = False, concatenate_views: bool = True, absolute_values: bool = False)#

Fetch the weight matrices

Parameters:
  • views (optional) – List of views to consider

  • factors (optional) – Indices of factors to use

  • df (optional) – Boolean value if to return W matrix as a (wide) pd.DataFrame

  • scale (optional) – If return values scaled to zero mean and unit variance (per factor when concatenated or per factor and per view otherwise)

  • concatenate_weights (optional) – If concatenate W matrices (True by default)

  • absolute_values (optional) – If return absolute values for weights

project_data(data, view: str | int | None = None, factors: int | List[int] | str | List[str] | None = None, df: bool = False, feature_intersection: bool = False)#

Project new data onto the factor space of the model.

For the projection, a pseudo-inverse of the weights matrix is calculated and its product with the provided data matrix is calculated.

Parameters:
  • data – Numpy array or Pandas DataFrame with the data matching the number of features

  • view (optional) – A view of the model to consider (first view by default)

  • factors (optional) – Indices of factors to use for the projection (all factors by default)

run_umap(groups: int | List[int] | str | List[str] | None = None, factors: int | List[int] | None = None, n_neighbors: int = 10, min_dist: float = 0.5, spread: float = 1.0, random_state: int = 42, **kwargs) None#

Run UMAP on the factor space

Parameters:
  • n_neighbors (optional) – UMAP parameter: number of neighbors.

  • min_dist – UMAP parameter: the effective minimum distance between embedded points. Smaller values will result in a more clustered/clumped embedding where nearby points on the manifold are drawn closer together, while larger values will result on a more even dispersal of points. The value should be set relative to the spread value, which determines the scale at which embedded points will be spread out.

  • spread – UMAP parameter: the effective scale of embedded points. In combination with min_dist this determines how clustered/clumped the embedded points are.

  • random_state – random seed

Plotting functions#