mofax documentation#

Indices and tables#

MOFA model#

class mofax.mofa_model(filepath, mode='r')#

Class around HDF5-based model on disk.

This class is a thin wrapper for the HDF5 file where the trained MOFA+ model is stored. It also provides utility functions to get factors, weights, features, and samples (cells) info in the form of Pandas dataframes, and data as a NumPy array.

calculate_variance_explained(factors: int | List[int] | str | List[str] | None = None, groups: int | List[int] | str | List[str] | None = None, views: int | List[int] | str | List[str] | None = None, groups_label: str | None = None, per_factor: bool | None = None) DataFrame#

Calculate the variance explained estimates for each factor in each view and/or group. Allow also for predefined groups

  • factors (optional) – List of factors to consider (default is None, all factors)

  • groups (optional) – List of groups to consider (default is None, all groups)

  • views (optional) – List of views to consider (default is None, all views)

  • groups_label (optional) – Group label to split samples by (default is None)

  • per_factor (optional) – If calculate R2 per factor or for all factors (default)


Close the connection to the HDF5 file

fetch_values(variables: str | List[str], unique: bool = True)#

Fetch metadata column, factors, or feature values as well as covariates. Shorthand to get_data, get_factors, metadata, and covariates calls.


variables (str) – Features, metadata columns, or factors (FactorN) to fetch. For MEFISTO models with covariates, covariates are accepted such as ‘covariate1’ or ‘covariate1_transformed’.


Get the cell metadata table (cell ID and its respective group)


groups (optional) – List of groups to consider

get_data(views: str | int | None = None, features: str | List[str] | None = None, groups: int | List[int] | str | List[str] | None = None, df: bool = False)#

Fetch the training data

  • view (optional) – view to consider

  • features (optional) – Features to consider (from one view)

  • groups (optional) – groups to consider

  • df (optional) – Boolean value if to return Y matrix as a DataFrame

get_factors(groups: int | List[int] | str | List[str] | None = None, factors: int | List[int] | str | List[str] | None = None, df: bool = False, concatenate_groups: bool = True, scale: bool = False, absolute_values: bool = False)#

Get the matrix with factors as a NumPy array or as a DataFrame (df=True).

  • groups (optional) – List of groups to consider

  • factors (optional) – Indices of factors to consider

  • df (optional) – Boolean value if to return the factor matrix Z as a (wide) pd.DataFrame

  • concatenate_groups (optional) – If concatenate Z matrices (True by default)

  • scale (optional) – If return values scaled to zero mean and unit variance (per factor when concatenated or per factor and per group otherwise)

  • absolute_values (optional) – If return absolute values for weights


Get the features metadata table (feature name and its respective view)


views (optional) – List of views to consider


Get the groups names

get_interpolated_factors(groups: int | List[int] | str | List[str] | None = None, factors: int | List[int] | str | List[str] | None = None, df: bool = False, df_long: bool = False, concatenate_groups: bool = True, scale: bool = False, absolute_values: bool = False)#

Get the matrix with interpolated factors.

If df_long is False, a dictionary with keys (“mean”, “variance”) is returned with NumPy arrays (df=False) or DataFrames (df=True) as values.

If df_long is True, a DataFrame with columns (“new_value”, “factor”, “mean”, “variance”) is returned.

  • groups (optional) – List of groups to consider

  • factors (optional) – Indices of factors to consider

  • df (optional) – Boolean value if to return mean and variance matrices as (wide) DataFrames (can be superseded by df_long=True)

  • df_long (optional) – Boolean value if to return a single long DataFrame (supersedes df=False and concatenate_groups=False)

  • concatenate_groups (optional) – If concatenate Z matrices (True by default, can be superseded by df_long=True)

  • scale (optional) – If return values scaled to zero mean and unit variance (per factor when concatenated or per factor and per group otherwise)

  • absolute_values (optional) – If return absolute values for weights

get_r2(factors: int | List[int] | str | List[str] | None = None, groups: int | List[int] | str | List[str] | None = None, views: int | List[int] | str | List[str] | None = None, groups_label: str | None = None, per_factor: bool | None = None) DataFrame#

Get variance explained (R2) per factor, view, and group. factors : optional

List of factors to consider (all by default)


List of groups to consider (all by default)


List of views to consider (all by default)


Sample (cell) metadata column to be used as group assignment


If compute R2 per factor if it is calculated


Get the sample metadata table (sample ID and its respective group)


groups (optional) – List of groups to consider

get_shape(groups=None, views=None)#

Get the shape of all the data, samples (cells) and features pulled across groups and views.

  • groups (optional) – List of groups to consider

  • views (optional) – List of views to consider

get_top_features(factors: int | List[int] | None = None, views: int | List[int] | str | List[str] | None = None, n_features: int | None = None, clip_threshold: float | None = None, scale: bool = False, absolute_values: bool = False, only_positive: bool = False, only_negative: bool = False, per_view: bool = True, df: bool = False)#

Fetch a list of top feature names

  • factors (optional) – Factors to use (all factors in the model by default)

  • view (options) – The view to get the factor weights for (first view by default)

  • n_features (optional) – Number of features for each factor by their absolute value (10 by default)

  • clip_threshold (optional) – Absolute weight threshold to clip all values to (no threshold by default)

  • absolute_values (optional) – If to fetch absolute weight values

  • only_positive (optional) – If to fetch only positive weights

  • only_negative (optional) – If to fetch only negative weights

  • per_view (optional) – Get n_features per view rather than globally (True by default)

  • df (optional) – Boolean value if to return a DataFrame

get_variance_explained(factors: int | List[int] | str | List[str] | None = None, groups: int | List[int] | str | List[str] | None = None, views: int | List[int] | str | List[str] | None = None) DataFrame#

Get variance explained estimates (R2) for each factor across view(s) and/or group(s).


List of factors to consider (all by default)


List of groups to consider (all by default)


List of views to consider (all by default)


Get the views names

get_views_contributions(scaled: bool = True)#

Project new data onto the factor space of the model.

For the projection, a pseudo-inverse of the weights matrix is calculated and its product with the provided data matrix is calculated.


scaled (bool, optional) – Whether to scale contributions scores per sample so that they sum up to 1 (True by default)


Dataframe with view contribution scores, samples in rows and views in columns

Return type:


get_weights(views: int | List[int] | str | List[str] | None = None, factors: int | List[int] | None = None, df: bool = False, scale: bool = False, concatenate_views: bool = True, absolute_values: bool = False)#

Fetch the weight matrices

  • views (optional) – List of views to consider

  • factors (optional) – Indices of factors to use

  • df (optional) – Boolean value if to return W matrix as a (wide) pd.DataFrame

  • scale (optional) – If return values scaled to zero mean and unit variance (per factor when concatenated or per factor and per view otherwise)

  • concatenate_weights (optional) – If concatenate W matrices (True by default)

  • absolute_values (optional) – If return absolute values for weights

project_data(data, view: str | int | None = None, factors: int | List[int] | str | List[str] | None = None, df: bool = False, feature_intersection: bool = False)#

Project new data onto the factor space of the model.

For the projection, a pseudo-inverse of the weights matrix is calculated and its product with the provided data matrix is calculated.

  • data – Numpy array or Pandas DataFrame with the data matching the number of features

  • view (optional) – A view of the model to consider (first view by default)

  • factors (optional) – Indices of factors to use for the projection (all factors by default)

run_umap(groups: int | List[int] | str | List[str] | None = None, factors: int | List[int] | None = None, n_neighbors: int = 10, min_dist: float = 0.5, spread: float = 1.0, random_state: int = 42, **kwargs) None#

Run UMAP on the factor space

  • n_neighbors (optional) – UMAP parameter: number of neighbors.

  • min_dist – UMAP parameter: the effective minimum distance between embedded points. Smaller values will result in a more clustered/clumped embedding where nearby points on the manifold are drawn closer together, while larger values will result on a more even dispersal of points. The value should be set relative to the spread value, which determines the scale at which embedded points will be spread out.

  • spread – UMAP parameter: the effective scale of embedded points. In combination with min_dist this determines how clustered/clumped the embedded points are.

  • random_state – random seed

Plotting functions#