mofax documentation#

Indices and tables#

MOFA model#

class mofax.mofa_model(filepath, mode='r')#

Class around HDF5-based model on disk.

This class is a thin wrapper for the HDF5 file where the trained MOFA+ model is stored. It also provides utility functions to get factors, weights, features, and samples (cells) info in the form of Pandas dataframes, and data as a NumPy array.

Calculate the variance explained estimates for each factor in each view and/or group. Allow also for predefined groups

Parameters:

factors (optional) – List of factors to consider (default is None, all factors)
groups (optional) – List of groups to consider (default is None, all groups)
views (optional) – List of views to consider (default is None, all views)
groups_label (optional) – Group label to split samples by (default is None)
per_factor (optional) – If calculate R2 per factor or for all factors (default)

close()#: Close the connection to the HDF5 file

fetch_values(variables: str | List[str], unique: bool = True)#

Fetch metadata column, factors, or feature values as well as covariates. Shorthand to get_data, get_factors, metadata, and covariates calls.

Parameters:: variables (str) – Features, metadata columns, or factors (FactorN) to fetch. For MEFISTO models with covariates, covariates are accepted such as ‘covariate1’ or ‘covariate1_transformed’.

get_cells(groups=None)#

Get the cell metadata table (cell ID and its respective group)

Parameters:: groups (optional) – List of groups to consider

Fetch the training data

Parameters:

view (optional) – view to consider
features (optional) – Features to consider (from one view)
groups (optional) – groups to consider
df (optional) – Boolean value if to return Y matrix as a DataFrame

Get the matrix with factors as a NumPy array or as a DataFrame (df=True).

Parameters:

groups (optional) – List of groups to consider
factors (optional) – Indices of factors to consider
df (optional) – Boolean value if to return the factor matrix Z as a (wide) pd.DataFrame
concatenate_groups (optional) – If concatenate Z matrices (True by default)
scale (optional) – If return values scaled to zero mean and unit variance (per factor when concatenated or per factor and per group otherwise)
absolute_values (optional) – If return absolute values for weights

get_features(views=None)#

Get the features metadata table (feature name and its respective view)

Parameters:: views (optional) – List of views to consider

get_groups()#: Get the groups names

Get the matrix with interpolated factors.

If df_long is False, a dictionary with keys (“mean”, “variance”) is returned with NumPy arrays (df=False) or DataFrames (df=True) as values.

If df_long is True, a DataFrame with columns (“new_value”, “factor”, “mean”, “variance”) is returned.

Parameters:

groups (optional) – List of groups to consider
factors (optional) – Indices of factors to consider
df (optional) – Boolean value if to return mean and variance matrices as (wide) DataFrames (can be superseded by df_long=True)
df_long (optional) – Boolean value if to return a single long DataFrame (supersedes df=False and concatenate_groups=False)
concatenate_groups (optional) – If concatenate Z matrices (True by default, can be superseded by df_long=True)
scale (optional) – If return values scaled to zero mean and unit variance (per factor when concatenated or per factor and per group otherwise)
absolute_values (optional) – If return absolute values for weights

Get variance explained (R2) per factor, view, and group. factors : optional

List of factors to consider (all by default)

groupsoptional: List of groups to consider (all by default)
viewsoptional: List of views to consider (all by default)
groups_labeloptional: Sample (cell) metadata column to be used as group assignment
per_factoroptional: If compute R2 per factor if it is calculated

get_samples(groups=None)#

Get the sample metadata table (sample ID and its respective group)

Parameters:: groups (optional) – List of groups to consider

get_shape(groups=None, views=None)#

Get the shape of all the data, samples (cells) and features pulled across groups and views.

Parameters:

groups (optional) – List of groups to consider
views (optional) – List of views to consider

Fetch a list of top feature names

Parameters:

factors (optional) – Factors to use (all factors in the model by default)
view (options) – The view to get the factor weights for (first view by default)
n_features (optional) – Number of features for each factor by their absolute value (10 by default)
clip_threshold (optional) – Absolute weight threshold to clip all values to (no threshold by default)
absolute_values (optional) – If to fetch absolute weight values
only_positive (optional) – If to fetch only positive weights
only_negative (optional) – If to fetch only negative weights
per_view (optional) – Get n_features per view rather than globally (True by default)
df (optional) – Boolean value if to return a DataFrame

Get variance explained estimates (R2) for each factor across view(s) and/or group(s).

factorsoptional: List of factors to consider (all by default)
groupsoptional: List of groups to consider (all by default)
viewsoptional: List of views to consider (all by default)

get_views()#: Get the views names

get_views_contributions(scaled: bool = True)#

Project new data onto the factor space of the model.

For the projection, a pseudo-inverse of the weights matrix is calculated and its product with the provided data matrix is calculated.

Parameters:: scaled (bool, optional) – Whether to scale contributions scores per sample so that they sum up to 1 (True by default)
Returns:: Dataframe with view contribution scores, samples in rows and views in columns
Return type:: pd.DataFrame

Fetch the weight matrices

Parameters:

views (optional) – List of views to consider
factors (optional) – Indices of factors to use
df (optional) – Boolean value if to return W matrix as a (wide) pd.DataFrame
scale (optional) – If return values scaled to zero mean and unit variance (per factor when concatenated or per factor and per view otherwise)
concatenate_weights (optional) – If concatenate W matrices (True by default)
absolute_values (optional) – If return absolute values for weights

Project new data onto the factor space of the model.

For the projection, a pseudo-inverse of the weights matrix is calculated and its product with the provided data matrix is calculated.

Parameters:

data – Numpy array or Pandas DataFrame with the data matching the number of features
view (optional) – A view of the model to consider (first view by default)
factors (optional) – Indices of factors to use for the projection (all factors by default)

Run UMAP on the factor space

Parameters:

n_neighbors (optional) – UMAP parameter: number of neighbors.
min_dist – UMAP parameter: the effective minimum distance between embedded points. Smaller values will result in a more clustered/clumped embedding where nearby points on the manifold are drawn closer together, while larger values will result on a more even dispersal of points. The value should be set relative to the spread value, which determines the scale at which embedded points will be spread out.
spread – UMAP parameter: the effective scale of embedded points. In combination with min_dist this determines how clustered/clumped the embedded points are.
random_state – random seed

mofax documentation#

Indices and tables#

MOFA model#

Plotting functions#