mofax documentation#
Indices and tables#
MOFA model#
- class mofax.mofa_model(filepath, mode='r')#
Class around HDF5-based model on disk.
This class is a thin wrapper for the HDF5 file where the trained MOFA+ model is stored. It also provides utility functions to get factors, weights, features, and samples (cells) info in the form of Pandas dataframes, and data as a NumPy array.
- calculate_variance_explained(factors: int | List[int] | str | List[str] | None = None, groups: int | List[int] | str | List[str] | None = None, views: int | List[int] | str | List[str] | None = None, groups_label: str | None = None, per_factor: bool | None = None) DataFrame #
Calculate the variance explained estimates for each factor in each view and/or group. Allow also for predefined groups
- Parameters:
factors (optional) – List of factors to consider (default is None, all factors)
groups (optional) – List of groups to consider (default is None, all groups)
views (optional) – List of views to consider (default is None, all views)
groups_label (optional) – Group label to split samples by (default is None)
per_factor (optional) – If calculate R2 per factor or for all factors (default)
- close()#
Close the connection to the HDF5 file
- fetch_values(variables: str | List[str], unique: bool = True)#
Fetch metadata column, factors, or feature values as well as covariates. Shorthand to get_data, get_factors, metadata, and covariates calls.
- Parameters:
variables (str) – Features, metadata columns, or factors (FactorN) to fetch. For MEFISTO models with covariates, covariates are accepted such as ‘covariate1’ or ‘covariate1_transformed’.
- get_cells(groups=None)#
Get the cell metadata table (cell ID and its respective group)
- Parameters:
groups (optional) – List of groups to consider
- get_data(views: str | int | None = None, features: str | List[str] | None = None, groups: int | List[int] | str | List[str] | None = None, df: bool = False)#
Fetch the training data
- Parameters:
view (optional) – view to consider
features (optional) – Features to consider (from one view)
groups (optional) – groups to consider
df (optional) – Boolean value if to return Y matrix as a DataFrame
- get_factors(groups: int | List[int] | str | List[str] | None = None, factors: int | List[int] | str | List[str] | None = None, df: bool = False, concatenate_groups: bool = True, scale: bool = False, absolute_values: bool = False)#
Get the matrix with factors as a NumPy array or as a DataFrame (df=True).
- Parameters:
groups (optional) – List of groups to consider
factors (optional) – Indices of factors to consider
df (optional) – Boolean value if to return the factor matrix Z as a (wide) pd.DataFrame
concatenate_groups (optional) – If concatenate Z matrices (True by default)
scale (optional) – If return values scaled to zero mean and unit variance (per factor when concatenated or per factor and per group otherwise)
absolute_values (optional) – If return absolute values for weights
- get_features(views=None)#
Get the features metadata table (feature name and its respective view)
- Parameters:
views (optional) – List of views to consider
- get_groups()#
Get the groups names
- get_interpolated_factors(groups: int | List[int] | str | List[str] | None = None, factors: int | List[int] | str | List[str] | None = None, df: bool = False, df_long: bool = False, concatenate_groups: bool = True, scale: bool = False, absolute_values: bool = False)#
Get the matrix with interpolated factors.
If df_long is False, a dictionary with keys (“mean”, “variance”) is returned with NumPy arrays (df=False) or DataFrames (df=True) as values.
If df_long is True, a DataFrame with columns (“new_value”, “factor”, “mean”, “variance”) is returned.
- Parameters:
groups (optional) – List of groups to consider
factors (optional) – Indices of factors to consider
df (optional) – Boolean value if to return mean and variance matrices as (wide) DataFrames (can be superseded by df_long=True)
df_long (optional) – Boolean value if to return a single long DataFrame (supersedes df=False and concatenate_groups=False)
concatenate_groups (optional) – If concatenate Z matrices (True by default, can be superseded by df_long=True)
scale (optional) – If return values scaled to zero mean and unit variance (per factor when concatenated or per factor and per group otherwise)
absolute_values (optional) – If return absolute values for weights
- get_r2(factors: int | List[int] | str | List[str] | None = None, groups: int | List[int] | str | List[str] | None = None, views: int | List[int] | str | List[str] | None = None, groups_label: str | None = None, per_factor: bool | None = None) DataFrame #
Get variance explained (R2) per factor, view, and group. factors : optional
List of factors to consider (all by default)
- groupsoptional
List of groups to consider (all by default)
- viewsoptional
List of views to consider (all by default)
- groups_labeloptional
Sample (cell) metadata column to be used as group assignment
- per_factoroptional
If compute R2 per factor if it is calculated
- get_samples(groups=None)#
Get the sample metadata table (sample ID and its respective group)
- Parameters:
groups (optional) – List of groups to consider
- get_shape(groups=None, views=None)#
Get the shape of all the data, samples (cells) and features pulled across groups and views.
- Parameters:
groups (optional) – List of groups to consider
views (optional) – List of views to consider
- get_top_features(factors: int | List[int] | None = None, views: int | List[int] | str | List[str] | None = None, n_features: int | None = None, clip_threshold: float | None = None, scale: bool = False, absolute_values: bool = False, only_positive: bool = False, only_negative: bool = False, per_view: bool = True, df: bool = False)#
Fetch a list of top feature names
- Parameters:
factors (optional) – Factors to use (all factors in the model by default)
view (options) – The view to get the factor weights for (first view by default)
n_features (optional) – Number of features for each factor by their absolute value (10 by default)
clip_threshold (optional) – Absolute weight threshold to clip all values to (no threshold by default)
absolute_values (optional) – If to fetch absolute weight values
only_positive (optional) – If to fetch only positive weights
only_negative (optional) – If to fetch only negative weights
per_view (optional) – Get n_features per view rather than globally (True by default)
df (optional) – Boolean value if to return a DataFrame
- get_variance_explained(factors: int | List[int] | str | List[str] | None = None, groups: int | List[int] | str | List[str] | None = None, views: int | List[int] | str | List[str] | None = None) DataFrame #
Get variance explained estimates (R2) for each factor across view(s) and/or group(s).
- factorsoptional
List of factors to consider (all by default)
- groupsoptional
List of groups to consider (all by default)
- viewsoptional
List of views to consider (all by default)
- get_views()#
Get the views names
- get_views_contributions(scaled: bool = True)#
Project new data onto the factor space of the model.
For the projection, a pseudo-inverse of the weights matrix is calculated and its product with the provided data matrix is calculated.
- Parameters:
scaled (bool, optional) – Whether to scale contributions scores per sample so that they sum up to 1 (True by default)
- Returns:
Dataframe with view contribution scores, samples in rows and views in columns
- Return type:
pd.DataFrame
- get_weights(views: int | List[int] | str | List[str] | None = None, factors: int | List[int] | None = None, df: bool = False, scale: bool = False, concatenate_views: bool = True, absolute_values: bool = False)#
Fetch the weight matrices
- Parameters:
views (optional) – List of views to consider
factors (optional) – Indices of factors to use
df (optional) – Boolean value if to return W matrix as a (wide) pd.DataFrame
scale (optional) – If return values scaled to zero mean and unit variance (per factor when concatenated or per factor and per view otherwise)
concatenate_weights (optional) – If concatenate W matrices (True by default)
absolute_values (optional) – If return absolute values for weights
- project_data(data, view: str | int | None = None, factors: int | List[int] | str | List[str] | None = None, df: bool = False, feature_intersection: bool = False)#
Project new data onto the factor space of the model.
For the projection, a pseudo-inverse of the weights matrix is calculated and its product with the provided data matrix is calculated.
- Parameters:
data – Numpy array or Pandas DataFrame with the data matching the number of features
view (optional) – A view of the model to consider (first view by default)
factors (optional) – Indices of factors to use for the projection (all factors by default)
- run_umap(groups: int | List[int] | str | List[str] | None = None, factors: int | List[int] | None = None, n_neighbors: int = 10, min_dist: float = 0.5, spread: float = 1.0, random_state: int = 42, **kwargs) None #
Run UMAP on the factor space
- Parameters:
n_neighbors (optional) – UMAP parameter: number of neighbors.
min_dist – UMAP parameter: the effective minimum distance between embedded points. Smaller values will result in a more clustered/clumped embedding where nearby points on the manifold are drawn closer together, while larger values will result on a more even dispersal of points. The value should be set relative to the
spread
value, which determines the scale at which embedded points will be spread out.spread – UMAP parameter: the effective scale of embedded points. In combination with min_dist this determines how clustered/clumped the embedded points are.
random_state – random seed