Normalization¶

Counts¶

class moonstone.normalization.counts.random_selection.RandomSelection(df, threshold=None, random_seed=2935)[source]¶

Randomly select a given number of counts (threshold) among all different items (genes, taxonomical annotation…) for each sample. Random selection takes into account the initial counts to influence the probability of picking one item or another.

__init__(df, threshold=None, random_seed=2935)[source]¶

Parameters

threshold (Optional[int]) – total number of counts to pick by sample
random_seed (int) – random seed to use for random picking of counts

generate_report_data()¶

Overload this method to perform data reporting in child classes

Return type: dict

normalize()[source]¶

Method to perform normalization

Return type: DataFrame

property normalized_df¶

property report_data¶

Return type: dict

visualize(write_file=False)¶: Generate visualization for the module

class moonstone.normalization.counts.random_selection.TaxonomyRandomSelection(df, concat_char=';', *args, **kwargs)[source]¶

Allow random selection for taxonomy multi-indexed dataframes.

__init__(df, concat_char=';', *args, **kwargs)[source]¶

Parameters

threshold – total number of counts to pick by sample
random_seed – random seed to use for random picking of counts

generate_report_data()¶

Overload this method to perform data reporting in child classes

Return type: dict

normalize()[source]¶

Method to perform normalization

Return type: DataFrame

property normalized_df¶

property report_data¶

Return type: dict

visualize(write_file=False)¶: Generate visualization for the module

class moonstone.normalization.counts.geometric_mean.GeometricMeanNormalization(df, log_number=2.718281828459045, zero_threshold=80, normalization_level=None, replace_0_to_1=False)[source]¶

normalization based on the one performed by DeSeq2.

info: https://hbctraining.github.io/DGE_workshop/lessons/02_DGE_count_normalization.html

__init__(df, log_number=2.718281828459045, zero_threshold=80, normalization_level=None, replace_0_to_1=False)[source]¶

Parameters: normalization_level – At which level of a multi-index you want the normalization to be perfomed

calculating_and_substracting_mean_row(df)[source]¶: Substracting the mean row to original values

generate_report_data()¶

Overload this method to perform data reporting in child classes

Return type: dict

log_df(df)[source]¶

non_zero_df(df)[source]¶: This method removes rows with 0 reads

normalize()[source]¶: Method to perform normalization

property normalized_df¶

remove_zero_and_apply_log(df)[source]¶

property removed_zero_df¶: gives the dataframe with the rows that were removed for having too many zeros. this attribute is computed during the non_zero_df function

property report_data¶

Return type: dict

property scaling_factors¶

visualize(write_file=False)¶: Generate visualization for the module

class moonstone.normalization.counts.total_counts.TotalCountsNormalization(dataframe)[source]¶

normalization based on total counts.

__init__(dataframe)¶: Initialize self. See help(type(self)) for accurate signature.

generate_report_data()¶

Overload this method to perform data reporting in child classes

Return type: dict

normalize()[source]¶: Method to perform normalization

property normalized_df¶

property report_data¶

Return type: dict

property scaling_factors¶

visualize(write_file=False)¶: Generate visualization for the module

Processed¶

class moonstone.normalization.processed.scaling_normalization.StandardScaler(raw_x)[source]¶

ML algorithms such as SVM assume that all features are centered around zero and have similar variance. Scikit-learn module preprocessing.scale performs this normalization on a single array. More info at : https://scikit-learn.org/stable/modules/preprocessing.html :return:

__init__(raw_x)[source]¶: Initialize self. See help(type(self)) for accurate signature.

scale()[source]¶: Takes a NumPy array of the independent variables, or features, as ‘x’ for ML training.

property scaled_x¶