Normalization

Counts

class moonstone.normalization.counts.random_selection.RandomSelection(df, threshold=None, random_seed=2935)[source]

Randomly select a given number of counts (threshold) among all different items (genes, taxonomical annotation…) for each sample. Random selection takes into account the initial counts to influence the probability of picking one item or another.

__init__(df, threshold=None, random_seed=2935)[source]
Parameters
  • threshold (Optional[int]) – total number of counts to pick by sample

  • random_seed (int) – random seed to use for random picking of counts

generate_report_data()

Overload this method to perform data reporting in child classes

Return type

dict

normalize()[source]

Method to perform normalization

Return type

DataFrame

property normalized_df
property report_data
Return type

dict

visualize(write_file=False)

Generate visualization for the module

class moonstone.normalization.counts.random_selection.TaxonomyRandomSelection(df, concat_char=';', *args, **kwargs)[source]

Allow random selection for taxonomy multi-indexed dataframes.

__init__(df, concat_char=';', *args, **kwargs)[source]
Parameters
  • threshold – total number of counts to pick by sample

  • random_seed – random seed to use for random picking of counts

generate_report_data()

Overload this method to perform data reporting in child classes

Return type

dict

normalize()[source]

Method to perform normalization

Return type

DataFrame

property normalized_df
property report_data
Return type

dict

visualize(write_file=False)

Generate visualization for the module

class moonstone.normalization.counts.geometric_mean.GeometricMeanNormalization(df, log_number=2.718281828459045, zero_threshold=80, normalization_level=None, replace_0_to_1=False)[source]

normalization based on the one performed by DeSeq2.

info: https://hbctraining.github.io/DGE_workshop/lessons/02_DGE_count_normalization.html

__init__(df, log_number=2.718281828459045, zero_threshold=80, normalization_level=None, replace_0_to_1=False)[source]
Parameters

normalization_level – At which level of a multi-index you want the normalization to be perfomed

calculating_and_substracting_mean_row(df)[source]

Substracting the mean row to original values

generate_report_data()

Overload this method to perform data reporting in child classes

Return type

dict

log_df(df)[source]
non_zero_df(df)[source]

This method removes rows with 0 reads

normalize()[source]

Method to perform normalization

property normalized_df
remove_zero_and_apply_log(df)[source]
property removed_zero_df

gives the dataframe with the rows that were removed for having too many zeros. this attribute is computed during the non_zero_df function

property report_data
Return type

dict

property scaling_factors
visualize(write_file=False)

Generate visualization for the module

class moonstone.normalization.counts.total_counts.TotalCountsNormalization(dataframe)[source]

normalization based on total counts.

__init__(dataframe)

Initialize self. See help(type(self)) for accurate signature.

generate_report_data()

Overload this method to perform data reporting in child classes

Return type

dict

normalize()[source]

Method to perform normalization

property normalized_df
property report_data
Return type

dict

property scaling_factors
visualize(write_file=False)

Generate visualization for the module

Processed

class moonstone.normalization.processed.scaling_normalization.StandardScaler(raw_x)[source]

ML algorithms such as SVM assume that all features are centered around zero and have similar variance. Scikit-learn module preprocessing.scale performs this normalization on a single array. More info at : https://scikit-learn.org/stable/modules/preprocessing.html :return:

__init__(raw_x)[source]

Initialize self. See help(type(self)) for accurate signature.

scale()[source]

Takes a NumPy array of the independent variables, or features, as ‘x’ for ML training.

property scaled_x