Normalization¶
Counts¶
-
class
moonstone.normalization.counts.random_selection.RandomSelection(df, threshold=None, random_seed=2935)[source]¶ Randomly select a given number of counts (threshold) among all different items (genes, taxonomical annotation…) for each sample. Random selection takes into account the initial counts to influence the probability of picking one item or another.
-
__init__(df, threshold=None, random_seed=2935)[source]¶ - Parameters
threshold (
Optional[int]) – total number of counts to pick by samplerandom_seed (
int) – random seed to use for random picking of counts
-
generate_report_data()¶ Overload this method to perform data reporting in child classes
- Return type
dict
-
property
normalized_df¶
-
property
report_data¶ - Return type
dict
-
visualize(write_file=False)¶ Generate visualization for the module
-
-
class
moonstone.normalization.counts.random_selection.TaxonomyRandomSelection(df, concat_char=';', *args, **kwargs)[source]¶ Allow random selection for taxonomy multi-indexed dataframes.
-
__init__(df, concat_char=';', *args, **kwargs)[source]¶ - Parameters
threshold – total number of counts to pick by sample
random_seed – random seed to use for random picking of counts
-
generate_report_data()¶ Overload this method to perform data reporting in child classes
- Return type
dict
-
property
normalized_df¶
-
property
report_data¶ - Return type
dict
-
visualize(write_file=False)¶ Generate visualization for the module
-
-
class
moonstone.normalization.counts.geometric_mean.GeometricMeanNormalization(df, log_number=2.718281828459045, zero_threshold=80, normalization_level=None, replace_0_to_1=False)[source]¶ normalization based on the one performed by DeSeq2.
info: https://hbctraining.github.io/DGE_workshop/lessons/02_DGE_count_normalization.html
-
__init__(df, log_number=2.718281828459045, zero_threshold=80, normalization_level=None, replace_0_to_1=False)[source]¶ - Parameters
normalization_level – At which level of a multi-index you want the normalization to be perfomed
-
generate_report_data()¶ Overload this method to perform data reporting in child classes
- Return type
dict
-
property
normalized_df¶
-
property
removed_zero_df¶ gives the dataframe with the rows that were removed for having too many zeros. this attribute is computed during the non_zero_df function
-
property
report_data¶ - Return type
dict
-
property
scaling_factors¶
-
visualize(write_file=False)¶ Generate visualization for the module
-
-
class
moonstone.normalization.counts.total_counts.TotalCountsNormalization(dataframe)[source]¶ normalization based on total counts.
-
__init__(dataframe)¶ Initialize self. See help(type(self)) for accurate signature.
-
generate_report_data()¶ Overload this method to perform data reporting in child classes
- Return type
dict
-
property
normalized_df¶
-
property
report_data¶ - Return type
dict
-
property
scaling_factors¶
-
visualize(write_file=False)¶ Generate visualization for the module
-
Processed¶
-
class
moonstone.normalization.processed.scaling_normalization.StandardScaler(raw_x)[source]¶ ML algorithms such as SVM assume that all features are centered around zero and have similar variance. Scikit-learn module preprocessing.scale performs this normalization on a single array. More info at : https://scikit-learn.org/stable/modules/preprocessing.html :return:
-
scale()[source]¶ Takes a NumPy array of the independent variables, or features, as ‘x’ for ML training.
-
property
scaled_x¶
-