Filtering¶
Simple¶
-
class
moonstone.filtering.basics_filtering.NaNPercentageFiltering(dataframe, percentage=80, axis=0)[source]¶ Remove rows (default) or columns with a percentage of NaN values above a given percentage.
-
__init__(dataframe, percentage=80, axis=0)[source]¶ - Parameters
percentage (
Union[int,float]) – maximum percentage of NaN values allowed (between 0 and 100)axis (
int) – axis to apply filtering (index (0) or columns(1))
-
property
filtered_df¶ retrieves the filtered pandas dataframe
- Return type
DataFrame
-
generate_report_data()¶ Overload this method to perform data reporting in child classes
- Return type
dict
-
property
report_data¶ - Return type
dict
-
visualize(write_file=False)¶ Generate visualization for the module
-
-
class
moonstone.filtering.basics_filtering.NamesFiltering(dataframe, names, axis=0, keep=True)[source]¶ Filtering based on row (default) or column names.
-
__init__(dataframe, names, axis=0, keep=True)[source]¶ - Parameters
names (
List[str]) – list of row or column namesaxis (
int) – axis to apply filtering (index (0) or columns(1))keep (
bool) – keep column (discard them if set to False)
-
property
filtered_df¶ retrieves the filtered pandas dataframe
- Return type
DataFrame
-
generate_report_data()¶ Overload this method to perform data reporting in child classes
- Return type
dict
-
property
report_data¶ - Return type
dict
-
visualize(write_file=False)¶ Generate visualization for the module
-
-
class
moonstone.filtering.basics_filtering.NoCountsFiltering(dataframe, axis=0)[source]¶ Remove rows (default) or columns with no counts.
-
__init__(dataframe, axis=0)¶ - Parameters
axis (
int) – axis to apply filtering (index (0) or columns(1))
-
property
filtered_df¶ retrieves the filtered pandas dataframe
- Return type
DataFrame
-
generate_report_data()¶ Overload this method to perform data reporting in child classes
- Return type
dict
-
property
report_data¶ - Return type
dict
-
visualize(write_file=False)¶ Generate visualization for the module
-
-
class
moonstone.filtering.basics_filtering.NumberOfDifferentValuesFiltering(dataframe, min=None, max=None, na=False, axis=0)[source]¶ Filtering of rows (default) or columns based on the number of different (unique) values they hold.
-
__init__(dataframe, min=None, max=None, na=False, axis=0)[source]¶ - Parameters
min (
Optional[int]) – minimum number of different values acceptedmax (
Optional[int]) – maximum number of different values acceptedna (
bool) – NaN values counted as a different value or notaxis (
int) – axis to apply filtering (index (0) or columns(1))
-
property
filtered_df¶ retrieves the filtered pandas dataframe
- Return type
DataFrame
-
generate_report_data()¶ Overload this method to perform data reporting in child classes
- Return type
dict
-
property
report_data¶ - Return type
dict
-
visualize(write_file=False)¶ Generate visualization for the module
-
By mean¶
-
class
moonstone.filtering.mean_filtering.MeanFiltering(dataframe, threshold=None, percentage_to_keep=90)[source]¶ Remove items with a mean read count below a given threshold.
You can either give a mean read count threshold or the percentage of data that you wish to keep (the threshold will then be computed for you).
-
__init__(dataframe, threshold=None, percentage_to_keep=90)[source]¶ - Parameters
threshold (
Optional[float]) – mean read count threshold, when not specified the threshold is therefore computed based on percentage_to_keeppercentage_to_keep (
Union[int,float]) – percentage of read you wish to keep, between 0 and 100, overridden if threshold is set
-
compute_threshold_best_n_percent()[source]¶ method that computes a threshold based on the percentage of read to keep. This method is called in the method filter() when no threshold is given
- Return type
float
-
property
filtered_df¶ retrieves the filtered pandas dataframe
- Return type
DataFrame
-
generate_report_data()[source]¶ method that generates a report summurazing the filtering on the data (parameters, results)
- Return type
dict
-
property
report_data¶ - Return type
dict
-
Taxonomy¶
-
class
moonstone.filtering.taxonomy_filtering.TaxonomyMeanFiltering(dataframe, mean_value, level='species')[source]¶ Filtering a Taxonomy multiindexed dataframe on sample mean at a chosen level.
This means you select a mean value for all samples and it will discard all selected taxonomy below this mean.
-
__init__(dataframe, mean_value, level='species')[source]¶ - Parameters
mean_value (
float) – mean among all samples to be keptlevel (
str) – level of the MultiIndex to filter on
-
property
filtered_df¶ retrieves the filtered pandas dataframe
- Return type
DataFrame
-
generate_report_data()¶ Overload this method to perform data reporting in child classes
- Return type
dict
-
property
report_data¶ - Return type
dict
-
visualize(write_file=False)¶ Generate visualization for the module
-
-
class
moonstone.filtering.taxonomy_filtering.TaxonomyNamesFiltering(dataframe, names, level='species', keep=True)[source]¶ Filtering a Taxonomy multiindexed dataframe on index names at a chosen level.
-
__init__(dataframe, names, level='species', keep=True)[source]¶ - Parameters
names (
List[str]) – list of index nameslevel (
str) – level of the MultiIndex to filter onkeep (
bool) – keep column (discard them if set to False)
-
property
filtered_df¶ retrieves the filtered pandas dataframe
- Return type
DataFrame
-
generate_report_data()¶ Overload this method to perform data reporting in child classes
- Return type
dict
-
property
report_data¶ - Return type
dict
-
visualize(write_file=False)¶ Generate visualization for the module
-