Filtering

Simple

class moonstone.filtering.basics_filtering.NaNPercentageFiltering(dataframe, percentage=80, axis=0)[source]

Remove rows (default) or columns with a percentage of NaN values above a given percentage.

__init__(dataframe, percentage=80, axis=0)[source]
Parameters
  • percentage (Union[int, float]) – maximum percentage of NaN values allowed (between 0 and 100)

  • axis (int) – axis to apply filtering (index (0) or columns(1))

filter()[source]

method that filters the items on your pandas dataframe.

Return type

DataFrame

property filtered_df

retrieves the filtered pandas dataframe

Return type

DataFrame

generate_report_data()

Overload this method to perform data reporting in child classes

Return type

dict

property report_data
Return type

dict

visualize(write_file=False)

Generate visualization for the module

class moonstone.filtering.basics_filtering.NamesFiltering(dataframe, names, axis=0, keep=True)[source]

Filtering based on row (default) or column names.

__init__(dataframe, names, axis=0, keep=True)[source]
Parameters
  • names (List[str]) – list of row or column names

  • axis (int) – axis to apply filtering (index (0) or columns(1))

  • keep (bool) – keep column (discard them if set to False)

filter()[source]

method that filters the items on your pandas dataframe.

Return type

DataFrame

property filtered_df

retrieves the filtered pandas dataframe

Return type

DataFrame

generate_report_data()

Overload this method to perform data reporting in child classes

Return type

dict

property report_data
Return type

dict

visualize(write_file=False)

Generate visualization for the module

class moonstone.filtering.basics_filtering.NoCountsFiltering(dataframe, axis=0)[source]

Remove rows (default) or columns with no counts.

__init__(dataframe, axis=0)
Parameters

axis (int) – axis to apply filtering (index (0) or columns(1))

filter()[source]

method that filters the items on your pandas dataframe.

Return type

DataFrame

property filtered_df

retrieves the filtered pandas dataframe

Return type

DataFrame

generate_report_data()

Overload this method to perform data reporting in child classes

Return type

dict

property report_data
Return type

dict

visualize(write_file=False)

Generate visualization for the module

class moonstone.filtering.basics_filtering.NumberOfDifferentValuesFiltering(dataframe, min=None, max=None, na=False, axis=0)[source]

Filtering of rows (default) or columns based on the number of different (unique) values they hold.

__init__(dataframe, min=None, max=None, na=False, axis=0)[source]
Parameters
  • min (Optional[int]) – minimum number of different values accepted

  • max (Optional[int]) – maximum number of different values accepted

  • na (bool) – NaN values counted as a different value or not

  • axis (int) – axis to apply filtering (index (0) or columns(1))

filter()[source]

method that filters the items on your pandas dataframe.

Return type

DataFrame

property filtered_df

retrieves the filtered pandas dataframe

Return type

DataFrame

generate_report_data()

Overload this method to perform data reporting in child classes

Return type

dict

property report_data
Return type

dict

visualize(write_file=False)

Generate visualization for the module

By mean

class moonstone.filtering.mean_filtering.MeanFiltering(dataframe, threshold=None, percentage_to_keep=90)[source]

Remove items with a mean read count below a given threshold.

You can either give a mean read count threshold or the percentage of data that you wish to keep (the threshold will then be computed for you).

__init__(dataframe, threshold=None, percentage_to_keep=90)[source]
Parameters
  • threshold (Optional[float]) – mean read count threshold, when not specified the threshold is therefore computed based on percentage_to_keep

  • percentage_to_keep (Union[int, float]) – percentage of read you wish to keep, between 0 and 100, overridden if threshold is set

compute_threshold_best_n_percent()[source]

method that computes a threshold based on the percentage of read to keep. This method is called in the method filter() when no threshold is given

Return type

float

filter()[source]

method that filters the items on your pandas dataframe.

Return type

DataFrame

property filtered_df

retrieves the filtered pandas dataframe

Return type

DataFrame

generate_report_data()[source]

method that generates a report summurazing the filtering on the data (parameters, results)

Return type

dict

property report_data
Return type

dict

visualize(html_output_file='')[source]

method to visualize the filtering on the data

Parameters

html_output_file (str) – name of the html output file

Taxonomy

class moonstone.filtering.taxonomy_filtering.TaxonomyMeanFiltering(dataframe, mean_value, level='species')[source]

Filtering a Taxonomy multiindexed dataframe on sample mean at a chosen level.

This means you select a mean value for all samples and it will discard all selected taxonomy below this mean.

__init__(dataframe, mean_value, level='species')[source]
Parameters
  • mean_value (float) – mean among all samples to be kept

  • level (str) – level of the MultiIndex to filter on

filter()[source]

method that filters the items on your pandas dataframe.

Return type

DataFrame

property filtered_df

retrieves the filtered pandas dataframe

Return type

DataFrame

generate_report_data()

Overload this method to perform data reporting in child classes

Return type

dict

property report_data
Return type

dict

visualize(write_file=False)

Generate visualization for the module

class moonstone.filtering.taxonomy_filtering.TaxonomyNamesFiltering(dataframe, names, level='species', keep=True)[source]

Filtering a Taxonomy multiindexed dataframe on index names at a chosen level.

__init__(dataframe, names, level='species', keep=True)[source]
Parameters
  • names (List[str]) – list of index names

  • level (str) – level of the MultiIndex to filter on

  • keep (bool) – keep column (discard them if set to False)

filter()[source]

method that filters the items on your pandas dataframe.

Return type

DataFrame

property filtered_df

retrieves the filtered pandas dataframe

Return type

DataFrame

generate_report_data()

Overload this method to perform data reporting in child classes

Return type

dict

property report_data
Return type

dict

visualize(write_file=False)

Generate visualization for the module