Parsers

Counts

Simple Counts

class moonstone.parsers.counts.genes.GeneCountsParser(*args, **kwargs)[source]

Common way of representing gene counts per sample in a matrix.

Format is the following:

genes

sample_1

sample_2

gene_1

3

19

gene_2

9

10

__init__(*args, **kwargs)[source]
Parameters
  • file_path – path of the input file to be parsed

  • sep – delimiter to use (same behaviour as read_csv from pandas)

  • no_header – set to True if table has no header

  • parsing_options – Extra parsing options for read_csv method, (see pandas documentation)

property dataframe

Retrieve the pandas dataframe constructed from the input file.

Return type

DataFrame

property plotter

Access to instance dedicated to visualization for this type of data.

class moonstone.parsers.counts.picrust2.Picrust2PathwaysParser(*args, **kwargs)[source]

Predicted sample pathway abundances output file from Picrust2.

Format is the following:

pathways

sample_1

sample_2

pathway_1

14.3

123.4

pathway_2

94.1

1232.1

__init__(*args, **kwargs)[source]
Parameters
  • file_path – path of the input file to be parsed

  • sep – delimiter to use (same behaviour as read_csv from pandas)

  • no_header – set to True if table has no header

  • parsing_options – Extra parsing options for read_csv method, (see pandas documentation)

property dataframe

Retrieve the pandas dataframe constructed from the input file.

Return type

DataFrame

property plotter

Access to instance dedicated to visualization for this type of data.

Taxonomy Counts

class moonstone.parsers.counts.taxonomy.kraken2.SunbeamKraken2Parser(*args, **kwargs)[source]

Parse output from Kraken2 merge table from Sunbeam pipeline.

PLOT_CLASS

alias of moonstone.plot.counts.PlotTaxonomyCounts

__init__(*args, **kwargs)[source]
Parameters
  • file_path – path of the input file to be parsed

  • sep – delimiter to use (same behaviour as read_csv from pandas)

  • no_header – set to True if table has no header

  • parsing_options – Extra parsing options for read_csv method, (see pandas documentation)

property dataframe

Retrieve the pandas dataframe constructed from the input file.

Return type

DataFrame

new_otu_id_name = 'NCBI_taxonomy_ID'
property plotter

Access to instance dedicated to visualization for this type of data.

property rank_level

retrieves rank_level

split_taxa_fill_none(df, sep=';', taxo_prefix='__', merge_genus_species=False, terms_to_remove=None)
Parameters

terms_to_remove (Optional[List]) – if specified, list of term to remove from taxa names (e.g. uncultured)

Return type

DataFrame

taxa_column = 'Consensus Lineage'
taxonomical_names = ['kingdom', 'phylum', 'class', 'order', 'family', 'genus', 'species', 'sTrain']
class moonstone.parsers.counts.taxonomy.metaphlan.BaseMetaphlanParser(*args, analysis_type='rel_ab', **kwargs)[source]
PLOT_CLASS

alias of moonstone.plot.counts.PlotTaxonomyCounts

__init__(*args, analysis_type='rel_ab', **kwargs)[source]
Parameters

analysis_type (str) – output type of Metaphlan3 (see -t option of metaphlan3)

compare_difference_between_two_levels(whole_df, df_at_lower_level, rank)[source]
Return type

DataFrame

property dataframe

Retrieve the pandas dataframe constructed from the input file.

Return type

DataFrame

property plotter

Access to instance dedicated to visualization for this type of data.

property rank_level

retrieves rank_level

remove_duplicates(df)[source]
Return type

DataFrame

rows_differences(dataframe1, dataframe2)[source]
Return type

DataFrame

split_taxa_fill_none(df, sep=';', taxo_prefix='__', merge_genus_species=False, terms_to_remove=None)
Parameters

terms_to_remove (Optional[List]) – if specified, list of term to remove from taxa names (e.g. uncultured)

Return type

DataFrame

taxa_column = 'OTU ID'
taxonomical_names = ['kingdom', 'phylum', 'class', 'order', 'family', 'genus', 'species', 'sTrain']
class moonstone.parsers.counts.taxonomy.metaphlan.Metaphlan2Parser(*args, analysis_type='rel_ab', **kwargs)[source]

Parse output from Metaphlan2 merged table.

PLOT_CLASS

alias of moonstone.plot.counts.PlotTaxonomyCounts

__init__(*args, analysis_type='rel_ab', **kwargs)
Parameters

analysis_type (str) – output type of Metaphlan3 (see -t option of metaphlan3)

compare_difference_between_two_levels(whole_df, df_at_lower_level, rank)
Return type

DataFrame

property dataframe

Retrieve the pandas dataframe constructed from the input file.

Return type

DataFrame

header: Union[str, None]
property plotter

Access to instance dedicated to visualization for this type of data.

property rank_level

retrieves rank_level

remove_duplicates(df)
Return type

DataFrame

rows_differences(dataframe1, dataframe2)
Return type

DataFrame

split_taxa_fill_none(df, sep=';', taxo_prefix='__', merge_genus_species=False, terms_to_remove=None)
Parameters

terms_to_remove (Optional[List]) – if specified, list of term to remove from taxa names (e.g. uncultured)

Return type

DataFrame

taxa_column = 'ID'
taxonomical_names = ['kingdom', 'phylum', 'class', 'order', 'family', 'genus', 'species', 'sTrain']
class moonstone.parsers.counts.taxonomy.metaphlan.Metaphlan3Parser(*args, analysis_type='rel_ab', **kwargs)[source]

Parse output from Metaphlan3 merged table.

NCBI_tax_column = 'NCBI_tax_id'
PLOT_CLASS

alias of moonstone.plot.counts.PlotTaxonomyCounts

__init__(*args, analysis_type='rel_ab', **kwargs)[source]
Parameters

analysis_type (str) – output type of Metaphlan3 (see -t option of metaphlan3)

compare_difference_between_two_levels(whole_df, df_at_lower_level, rank)
Return type

DataFrame

property dataframe

Retrieve the pandas dataframe constructed from the input file.

Return type

DataFrame

header: Union[str, None]
property plotter

Access to instance dedicated to visualization for this type of data.

property rank_level

retrieves rank_level

remove_duplicates(df)
Return type

DataFrame

rows_differences(dataframe1, dataframe2)
Return type

DataFrame

split_taxa_fill_none(df, sep=';', taxo_prefix='__', merge_genus_species=False, terms_to_remove=None)
Parameters

terms_to_remove (Optional[List]) – if specified, list of term to remove from taxa names (e.g. uncultured)

Return type

DataFrame

taxa_column = 'clade_name'
taxonomical_names = ['kingdom', 'phylum', 'class', 'order', 'family', 'genus', 'species', 'sTrain']
class moonstone.parsers.counts.taxonomy.qiime.Qiime2Parser(*args, **kwargs)[source]

Parse output csv data obtained by Qiime2.

PLOT_CLASS

alias of moonstone.plot.counts.PlotTaxonomyCounts

__init__(*args, **kwargs)[source]
Parameters
  • file_path – path of the input file to be parsed

  • sep – delimiter to use (same behaviour as read_csv from pandas)

  • no_header – set to True if table has no header

  • parsing_options – Extra parsing options for read_csv method, (see pandas documentation)

property dataframe

Retrieve the pandas dataframe constructed from the input file.

Return type

DataFrame

property plotter

Access to instance dedicated to visualization for this type of data.

property rank_level

retrieves rank_level

split_taxa_fill_none(df, sep=';', taxo_prefix='__', merge_genus_species=False, terms_to_remove=None)
Parameters

terms_to_remove (Optional[List]) – if specified, list of term to remove from taxa names (e.g. uncultured)

Return type

DataFrame

taxa_column = '#OTU ID'
taxonomical_names = ['kingdom', 'phylum', 'class', 'order', 'family', 'genus', 'species', 'sTrain']
terms_to_remove = ['Ambiguous_taxa', 'Unknown Family', 'uncultured']

Metadata

Classes to handle metadata import.

class moonstone.parsers.metadata.MetadataParser(*args, index_col='sample', cleaning_operations=None, **kwargs)[source]

Parse metadata file and allows to apply transformations on them (cleaning…).

DEFAULT_COLORSCALE = [[0, 'rgb(166,206,227)'], [0.25, 'rgb(31,120,180)'], [0.45, 'rgb(178,223,138)'], [0.65, 'rgb(51,160,44)'], [0.85, 'rgb(251,154,153)'], [1, 'rgb(227,26,28)']]
__init__(*args, index_col='sample', cleaning_operations=None, **kwargs)[source]

Parse metadata file and allows to apply transformations on them (cleaning…).

Cleaning operations are based on DataFrameCleaner object that allows to perform transformation operations on different columns.

Format is the following:

{'col_name': [('operation1', 'operation1_options'), ('operation2', 'operation2_options')]}
Parameters
  • index_col (str) – name of the column used as dataframe index

  • cleaning_operations (Optional[dict]) – cleaning operations to apply to the input table

property dataframe

Retrieve the pandas dataframe constructed from the input file.

Return type

DataFrame

get_stats()[source]

Retrieve statistics about each columns.

Return type

List[Dict]

Returns

list of dict containing statistics about each column

property plotter

Access to instance dedicated to visualization for this type of data.

visualize_categories(categories, color_by, colorscale=None, title='Metadata categories distribution', output_file='')[source]

Visualize category metadata with parallel categories diagram.

Parameters
  • categories (list) – list of column to display

  • color_by (str) – perform coloration on the given category

class moonstone.parsers.metadata.YAMLBasedMetadataParser(metadata_file_path, config_file_path, **kwargs)[source]

Metadata Parser with operations configured in a YAML file.

__init__(metadata_file_path, config_file_path, **kwargs)[source]

Metadata Parser with operations configured in a YAML file.