Parsers¶

Counts¶

Simple Counts¶

class moonstone.parsers.counts.genes.GeneCountsParser(*args, **kwargs)[source]¶

Common way of representing gene counts per sample in a matrix.

Format is the following:

genes	sample_1	sample_2
gene_1	3	19
gene_2	9	10

__init__(*args, **kwargs)[source]¶

Parameters

file_path – path of the input file to be parsed
sep – delimiter to use (same behaviour as read_csv from pandas)
no_header – set to True if table has no header
parsing_options – Extra parsing options for read_csv method, (see pandas documentation)

property dataframe¶

Retrieve the pandas dataframe constructed from the input file.

Return type: DataFrame

property plotter¶: Access to instance dedicated to visualization for this type of data.

class moonstone.parsers.counts.picrust2.Picrust2PathwaysParser(*args, **kwargs)[source]¶

Predicted sample pathway abundances output file from Picrust2.

Format is the following:

pathways	sample_1	sample_2
pathway_1	14.3	123.4
pathway_2	94.1	1232.1

__init__(*args, **kwargs)[source]¶

Parameters

file_path – path of the input file to be parsed
sep – delimiter to use (same behaviour as read_csv from pandas)
no_header – set to True if table has no header
parsing_options – Extra parsing options for read_csv method, (see pandas documentation)

property dataframe¶

Retrieve the pandas dataframe constructed from the input file.

Return type: DataFrame

property plotter¶: Access to instance dedicated to visualization for this type of data.

Taxonomy Counts¶

class moonstone.parsers.counts.taxonomy.kraken2.SunbeamKraken2Parser(*args, **kwargs)[source]¶

Parse output from Kraken2 merge table from Sunbeam pipeline.

PLOT_CLASS¶: alias of moonstone.plot.counts.PlotTaxonomyCounts

__init__(*args, **kwargs)[source]¶

Parameters

file_path – path of the input file to be parsed
sep – delimiter to use (same behaviour as read_csv from pandas)
no_header – set to True if table has no header
parsing_options – Extra parsing options for read_csv method, (see pandas documentation)

property dataframe¶

Retrieve the pandas dataframe constructed from the input file.

Return type: DataFrame

new_otu_id_name = 'NCBI_taxonomy_ID'¶

property plotter¶: Access to instance dedicated to visualization for this type of data.

property rank_level¶: retrieves rank_level

split_taxa_fill_none(df, sep=';', taxo_prefix='__', merge_genus_species=False, terms_to_remove=None)¶

Parameters: terms_to_remove (Optional[List]) – if specified, list of term to remove from taxa names (e.g. uncultured)
Return type: DataFrame

taxa_column = 'Consensus Lineage'¶

taxonomical_names = ['kingdom', 'phylum', 'class', 'order', 'family', 'genus', 'species', 'sTrain']¶

class moonstone.parsers.counts.taxonomy.metaphlan.BaseMetaphlanParser(*args, analysis_type='rel_ab', **kwargs)[source]¶

PLOT_CLASS¶: alias of moonstone.plot.counts.PlotTaxonomyCounts

__init__(*args, analysis_type='rel_ab', **kwargs)[source]¶

Parameters: analysis_type (str) – output type of Metaphlan3 (see -t option of metaphlan3)

compare_difference_between_two_levels(whole_df, df_at_lower_level, rank)[source]¶

Return type: DataFrame

property dataframe¶

Retrieve the pandas dataframe constructed from the input file.

Return type: DataFrame

property plotter¶: Access to instance dedicated to visualization for this type of data.

property rank_level¶: retrieves rank_level

remove_duplicates(df)[source]¶

Return type: DataFrame

rows_differences(dataframe1, dataframe2)[source]¶

Return type: DataFrame

split_taxa_fill_none(df, sep=';', taxo_prefix='__', merge_genus_species=False, terms_to_remove=None)¶

Parameters: terms_to_remove (Optional[List]) – if specified, list of term to remove from taxa names (e.g. uncultured)
Return type: DataFrame

taxa_column = 'OTU ID'¶

taxonomical_names = ['kingdom', 'phylum', 'class', 'order', 'family', 'genus', 'species', 'sTrain']¶

class moonstone.parsers.counts.taxonomy.metaphlan.Metaphlan2Parser(*args, analysis_type='rel_ab', **kwargs)[source]¶

Parse output from Metaphlan2 merged table.

PLOT_CLASS¶: alias of moonstone.plot.counts.PlotTaxonomyCounts

__init__(*args, analysis_type='rel_ab', **kwargs)¶

Parameters: analysis_type (str) – output type of Metaphlan3 (see -t option of metaphlan3)

compare_difference_between_two_levels(whole_df, df_at_lower_level, rank)¶

Return type: DataFrame

property dataframe¶

Retrieve the pandas dataframe constructed from the input file.

Return type: DataFrame

header: Union[str, None]¶

property plotter¶: Access to instance dedicated to visualization for this type of data.

property rank_level¶: retrieves rank_level

remove_duplicates(df)¶

Return type: DataFrame

rows_differences(dataframe1, dataframe2)¶

Return type: DataFrame

split_taxa_fill_none(df, sep=';', taxo_prefix='__', merge_genus_species=False, terms_to_remove=None)¶

Parameters: terms_to_remove (Optional[List]) – if specified, list of term to remove from taxa names (e.g. uncultured)
Return type: DataFrame

taxa_column = 'ID'¶

taxonomical_names = ['kingdom', 'phylum', 'class', 'order', 'family', 'genus', 'species', 'sTrain']¶

class moonstone.parsers.counts.taxonomy.metaphlan.Metaphlan3Parser(*args, analysis_type='rel_ab', **kwargs)[source]¶

Parse output from Metaphlan3 merged table.

NCBI_tax_column = 'NCBI_tax_id'¶

PLOT_CLASS¶: alias of moonstone.plot.counts.PlotTaxonomyCounts

__init__(*args, analysis_type='rel_ab', **kwargs)[source]¶

Parameters: analysis_type (str) – output type of Metaphlan3 (see -t option of metaphlan3)

compare_difference_between_two_levels(whole_df, df_at_lower_level, rank)¶

Return type: DataFrame

property dataframe¶

Retrieve the pandas dataframe constructed from the input file.

Return type: DataFrame

header: Union[str, None]¶

property plotter¶: Access to instance dedicated to visualization for this type of data.

property rank_level¶: retrieves rank_level

remove_duplicates(df)¶

Return type: DataFrame

rows_differences(dataframe1, dataframe2)¶

Return type: DataFrame

split_taxa_fill_none(df, sep=';', taxo_prefix='__', merge_genus_species=False, terms_to_remove=None)¶

Parameters: terms_to_remove (Optional[List]) – if specified, list of term to remove from taxa names (e.g. uncultured)
Return type: DataFrame

taxa_column = 'clade_name'¶

taxonomical_names = ['kingdom', 'phylum', 'class', 'order', 'family', 'genus', 'species', 'sTrain']¶

class moonstone.parsers.counts.taxonomy.qiime.Qiime2Parser(*args, **kwargs)[source]¶

Parse output csv data obtained by Qiime2.

PLOT_CLASS¶: alias of moonstone.plot.counts.PlotTaxonomyCounts

__init__(*args, **kwargs)[source]¶

Parameters

file_path – path of the input file to be parsed
sep – delimiter to use (same behaviour as read_csv from pandas)
no_header – set to True if table has no header
parsing_options – Extra parsing options for read_csv method, (see pandas documentation)

property dataframe¶

Retrieve the pandas dataframe constructed from the input file.

Return type: DataFrame

property plotter¶: Access to instance dedicated to visualization for this type of data.

property rank_level¶: retrieves rank_level

split_taxa_fill_none(df, sep=';', taxo_prefix='__', merge_genus_species=False, terms_to_remove=None)¶

Parameters: terms_to_remove (Optional[List]) – if specified, list of term to remove from taxa names (e.g. uncultured)
Return type: DataFrame

taxa_column = '#OTU ID'¶

taxonomical_names = ['kingdom', 'phylum', 'class', 'order', 'family', 'genus', 'species', 'sTrain']¶

terms_to_remove = ['Ambiguous_taxa', 'Unknown Family', 'uncultured']¶

Metadata¶

Classes to handle metadata import.

class moonstone.parsers.metadata.MetadataParser(*args, index_col='sample', cleaning_operations=None, **kwargs)[source]¶

Parse metadata file and allows to apply transformations on them (cleaning…).

DEFAULT_COLORSCALE = [[0, 'rgb(166,206,227)'], [0.25, 'rgb(31,120,180)'], [0.45, 'rgb(178,223,138)'], [0.65, 'rgb(51,160,44)'], [0.85, 'rgb(251,154,153)'], [1, 'rgb(227,26,28)']]¶

__init__(*args, index_col='sample', cleaning_operations=None, **kwargs)[source]¶

Parse metadata file and allows to apply transformations on them (cleaning…).

Cleaning operations are based on DataFrameCleaner object that allows to perform transformation operations on different columns.

Format is the following:

{'col_name': [('operation1', 'operation1_options'), ('operation2', 'operation2_options')]}

Parameters

index_col (str) – name of the column used as dataframe index
cleaning_operations (Optional[dict]) – cleaning operations to apply to the input table

property dataframe¶

Retrieve the pandas dataframe constructed from the input file.

Return type: DataFrame

get_stats()[source]¶

Retrieve statistics about each columns.

Return type: List[Dict]
Returns: list of dict containing statistics about each column

property plotter¶: Access to instance dedicated to visualization for this type of data.

visualize_categories(categories, color_by, colorscale=None, title='Metadata categories distribution', output_file='')[source]¶

Visualize category metadata with parallel categories diagram.

Parameters

categories (list) – list of column to display
color_by (str) – perform coloration on the given category

class moonstone.parsers.metadata.YAMLBasedMetadataParser(metadata_file_path, config_file_path, **kwargs)[source]¶

Metadata Parser with operations configured in a YAML file.

__init__(metadata_file_path, config_file_path, **kwargs)[source]¶: Metadata Parser with operations configured in a YAML file.