mclumi.trim package

Submodules

mclumi.trim.Filter module

class mclumi.trim.Filter.filter

Bases: object

method()
paired(x, rule)
Parameters
  • x

  • rule

singleStart(x, start, end)
Parameters
  • x

  • start

  • end

mclumi.trim.Fixed module

class mclumi.trim.Fixed.fixed(mode='external', params=None, verbose=True)

Bases: object

call()

mclumi.trim.Reader module

class mclumi.trim.Reader.reader

Bases: object

pcrnum(x)
Parameters

x

todf(seqs, names)
Parameters
  • seqs

  • names

mclumi.trim.SeqRuleOut module

class mclumi.trim.SeqRuleOut.seqRuleOut(read_summary, verbose=True)

Bases: object

sequential(compo_struct, seq_pos_in_struct)

Notes

Starting positions of all genomic sequences.

Example

rule_out_struct_dict returns all structures before each key of rule_out_struct_dict: {

‘seq_1’: [struct_1, struct_2, …, struct_n], ‘seq_2’: [struct_1, struct_2, …, struct_n], …, ‘seq_m’: [struct_1, struct_2, …, struct_n], } e.g., {‘seq_1’: [‘primer_1’, ‘umi_1’], ‘seq_2’: []} if the structure is ‘primer_1+umi_1+seq_1+seq_2+umi_2+primer_2’ * each key does not count all structures of its preceding keys.

rule_out_rel_len_dict returns accumulated lengths of all structures in the list w.r.t each key of rule_out_struct_dict.

rule_out_accumu_len_dict returns the starting positions of all UMIs.

Parameters
  • compo_struct – 1d list of strings of seq_struct split by +.

  • seq_pos_in_struct – 1d list of indices of compo_struct.

Returns

1d dict

Return type

{umi_1: int, umi_2: int, …, umi_n: int}

mclumi.trim.Template module

class mclumi.trim.Template.template(params, verbose=True)

Bases: object

todf()

Notes

Trimmed UMIs and genomic sequences.

Examples

>>> params = {
...  'read_struct': 'primer_1+umi_1+seq_1+seq_2+umi_2+primer_2',
...  'umi_1': {'len': 12},
...  'umi_2': {'len': 10},
...  'primer_1': {'len': 20},
...  'primer_2': {'len': 20},
...  'seq_1': {'len': 6},
...  'seq_2': {'len': 8},
...  'fastq': {
...      'path': to('example/data/'),
...      'name': 'pcr_1',
...      'trimmed_path': to('example/data/'),
...      'trimmed_name': 'pcr_1_trim',
...   },
... }
>>> p = template(params)
>>> p.todf()
Returns

pandas dataframe with column names

Return type

seq_raw, name, umi_1, umi_2, …, umi_n, seq_2, seq_3, …, seq_n

togz(df)

Note

write to a Fastq file in gz format.

Example

>>>params = { … ‘read_struct’: ‘primer_1+umi_1+seq_1+seq_2+umi_2+primer_2’, … ‘umi_1’: {‘len’: 12}, … ‘umi_2’: {‘len’: 10}, … ‘primer_1’: {‘len’: 20}, … ‘primer_2’: {‘len’: 20}, … ‘seq_1’: {‘len’: 6}, … ‘seq_2’: {‘len’: 8}, … ‘fastq’: { … ‘path’: to(‘example/data/’), … ‘name’: ‘pcr_1’, … ‘trimmed_path’: to(‘example/data/’), … ‘trimmed_name’: ‘pcr_1_trim’, … }, …} >>>p = template(params) >>>df = p.todf() >>>p.togz(df)

Parameters

df – pandas Dataframe returned from todf().

Returns

pandas dataframe in the column orders

Return type

seq_1, name, umi_1, umi_2, …, umi_n, seq_2, seq_3, …, seq_n

mclumi.trim.UMIRuleOut module

class mclumi.trim.UMIRuleOut.umiRuleOut(read_summary, verbose=True)

Bases: object

sequential(compo_struct, umi_pos_in_struct)

Note

Starting positions of all UMIs.

Example

rule_out_struct_dict returns all structures before each key of rule_out_struct_dict: {

‘umi_1’: [struct_1, struct_2, …, struct_n], ‘umi_2’: [struct_1, struct_2, …, struct_n], …, ‘umi_m’: [struct_1, struct_2, …, struct_n], } e.g., {‘umi_1’: [‘primer_1’], ‘umi_2’: [‘seq_1’]}

rule_out_rel_len_dict returns accumulated lengths of all structures in the list w.r.t each key of rule_out_struct_dict.

rule_out_accumu_len_dict returns the starting positions of all UMIs.

Parameters
  • compo_struct – 1d list of strings of seq_struct split by +.

  • umi_pos_in_struct – 1d list of indices of compo_struct.

Returns

1d dict

Return type

{umi_1: int, umi_2: int, …, umi_n: int}

Module contents