mclumi.trim package
Submodules
mclumi.trim.Filter module
mclumi.trim.Fixed module
mclumi.trim.Reader module
mclumi.trim.SeqRuleOut module
- class mclumi.trim.SeqRuleOut.seqRuleOut(read_summary, verbose=True)
Bases:
object- sequential(compo_struct, seq_pos_in_struct)
Notes
Starting positions of all genomic sequences.
Example
- rule_out_struct_dict returns all structures before each key of rule_out_struct_dict: {
‘seq_1’: [struct_1, struct_2, …, struct_n], ‘seq_2’: [struct_1, struct_2, …, struct_n], …, ‘seq_m’: [struct_1, struct_2, …, struct_n], } e.g., {‘seq_1’: [‘primer_1’, ‘umi_1’], ‘seq_2’: []} if the structure is ‘primer_1+umi_1+seq_1+seq_2+umi_2+primer_2’ * each key does not count all structures of its preceding keys.
rule_out_rel_len_dict returns accumulated lengths of all structures in the list w.r.t each key of rule_out_struct_dict.
rule_out_accumu_len_dict returns the starting positions of all UMIs.
- Parameters
compo_struct – 1d list of strings of seq_struct split by +.
seq_pos_in_struct – 1d list of indices of compo_struct.
- Returns
1d dict
- Return type
{umi_1: int, umi_2: int, …, umi_n: int}
mclumi.trim.Template module
- class mclumi.trim.Template.template(params, verbose=True)
Bases:
object- todf()
Notes
Trimmed UMIs and genomic sequences.
Examples
>>> params = { ... 'read_struct': 'primer_1+umi_1+seq_1+seq_2+umi_2+primer_2', ... 'umi_1': {'len': 12}, ... 'umi_2': {'len': 10}, ... 'primer_1': {'len': 20}, ... 'primer_2': {'len': 20}, ... 'seq_1': {'len': 6}, ... 'seq_2': {'len': 8}, ... 'fastq': { ... 'path': to('example/data/'), ... 'name': 'pcr_1', ... 'trimmed_path': to('example/data/'), ... 'trimmed_name': 'pcr_1_trim', ... }, ... } >>> p = template(params) >>> p.todf()
- Returns
pandas dataframe with column names
- Return type
seq_raw, name, umi_1, umi_2, …, umi_n, seq_2, seq_3, …, seq_n
- togz(df)
Note
write to a Fastq file in gz format.
Example
>>>params = { … ‘read_struct’: ‘primer_1+umi_1+seq_1+seq_2+umi_2+primer_2’, … ‘umi_1’: {‘len’: 12}, … ‘umi_2’: {‘len’: 10}, … ‘primer_1’: {‘len’: 20}, … ‘primer_2’: {‘len’: 20}, … ‘seq_1’: {‘len’: 6}, … ‘seq_2’: {‘len’: 8}, … ‘fastq’: { … ‘path’: to(‘example/data/’), … ‘name’: ‘pcr_1’, … ‘trimmed_path’: to(‘example/data/’), … ‘trimmed_name’: ‘pcr_1_trim’, … }, …} >>>p = template(params) >>>df = p.todf() >>>p.togz(df)
- Parameters
df – pandas Dataframe returned from todf().
- Returns
pandas dataframe in the column orders
- Return type
seq_1, name, umi_1, umi_2, …, umi_n, seq_2, seq_3, …, seq_n
mclumi.trim.UMIRuleOut module
- class mclumi.trim.UMIRuleOut.umiRuleOut(read_summary, verbose=True)
Bases:
object- sequential(compo_struct, umi_pos_in_struct)
Note
Starting positions of all UMIs.
Example
- rule_out_struct_dict returns all structures before each key of rule_out_struct_dict: {
‘umi_1’: [struct_1, struct_2, …, struct_n], ‘umi_2’: [struct_1, struct_2, …, struct_n], …, ‘umi_m’: [struct_1, struct_2, …, struct_n], } e.g., {‘umi_1’: [‘primer_1’], ‘umi_2’: [‘seq_1’]}
rule_out_rel_len_dict returns accumulated lengths of all structures in the list w.r.t each key of rule_out_struct_dict.
rule_out_accumu_len_dict returns the starting positions of all UMIs.
- Parameters
compo_struct – 1d list of strings of seq_struct split by +.
umi_pos_in_struct – 1d list of indices of compo_struct.
- Returns
1d dict
- Return type
{umi_1: int, umi_2: int, …, umi_n: int}