mclumi.deduplicate.monomer package

Submodules

mclumi.deduplicate.monomer.Adjacency module

class mclumi.deduplicate.monomer.Adjacency.adjacency

Bases: object

decompose(cc_sub_dict)
Parameters

cc_sub_dict

umi_tools(connected_components, df_umi_uniq_val_cnt, graph_adj)

Examples

umi_tools adjacency wrap

Parameters
  • connected_components

  • df_umi_uniq_val_cnt

  • graph_adj

umi_tools_(df_umi_uniq_val_cnt, cc, graph_adj)

umi_tools adjacency

Parameters
  • df_umi_uniq_val_cnt – unique umi counts

  • cc – connected_components

  • graph_adj – the adjacency list of a graph

mclumi.deduplicate.monomer.Build module

class mclumi.deduplicate.monomer.Build.build(df, ed_thres, verbose=False)

Bases: object

ed_list_()
pcrnum(x)
Parameters

x

mclumi.deduplicate.monomer.Cluster module

class mclumi.deduplicate.monomer.Cluster.cluster

Bases: object

cc(graph_adj)
Parameters

graph_adj

ccnx(edge_list)
Parameters

edge_list

mclumi.deduplicate.monomer.DedupBasic module

class mclumi.deduplicate.monomer.DedupBasic.dedupBasic(bam_fpn, ed_thres, method, mode='external', mcl_fold_thres=None, inflat_val=2.0, exp_val=2, iter_num=100, is_sv=True, sv_fpn='./dedup.bam', verbose=False)

Bases: object

bamids(df_row, by_col)
correct(umi)
decompose(list_nd)
Parameters

x

diffDedupReadCountPos(df_row, by_col)
Parameters
  • df_row – object - a pandas-like df row

  • by_col – str - a column name in question

Returns

Return type

int - the total counts of deduplicated reads per position

diffDedupUniqCountPos(df_row, by_col)
Parameters
  • df_row – object - a pandas-like df row

  • by_col – str - a column name in question

Returns

Return type

int - the sum of deduplicated unique UMI counts per position

edave(df_row, by_col)
eds_(df_row, by_col)
evaluate()
length(df_val)
Parameters

df_val – list - a python list

Returns

Return type

int - the length of the list

markSingleUMI(df_val)
umimax(df_row, by_col)

mclumi.deduplicate.monomer.DedupGene module

class mclumi.deduplicate.monomer.DedupGene.dedupGene(bam_fpn, ed_thres, method, gene_assigned_tag, gene_is_assigned_tag, mode='internal', mcl_fold_thres=None, inflat_val=2.0, exp_val=2, iter_num=100, is_sv=True, sv_fpn='./dedup.bam', verbose=False)

Bases: object

bamids(df_row, by_col)
correct(umi)
decompose(list_nd)
Parameters

x

diffDedupReadCountPos(df_row, by_col)
diffDedupUniqCountPos(df_row, by_col)
edave(df_row, by_col)
eds_(df_row, by_col)
evaluate()
length(df_val)
markSingleUMI(df_val)
umimax(df_row, by_col)

mclumi.deduplicate.monomer.DedupPos module

class mclumi.deduplicate.monomer.DedupPos.dedupPos(bam_fpn, ed_thres, method, mode='external', pos_tag='PO', mcl_fold_thres=None, inflat_val=2.0, exp_val=2, iter_num=100, is_sv=True, sv_fpn='./dedup.bam', verbose=False)

Bases: object

bamids(df_row, by_col)
correct(umi)
decompose(list_nd)
Parameters

x

diffDedupReadCountPos(df_row, by_col)
diffDedupUniqCountPos(df_row, by_col)
edave(df_row, by_col)
eds_(df_row, by_col)
evaluate()
length(df_val)
markSingleUMI(df_val)
umimax(df_row, by_col)

mclumi.deduplicate.monomer.DedupSC module

class mclumi.deduplicate.monomer.DedupSC.dedupSC(bam_fpn, ed_thres, method, gene_assigned_tag, gene_is_assigned_tag, mode='internal', mcl_fold_thres=None, inflat_val=2.0, exp_val=2, iter_num=100, is_sv=True, sv_fpn='./dedup.bam', verbose=False)

Bases: object

bamids(df_row, by_col)
decompose(list_nd)
Parameters

x

diffDedupReadCountPos(df_row, by_col)
diffDedupUniqCountPos(df_row, by_col)
edave(df_row, by_col)
eds_(df_row, by_col)
evaluate()
length(df_val)
markSingleUMI(df_val)
umimax(df_row, by_col)

mclumi.deduplicate.monomer.Directional module

class mclumi.deduplicate.monomer.Directional.directional

Bases: object

decompose(cc_sub_dict)
Parameters

cc_sub_dict

dfs(node, node_val_sorted, node_set_remaining, graph_adj)
Parameters
  • node

  • node_val_sorted

  • node_set_remaining

  • graph_adj

dictTo2d(x)
Parameters

x

formatApvsDisapv(cc_dict)

input format for the Directional method in umi-tools :Parameters: cc_dict

formatCCS(cc_dict)
Parameters

cc_dict

umi_tools(connected_components, df_umi_uniq_val_cnt, graph_adj)
Parameters
  • connected_components

  • df_umi_uniq_val_cnt

  • graph_adj

umi_tools_(df_umi_uniq_val_cnt, cc, graph_adj)
Parameters
  • df_umi_uniq_val_cnt

  • cc

  • graph_adj

mclumi.deduplicate.monomer.MarkovClustering module

class mclumi.deduplicate.monomer.MarkovClustering.markovClustering(inflat_val, exp_val, iter_num)

Bases: object

cluster(cc_adj_mat)
Parameters

cc_adj_mat

decompose(list_nd)
Parameters

df

Returns

  • {

  • }

dfclusters(connected_components, graph_adj)
Parameters
  • connected_components – connected components in dict format: {

    ‘cc0’: […] # nodes, ‘cc1’: […], ‘cc2’: […], … ‘ccn’: […],

    }

  • graph_adj – the adjacency list of a graph

Returns

  • a pandas dataframe

  • each connected component is decomposed into more connected subcomponents.

graph_cc_adj(cc, graph_adj)
Parameters
  • cc – The first parameter.

  • graph_adj – The se parameter.

keyToNode(list_2d, keymap)
Parameters
  • list_2d

  • keymap

keymap(graph_adj, reverse=False)
Parameters
  • graph_adj

  • reverse

matrix(graph_adj, key_map)
Parameters
  • graph_adj

  • key_map

maxval_ed(df_mcl_ccs, df_umi_uniq_val_cnt, umi_uniq_mapped_rev, thres_fold)
Parameters
  • df_mcl_ccs

  • df_umi_uniq_val_cnt

  • umi_uniq_mapped_rev

  • thres_fold

maxval_ed_(mcl_clusters_per_cc, df_umi_uniq_val_cnt, umi_uniq_mapped_rev, thres_fold)

# for k1, v1 in mcl_sub_clust_max_val_weights.items(): # for k2, v2 in mcl_sub_clust_max_val_weights.items(): # if k1 != k2: # edh = hamming().general( # umi_uniq_mapped_rev[k1], # umi_uniq_mapped_rev[k2], # ) # if edh <= thres_fold: # mcl_sub_clust_max_val_graph[k1].add(k2) # mcl_sub_clust_max_val_graph[k2].add(k1) # approval.append([k1, k2]) # else: # disapproval.append([k1, k2])

Parameters
  • mcl_clusters_per_cc

  • df_umi_uniq_val_cnt

  • umi_uniq_mapped_rev

  • thres_fold

maxval_val(df_mcl_ccs, df_umi_uniq_val_cnt, thres_fold)
Parameters
  • df_mcl_ccs

  • df_umi_uniq_val_cnt

  • thres_fold

maxval_val_(mcl_clusters_per_cc, df_umi_uniq_val_cnt, thres_fold)
Parameters
  • mcl_clusters_per_cc

  • df_umi_uniq_val_cnt

  • thres_fold

sort_vals(df_umi_uniq_val_cnt, cc)
Parameters
  • df_umi_uniq_val_cnt

  • cc

Module contents