cyclum package¶

Subpackages¶

cyclum.models package

Submodules¶

cyclum.evaluation module¶

cyclum.evaluation.parzen_estimate(x, lim, half_granularity=100, window=<function <lambda>>, scale=0.5)[source]¶

Calculate parzen window estimation (a non-parametric density estimation method)

Parameters

x – instances
lim – limit of domain
half_granularity –
window –
scale –

Returns

cyclum.evaluation.periodic_parzen_estimate(x, period=3.14, half_granularity=100, window=<function <lambda>>, scale=0.5)[source]¶

Calculate parzen window estimation specifically for periodic domain

Parameters

x –
period –
half_granularity –
window –
scale –

Returns

cyclum.evaluation.precision_estimate(distr_vector_list, label_vector, possible_label_list)[source]¶

Estimate precision

Parameters

distr_vector_list –
label_vector –
possible_label_list –

Returns

cyclum.hdfrw module¶

Read write HDF.

cyclum.hdfrw.hdf2mat(filepath, dtype=<class 'float'>)[source]¶

Read hdf generated by hdfrw.R mat2hdf function to a data frame. Note that due to how python and R handles data differently, colnames are for index and rownames are for columns, and the matrix is also tacitly transposed.

Parameters

filepath (str) – path of hdf file
dtype (Type) – type of data; default is float

Return type

DataFrame

Returns

a pandas data frame

cyclum.hdfrw.mat2hdf(data, filepath)[source]¶

Write dataframe to an hdf file which can be read by hdfrw.R hdf2mat function.

Parameters

data (Union[DataFrame, <built-in function array>, List[str]]) – data frame or numpy array to be written
filepath (str) – path of hdf file to be written

Return type

None

Returns

None

cyclum.illustration module¶

class cyclum.illustration.FigureWriter(pdf_name)[source]¶

Bases: object

keep and write figures into a pdf file.

add_figure(figure, title=None)[source]¶: add a figure, but not write to file :param figure: :param title: :return:

add_figure_and_write(figure, title=None)[source]¶

write()[source]¶

cyclum.illustration.plot_cell_sparsity(linear_data, use_ratio=True)[source]¶: Return a figure of #{cell, none_zero_genes(cell) > x} :param linear_data: data :param use_ratio: plot as ratio or :return:

cyclum.illustration.plot_gene_sparsity(linear_data, use_ratio=True)[source]¶: Return a figure of #{cell, none_zero_genes(cell) > x} :param linear_data: data :param use_ratio: plot as ratio or :return:

cyclum.illustration.plot_multi_distr(xs, ys, colors, labels)[source]¶

cyclum.illustration.plot_pair_color(a, b, color)[source]¶: either plot an embedding, two dimensions at a time or compare two embeddings :param a: :param b: :param color: :return:

cyclum.illustration.plot_round_color(flat, color)[source]¶

cyclum.illustration.plot_round_distr_color(flat, label, color_dict)[source]¶

cyclum.illustration.plot_round_distr_color2(flat, label1, label2, color_dict1, color_dict2)[source]¶

cyclum.tuning module¶

Auto tuning.

class cyclum.tuning.CyclumAutoTune(data, max_linear_dims=3, epochs=500, verbose=100, rate=0.0005, early_stop=False, encoder_depth=2, encoder_width=50, dropout_rate=0.1, nonlinear_reg=0.0001, linear_reg=0.0001)[source]¶

Bases: cyclum.models.ae.AutoEncoder

Circular autoencoder with automatically decided number of linear components

We first perform PCA on the data, and record the MSE of having first 1, 2, …, max_linear_dims + 1 components. We then try to train a circular autoencoder with 0, 1, …, max_linear_dims linear components. We compare circular autoencoder with i linear components with PCA with (i + 1) components, for i = 0, 1, … We record the first i where the difference of loss compared with PCA is greater than both (i - 1) and (i + 1), or just (i + 1) if i == 0.

At the end, this class will be a UNTRAINED model, which has optimal numbers of linear components. You can train it will all your data, more epochs, and better learning rate.

Parameters

data – The data used to decide number of linear components. For a large dataset, you may use a representative portion of it.
max_linear_dims – maximum number of linear dimensions.
epochs – number of epochs for each test
verbose – per how many epochs does it report the loss, time consumption, etc.
rate – training rate
early_stop – Stop checking more linear components when result decided? ONLY affects the elbow plot. NO influence on result.
encoder_depth – depth of encoder, i.e., number of hidden layers
encoder_width –
width of encoder, one of the following:
- An integer stands for number of nodes per layer. All hidden layers will have the same number of nodes.
- A list, whose length is equal to encoder_depth, of integers stand for numbers of nodes of the layers.
dropout_rate – rate for dropout.
nonlinear_reg – strength of regularization on the nonlinear encoder.
linear_reg – strength of regularization on the linear encoder.

Examples:

>>> from cyclum.hdfrw import hdf2mat, mat2hdf
>>> df = hdf2mat('path_to_hdf.h5')
>>> m = CyclumAutoTune(df.values, max_linear_dims=5)
>>> m.train(df.values)
>>> pseudotime = m.predict_pseudotime(df.values)
>>> mat2hdf(pseudotime, 'path_to_pseudotime.h5')

show_bar(root=False)[source]¶

Show a bar plot for what percentage of more loss is handled by the circular component

Returns: figure object

show_elbow()[source]¶

Show an elbow plot of both PCA and autoencoder You will observe the time when autoencoder become to have a higher loss than PCA. The previous time is considered as the best model.

Returns: figure object

cyclum.writer module¶

Writer gives a very fast way of saving and loading float value matrices. It saves matrices in binary and in very rigid format. This avoids overheads in csv reading functions. The R counterpart is also available.

cyclum.writer.int32_to_bytes(x)[source]¶

Convert an 32 bit int number to little endian 4 byte binary format. This helps writing a integer to a binary file.

Parameters: x (int32) – number to be converted
Returns: 4 byte binary

cyclum.writer.read_df_from_binary(file_name_mask)[source]¶

Read a data frame from a binary file defined by this module

Parameters: file_name_mask –
Returns: the data frame

cyclum.writer.read_matrix_from_binary(file_name)[source]¶

Read a matrix from a binary file defined by this module.

Parameters: file_name (str) – the file to read
Returns: the matrix

cyclum.writer.write_df_to_binary(file_name_mask, df)[source]¶

Write a data frame to a file. Compared with matrix, it has column and row names Besides the row names and column names, the data frame must contain only float values.

Two files will be saved. For exmaple, a call write_df_to_binary(“example”, df) will output an “example-value.bin” and “example-name.txt”. They store the matrix and the column and row names separately.

Parameters

file_name_mask (str) – the stem of the file name
df – the data frame to write

Returns

None

cyclum.writer.write_matrix_to_binary(file_name, val)[source]¶

Write an (unnamed) matrix to a file. The matrix should contain only float values, or at least convertible to float.

Parameters

file_name (str) – name of file
val – The matrix to write

Returns

None

Module contents¶

Top-level package for cyclum.