cyclum package¶
Submodules¶
cyclum.evaluation module¶
-
cyclum.evaluation.
parzen_estimate
(x, lim, half_granularity=100, window=<function <lambda>>, scale=0.5)[source]¶ Calculate parzen window estimation (a non-parametric density estimation method)
- Parameters
x – instances
lim – limit of domain
half_granularity –
window –
scale –
- Returns
cyclum.hdfrw module¶
Read write HDF.
-
cyclum.hdfrw.
hdf2mat
(filepath, dtype=<class 'float'>)[source]¶ Read hdf generated by hdfrw.R mat2hdf function to a data frame. Note that due to how python and R handles data differently, colnames are for index and rownames are for columns, and the matrix is also tacitly transposed.
- Parameters
filepath (
str
) – path of hdf filedtype (
Type
) – type of data; default is float
- Return type
DataFrame
- Returns
a pandas data frame
-
cyclum.hdfrw.
mat2hdf
(data, filepath)[source]¶ Write dataframe to an hdf file which can be read by hdfrw.R hdf2mat function.
- Parameters
data (
Union
[DataFrame
, <built-in function array>,List
[str
]]) – data frame or numpy array to be writtenfilepath (
str
) – path of hdf file to be written
- Return type
None
- Returns
None
cyclum.illustration module¶
-
class
cyclum.illustration.
FigureWriter
(pdf_name)[source]¶ Bases:
object
keep and write figures into a pdf file.
-
cyclum.illustration.
plot_cell_sparsity
(linear_data, use_ratio=True)[source]¶ Return a figure of #{cell, none_zero_genes(cell) > x} :param linear_data: data :param use_ratio: plot as ratio or :return:
-
cyclum.illustration.
plot_gene_sparsity
(linear_data, use_ratio=True)[source]¶ Return a figure of #{cell, none_zero_genes(cell) > x} :param linear_data: data :param use_ratio: plot as ratio or :return:
cyclum.tuning module¶
Auto tuning.
-
class
cyclum.tuning.
CyclumAutoTune
(data, max_linear_dims=3, epochs=500, verbose=100, rate=0.0005, early_stop=False, encoder_depth=2, encoder_width=50, dropout_rate=0.1, nonlinear_reg=0.0001, linear_reg=0.0001)[source]¶ Bases:
cyclum.models.ae.AutoEncoder
Circular autoencoder with automatically decided number of linear components
We first perform PCA on the data, and record the MSE of having first 1, 2, …, max_linear_dims + 1 components. We then try to train a circular autoencoder with 0, 1, …, max_linear_dims linear components. We compare circular autoencoder with i linear components with PCA with (i + 1) components, for i = 0, 1, … We record the first i where the difference of loss compared with PCA is greater than both (i - 1) and (i + 1), or just (i + 1) if i == 0.
At the end, this class will be a UNTRAINED model, which has optimal numbers of linear components. You can train it will all your data, more epochs, and better learning rate.
- Parameters
data – The data used to decide number of linear components. For a large dataset, you may use a representative portion of it.
max_linear_dims – maximum number of linear dimensions.
epochs – number of epochs for each test
verbose – per how many epochs does it report the loss, time consumption, etc.
rate – training rate
early_stop – Stop checking more linear components when result decided? ONLY affects the elbow plot. NO influence on result.
encoder_depth – depth of encoder, i.e., number of hidden layers
encoder_width –
width of encoder, one of the following:
An integer stands for number of nodes per layer. All hidden layers will have the same number of nodes.
A list, whose length is equal to encoder_depth, of integers stand for numbers of nodes of the layers.
dropout_rate – rate for dropout.
nonlinear_reg – strength of regularization on the nonlinear encoder.
linear_reg – strength of regularization on the linear encoder.
- Examples:
>>> from cyclum.hdfrw import hdf2mat, mat2hdf >>> df = hdf2mat('path_to_hdf.h5') >>> m = CyclumAutoTune(df.values, max_linear_dims=5) >>> m.train(df.values) >>> pseudotime = m.predict_pseudotime(df.values) >>> mat2hdf(pseudotime, 'path_to_pseudotime.h5')
cyclum.writer module¶
Writer gives a very fast way of saving and loading float value matrices. It saves matrices in binary and in very rigid format. This avoids overheads in csv reading functions. The R counterpart is also available.
-
cyclum.writer.
int32_to_bytes
(x)[source]¶ Convert an 32 bit int number to little endian 4 byte binary format. This helps writing a integer to a binary file.
- Parameters
x (int32) – number to be converted
- Returns
4 byte binary
-
cyclum.writer.
read_df_from_binary
(file_name_mask)[source]¶ Read a data frame from a binary file defined by this module
- Parameters
file_name_mask –
- Returns
the data frame
-
cyclum.writer.
read_matrix_from_binary
(file_name)[source]¶ Read a matrix from a binary file defined by this module.
- Parameters
file_name (str) – the file to read
- Returns
the matrix
-
cyclum.writer.
write_df_to_binary
(file_name_mask, df)[source]¶ Write a data frame to a file. Compared with matrix, it has column and row names Besides the row names and column names, the data frame must contain only float values.
Two files will be saved. For exmaple, a call write_df_to_binary(“example”, df) will output an “example-value.bin” and “example-name.txt”. They store the matrix and the column and row names separately.
- Parameters
file_name_mask (str) – the stem of the file name
df – the data frame to write
- Returns
None
Module contents¶
Top-level package for cyclum.