mESC analysis using the object oriented core¶
We redesigned the core of Cyclum to a more friendly object oriented core. The core is still under active development, but the major functions are already functional.
We still use the mESC dataset. For simplicity we have converted the dataset into TPM. The original count data is available at ArrayExpress: E-MTAB-2805. Tools to transform data are also provided and explained in the following sections.
Import necessary packages¶
[1]:
%load_ext autoreload
%autoreload 1
[2]:
import sys
import pandas as pd
import numpy as np
import pickle as pkl
import sklearn as skl
import sklearn.preprocessing
import matplotlib as mpl
import matplotlib.pyplot as plt
Warning information from TensorFlow may occur. It doesn’t matter.
[3]:
import cyclum
from cyclum import writer
/home/shaoheng/.conda/envs/tensorflow-gpu/lib/python3.6/site-packages/h5py/__init__.py:36: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.
from ._conv import register_converters as _register_converters
[4]:
input_file_mask = 'data/mESC/mesc-tpm'
output_file_mask = './results/mESC_original/mesc-tpm'
Read data¶
Here we have label, so we load both. However, the label is not used until evaluation.
[5]:
def preprocess(input_file_mask):
"""
Read in data and perform log transform (log2(x+1)), centering (mean = 1) and scaling (sd = 1).
"""
tpm = writer.read_df_from_binary(input_file_mask).T
sttpm = pd.DataFrame(data=skl.preprocessing.scale(np.log2(tpm.values + 1)), index=tpm.index, columns=tpm.columns)
label = pd.read_csv(input_file_mask + '-label.txt', sep="\t", index_col=0).T
return sttpm, label
sttpm, label = preprocess(input_file_mask)
There is no convention whether cells should be columns or rows. Here we require cells to be rows.
[6]:
sttpm.head()
[6]:
Gnai3 | Pbsn | Cdc45 | H19 | Scml2 | Apoh | Narf | Cav2 | Klf6 | Scmh1 | ... | RP23-345J21.2 | AC121960.1 | AC136147.1 | AC122013.1 | AC132389.1 | Gm11392 | AC160109.2 | AC154675.1 | AC156980.1 | RP23-429I18.1 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
G1_cell1_count | -0.411123 | -0.059028 | -0.099416 | 5.385822 | -0.691219 | 0.0 | -0.690715 | -0.059028 | -1.051909 | -0.350978 | ... | -0.146722 | 0.0 | -0.079577 | -0.374972 | -0.824399 | -0.059028 | -0.079861 | 0.0 | -0.144843 | 0.090295 |
G1_cell2_count | -0.180800 | -0.059028 | 0.777223 | -0.165725 | -0.820206 | 0.0 | 0.362341 | -0.059028 | 1.458881 | 0.207421 | ... | -0.146722 | 0.0 | -0.079577 | -0.374972 | -0.824399 | -0.059028 | -0.079861 | 0.0 | -0.144843 | -1.271033 |
G1_cell3_count | -1.409101 | -0.059028 | -1.218187 | -0.165725 | -0.820206 | 0.0 | -0.690715 | -0.059028 | -1.271394 | -0.657735 | ... | 2.593349 | 0.0 | -0.079577 | -0.374972 | -0.592938 | -0.059028 | -0.079861 | 0.0 | -0.144843 | -1.271033 |
G1_cell4_count | -1.867558 | -0.059028 | 0.923695 | -0.165725 | -0.820206 | 0.0 | 0.903266 | -0.059028 | 1.430708 | -0.657735 | ... | -0.146722 | 0.0 | -0.079577 | -0.374972 | 2.938898 | -0.059028 | -0.079861 | 0.0 | -0.144843 | -1.271033 |
G1_cell5_count | -1.646290 | -0.059028 | 0.001887 | -0.165725 | -0.820206 | 0.0 | -0.690715 | -0.059028 | -0.811233 | -0.657735 | ... | -0.146722 | 0.0 | -0.079577 | -0.374972 | -0.824399 | -0.059028 | -0.079861 | 0.0 | -0.144843 | -0.111558 |
5 rows × 38293 columns
[7]:
label.head()
[7]:
stage | |
---|---|
G1_cell1_count | g0/g1 |
G1_cell2_count | g0/g1 |
G1_cell3_count | g0/g1 |
G1_cell4_count | g0/g1 |
G1_cell5_count | g0/g1 |
Set up the model and fit the model¶
Fitting the model may take some time. Using a GTX 960M GPU it takes 6 minutes.
[8]:
model = cyclum.core.PreloadCyclum2(sttpm.values)
<tf.Variable 'encoder0/layer0/W:0' shape=(38293, 30) dtype=float32_ref>
<tf.Variable 'encoder0/layer1/W:0' shape=(30, 20) dtype=float32_ref>
<tf.Variable 'encoder0/output/W:0' shape=(20, 1) dtype=float32_ref>
<tf.Variable 'encoder1/output/W:0' shape=(38293, 1) dtype=float32_ref>
<tf.Variable 'decoder1/output/W:0' shape=(1, 38293) dtype=float32_ref>
[9]:
pseudotime, rotation = model.train()
Tensor("concat:0", shape=(288, 2), dtype=float32)
pretrain burnin
step 2000: loss [0.14480928, 1947598.0, 124534.39] time 11.15
pretrain train
step 2000: loss [0.14479513, 1947596.6, 124534.4] time 9.49
step 4000: loss [0.14479521, 1947596.6, 124534.4] time 8.00
midtrain burnin
step 2000: loss [0.14479521, 13710.437, 65734.66] time 31.36
midtrain train
step 2000: loss [0.14479521, 13699.954, 65734.64] time 31.18
step 4000: loss [0.14479521, 13699.952, 65734.64] time 28.92
finaltrain train
step 2000: loss [0.16983634, 12900.013, 35861.742] time 41.48
step 4000: loss [0.1932417, 12898.72, 19201.01] time 38.23
step 6000: loss [0.23846658, 12895.566, 9758.232] time 38.25
finaltrain refine
step 2000: loss [0.24100712, 12895.478, 7821.967] time 41.76
step 4000: loss [0.24099027, 12895.481, 6303.455] time 38.24
step 6000: loss [0.2427861, 12895.437, 5069.4487] time 38.28
step 8000: loss [0.24300408, 12895.437, 4060.3792] time 38.33
step 10000: loss [0.24350564, 12895.429, 3236.5916] time 38.46
Full time 434.95
[10]:
pseudotime
[10]:
array([[-1.88080478e+00, 1.49268208e+01],
[-1.83068943e+00, 1.00451288e+01],
[-2.14693308e+00, -5.19374895e+00],
[-1.60541892e+00, 3.00833664e+01],
[-1.53314245e+00, 3.14072437e+01],
[-3.00005198e+00, -9.08647537e+00],
[-2.02047205e+00, 1.37519932e+00],
[-2.00901365e+00, -9.53592873e+00],
[-2.04460120e+00, -2.01704903e+01],
[ 3.26236892e+00, -1.19897280e+01],
[-2.33301997e+00, -1.12434793e+00],
[-1.74189258e+00, 1.70298977e+01],
[-2.72788811e+00, -2.45228367e+01],
[-2.22418714e+00, 4.85666752e+00],
[-2.47167468e+00, -1.69890137e+01],
[-1.50875592e+00, -1.76704292e+01],
[-2.72318149e+00, -1.71104374e+01],
[-1.43556404e+00, 1.57713480e+01],
[-1.99151707e+00, -6.32707071e+00],
[-1.83119965e+00, -5.77868700e-01],
[ 2.75630283e+00, -3.04834576e+01],
[-2.53520846e+00, -2.34312592e+01],
[-1.20479560e+00, -1.23832312e+01],
[ 2.54360294e+00, -2.38925457e+01],
[ 3.54959989e+00, -1.67263813e+01],
[-2.55432010e+00, -1.09909945e+01],
[-2.28106904e+00, -1.11520720e+01],
[-2.08780193e+00, 1.29073153e+01],
[-1.68445683e+00, 2.83493404e+01],
[ 3.09858608e+00, -2.53242912e+01],
[-1.62355447e+00, 3.43955650e+01],
[-2.45936823e+00, -1.67793026e+01],
[-1.68709314e+00, 1.79854164e+01],
[ 2.63582134e+00, -3.60437508e+01],
[-1.77448249e+00, 2.40024681e+01],
[-2.63495111e+00, -3.13200054e+01],
[ 3.23755217e+00, -3.97203660e+00],
[-2.22011375e+00, 1.53145921e+00],
[-1.34875369e+00, 1.63471756e+01],
[-1.97678757e+00, 1.91151981e+01],
[-1.84385562e+00, 4.76742363e+00],
[ 2.90569687e+00, -2.40276814e+01],
[ 3.70491648e+00, -1.86204300e+01],
[-1.98422003e+00, 6.79649687e+00],
[-1.84352350e+00, 7.47265625e+00],
[-1.49645114e+00, 1.16939936e+01],
[-1.92099762e+00, 6.22199631e+00],
[-1.96539354e+00, 1.82147102e+01],
[-2.06275940e+00, -4.28217602e+00],
[-1.52672315e+00, 2.90875053e+01],
[-1.51863694e+00, -5.79061270e+00],
[-1.91972208e+00, 1.31791267e+01],
[-1.61372554e+00, 1.92629776e+01],
[-1.76490915e+00, -1.38271546e+00],
[-1.90742517e+00, 9.87796021e+00],
[ 1.08639944e+00, 4.73347712e+00],
[-2.11983824e+00, 7.25509596e+00],
[-1.81208313e+00, 1.99739571e+01],
[-2.09524798e+00, 1.25406828e+01],
[-1.28012872e+00, -7.78396034e+00],
[-1.78499365e+00, 1.80988903e+01],
[-2.67990685e+00, -2.83074017e+01],
[ 2.15729189e+00, -1.86907997e+01],
[-2.11775875e+00, -1.89147830e+00],
[-1.66387141e+00, 1.45372581e+01],
[-1.46519804e+00, -1.04516888e+01],
[-2.15873170e+00, -2.68910003e+00],
[ 3.29766583e+00, -2.38496780e+01],
[ 3.79412389e+00, -3.60418701e+00],
[-2.85797566e-01, -2.28351002e+01],
[-1.43342328e+00, 2.42385387e+01],
[ 2.82953978e+00, -3.05958214e+01],
[-1.58059835e+00, -1.40234623e+01],
[ 2.25121188e+00, -3.48759079e+01],
[ 3.56804991e+00, -2.65376968e+01],
[-2.28769398e+00, 6.59906626e+00],
[-7.18054295e-01, -2.15657806e+01],
[-1.73021936e+00, 1.09718809e+01],
[-1.66856742e+00, -1.17320943e+00],
[-4.54792380e-02, -2.46505032e+01],
[-2.05575514e+00, -4.86828804e+00],
[-2.15269518e+00, -2.46980000e+00],
[-1.97208071e+00, -1.32806664e+01],
[-1.75763011e+00, 6.67824745e+00],
[-1.24671364e+00, -6.95690393e-01],
[-1.83129740e+00, -1.64976823e+00],
[-2.51052785e+00, -2.49853668e+01],
[ 1.81519914e+00, 1.56697510e+02],
[ 2.00044703e+00, -3.10828743e+01],
[ 7.66532063e-01, -2.16776257e+01],
[ 2.61384773e+00, -3.57581329e+01],
[-1.99795341e+00, 1.18810332e+00],
[-2.19437122e+00, -1.31428518e+01],
[-4.88943666e-01, -6.84517622e+00],
[-4.51790124e-01, -1.13126841e+01],
[ 3.14263725e+00, -1.94762135e+01],
[-7.81330824e-01, 3.18693905e+01],
[ 3.72471958e-01, -5.74041080e+00],
[ 7.95454741e-01, -1.34811754e+01],
[-6.81833982e-01, 4.24557877e+01],
[ 5.14007688e-01, -2.30169964e+01],
[-9.74253297e-01, 3.25443687e+01],
[ 1.46137774e-02, -1.71452103e+01],
[ 7.86510587e-01, -2.80469398e+01],
[ 5.26935458e-01, -6.75344372e+00],
[ 6.08552456e-01, -1.96278992e+01],
[ 5.41990757e-01, -1.03533869e+01],
[ 1.10072279e+00, -2.99213352e+01],
[ 1.13791227e+00, -2.02997417e+01],
[ 5.46540976e-01, -7.72393227e+00],
[ 1.04997969e+00, -2.42958565e+01],
[-2.32939988e-01, 1.24658203e+01],
[ 7.76328564e-01, -1.91249046e+01],
[-5.10520101e-01, -2.13935113e+00],
[ 8.97438169e-01, -2.48660774e+01],
[ 5.29081345e-01, -2.70399380e+01],
[ 5.36990225e-01, -1.28023043e+01],
[-1.06004715e+00, 3.26832123e+01],
[ 3.44303340e-01, -2.26595745e+01],
[ 1.78393054e+00, -3.12572689e+01],
[ 5.51237822e-01, -5.87905979e+00],
[-1.13741446e+00, 3.94306755e+01],
[-1.09089911e+00, 6.26520061e+00],
[ 7.74762630e-01, -1.85526810e+01],
[-3.58483166e-01, 1.22616215e+01],
[-1.09127927e+00, 3.48706131e+01],
[-8.35039854e-01, 2.95066624e+01],
[ 8.67719412e-01, -2.83200855e+01],
[-2.09794849e-01, -1.41811161e+01],
[ 1.06433225e+00, -2.79621143e+01],
[-8.21636319e-01, -1.07993898e+01],
[ 3.28861088e-01, -2.28642511e+00],
[-3.29822242e-01, -9.23649502e+00],
[ 8.25832486e-01, -1.32783060e+01],
[-3.86663526e-01, -9.63309333e-02],
[ 4.17242885e-01, -1.12792873e+01],
[ 3.79645318e-01, -9.07922459e+00],
[ 6.12021983e-02, -3.68483210e+00],
[ 5.09337783e-02, -6.83753443e+00],
[-8.35998058e-02, -1.18566189e+01],
[ 4.49770302e-01, -1.10087357e+01],
[-3.23384374e-01, 7.85648251e+00],
[-1.24399638e+00, 4.50859213e+00],
[ 5.81491661e+00, -1.21675730e+01],
[ 4.80097145e-01, -8.60607624e+00],
[-6.39191031e-01, -3.38796926e+00],
[ 4.40801412e-01, -1.25775585e+01],
[ 2.90843457e-01, -8.88970470e+00],
[ 3.27654630e-01, -1.47641964e+01],
[-1.03414446e-01, -5.12129307e-01],
[ 5.93684196e-01, -5.24291563e+00],
[ 6.22451425e-01, -1.09915743e+01],
[ 8.99125636e-02, -6.61691332e+00],
[-1.16708326e+00, -8.83444977e+00],
[-5.94261646e-01, 6.18247557e+00],
[ 4.51473743e-01, -2.36777706e+01],
[-8.77194166e-01, -1.08329945e+01],
[ 1.46868736e-01, -9.44843102e+00],
[ 2.30497897e-01, -9.96028042e+00],
[-8.74585986e-01, 1.91990414e+01],
[ 3.68390441e-01, -1.06521368e+01],
[ 8.37905169e-01, -1.98295097e+01],
[-8.31497669e-01, 3.36677513e+01],
[-1.37950182e-01, -1.94550381e+01],
[ 2.25027502e-02, -7.99194670e+00],
[-1.03092706e+00, 2.96240578e+01],
[ 1.59772009e-01, -1.91067314e+01],
[ 2.28214407e+00, -3.49879112e+01],
[ 7.89140940e-01, -2.52394218e+01],
[-1.22091055e-01, -7.71836758e+00],
[ 1.34536058e-01, -1.14051886e+01],
[-1.98526770e-01, -3.64126062e+00],
[-1.56965494e-01, 1.18828058e+01],
[ 3.35113436e-01, -5.33939648e+00],
[ 5.41561007e-01, -1.44523783e+01],
[-9.96852815e-02, -4.07140589e+00],
[ 8.91364098e-01, -2.65831661e+01],
[-8.40695739e-01, 3.96968155e+01],
[-2.92037815e-01, -2.12753606e+00],
[ 9.04862404e-01, -2.71385250e+01],
[-3.66110802e-02, -1.48365536e+01],
[-1.10810733e+00, -1.24938011e+01],
[ 9.75080013e-01, -1.68421402e+01],
[ 6.08522534e-01, -1.21821375e+01],
[ 8.17227840e-01, -1.73037090e+01],
[-1.02403730e-01, -9.79152143e-01],
[ 6.49303794e-01, -2.57489128e+01],
[ 5.69854617e-01, -1.45891619e+01],
[ 1.25029325e+00, -2.04991188e+01],
[ 2.51113504e-01, -7.81974494e-01],
[-3.41136962e-01, 1.40259695e+01],
[ 2.03663993e+00, -1.62686539e+01],
[ 1.81519914e+00, 2.18433350e+02],
[ 1.51362944e+00, -2.24570751e+01],
[-3.66690803e+00, -7.74798584e+00],
[ 1.74248338e+00, -1.70989418e+01],
[ 1.81519914e+00, 2.16595383e+02],
[-4.03745174e+00, -2.10847492e+01],
[ 2.47830915e+00, -1.15022650e+01],
[ 2.15843725e+00, -2.75647697e+01],
[ 8.79188538e-01, -1.63194580e+01],
[ 1.96657765e+00, -1.57544184e+01],
[ 2.06179500e+00, -1.82751694e+01],
[ 1.50144041e+00, -2.18298931e+01],
[ 2.33662796e+00, -2.75429592e+01],
[-2.91265535e+00, 1.53857911e+00],
[ 1.51612794e+00, -2.34474773e+01],
[ 2.21980619e+00, -2.25975018e+01],
[ 2.70665288e+00, -1.96391277e+01],
[ 1.81519914e+00, 2.18456894e+02],
[ 2.40450358e+00, -1.90739231e+01],
[ 1.55576754e+00, -2.61141815e+01],
[ 1.86737990e+00, -2.69432220e+01],
[ 1.17162383e+00, -1.46962442e+01],
[ 1.88511109e+00, -1.33146896e+01],
[ 1.75478864e+00, -2.70466042e+01],
[ 2.25218034e+00, 2.39221668e+00],
[ 2.25038910e+00, -9.72757912e+00],
[ 1.45838094e+00, -2.30557632e+01],
[ 2.00435185e+00, -2.56383591e+01],
[ 1.65824318e+00, -2.28895950e+01],
[ 3.67251492e+00, 6.56364536e+00],
[ 1.81519914e+00, 2.18050400e+02],
[ 2.31304479e+00, -1.77586021e+01],
[ 1.09642792e+00, -1.31371851e+01],
[ 1.90589756e-01, 1.88669109e+01],
[ 1.90949345e+00, -2.10922432e+01],
[-9.44056273e-01, 6.45048761e+00],
[ 1.74132633e+00, -1.77247181e+01],
[ 1.36677599e+00, -1.51440248e+01],
[ 2.21458769e+00, -2.13104000e+01],
[ 2.45775676e+00, -1.63243580e+01],
[ 6.68768525e-01, -7.64337444e+00],
[ 2.48189592e+00, -1.42729197e+01],
[ 1.55243373e+00, -1.63729649e+01],
[ 1.81519914e+00, 2.16617355e+02],
[ 1.62376952e+00, -1.60758076e+01],
[-2.44056201e+00, 5.02266693e+00],
[ 2.16105890e+00, -1.06902733e+01],
[ 1.52357936e+00, -1.80762577e+01],
[ 1.91408134e+00, -3.48595200e+01],
[ 2.06909847e+00, -2.46016407e+01],
[-4.40830708e+00, 6.29463568e-02],
[ 2.21440935e+00, -3.29487190e+01],
[ 2.51865697e+00, -1.67729168e+01],
[ 1.26352024e+00, -2.26289101e+01],
[-4.09907579e+00, -1.85375214e+01],
[ 2.43417931e+00, -7.72354794e+00],
[-3.46632814e+00, -1.75531788e+01],
[ 1.81519914e+00, 2.17464203e+02],
[ 2.66472244e+00, -1.63560905e+01],
[ 2.86545849e+00, -4.57689428e+00],
[-3.81132579e+00, -2.04803791e+01],
[ 2.60997224e+00, -4.60065794e+00],
[ 1.94236231e+00, -1.53495979e+01],
[ 1.83971524e+00, -1.95907440e+01],
[ 2.57765341e+00, -9.27616596e+00],
[-3.98111820e+00, -1.61716251e+01],
[ 1.49002218e+00, -8.24895573e+00],
[ 2.35166407e+00, -2.03668022e+01],
[ 1.78349495e+00, -2.82284050e+01],
[ 2.01421452e+00, -2.55387173e+01],
[ 1.81519914e+00, 2.20200607e+02],
[ 2.30397558e+00, -2.30943661e+01],
[-4.80054188e+00, -2.90102825e+01],
[-6.54902816e-01, 3.06810493e+01],
[ 1.89077568e+00, -2.02105179e+01],
[ 2.00662673e-01, -8.00766182e+00],
[ 2.10995245e+00, -9.81829739e+00],
[ 4.55492973e+00, -3.09925861e+01],
[ 1.87334144e+00, -2.11787987e+01],
[ 2.04158878e+00, -2.78806171e+01],
[-8.54304075e-01, 5.94218178e+01],
[ 2.38518620e+00, -1.86650848e+01],
[ 1.33689237e+00, -2.57526932e+01],
[ 1.81519914e+00, 2.23321243e+02],
[ 3.06019235e+00, -1.35738144e+01],
[ 2.26424980e+00, -2.91853142e+01],
[-1.46147847e-01, 5.07323933e+00],
[-5.54601789e-01, 6.47962036e+01],
[ 2.23264813e+00, -2.77605476e+01],
[-8.62332225e-01, 6.26800995e+01],
[-4.08478117e+00, -7.54828739e+00],
[ 1.41936350e+00, -2.61203613e+01],
[ 1.31178796e+00, -1.76503181e+01],
[-9.01086569e-01, 7.38781281e+01],
[-9.25832272e-01, 6.76473389e+01],
[-7.27904201e-01, 6.72318192e+01]], dtype=float32)
[11]:
rotation
[11]:
array([[-7.8976573e-03, -9.6943472e-03, -1.3284871e-02, ...,
-5.2904693e-07, -4.6778034e-02, -3.0144814e-02],
[-2.2776838e-02, 3.2661978e-02, -4.6866067e-02, ...,
-4.1408495e-07, 7.3825166e-02, 5.5136330e-02],
[ 3.0675331e-02, -2.2968404e-02, 6.0151089e-02, ...,
4.5403578e-07, -2.7050946e-02, -2.4992879e-02]], dtype=float32)
Illustrations¶
We illustrate the results on a circle, to show its circular nature. There is virtually no start and end of the circle. Red, green and blue represents G0/G1, S and G2/M phase respectively. The inner lines represents single cells. The cells spread across the The areas outside
[12]:
import cyclum.illustration
[13]:
color_map = {'stage': {"g0/g1": "red", "s": "green", "g2/m": "blue"},
'subcluster': {"intact": "cyan", "perturbed": "violet"}}
cyclum.illustration.plot_round_distr_color(pseudotime[:, 0], label['stage'], color_map['stage'])
pass
[ ]: