DeepInsight

This repository contains the original MatLab code for DeepInsight as described in the paper DeepInsight: A methodology to transform a non-image data to an image for convolution neural network architecture.

pyDeepInsight

This package provides a python version of the image transformation procedure of DeepInsight. This is not guaranteed to give the same results as the published MatLab code and should be considered experimental.

Installation

python3 -m pip -q install git+git://github.com/alok-ai-lab/DeepInsight.git#egg=DeepInsight

Usage

The following is a walkthrough of standard usage of the ImageTransformer class

from pyDeepInsight import ImageTransformer, LogScaler
from sklearn.model_selection import train_test_split
import pandas as pd
import numpy as np

from matplotlib import pyplot as plt
import matplotlib.ticker as ticker
import seaborn as sns

Load example TCGA data

expr_file = r"./examples/data/tcga.rnaseq_fpkm_uq.example.txt.gz"
expr = pd.read_csv(expr_file, sep="\t")
y = expr['project'].values
X = expr.iloc[:, 1:].values
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=23, stratify=y)
X_train.shape

(480, 5000)

Normalize data to values between 0 and 1. The following normalization procedure is described in the DeepInsight paper supplementary information as norm-2.

ln = LogScaler()
X_train_norm = ln.fit_transform(X_train)
X_test_norm = ln.transform(X_test)

Initialize image transformer. There are three built-in feature extraction options, 'tsne', 'pca', and 'kpca' to align with the original MatLab implementation.

it = ImageTransformer(feature_extractor='tsne', 
                      pixels=50, random_state=1701, 
                      n_jobs=-1)

Alternatively, any class instance with method .fit_transform() that returns a 2-dimensional array of extracted features can also be provided to the ImageTransformer class. This allows for customization of the feature extraction procedure.

from sklearn.manifold import TSNE

tsne = TSNE(n_components=2, perplexity=30, metric='cosine',
            random_state=1701, n_jobs=-1)

it = ImageTransformer(feature_extractor=tsne, pixels=50)

Train image transformer on training data. Setting plot=True results in at a plot showing the reduced features (blue points), convex full (red), and minimum bounding rectagle (green) prior to rotation.

plt.figure(figsize=(5, 5))
_ = it.fit(X_train_norm, plot=True)

The feature density matrix can be extracted from the trained transformer in order to view overall feature overlap.

fdm = it.feature_density_matrix()
fdm[fdm == 0] = np.nan

plt.figure(figsize=(10, 7))

ax = sns.heatmap(fdm, cmap="viridis", linewidths=0.01, 
                 linecolor="lightgrey", square=True)
ax.xaxis.set_major_locator(ticker.MultipleLocator(5))
ax.yaxis.set_major_locator(ticker.MultipleLocator(5))
for _, spine in ax.spines.items():
    spine.set_visible(True)
_ = plt.title("Genes per pixel")

It is possible to update the pixel size without retraining.

px_sizes = [25, (25, 50), 50, 100]

fig, ax = plt.subplots(1, len(px_sizes), figsize=(25, 7))
for ix, px in enumerate(px_sizes):
    it.pixels = px
    fdm = it.feature_density_matrix()
    fdm[fdm == 0] = np.nan
    cax = sns.heatmap(fdm, cmap="viridis", linewidth=0.01, 
                      linecolor="lightgrey", square=True, 
                      ax=ax[ix], cbar=False)
    cax.set_title('Dim {} x {}'.format(*it.pixels))
    for _, spine in cax.spines.items():
        spine.set_visible(True)
    cax.xaxis.set_major_locator(ticker.MultipleLocator(5))
    cax.yaxis.set_major_locator(ticker.MultipleLocator(5))
plt.tight_layout()    
    
it.pixels = 50

The trained transformer can then be used to transform sample data to image matricies.

X_train_img = it.transform(X_train_norm)

Fit and transform can be done in a single step.

X_train_img = it.fit_transform(X_train_norm)

Plotting the image matrices first four samples of the training set.

fig, ax = plt.subplots(1, 4, figsize=(25, 7))
for i in range(0,4):
    ax[i].imshow(X_train_img[i])
    ax[i].title.set_text("Train[{}] - class '{}'".format(i, y_train[i]))
plt.tight_layout()

Transforming the testing data is done the same as transforming the training data.

X_test_img = it.transform(X_test_norm)

fig, ax = plt.subplots(1, 4, figsize=(25, 7))
for i in range(0,4):
    ax[i].imshow(X_test_img[i])
    ax[i].title.set_text("Test[{}] - class '{}'".format(i, y_test[i]))
plt.tight_layout()

The image matrices can then be used as input for the CNN model.

Name		Name	Last commit message	Last commit date
Latest commit History 47 Commits
README_files		README_files
examples		examples
matlab		matlab
pyDeepInsight		pyDeepInsight
.gitignore		.gitignore
LICENSE		LICENSE
README.ipynb		README.ipynb
README.md		README.md
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DeepInsight

pyDeepInsight

Installation

Usage

About

Releases

Packages

Languages

License

Ankitapal1995/DeepInsight

Folders and files

Latest commit

History

Repository files navigation

DeepInsight

pyDeepInsight

Installation

Usage

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages