Skip to content

deepomicslab/iTCR

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

iTCR - TCR Analysis Tools

A toolkit for T-Cell Receptor (TCR) sequence analysis based on information theory principles.

Introduction

The ubiquity of information theory provides the ability to directly capture how knowledge of one event increases understanding of another. In this study, we developed iTCR, a tool grounded in information theory to systematically assess and interpret the complexity and informativeness of TCR αβ-chain pairing patterns.

We formalized how paired $\alpha$ and $\beta$ chains constrain the accessible repertoire at the level of coarse-grained TCR features. Our iTCR provides two core analytical approaches:

  • MCR: Quantifies the fraction of the theoretical diversity space that is biologically accessible. A value of $MCR \approx 1$ implies perfect independence, where the features pair randomly. Conversely, values approaching $0$ reveal strong pairing constraints between $X$ and $Y$, indicating that the accessible repertoire manifold is significantly compressed relative to the theoretical potential of combinatorial pairing.
  • PLS: Serves as a global metric of combinatorial plasticity within the fixed germline space. A higher PLS indicates that a significant fraction of the V(J) pairing architecture has been actively reconfigured in the repertoire.

Installation

From PyPI (Recommended)

pip3 install iTCR

From GitHub

git clone https://github.com/deepomicslab/iTCR.git
cd iTCR
pip install -e .

Requirements

Python >= 3.7
numpy >= 1.22.4
pandas >= 1.5.0
matplotlib >= 3.6.3
seaborn >= 0.11.2
scipy >= 1.10.1
joblib >= 1.3.2
tidytcells (pip3 install tidytcells)
ndd (pip3 install -U ndd)
statsmodels (pip3 install statsmodels)

Usage

Input data

Format

The input data should be a dictionary saved in a pickle file with the following structure:

Data Structure

    "sample_name_1": pandas.DataFrame,
    "sample_name_2": pandas.DataFrame,
    # ... more samples

Required DataFrame Columns

Each DataFrame must contain the following columns:

Column Description Example
TRAV T-cell receptor alpha variable gene TRAV1-2
TRBV T-cell receptor beta variable gene TRBV19
TRAJ T-cell receptor alpha joining gene TRAJ33
TRBJ T-cell receptor beta joining gene TRBJ2-1
cdr3A CDR3 alpha amino acid sequence CAVRDSSYKLIF
cdr3B CDR3 beta amino acid sequence CASSLAPGATNEKLFF
(customized name) Frequency/probability of the TCR for down-sampling clonotype.freq
Involved feature pairs

Users can customize which features to analyze by revising a configuration file (please visit iTCR/config.py). This allows flexible control.

📁 Locating the Configuration File

Since config.py is bundled inside the installed package, you need to find its path before editing it.

Option 1 — Using Python (recommended):

import iTCR
import os
config_path = os.path.join(os.path.dirname(iTCR.__file__), "config.py")
print(config_path)

Option 2 — Using the command line:

python -c "import iTCR, os; print(os.path.join(os.path.dirname(iTCR.__file__), 'config.py'))"

Once you have the path, open the file with any text editor (e.g., nano, vim, VS Code, or Notepad) and modify the settings as needed.

⚠️ Note: Editing config.py directly modifies the package-level defaults. If you reinstall or upgrade iTCR, your changes may be overwritten. Consider keeping a backup of your custom configuration.

🛠️ Configuration Stucture

The configuration file (config.py) defines three feature lists that control which analyses are performed:

{
    "SINGLE_FEATURES": ["feature1", "feature2", ...],
    "CONDITIONAL_FEATURES": [["feature1", "feature2"], ...],
    "CROSS_FEATURES": [["feature1", "feature2"], ...]
}

🔧 Default Configuration

If config.py is left unmodified, iTCR uses the following default settings:

{
    "SINGLE_FEATURES": [
        "cdr3A", "cdr3B", "TRAV", "TRBV", "TRAJ", "TRBJ"
    ],
    "CONDITIONAL_FEATURES": [
        ["cdr3A", "cdr3B"], ["cdr3B", "cdr3A"],
        ["TRAV", "TRBV"], ["TRBV", "TRAV"],
        ["TRAJ", "TRBJ"], ["TRBJ", "TRAJ"]
    ],
    "CROSS_FEATURES": [
        ["TRAV", "TRBV"], ["TRAV", "cdr3B"],
        ["TRAJ", "TRBJ"], ["TRAJ", "cdr3B"],
        ["cdr3A", "TRBV"], ["cdr3A", "cdr3B"],
        ["cdr3A", "TRBJ"]
    ]
}

Feature Types Explained

  • SINGLE_FEATURES: Individual features for entropy calculation

    • Calculates H(X) for each feature X
    • Used when --analysis_type includes entropy
  • CONDITIONAL_FEATURES: Feature pairs for conditional entropy calculation

    • Calculates H(X|Y) for each pair [X, Y]
    • Format: ["condition_feature", "target_feature"] means H(target|condition)
    • Used when --analysis_type includes entropy
  • MCR_FEATURES: Feature pairs for MCR calculation

    • Calculates MCR(X,Y) for each pair [X, Y]
    • Order doesn't matter as MCR(X,Y) = MCR(Y,X)
    • Used when --analysis_type includes mcr
Command Line Interface Overview
# General usage
iTCR [command] [options]
# Or 
itcr [command] [options]

Available Commands

mcr                   - Entropy and MCR analysis
PLS                   - V(J)-gene Pairing Landscape Shift analysis
mcr-display           - Display MCR results
entropy-display       - Display entropy results
Start analysis!

1. Manifold Coverage Ratio (MCR) Analysis

Analysis usage

Basic command

This module calculates entropy and MCR between different TCR features (V genes, J genes, CDR3 sequences).

iTCR mcr --inputfile tcr_data.pickle --outputdir results/ [options]

Paramenters

Parameter Type Default Description
--inputfile str Required Path to input pickle file containing TCR data
--outputdir str Required Output directory for results
--analysis_type str both Type of analysis: entropy, mcr, or both
--sample_times int 300 Number of down-sampling times
--sample_weights str clonotype.freq Sample weights method
--outer_jobs int 8 Number of parallel outer permutation tasks; if your cores < 64, you should set it smaller.
--inner_jobs int None Number of cores per permutation task

Examples

# Calculate MCR only
iTCR mcr \
    --inputfile tcr_data.pickle \
    --outputdir example_outputs/ \
    --analysis_type mcr \
    --sample_times 300 \
    --sample_weights clonotype.freq

# Calculate Entropy only
iTCR mcr \
    --inputfile tcr_data.pickle \
    --outputdir example_outputs/ \
    --analysis_type entropy \
    --sample_times 300 \
    --sample_weights clonotype.freq
Output files
  • entropy.pickle: Entropy values
  • mcr.pickle: MCR values

2. V(J)-gene Pairing Landscape Shift (PLS) Analysis

PLS analysis usage The PLS module is a two-step pipeline that quantifies repertoire remodeling between biological conditions (e.g., pre- vs. post-treatment, different timepoints) by analyzing V(J)-gene pairing patterns.

Pipeline Overview

Step 1: Calculate Normalized Pointwise Information (NPMI)

  • Computes NPMI matrices for V-gene and J-gene pairs
  • Uses bootstrap sampling to generate robust estimates
  • Quantifies local coupling strength for each gene pair

Step 2: Analyze Timepoint Changes

  • Performs statistical testing between conditions
  • Applies dual-criterion filtering (FDR and effect size)
  • Calculates PLS as the proportion of significantly shifted gene pairs

Sample Naming Convention (IMPORTANT)

⚠️ Before running PLS analysis, you MUST configure your sample naming convention in your inputdata.
PLS analysis requires specific sample ID formats to identify paired samples (e.g., pre- vs. post-treatment):
Required Sample ID Format:
patient_id pretreatment # Pre-treatment sample
patient_id posttreatment # Post-treatment sample
Examples: UPN1 pretreatment, UPN1 posttreatment, UPN4 pretreatment, UPN4 posttreatment

Customizing Sample Metadata

Step 1: Locate the configuration file
The sample parser configuration is located at: iTCR/analysis/sample_parser.py
Step 2: Modify the create_sample_mapping() function

Edit this function to match your patient metadata:

def create_sample_mapping():
    """
    Create sample mapping dictionary
    MODIFY THIS FUNCTION according to your sample naming convention
    
    Returns:
    --------
    dict: Mapping of patient IDs to their metadata
    """
    return {
        "patient_id_1": {
            "pre": "Pre",
            "posttreatment": "timepoint_info",
            "metadata_field_1": "value1",
            "metadata_field_2": "value2",
            # Add more metadata fields as needed
        },
        "patient_id_2": {
            "pre": "Pre",
            "posttreatment": "timepoint_info",
            "metadata_field_1": "value1",
            "metadata_field_2": "value2",
        },
        # Add more patients...
    }

Example configuration

def create_sample_mapping():
    return {
        "UPN1": {
            "pre": "Pre",
            "posttreatment": "3M_CR",
            "cmv_status": "Positive",
            "3M_response": "CR",
            "6M_response": "CR"
        },
        "UPN4": {
            "pre": "Pre",
            "posttreatment": "3M_PR",
            "cmv_status": "Positive",
            "3M_response": "PR",
            "6M_response": "Relapsed"
        },
        "UPN6": {
            "pre": "Pre",
            "posttreatment": None,  # No post-treatment sample
            "cmv_status": "Negative",
            "3M_response": "NR",
            "6M_response": "NE, off"
        },
        # Add more patients...
    }

Data Structure Requirements
Your input pickle file should contain a dictionary where:

  • Keys: Sample IDs following the naming convention (e.g., "UPN1 pretreatment")
  • Values: DataFrames with required TCR columns (TRAV, TRBV, TRAJ, TRBJ, cdr3A, cdr3B, frequency column)
    Example:
{
    "UPN1 pretreatment": DataFrame(...),
    "UPN1 posttreatment": DataFrame(...),
    "UPN4 pretreatment": DataFrame(...),
    "UPN4 posttreatment": DataFrame(...),
    # ...
}

Basic Command

iTCR PLS --inputfile data.pickle --outputdir results/ [options]

Parameters

Parameter Type Default Description
Input/Output
--inputfile str Required Path to input pickle file
--outputdir str Required Output directory for results
Step 1: NPMI Calculation
--sample_times int 300 Number of bootstrap samples
--sample_weights str clonotype.freq Column name for sampling weights
--outer_jobs int 4 Number of parallel outer tasks
--inner_jobs int None Number of cores per task (auto)
--base float e Logarithm base for NPMI calculation
Step 2: Statistical Analysis
--n_permutations int 10000 Number of permutations for testing
--n_jobs int -1 Number of parallel jobs (-1 = all cores)
Pipeline Control
--skip_step1 flag False Skip Step 1 and use existing NPMI results
--only_step1 flag False Only run Step 1 (NPMI calculation)

Examples

Full Pipeline

# Run complete PLS analysis
iTCR PLS \
    --inputfile tcr_data.pickle \
    --outputdir pls_results/ \
    --sample_times 300 \
    --n_permutations 10000

Step-by-Step Execution

# Step 1 only: Calculate NPMI
iTCR PLS \
    --inputfile tcr_data.pickle \
    --outputdir pls_results/ \
    --only_step1 \
    --sample_times 300

# Step 2 only: Analyze changes (requires existing NPMI results)
iTCR PLS \
    --inputfile tcr_data.pickle \
    --outputdir pls_results/ \ # the directory which stores 'npmi.pickle'
    --skip_step1 \
    --n_permutations 10000
Output files

Step 1 Output

npmi.pickle: NPMI matrices for all V(J)-gene pairs across bootstrap iterations

Step 2 Output

  • patient_PLS_detailed.pickle
  • patient_PLS_summary.csv

3. Results Visualization

We provide the visualization for MI and entropy results generated by the "analysis" module.

Display Commands for MCR results

Features

  • Statistical Testing: Performs pairwise Mann-Whitney U tests between samples
  • Multiple Testing Correction: Supports FDR and Bonferroni correction methods
  • Combined Visualizations: Creates multi-panel boxplots and heatmaps
  • Flexible Analysis: Customizable feature pairs and test parameters
  • Batch Processing: Support for automated analysis without display

Usage

Basic Usage

# Analyze with default settings
iTCR mcr-display --mcr_path example_outputs/mcr.pickle --save_dir figures

Advanced Options

# Use FDR correction with custom significance threshold
iTCR mcr-display --mcr_path example_outputs/mcr.pickle --adjust_method FDR --save_dir figures

# Custom feature pairs
iTCR mcr-display --mcr_path example_outputs/mcr.pickle --features "TRAV,TRBV;cdr3A,cdr3B" --save_dir figures

Parameters

Parameter Type Default Description
--mcr_path str Required Path to pickle file containing MCR data
--save_dir str figures/MCR_analysis Directory to save output figures
--features str None Custom feature pairs ("feat1,feat2;feat3,feat4") to display. Separate feature pairs using ';'
--adjust_method str Bonferroni Multiple testing correction (FDR/Bonferroni)
--no_adjust flag False Skip multiple testing correction
--significance_threshold float 0.05 P-value threshold for significance
--no_display flag False Batch mode without plot display
--output_results str None Save statistical results to CSV file
--verbose flag False Enable detailed output

Default Feature Pairs

The analysis includes these TCR feature combinations by default:

  • TRAV, TRBV - Alpha and beta V genes
  • cdr3A, cdr3B - Alpha and beta CDR3 sequences
  • TRAV, cdr3B - Alpha V gene with beta CDR3
  • cdr3A, TRBV - Alpha CDR3 with beta V gene
  • TRAJ, TRBJ - Alpha and beta J genes
  • cdr3A, TRBJ - Alpha CDR3 with beta J gene
  • TRAJ, cdr3B - Alpha J gene with beta CDR3

Statistical Analysis

Multiple Testing Correction

  • Bonferroni: Conservative correction for multiple comparisons
  • FDR: False Discovery Rate (Benjamini-Hochberg) correction
  • None: Raw p-values without correction

Output Files

Visualizations

  • combined_boxplots.pdf - Multi-panel boxplots showing MI value distributions
  • combined_heatmaps.png - P-value heatmaps with significance annotations

Statistical Results (Optional)

  • CSV file with columns: Feature1, Feature2, Sample1, Sample2, P_Value_Raw, P_Value_Adjusted, Test_Direction_Used, N_Sample1, N_Sample2

Interpretation

Boxplots

  • Show MCR value distributions across samples for each feature pair
  • Colored boxes represent different samples
  • Means are indicated by markers
  • Lower MCR values suggest stronger feature associations

Heatmaps

  • Gray cells represent no significant ($p \ge 0.05$).
  • Colored cells represent significant diferences ($p &lt; 0.05$). Red: The sample on the Left (Row) has a HIGHER value than the sample on the Bottom (Column). Blue: The sample on the Left (Row) has a LOWER value than the sample on the Bottom (Column).

Example Output

Display Commands for entropy results The `entropy_display.py` module provides comprehensive visualization and statistical analysis tools for Entropy data generated by TCR analysis.

Features

  • Statistical Testing: Performs pairwise Mann-Whitney U tests between samples
  • Multiple Testing Correction: Supports FDR and Bonferroni correction methods
  • Combined Visualizations: Creates multi-panel boxplots and heatmaps
  • Flexible Analysis: Customizable entropy features and test parameters
  • Batch Processing: Support for automated analysis without display

Usage

Basic Usage

# Analyze with default settings
iTCR entropy-display  --entropy_path example_outputs/entropy.pickle --save_dir figures

Advanced Options

# Use FDR correction with custom significance threshold
iTCR entropy-display --entropy_path example_outputs/entropy.pickle --adjust_method FDR --save_dir figures

# Custom entropy features
iTCR entropy-display --entropy_path example_outputs/entropy.pickle --features "cdr3A;cdr3B;TRAV|TRBV" --save_dir figures

Parameters

Parameter Type Default Description
--entropy_path str Required Path to pickle file containing Entropy data
--save_dir str figures/Entropy_analysis Directory to save output figures
--features str None Custom entropy features ("feat1;feat2;feat3|feat4") to display. Separate features using ';'
--adjust_method str Bonferroni Multiple testing correction (FDR/Bonferroni)
--no_adjust flag False Skip multiple testing correction
--significance_threshold float 0.05 P-value threshold for significance
--no_display flag False Batch mode without plot display
--output_results str None Save statistical results to CSV file
--verbose flag False Enable detailed output

Default Entropy Features

The analysis includes these TCR entropy features by default:

  • cdr3A - Alpha CDR3 entropy
  • cdr3B - Beta CDR3 entropy
  • TRAV - Alpha V gene entropy
  • TRBV - Beta V gene entropy
  • cdr3A|cdr3B - Conditional entropy of alpha CDR3 given beta CDR3
  • cdr3B|cdr3A - Conditional entropy of beta CDR3 given alpha CDR3
  • TRAV|TRBV - Conditional entropy of alpha V gene given beta V gene
  • TRBV|TRAV - Conditional entropy of beta V gene given alpha V gene

Statistical Analysis

Multiple Testing Correction

  • Bonferroni: Conservative correction for multiple comparisons
  • FDR: False Discovery Rate (Benjamini-Hochberg) correction
  • None: Raw p-values without correction

Output Files

Visualizations

  • combined_entropy_boxplots.pdf - Multi-panel boxplots showing entropy value distributions
  • combined_entropy_heatmaps.png - P-value heatmaps with significance annotations

Statistical Results (Optional)

  • CSV file with columns: Feature, Sample1, Sample2, P_Value_Raw, P_Value_Adjusted, Test_Direction_Used, N_Sample1, N_Sample2, Mean_Sample1, Mean_Sample2, Std_Sample1, Std_Sample2

Interpretation

Boxplots

  • Show entropy value distributions across samples for each feature
  • Colored boxes represent different samples
  • Means are indicated by markers
  • Higher entropy values suggest greater diversity/uncertainty

Heatmaps

  • Gray cells represent no significant ($p \ge 0.05$).
  • Colored cells represent significant diferences ($p &lt; 0.05$). Red: The sample on the Left (Row) has a HIGHER value than the sample on the Bottom (Column). Blue: The sample on the Left (Row) has a LOWER value than the sample on the Bottom (Column).

Example Output

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors