Example of usage
================
MSCI is a Python package designed to evaluate the information content of peptide fragmentation spectra. Our objective was to compute an information-content index for all peptides within a given proteome. This would allow us to devise data acquisition and analysis strategies that generate and prioritize the most informative fragment ions for peptide quantification.
#Download MSCI package and necessary installations
.. code:: python
#!git clone https://github.com/proteomicsunitcrg/MSCI.git
#! pip install matchms
# do not restart session if asked (press cancel matchms since probably you already have matchms installed )
#%cd MSCI
#import sys
#sys.path.append('/content/MSCI')
.. code:: python
! pip install MSCI==0.2.0
.. parsed-literal::
Collecting MSCI==0.2.0
Downloading MSCI-0.2.0-py2.py3-none-any.whl.metadata (903 bytes)
Requirement already satisfied: Click>=7.0 in /usr/local/lib/python3.10/dist-packages (from MSCI==0.2.0) (8.1.7)
Successfully installed MSCI-0.2.0 gitdb-4.0.11 gitpython-3.1.43 pydeck-0.9.1 smmap-5.0.1 streamlit-1.37.1 tenacity-8.5.0 watchdog-4.0.2
.. parsed-literal::
Requirement already satisfied: biopython in /usr/local/lib/python3.10/dist-packages (1.84)
Requirement already satisfied: numpy in /usr/local/lib/python3.10/dist-packages (from biopython) (1.26.4)
Requirement already satisfied: matchms in /usr/local/lib/python3.10/dist-packages (0.27.0)
Requirement already satisfied: deprecated>=1.2.14 in /usr/local/lib/python3.10/dist-packages (from matchms) (1.2.14)
Import
------
.. code:: python
from MSCI.Preprocessing.Koina import PeptideProcessor
from MSCI.Grouping_MS1.Grouping_mw_irt import process_peptide_combinations
from MSCI.Preprocessing.read_msp_file import read_msp_file
from MSCI.Similarity.spectral_angle_similarity import process_spectra_pairs
from MSCI.data.digest import parse_fasta_and_digest, tryptic_digest, peptides_to_csv
from matchms.importing import load_from_msp
import random
import numpy as np
import pandas as pd
Generate predicted dataset
---------------------------
Parse fasta file
----------------
.. code:: python
result = parse_fasta_and_digest("https://raw.githubusercontent.com/proteomicsunitcrg/MSCI/refs/heads/main/tutorial/sp_human_2023_04.fasta", digest_type="trypsin")
peptides_to_csv(result, "random_tryptic_peptides.txt")
Download the list of peptides of interest
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. code:: python
import random
# List of standard amino acids
amino_acids = 'ACDEFGHIKLMNPQRSTVWY'
# Function to generate a single tryptic peptide
def generate_tryptic_peptide(min_length=8, max_length=20):
length = random.randint(min_length, max_length - 1)
peptide = ''.join(random.choices(amino_acids, k=length))
peptide += random.choice('KR')
return peptide
# Generate a list of 90 random tryptic peptides
tryptic_peptides = [generate_tryptic_peptide() for _ in range(90)]
# Generate 5 pairs of peptides that are permutations of each other and print them
permuted_pairs = []
for _ in range(5):
peptide = generate_tryptic_peptide()
# Ensure the peptide has at least 2 characters to swap
if len(peptide) < 2:
continue
# Select two different positions to swap
pos1, pos2 = random.sample(range(len(peptide) - 1), 2)
permuted_peptide_list = list(peptide)
permuted_peptide_list[pos1], permuted_peptide_list[pos2] = permuted_peptide_list[pos2], permuted_peptide_list[pos1]
permuted_peptide = ''.join(permuted_peptide_list)
tryptic_peptides.append(peptide)
tryptic_peptides.append(permuted_peptide)
permuted_pairs.append((peptide, permuted_peptide))
# Ensure the last peptide meets the length requirement
last_peptide_length = random.randint(5, 20)
last_peptide = ''.join(random.choices(amino_acids, k=last_peptide_length))
tryptic_peptides[-1] = last_peptide
# Shuffle the list to mix the pairs with the other peptides
random.shuffle(tryptic_peptides)
# Save the peptides to a file
with open('random_tryptic_peptides.txt', 'w') as f:
for peptide in tryptic_peptides:
f.write(f"{peptide}\n")
print("Generated 100 random tryptic peptides with permutation pairs and saved to 'random_tryptic_peptides.txt'.")
.. parsed-literal::
Generated 100 random tryptic peptides with permutation pairs and saved to 'random_tryptic_peptides.txt'.
Predict with Koina
~~~~~~~~~~~~~~~~~~
If available your own list of peptides
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
.. code:: python
processor = PeptideProcessor(
input_file="random_tryptic_peptides.txt",
collision_energy=30,
charge=2,
model_intensity="Prosit_2020_intensity_HCD",
model_irt="Prosit_2019_irt"
)
processor.process('random_tryptic_peptides.msp')
Load dataset
------------
.. code:: python
# You can use your own spectra
File= 'random_tryptic_peptides.msp'
spectra = list(load_from_msp(File))
.. parsed-literal::
2024-08-22 13:30:02,993:WARNING:matchms:add_precursor_mz:No precursor_mz found in metadata.
Group within MS1 tolerance
----------------
.. code:: python
mz_tolerance = 1
irt_tolerance = 5
mz_irt_df = read_msp_file(File)
Groups_df = process_peptide_combinations(mz_irt_df, mz_tolerance, irt_tolerance, use_ppm=False)
Groups_df
.. parsed-literal::
Results DataFrame Columns: Index(['index1', 'index2', 'peptide 1', 'peptide 2', 'm/z 1', 'm/z 2',
'iRT 1', 'iRT 2'],
dtype='object')
.. raw:: html
|
index1 |
index2 |
peptide 1 |
peptide 2 |
m/z 1 |
m/z 2 |
iRT 1 |
iRT 2 |
| 0 |
2 |
15 |
FTCQIAHVCPHFNNPK/2 |
IDIDKYGKAISACHPPK/2 |
928.440166 |
928.490379 |
50.206707 |
49.247311 |
| 1 |
8 |
19 |
RTNYPMFEYHK/2 |
TLPRMTKYYGVR/2 |
743.350811 |
742.905754 |
35.316872 |
34.458534 |
| 2 |
46 |
73 |
HQEEAMMFHPLMNKNNTFR/2 |
QSAICREAEQTKFNMVSKFR/2 |
1188.045732 |
1187.093736 |
61.910671 |
62.716576 |
Calculate similarity within fragment tolerance
----------------
.. code:: python
Groups_df.columns = Groups_df.columns.str.strip()
index_array = Groups_df[['index1','index2']].values.astype(int)
result = process_spectra_pairs(index_array, spectra, mz_irt_df, tolerance =0, ppm=10)
result.to_csv("output.csv", index=False)
result
.. parsed-literal::
0.002814877157520823
0.0
0.0025644450471453695
.. raw:: html
|
index1 |
index2 |
peptide 1 |
peptide 2 |
m/z 1 |
m/z 2 |
iRT 1 |
iRT 2 |
similarity_score |
| 0 |
2 |
15 |
FTCQIAHVCPHFNNPK/2 |
IDIDKYGKAISACHPPK/2 |
928.440166 |
928.490379 |
50.206707 |
49.247311 |
0.002815 |
| 1 |
8 |
19 |
RTNYPMFEYHK/2 |
TLPRMTKYYGVR/2 |
743.350811 |
742.905754 |
35.316872 |
34.458534 |
0.000000 |
| 2 |
46 |
73 |
HQEEAMMFHPLMNKNNTFR/2 |
QSAICREAEQTKFNMVSKFR/2 |
1188.045732 |
1187.093736 |
61.910671 |
62.716576 |
0.002564 |
Plot results
---------
Plot spectra of interest using matchms
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. code:: python
import matplotlib.pyplot as plt
print(mz_irt_df.iloc[19])
print(mz_irt_df.iloc[36])
spectra[19].plot_against(spectra[36])
plt.savefig('spectra_comparison.png')
.. parsed-literal::
Name MRIGTPEPWSTQSDKR/2
MW 944.970342
iRT 41.258202
Name: 19, dtype: object
Name QAIMSISYHSCYNMFR/2
MW 975.936599
iRT 93.540787
Name: 36, dtype: object