stcrpy.tcr_formats package
Submodules
stcrpy.tcr_formats.tcr_formats module
- stcrpy.tcr_formats.tcr_formats.get_sequences(entity: Bio.PDB.Entity, amino_acids_only: bool = True, residues_to_include: list = None) dict[source]
Extract seqeunces from strcuture objects as dictionary.
- Parameters:
entity (Bio.PDB.Entity) – Stucture object
amino_acids_only (bool, optional) – Whether to remove non-amino acid ‘X’ from sequences. Defaults to True.
residues_to_include (list, optional) – List of residue IDs to include in sequence. Defaults to None.
- Raises:
e – AttributeError if entity has no attribute .get_chains(). The assuems entity is chain level and returns single sequence
- Returns:
Dictionary of amino acid sequences, keyed by chain ID in strcuctre entity.
- Return type:
dict
- stcrpy.tcr_formats.tcr_formats.to_AF3_json(tcr: TCR, tcr_only: bool = True, save: bool = True, save_dir: str = '', name: str = None, V_domain_only: bool = False) dict[source]
Converts TCR object to dict in Alphafold 3 JSON input format, ie. amino acid sequences. Eg: {
“name”: Job name, “modelSeeds”: [], “sequences”: [
{“proteinChain”: {“sequence”: AAAAAAAAAAAAAA, “count”: 1}}, {“proteinChain”: {“sequence”: AAAAAAAAAAAAAA, “count”: 1}}, {“proteinChain”: {“sequence”: AAAAAAAAAAAAAA, “count”: 1}},
],
}
- Parameters:
tcr (TCR) – TCR structure object
tcr_only (bool, optional) – Whether to include TCR sequence only, excluding antigen and MHC. Defaults to True.
save (bool, optional) – Whether to save dict as JSON file. Defaults to True.
save_dir (str, optional) – Directory to save JSON files to. Defaults to “”.
name (str, optional) – TCR ID to use as name for AF3 job. Defaults to None.
V_domain_only (bool, optional) – Include full TCR sequence or only the variable domain (1-128 IMGT numbering). Defaults to False.
- Returns:
Nested dictionary of AF3 sequence inputs.
- Return type:
dict
stcrpy.tcr_formats.tcr_haddock module
- class stcrpy.tcr_formats.tcr_haddock.HADDOCKFormatter(save_dir: str = None)[source]
Bases:
object- pMHC_to_haddock(mhc: MHC, antigen: list[Antigen])[source]
Bound reformatting of MHC and antigen structures object to HADDOCK compatible PDB file.
- Parameters:
mhc (MHC) – MHC structure object
antigen (Antigen) – Antigen structure object
- tcr_to_haddock(tcr: TCR)[source]
Bound reformatting of TCR structure object to HADDOCK compatible PDB file.
- Parameters:
tcr (TCR) – TCR structure object
- write_TCR_pdb_file(tcr: TCR, save_dir: str)[source]
Writes TCR structure to a PDB file in a format HADDOCK can deal with. Generates a PDB file, a mapping from the old to the new numbering,
and a list of active residues to restrain the HADDOCK simulation.
- Parameters:
tcr (TCR) – The TCR structure.
save_dir (str) – The directory to save the files (default is current directory).
- write_antigen_pdb_file(mhc: MHC, antigen: list[Antigen], save_dir: str)[source]
Writes the antigen PDB file for docking with HADDOCK. Generates a PDB file, a file containing the renumbering mapping, and a list of active residues to restrict the simulation.
- Parameters:
mhc (MHC) – MHC structure object.
antigen (list[Antigen]) – List containing antigen chain. Should be length 1.
save_dir (str, optional) – The directory to save the PDB file. Defaults to “.”.
- Returns:
The filename of the saved antigen PDB file.
- Return type:
str
- class stcrpy.tcr_formats.tcr_haddock.HADDOCKResultsParser(haddock_results_dir: str, tcr_renumbering_file: str = None, pmhc_renumbering_file: str = None)[source]
Bases:
object- get_haddock_scores() pandas.DataFrame[source]
Retrieve HADDOCK energy scoes and RMSD evaluations from simulation output:
Columns:
“haddock_score”,
“interface_rmsd”,
“ligand_rmsd”,
“frac_common_contacts”,
“E_vdw”,
“E_elec”,
“E_air”,
“E_desolv”,
“ligand_rmsd_2”,
- “cluster_id”,
- Raises:
FileNotFoundError: HADDOCK file contianing scores not found.
- Returns:
pandas.DataFrame: DataFrame with HADDOCK simulation metrics.
- renumber_all_haddock_predictions()[source]
Renumber all haddock predictions contained in results folder. Requires standard HADDOCK output directory format.
- renumber_haddock_prediction(docked_prediction_file: str, haddock_renumbering_file: str, antigen_renumbering_file: str = None) Model[source]
Renumber the HADDOCK prediction based on the renumbering files.
- Parameters:
docked_prediction_file (str) – Path to the docked prediction file.
haddock_renumbering_file (str) – Path to the HADDOCK renumbering file.
antigen_renumbering_file (str, optional) – Path to the antigen renumbering file. Needed for TCR only PDBs with no antigen. Defaults to None.
- Returns:
The renumbered HADDOCK prediction.
- Return type:
Bio.PDB.Model.Model
- Raises:
ValueError – If the renumbering index is not found in the renumbering file.
- stcrpy.tcr_formats.tcr_haddock.imgt_insertion_char_to_int(char: str) int[source]
Converts an IMGT insertion character to an integer.
- Parameters:
char (str) – The IMGT insertion character.
- Returns:
The corresponding integer value.
- Return type:
int
- stcrpy.tcr_formats.tcr_haddock.parse_renumbered_line(line: str) tuple[source]
Parses a renumbered line from a file and extracts the chain ID, original numbering, and HADDOCK numbering.
- Parameters:
line (str) – The renumbered line to parse.
- Returns:
A tuple containing the chain ID, original numbering, and HADDOCK numbering.
- Return type:
tuple
Example
line = “(O,( ,3, ),( ,203, )” result = parse_renumbered_line(line) # Output: (O)’, (‘’, ‘3’, ‘’), (‘’, ‘203’, ‘’))
- stcrpy.tcr_formats.tcr_haddock.sort_residues_by_imgt_numbering(residues: list[<module 'Bio.PDB.Residue' from '/home/quast/miniconda3/envs/test-stcrpy/lib/python3.12/site-packages/Bio/PDB/Residue.py'>]) list[<module 'Bio.PDB.Residue' from '/home/quast/miniconda3/envs/test-stcrpy/lib/python3.12/site-packages/Bio/PDB/Residue.py'>][source]
Sort residues in order by IMGT numbering.
- Parameters:
residues (list[Bio.PDB.Residue]) – List of IMGT numbered residues.
- Returns:
Sorted list of IMGT numbered residuess.
- Return type:
list[Bio.PDB.Residue]