stcrpy.tcr_processing package
Subpackages
Submodules
stcrpy.tcr_processing.AGchain module
Created on 10 May 2017 @author: leem
Based on the AGchain class from ABDB.
stcrpy.tcr_processing.Chemical_components module
Created on 12 May 2017
@author: leem Based on the ABDB.AbPDB.Chemical_components module by dunbar.
Analyse the chemical component dictionary http://www.wwpdb.org/ccd.html
These are the types of chemical components in the pdb: ( grep “_chem_comp.type” components.cif | cut -c 50-120 | sort | uniq )
We will bin them into:
- peptide:
“D-beta-peptide, C-gamma linking” “D-gamma-peptide, C-delta linking” “D-peptide linking” “D-PEPTIDE LINKING” “D-peptide NH3 amino terminus” “D-PEPTIDE NH3 AMINO TERMINUS” “L-beta-peptide, C-gamma linking” “L-gamma-peptide, C-delta linking” “L-peptide COOH carboxy terminus” “L-PEPTIDE COOH CARBOXY TERMINUS” “L-peptide linking” “L-PEPTIDE LINKING” “L-peptide NH3 amino terminus” peptide-like PEPTIDE-LIKE “peptide linking” “PEPTIDE LINKING”
- nucleic-acid:
“DNA linking” “DNA LINKING” “DNA OH 3 prime terminus” “DNA OH 3 PRIME TERMINUS” “L-DNA LINKING” “L-RNA LINKING” “RNA linking” “RNA LINKING” “RNA OH 3 prime terminus”
- saccharide:
D-saccharide D-SACCHARIDE “D-saccharide 1,4 and 1,4 linking” “D-SACCHARIDE 1,4 AND 1,4 LINKING” L-saccharide L-SACCHARIDE “L-SACCHARIDE 1,4 AND 1,4 LINKING” saccharide SACCHARIDE
- non-polymer:
non-polymer NON-POLYMER
This has been done in resname_to_type
Common buffers/molecules in the PDB which are unlikely to be hapten antigens
- Method 1:
This list was taken from the supplementary material (Table 2) of: Visualizing ligand molecules in twilight electron density. C. X. Weichenberger, E. Pozharski and B. Rupp. Acta Cryst. (2013). F69, 195-200.
Acknowledgement - Anthony’s JC 03/04/13
Method 2:
The list of chemical component code to pdb code was taken from: http://ligand-expo.rcsb.org/ld-download.html Saved as: ./Antibody/AbPDB/dat/Resources/cc-to-pdb.tdd.txt
We look at the number of structures with each code. Distribution of number of pdb codes with the chemical component found at: ./Antibody/AbPDB/dat/Resources/Frequency_of_cc_in_pdb.pdf
We use the cut-off of 15. (and manually examine those which are over but under 50 and have mainly known Antibodies - just in case it’s a pet hapten antigen)
This is a harsh cut off! This is fine for analysing antibodies (as you are unlikely to have more than 15 bound to the same ag). However, change the cutoff for other purposes (suggest at least 200 for “common”)
There is still a problem if there is a rarely used code or a newly introduced code for buffer.
fix 120613 will chemical component dict api - runs to online database if cannot find it - obviously requires web access.
Acknowledgement - JP for suggesting the method. The following have been removed from the list as they are either common sugar or peptides
BGC saccharide NAG saccharide XYP saccharide XYS saccharide MAL saccharide MAN saccharide GLA saccharide GLC saccharide A2G saccharide LMT saccharide PE1 peptide F6P saccharide DPN peptide GAL saccharide BOG saccharide NGA saccharide FUC saccharide BMA saccharide SUC saccharide FUL saccharide NDG saccharide
I have updated method 1 list with method 2 list
It is left as a dictionary of dictionaries if we decide to add annotations
Functions provided are:
is_aa is_common_buffer is_carbohydrate is_nucleic_acid is_polymer get_type get_chemical_name
each take either a three letter code or a residue object as argument
- stcrpy.tcr_processing.Chemical_components.get_from_expo(residue)[source]
The PDB has a habit of updating …therefore, if we don’t have the three letter code try to get it from ligand expo database online.
- stcrpy.tcr_processing.Chemical_components.is_common_buffer(residue)[source]
Is the residue a common buffer? If it occurs in the L{common buffers<common_buffers>} list it is considered a common buffer.
- Parameters:
residue – A AbPDB residue object or residues identifier e.g. PO4
- Returns:
Flag if the residue is a common buffer.
stcrpy.tcr_processing.Entity module
Created on 9 May 2017 @author: leem
A modified Entity class based on SAbDab’s ABDB.AbPDB and Bio.PDB’s entity
- class stcrpy.tcr_processing.Entity.Entity(id)[source]
Bases:
EntityA modified entity object allows for direct writing of coordinates.
- copy()[source]
Copy has been played with a bit. For my purposes the version in 1.61 did not work as explicit copying of the child list meant that the child objects became referenced to both self and shallow. This may be due to overriding the residue and chain classes so may not be a bug in biopython.
When copying the child_list in the loop, I use the list to iterate over instead of the dictionary. This preserves the ordering of the children.
- save(output=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='utf-8'>, renumber=True, selection=False, remarks=True)[source]
Save the coordinates of the entity. Example: entity.save(“path/to/file/filename.pdb”) residue.save( “residue1.pdb” )
- Parameters:
output – Where to write coordinates to. Should be an an open file, string or sys.stdout. By default the output is written to stdout
renumber – Flag whether to renumber the atoms to IMGT scheme Default is to renumber the atoms so that the first is 1 etc. Use renumber = False to retain the original atom numbering from the pdb file
selection – Provide a selector object to select which children of the entity should be outputted. Selection should be a selector object from TcrPDB.Select. Some basic selector classes are provided in the module. More complex classes can be created by inheriting from these. If selection = False (default) all atoms in the entity are output
remarks – Flag to print out remarks generated by TcrPDB. Default TRUE
- transform(rot, tran)[source]
Apply rotation and translation to the atomic coordinates.
Example
>>> rotation=rotmat(pi, Vector(1,0,0)) >>> translation=array((0,0,1), 'f') >>> entity.transform(rotation, translation)
- Parameters:
rot – A right multiplying rotation matrix (3x3 Numeric array)
tran – the translation vector (size 3 Numeric array)
stcrpy.tcr_processing.Fragment module
Created on 9 May 2017 @author: leem Modified version of the ABDB.AbPDB.Fragment class
- class stcrpy.tcr_processing.Fragment.Fragment(id)[source]
Bases:
Entity- A modified Entity class that can be thought of as a way of grouping children:
- e.g. TCR (TCR object) -> TCRchain (TCRchain object) -> Fragment CDRB3 (Fragment object)
-> Residue B110 (Residue object)
Does not modify the parent/child attributes of its children. For instance, one might define a fragment and add residues to it in order to visualise them.
stcrpy.tcr_processing.Holder module
Created on 9 May 2017 @author: leem
A generic holder class that can be used to contain individual chains, etc.
stcrpy.tcr_processing.MHC module
Created on 30 Apr 2016
@author: leem, based on work by dunbar
The MHC class. This is similar to the Fab class.
- class stcrpy.tcr_processing.MHC.CD1(c1, c2)[source]
Bases:
MHCCD1 class. Holds paired CD1/B2M domains.
- class stcrpy.tcr_processing.MHC.MH1(c1, c2)[source]
Bases:
MHCType 1 MHC class. Holds paired MHC domains.
- class stcrpy.tcr_processing.MHC.MH2(c1, c2)[source]
Bases:
MHCType 2 MHC class. Holds paired MHC domains.
- class stcrpy.tcr_processing.MHC.MHC(c1, c2)[source]
Bases:
EntityMHC class. Holds paired MHC domains.
- class stcrpy.tcr_processing.MHC.MR1(c1, c2)[source]
Bases:
MHCMR1 class. Holds paired MR1/B2M domains.
- class stcrpy.tcr_processing.MHC.scCD1(c1)[source]
Bases:
MHCType 1 MHC class. Holds single chain MHC domains of type CD1 for Class I MHC if the identiifed chain is the double alpha helix, ie. CD1 without B2M.
stcrpy.tcr_processing.MHCchain module
Created on 30 Apr 2016 @author: leem
Based on the ABchain class from @dunbar
stcrpy.tcr_processing.Model module
Created on 9 May 2017 @author: leem
Based on the ABDB.AbPDB.Model class.
- class stcrpy.tcr_processing.Model.Model(identifier, serial_num=None)[source]
Bases:
Model,EntityOverride to use our Entity
- @change: __getitem__ changed so that single chains can be called as well as holder object from a model.
e.g. s[0][“B”] and s[0][“BA”] gets the B chain and the BA tcr respectively.
stcrpy.tcr_processing.Select module
Select.py Created on 9 May 2017 @author: leem
These are selection classes for the save method of the TcrPDB entity They are based on the ABDB.AbPDB.Select and Bio.PDB.PDBIO Selection classes
- class stcrpy.tcr_processing.Select.backbone[source]
Bases:
select_allSelect only backbone (no side chains) atoms in the structure. Backbone defined as “C”,”CA”,”N”,”CB” and “O” atom identifiers in amino acid (pdb notation)
- class stcrpy.tcr_processing.Select.cdr3[source]
Bases:
variable_onlySelect only CDR3.
- class stcrpy.tcr_processing.Select.fv_only_backbone[source]
Bases:
variable_only,backboneSelect the backbone atoms of the variable region. Example of combining selection classes.
- class stcrpy.tcr_processing.Select.select_all[source]
Bases:
objectDefault selection (everything) during writing - can be used as base class to implement selective output. This selects which entities will be written out.
stcrpy.tcr_processing.TCR module
Created on 3rd April 2024 Nele Quast based on work by Dunbar and Leem The TCR class.
- class stcrpy.tcr_processing.TCR.TCR(id)[source]
Bases:
EntityTCR class. Inherits from PDB.Entity. This is a base class for TCR strucutres, enabling antigen and MHC association. abTCR and gdTCR are the instantiated subclasses of this class.
- calculate_docking_geometry(mode='rudolph', as_df=False)[source]
Calculate docking geometry of TCR to MHC. This is a wrapper function for the TCRGeom class.
- Parameters:
mode (str, optional) – Mode for calculating the geometry. Options “rudolph”, “cys”, “com”. Defaults to “rudolph”.
as_df (bool, optional) – Whether to return as dictionary or dataframe. Defaults to False.
- Returns:
TCR to MHC geometry.
- Return type:
[dict, DataFrame]
- get_CDRs()[source]
Obtain complementarity determining regions (CDRs) from a TCR structure object as generator.
- Yields:
Fragment – TCR CDR regions
- get_MHC_allele_assignments()[source]
Retrieve MHC allele assignments for all TCR associated MHCs. This is a list of dictionaries with the MHC ID as key and the allele assignments as value.
- Returns:
dict with MHC chain ID as key and allele assignments as value
- Return type:
dict
- get_TCR_type()[source]
Get TCR type according to variable region assignments.
- Returns:
TCR type (abTCR, gdTCR, dbTCR)
- Return type:
str
- get_frameworks()[source]
Obtain framework regions from a TCR structure object as generator.
- Yields:
Fragment – TCR framework regions
- get_germline_assignments()[source]
Retrive germline assignments for all TCR chains. This is a dictionary with the chain ID as key and the germline assignments as value.
- Returns:
dict with TCR chain ID as key and germline assignments as value
- Return type:
dict
- get_germlines_and_alleles()[source]
Get all germline and allele assignments for TCR and MHC chains as a dictionary with the chain ID as key and the germline assignments as value.
- Returns:
Dictionary of TCR germline and MHC allele assignemnts with amino acid sequences.
- Return type:
dict
- get_interaction_heatmap(plotting_kwargs={}, **interaction_kwargs)[source]
Get interaction heatmap of TCR to MHC and peptide. Generates heatmap image. Plotting kwargs are passed to heatmap function.
- Parameters:
plotting_kwargs (dict, optional) –
save_as: path to save heatmap image to interaction_type: type of interaction (eg. saltbridge, h_bond) to plot. All interactions are plotted by default. antigen_name: name of antigen for plot title mutation_index: index of antigen residues to highlight in plot Defaults to {
save_as:None, interaction_type:None, antigen_name:None, mutation_index:None }.
interaction_kwargs – kwargs for TCRInteractionProfiler class. See TCRInteractionProfiler for details.
- get_pitch_angle(mode='cys')[source]
Returns TCR:pMHC complex pitch angle of TCR to MHC. See paper for details.
- Parameters:
mode (str, optional) – Mode for calculating the scanning angle. Options “rudolph”, “cys”, “com”. Defaults to “cys”.
- Returns:
Pitch angle of TCR to MHC in degrees
- Return type:
float
- get_scanning_angle(mode='rudolph')[source]
Returns TCR:pMHC complex scanning (aka crossing, incident) angle of TCR to MHC. See paper for details.
- Parameters:
mode (str, optional) – Mode for calculating the scanning angle. Options “rudolph”, “cys”, “com”. Defaults to “rudolph”.
- Returns:
Scanning angle of TCR to MHC in degrees
- Return type:
float
- is_bound()[source]
True or False if the TCR is associated with an antigen.
- Returns:
Whether TCR is associated with an antigen.
- Return type:
bool
- profile_peptide_interactions(renumber: bool = True, save_to: str = None, **kwargs) pd.DataFrame[source]
Profile the interactions of the peptide to the TCR and the MHC.
- Parameters:
renumber (bool, optional) – Whether to renumber the interacting residues. Defaults to True.
save_to (str, optional) – Path to save intraction data to as csv. Defaults to None.
- Returns:
Dataframe of peptide interactions
- Return type:
pd.DataFrame
- save(save_as=None, tcr_only: bool = False, format: str = 'pdb')[source]
Save TCR object as PDB or MMCIF file.
- Parameters:
save_as (str, optional) – File path to save TCR to. Defaults to None.
tcr_only (bool, optional) – Whether to save TCR only or to include MHC and antigen. Defaults to False.
format (str, optional) – Whether to save as PDB or MMCIF. Defaults to “pdb”.
- score_docking_geometry(**kwargs)[source]
Score docking geometry of TCR to MHC. This is a wrapper function for the TCRGeomFiltering class. The score is calculated as the negative log of the TCR:pMHC complex geometry feature probabilities based on the distributions fit by maximum likelihood estimation of TCR to Class I MHC strucutres from STCRDab. Please see the paper methods for details.
- Returns:
TCR:pMHC complex score as negative log of TCR:pMHC complex geometry feature probabilities
- Return type:
float
- class stcrpy.tcr_processing.TCR.abTCR(c1, c2)[source]
Bases:
TCRabTCR class. Inherits from TCR. This is a subclass of TCR for TCRs with alpha and beta chains.
- get_domain_assignment()[source]
Retrieve the domain assignment of the TCR as a dict with variable domain type as key and chain ID as value.
- Returns:
domain assignment from domain to chain ID, e.g. {“VA”: “D”, “VB”: “E”}
- Return type:
dict
stcrpy.tcr_processing.TCRIO module
- class stcrpy.tcr_processing.TCRIO.TCRIO[source]
Bases:
PDBIO- save(tcr: TCR, save_as: str = None, tcr_only: bool = False, format: str = 'pdb')[source]
Save structure to a file.
- Parameters:
file (string or filehandle) – output file
select (object) – selects which entities will be written.
Typically select is a subclass of L{Select}, it should have the following methods:
accept_model(model)
accept_chain(chain)
accept_residue(residue)
accept_atom(atom)
These methods should return 1 if the entity is to be written out, 0 otherwise.
Typically select is a subclass of L{Select}.
stcrpy.tcr_processing.TCRParser module
Created on 3 April 2024 @author: Nele Quast, based on leem
TCRParser object which is based on ABDB’s AntibodyParser and BioPython’s PDB parser.
- class stcrpy.tcr_processing.TCRParser.TCRParser(PERMISSIVE=True, get_header=True, QUIET=False)[source]
Bases:
PDBParser,MMCIFParser- get_tcr_structure(id, file, prenumbering=None, ali_dict={}, crystal_contacts=[])[source]
Post processing of the TCRPDB.Bio.PDB structure object into a TCR context.
id: a string to identify the structure file: the path to the .pdb file
- optional:
prenumbering: prenumbering for the chains in the structure.
stcrpy.tcr_processing.TCRStructure module
Created on 10 May 2017 @author: leem Based on the ABDB.AbPDB.AntibodyStructure class.
stcrpy.tcr_processing.TCRchain module
stcrpy.tcr_processing.annotate module
Created on 10 May 2017 @author: leem
Implementation to call anarci (built-in to STrDab) to annotate structures.
- stcrpy.tcr_processing.annotate.align_numbering(numbering, sequence_list, alignment_dict={})[source]
Align the sequence that has been numbered to the sequence you input. The numbered sequence should be “in” the input sequence. If not, supply an alignment dictionary.(align sequences and use get_alignment_dict(ali1,ali2))
- stcrpy.tcr_processing.annotate.align_scTCR_numbering(numbering, sequence_list, sequence_str)[source]
Align the sequence that has been numbered to a scTCR structure. :param numbering: numbered list of residues; this is usually a two-element list/tuple from TCRDB.anarci.number :param sequence_list: list of residues (e.g. from a structure) in its original numbering :param sequence_str: string form of sequence_list
- stcrpy.tcr_processing.annotate.annotate(chain)[source]
Annotate the sequence of a chain object from TCRDB.TcrPDB # e.g. if you have chains B, A and X, you want to force the annotator to return the annotation # for B and A but not for X (the antigen)
returns a dictionary which has the residue ids as key and the annotation as value or is False, and chain type which is B/A/G/D/MH1/GA/GB/B2M or False.
- stcrpy.tcr_processing.annotate.call_anarci(seq, allow={'A', 'B', 'B2M', 'D', 'G', 'GA', 'GA1', 'GA1L', 'GA2', 'GA2L', 'GB', 'MH1', 'MR1', 'MR2'})[source]
Use the ANARCI program to number the sequence.
- Parameters:
seq – An amino acid sequence that you wish to number.
- Returns:
numbering, chain type, germline information
- stcrpy.tcr_processing.annotate.cleanup_scTCR_numbering(numbering_dict, sequence_list)[source]
The scTCR numbering method, while useful for sequences with two domains, can have gaps in between (e.g. CD1 molecule of 4lhu). This is to close the gaps in the numbering so that residues that were unnumbered by anarci don’t move around during structural parsing (when they’re probably just connections between domains).
- Parameters:
numbering_dict – numbered dictionary from align_scTCR_numbering
sequence_list – sequence list from the structure for alignment.
- stcrpy.tcr_processing.annotate.easy_alignment(seq1, seq2)[source]
Function to align two sequences by checking if one is in the other. This function will conserve gaps.
- stcrpy.tcr_processing.annotate.extract_sequence(chain, selection=False, return_warnings=False, ignore_hets=False, backbone=False)[source]
Get the amino acid sequence of the chain. Residues containing HETATOMs are skipped –> Residues containing HETATOMs are checked as an amino acid.
Residues containing HETATOMs are checked to be amino acids and the single letter returned.
This works provided the residues in the chain are in the correct order.
- Parameters:
selection – a selection object to select certain residues
return_warnings – Flag to return a list of warnings or not
backbone – Flag whether to only show residues with a complete backbone (in the structure) or not.
- Returns:
aa tuple list and the sequence as a string.
- Return type:
The sequence in a resid
- stcrpy.tcr_processing.annotate.get_alignment_dict(ali1, ali2)[source]
Get a dictionary which tells you the index in sequence 2 that should align with the index in sequence 1 (key)
ali1: —-bcde-f— seq1: bcdef ali2: —abcd–f— seq2: abcdf
- alignment_dict={
0:1, 1:2, 2:3, 4:4 }
If the index is aligned with a gap do not include in the dictionary. e.g 1 in alignment_dict –> True e.g 3 in alignment_dict –> False
- stcrpy.tcr_processing.annotate.interpret(x)[source]
Function to interpret an annotation in the form H100A into the form ( 100, ‘A’ )