Atom Specification Language (ASL) |
To provide a flexible way to define sets of atoms in complex macromolecular systems
To serve as the basis of the Sets facility in Maestro
To allow atom specification from textual input, which some users find faster then picking atoms from the main window.
entry > molecule > chain > residue > atom
The atom specification language is made up of five classes. Each is listed below [with their minimum acceptable abbreviations shown outside the square brackets]
e[ntry]
This is the top level class in the language. The term entry is used to mean all the atoms in the Workspace associated with a single entry in the current project.
m[olecule]
The term molecule is used in the normal chemical sense meaning all atoms that are connected by a single covalent path.
c[hain]
This corresponds to a chain as specified in the PDB file format. Note that this chain may be a subset of a molecule (e.g. when chains are linked by disulfide linkages)
r[esidue]
An arbitrary collection of one or more covalently bound atoms within a molecule, such as the monomer units in a polymer or the amino acids in a protein.
a[tom]
A single atom.
Each class is optional. If absent all entities of that type are matched.
A complete specification is a class name and some property specified by a property name and property list. The syntax is:
class.property propertylist
All names of properties and characters in property lists are treated in a case insensitive manner. Wildcards are supported for atom and set names. A '*' matches zero or more characters and a '?' matches any single character. You can include comments in a specification by placing a '#' character before the text you wish to hide.
Items in a property list may be separated by comma, white space or both. Ranges (lower-upper) may be used where appropriate. Unterminated ranges are taken to include all available numbers. For example if there are four molecules in the system then the specifications:
mol. 2, 3, 4 mol. >=2 mol. 2-4 mol. >1
are equivalent. In a similar manner, the following specifications are equivalent.
mol. 1, 2, 3 mol. <=3 mol. 1-3 mol. <4
There are predefined ASL labels for class and property
designations. The labels, which are typically just the
words for the actual values they represent, can be
abbreviated. For example the ASL expression atom.ptype
could be abbreviated a.pt
.
The standard ASL class and property designations are listed below. All labels shown below have their minimum acceptable abbreviations shown outside the square brackets:
e[ntry]
i[d]
The word id
can be completely omitted, because this is the
default. Thus entry.id 234
and entry. 234
are
equivalent. Note, however, that the '.' is still required even if
id
is not included. A valid property list is a list of entry
ids. Wildcard characters are permitted. For example:
entry.i 123
entry.id 123?
n[ame]
A valid property list is a list of entry names. Wildcard characters are permitted. For example:
entry.n e1
entry.name recep, lig*
m[odulo]
This property can be used to select every nth entry from a set of entries. For example, the expression:
e.m 10 1
selects entry number 1, 11, 21, and so on.
idm[odulo]
This property can be used to select every nth entry based on their entry ID value. For example, the expression:
entry.idmodulo 10 1
selects entries whose entry ID is 1, 11, 21, and so on.
m[olecule]
[number]
Because numbers are the only molecular properties specifiable
using ASL expressions, the word number
can be completely omitted
(i.e. mol.number 1
and mol. 1
are equivalent). Note,
however, that the '.' is still required even if the property name is not
included. A valid property list for this property is a set of
numbers or a range. For example:
mol. 1-4 mol. 1,2,3,4
m[odulo]
This property can be used to select every nth molecule. For example, the expression:
mol.modulo 10 1
selects molecule number 1, 11, 21, and so on.
e[ntrynum]
This property can be used to select a molecule by its entry-relative numbering. For example, the expression:
mol.entrynum 1
selects the first molecule in each entry.
a[toms]
This property can be used to select a molecule by its size in terms of number of atoms. For example, the expression:
mol.atoms 300
selects the molecules that have exactly 300 atoms. The expression
mol.atoms 300-500
selects molecules with between 300 and 500 atoms.
w[eight]
This property can be used to select a molecule by its molecular weight. For example, the expression:
mol.weight 218.10
selects a molecule with a molecular weight of exactly 218.10. The expression:
mol.weight >= 300.0
selects molecules with molecular weight greater than 300.0
m[odulo]
This property can be used to select every nth molecule. For example, the expression:
mol.mod 10 1
selects molecule 1, 11, 21 etc.
c[hain]
This class designation allows you to specify atoms using chain attributes.
[name]
Because names are the only chain properties that are specifiable by ASL,
the word name
can be omitted. For example, the expressions
chain.name A
and chain. A
are equivalent. Note,
however, that the '.
' is required even if the property name is not
included. A valid property list for this property is a single character
representing a PDB chain name. Some examples of equivalent acceptable
expressions are:
chain.name A chain. A c. A
r[esidue]
This ASL class designation allows you to specify atoms based on residue properties. Combine with one of the following property specifications.
[number]
Residue numbers can be specified in ASL expressions by specifying only a property list, not the property itself. For example,
res. 1 2 3
are valid specifications that return atoms in residues 1, 2 or 3.
Residue numbers can be negative, so a range must be clearly distinguished from a negative number. For example,
res. 2-4selects residues 2 and -4, but either or the following specifies all residues from 2 to 4
res. 2- 4 res. 2 - 4
pt[ype]
The ptype
property designator can be used to specify atoms based
on the three-letter residue PDB code. This is the default for non-numeric
characters in the property list, so the expression res. arg
and
res.ptype arg
are equivalent. A valid property list for
ptype
is comprised of three-character tokens. For example:
res.ptype gly,val,ala res. gly val ala
m[type]
The mtype
designator can be used to specify residues using
MacroModel one-letter representations. A valid property list for
mtype
contains the MacroModel one-character tokens. For
example:
res.mtype g,v,a res.m g,v,a
sec[ondary_structure]
This ASL property allows you to specify atoms based on the secondary structure type of the residue. The property list must consist of one of the following descriptor types:
h[elix]
— returns atoms in helix regions s[trand]
— returns atoms in strand regionsl[oop]
— returns atoms in loop regionsSyntax examples:
residue.sec helix residue.sec hel, str res.sec h l
po[larity]
This ASL property allows you to specify atoms based on residue polarity. The property list must consist of one of the following descriptor types:
h[ydrophobic]
returns atoms in hydrophobic residues pol[ar]
returns atoms in polar residues pos[itive]
returns atoms in residues with positive formal
chargesn[egative]
returns atoms in residues with negative formal
chargesSyntax examples:
residue.polarity hydrophobic residue.pol pos,neg res.pol h pos neg
pos[ition]
The position
descriptor allows you to specify atoms by
fractional position of the residue. The property list must include two real
numbers representing a fractional range of residue numbers. For example, if
there are 100 residues numbered from 1-100, the expression:
residue.pos 0.0 0.1
returns residues 1 to 10.
i[nscode]
The inscode
descriptor allows you to specify atoms by the
insertion code of the residue. A property list should include one-character
tokens representing insertion codes. For example:
residue.inscode a residue.num 25 residue.inscode a
a[tom]
The atom
class designator allows you to specify atoms according
to their characteristics. Use this designator in combination with one of the
properties described below. Note that property lists containing either ptypes or
numbers may be used without explicit property specification. For example, the
following are valid:
atom. 1,2,3 atom. CA
and returns respectively atoms 1,2 and 3 and any alpha carbons.
pt[ype]
The ptype
property specification allows you to designate atoms
by their PDB atom names. A valid property list consists of acceptable PDB
names. PDB name is the default property for non-numeric property list components
and, as such, the word ptype
may be omitted. However, the
'.
' is required. The following expressions are equivalent and
return the backbone atoms in a structure:
atom.ptype N,CA,C,O atom. N,CA,C,O a. n,ca,c,o
Note: see below for a discussion of how PDB atom names are specified and matched. Wildcards as described above can be applied to ptypes.
na[me]
The name
property specification allows you to designate atoms by
name. The property list must contain valid atom names. A valid atom name could
be a string of any length that:
Examples:
atom.name the_36th_carbon
atom.na C15, O:66, H-77
atom.na C*
(returns atoms with name starting with C) atom.nam ??0*
(returns atoms whose names contain 0
(zero) as the third character) n[umber]
number
property specification allows you to designate atoms by
number. The property list must contain integers or a range of integers. Atom
number is the default property when property lists contain integers, so the
number
property designator can be omitted. The following
expressions are equivalent and return the atoms numbered 1, 2, 3, and 4.
atom.num 1,2,3,4 a. 1 2 3 4 atom. 1-4
mo[lnum]
The molnum
property delineator facilitates atom numbering by
molecule. The property list must be a list or a range of integers. For example,
the expression:
atom.molnum 1
returns the first atom in each molecule, while the expression
atom.molnum 1-10
returns the first 10 atoms in each molecule.
en[trynum]
The entrynum
property delineator facilitates atom numbering by
entry. The property list must be a list or a range of integers. For example, the
expression:
atom.entrynum 1
returns the first atom in each entry, while the expression
atom.entrynum 1-10
returns the first 10 atoms in each entry.
m[type]
This property specification allows you to specify atoms using MacroModel
atom types. A valid property list for the mtype
property
consists of MacroModel atom types. The following expression is valid
and specifies sp2 carbons and oxygens.
atom.mtype C2,O2
e[lement]
The element
property allows you to specify atoms by element
type. A valid property list for the element contains standard periodic table
symbols. The following expression is valid and defines all carbons and
oxygens.
atom.ele C,O
a[ttachments]
This designator allows you to specify atoms by the number of bonds they have. The property list must contain a single integer in the range 1-6, but >, <, and = signs may also be used. The expression:
atom.att 1
returns all terminal atoms. The expression:
atom.att <=2
returns all atoms with 2 or fewer bonds.
ato[micnumber]
This designator allows you to specify atoms by their atomic number. The property list must contain integers only. Ranges of integers and >, <, and = signs may also be used. The expression:
atom.atomicnum 1
returns all hydrogen atoms. The expression:
atom.ato 1-6
returns all atoms in the range H to C.
c[harge]
The charge
designator allows you to identify atoms by their
partial charges. A valid property list contains a value or range of floating
point values. The expression:
atom.charge 0.400
returns atoms with partial charges of 0.400. The expression:
atom.charge -0.6--0.4
returns atoms with partial charges -0.6 to -0.4, while
atom.charge <0.0
returns atoms with negative partial charges, and
atom.charge >=0.5
returns atoms with charges of 0.5 or greater.
f[ormalcharge]
The formalcharge
designator allows you to identify atoms by
their formal charges. A valid property list contains a value or range of
integer values. The expression:
atom.formalcharge 0
returns atoms with formal charges of 0.0. The expression:
atom.f -2 - -1
returns atoms with formal charges -2 to -1, while
atom.formal <0
returns atoms with negative formal charges, and
atom.formalcharge >=1
returns atoms with formal charges of 1 or greater.
d[isplayed]
The displayed
designator specifies atoms depending on whether or
not they are currently displayed in Maestro. This descriptor requires no
property list. For example, the expression:
atom.displayed
returns the set of all displayed atoms.
s[elected]
The selected
designator specifies atoms depending on whether or
not they are currently selected in Maestro. This descriptor requires no
property list. For example, the expression:
atom.selected
returns the set of all atoms that are selected in the Workspace.
Some structures may have additional properties available. These are
referenced directly by their data names appended onto the atom.
class. These properties are either of integer, real, boolean or string type and
the datanames are encoded as beginning with i_
, r_
,
b_
or s_
respectively. It is possible to use these
atom properties in conjunction with any other ASL expression. Any atoms that
don't have these properties associated with them will never match.
Some examples of using the ASL to address these properties are:
atom.i_my_integer_prop 1-4 atom.b_my_boolean_prop atom.r_my_real_prop < 4.0 atom.s_my_string_prop LIG_
A number of operators are supported:
Boolean AND (set intersection)
The Boolean and
operator returns the set of atoms that meets
all the conditions defined in the ASL specifications. The syntax for
use of the and
operator is:
spec1 and
spec2
where spec1 and spec2 are valid atom specifications. For example, the expression:
mol. 1 and atom. CA
returns the set of all the alpha carbons of molecule 1. The expression:
res.num 1-100 and res. ala
returns all alanines in residues with numbers in the range 1-100.
Boolean OR (set union)
The Boolean or
operator returns the set of atoms that meets the
requirements of the first specification set OR the requirements of the
second. The syntax for this operations is:
spec1 or
spec2
where spec1 and spec2 are valid atom specifications. For example, the expression:
mol. 1 or atom.ptype CA
returns the set of all atoms that are in molecule number 1, or are alpha carbons. The expression:
res.num 1-100 or res.ptype ala
returns atoms in all residues with numbers in the range 1-100 and any alanines.
Boolean NOT
The Boolean not
operator returns the set of all atoms that do
not match the given specification. The syntax for this operation is:
not
spec1
where spec1 is a valid atom specification. This returns the set of atoms that are not part of those defined by spec1. For example, the expression:
not atom. CA,C,N,O
returns a set containing all side chain atoms.
fillres
and fillmol
fillres
and fillmol
, can be
used to "fill out" a given atom set so that the new set is defined by
residue or molecular boundaries. For example:
fillres atom.num 1,100,40
returns all the atoms in residues of which atoms 1,100 and 40 are members. In a similar way:
fillmol atom.num 1,100,40
will return all the atoms in molecules of which atoms 1,100 and 40 are members.
within
and beyond
The operators within
and beyond
can be used to
define sets of atoms based on their proximity to atoms in a previously defined
set. The syntax for these operators is:
within
distance
spec1
beyond
distance
spec2
where spec1 and spec2 are a valid atom specifications. When
used with the within
operator, the distance is inclusive,
i.e. atoms that are less than or equal to the specified distance from atoms
defined by spec1 are returned. This includes the atoms being used for
reference.
In an expression containing the beyond
operator, only atoms farther
than the specified distance from atoms in spec2 are returned. For
example, the expression:
within 5.0 mol. 1
returns the set of all atoms that are within 5 Å of molecule 1. The expression:
beyond 5.0 mol. 2
returns all atoms that are farther than 5 Å from molecule 2. Thus,
within
and beyond
return complementary sets.
The combination of fillres
and within
or
beyond
is especially powerful.
For instance, the expression:
fillres within 5.0 mol. 1
produces a set containing the atoms of all complete residues that have atoms within 5 Å of molecule 1, whereas the expression:
within 5.0 mol. 1
returns the reference set of all atoms that are within 5 Å of molecule 1 and those that are part of molecule 1. The expression:
fillres beyond 5.0 mol. 1
produces a set containing the atoms of all complete residues that have
atoms beyond 5 Å of molecule 1. This set overlaps with that generated
by fillres within 5.0 mol. 1
: both sets include residues that span
the 5 Å boundary.
The 'and' operator, when used with within
and
beyond
, can be used to allow more specificity. For example, in the
expression:
mol. 2 and within 5.0 mol. 1
returns the set of all atoms of molecule 2 that are within 5 Å of molecule 1.
withinbonds
The withinbonds
operator finds all atoms within a certain number of
bonds of the reference set. The syntax is:
withinbonds
num_bonds
spec
For example, the ASL expression
withinbonds 4 atom. 1
finds all the atoms that are within four bonds of atom 1.
beyondbonds
The beyondbonds operator is used to find all atoms beyond a certain number of bonds of the reference set. The syntax is:
beyondbonds
num_bonds
spec
For example, the ASL expression
beyondbonds 4 atom. 1
finds all the atoms that are in the same molecule as atom 1 but beyond four bonds of atom 1.
The order of priority of operators is (in decreasing order):
not
/fillres
/fillmol
and
/or
within
/beyond
If two equal-priority operators are used in a single ASL expression, they are evaluated in the order in which they are encountered, left to right. For example, the expression:
within 5.0 mol. 1 or mol. 2
returns the set of all atoms that are within 5.0 Å of either molecule 1 or molecule 2 ('or' has higher priority). The expression:
not atom.ptype CA,C,O,N and mol. 1
returns the set of molecule 1 side chain atoms. The following expression will define all alpha carbons and atoms in hydrophobic residues of molecule 1
atom.ptype CA or mol. 1 and not res.pol polar
Parentheses can be used to override the order of evaluation. For example, the expression
not (atom.ptype CA,C,O,N or mol. 1)
produces all atoms either not in the backbone or not in molecule 1.
When no operator is specified, the following operations are assumed:
atom.ptype CA,CB
and
operator is assumed. For example, the following
expressions are equivalent, and both return the alpha carbons in chain A of
molecule 1.
mol. 1 chain. A atom.ptype CA mol. 1 and chain. A and atom.ptype CA
The names of existing sets may be used in expressions if they are prefixed
with the word set
. For example, if two sets having the names S1 and
S2 are defined as:
S1: mol. 1
S2: atom.ptype C,O,N,CA
the following would be valid atom specifications:
set S1 and set S2 set S1 or set S2 within 5.0 set S1
The following strategy must be used to specify atoms using PDB atom names. Before matching an unquoted name that begins with a non-numeric character, a blank character is inserted in front of it, and it is padded with blanks from the right so that there is a total of four characters.
Examples:
CA,C
CA
"," C
" CG1,CG2
CG1
"," CG2
" Initial blank characters are not added to unquoted names that begin with numbers. However, right-padding characters are added so that there is a total of four characters.
Example:
CA,1HB,2HB
CA
","1HB
","2HB
"Names with either double or single quotes are treated as is, except that they are right-padded if necessary with blanks so that each name has a total of four characters.
Examples:
CA,"CA"
CA
","CA
" " N A"
N A
" The difference in the first example is significant: the first matches an alpha carbon, the second matches a calcium.
Atoms not yet present in a structure:
If a structure in the Workspace has only 100 atoms when the ASL definition:
atom.num 1,8,44,101,103
is issued, Maestro simply matches the atoms numbered 1,8 and 44. If additional atoms are subsequently added to the structure, the atoms bearing the numbers 101 and 103 are added to the previously defined set.
Aliasing:
Maestro allows you to define your own aliases, using either the Command Input Area or the Command Aliases panel. Maestro converts all aliases into their corresponding commands before performing operations involving the aliased commands. It has been left to the user to ensure that aliases produce sensible results. Some aliases are supplied with the distribution. They are:
Operator: and
Aliases: intersection
, INTERSECTION
, &
Operator: or
Aliases: UNION
, union
, |
Operator: not
Aliases: !
Class Designator: mol.
Aliases: MOL
, mol
Class Designator: atom.
Aliases: ATOM
, atom
Class Designator: res.
Aliases: RES
, res
Class Designator: chain.
Aliases: CHAIN
, chain
ASL Definition: atom. ca,c,n,h,o
Aliases: BACKBONE
, backbone
ASL Definition: not (atom.pt ca,c,n,h,o)
Aliases: SIDECHAIN
, sidechain
ASL Definition: "/H2-O3-H2/ or atom.mtype OW"
Aliases: WATER
, water
The order in which entries are included into and excluded from the Workspace affects the molecule numbers. For example, if you have two entries called "A" and "B" and you include into an empty Workspace first A and then B, the molecule numbers will be 1 for A and 2 for B; however, if you first include B and then A, the molecule numbers will be 1 for B and 2 for A.
This means that an expression such as mol. 1
matches
different atoms in each of the above cases. In most cases it makes more sense
to use entry ids or entry names.
For example, if you have an inhibitor and a
receptor that are in different entries and wish to have a ribbon appear
on only the receptor, use the entry name in the ASL expression, not
the molecule number. This will ensure that when the receptor is
included that it, and only it, will be used to generate the ribbons.
Different inclusion order of entries in the Workspace now result in
the same matching atoms. So for ribbons with a receptor called
receptor
it would be more useful to use entry.name
receptor
as the ASL definition.
This section gives some examples of the use of the ASL in real-life situations. Note that while these examples all use lower-case, the ASL expressions themselves are not case sensitive.
Defining a set to refer to a ligand and/or receptor.
The exact command will depend on the nature of your system. If the ligand and the receptor are separate entries then it will suffice to use
set ligand entry.name
ligand_name
where ligand_name is the name of the entry that contains the ligand. Similarly
set receptor entry.name
receptor_name
for the receptor with entry called receptor_name.
In order to define sets that will work with multiple ligands it is also possible to define the ligand as everything that is not part of the receptor. A definition of:
set ligand not set receptor
identifies the ligand as anything that is not part of the receptor.
If the ligand and the receptor are part of the same entry then molecule numbers are the best way to define the ligand and the receptor. Assuming the receptor is molecule 1 and the ligand molecule 2:
set ligand mol.num 2 set receptor mol.num 1
Note however that the use of molecule numbers in set definitions should be avoided where possible as these depend on the order in which the project entries are included into the Workspace. If it is possible to use entry names, then these should be used.
The subsequent examples assume that sets for the receptor and the ligand have been defined using one of the methods defined above.
The set of atoms within a given distance of the ligand.
One common task is to do something with the set of atoms within
a given distance of the ligand. For example to only display those atoms
or to include them in a substructure region for a MacroModel calculation.
These examples use the displayonlyatom
command but the ASL
that follows can be used with any other command that uses ASL.
To only display atoms within 5.0 Å of the ligand:
displayonlyatom within 5.0 set ligand
A common variation is to display complete residues which have any of their atoms within a given distance of the ligand:
displayonlyatom fillres within 5.0 set ligand
It is also possible to restrict the expression so that it only applies to
receptor atoms within a given distance of the ligand. Here the Boolean
and
operator is used to restrict the displayed atoms to the
receptor only:
displayonlyatom set receptor and fillres within 5.0 set ligand
Because this is a lengthy expression it's often convenient to make this into a set itself:
set active_site set receptor and fillres within 5.0 set ligand
An equivalent form of this is:
set active_site (! set ligand) & fillres within 5.0 set ligand
Sidechain and backbone.
The ASL has standard aliases for the definition of side chain and backbone atoms in proteins. For example to only display the atoms of the backbone:
displayonlyatom backbone
These aliases can be used with operators to build up more complicated expressions. For example to only display the side chains of the receptor:
displayonlyatom sidechain and set receptor
To display only the side chains of the atoms within 5.0 Å of the ligand:
displayonlyatom sidechain and set receptor and fillres within 5.0 set ligand
Atoms of a given type.
There are a variety of ways to specify atoms of a given type. For example to specify all carbons, nitrogens and oxygens the following is used:
atom.ele C,N,O
To specify non-hydrogen atoms:
not atom.ele H
To specify the alpha carbons in a protein:
atom.ptype CA
To specify all sp2 carbons there are two choices. The first relies on
knowing that the MacroModel atom type for such an atom is C2
and
using:
atom.mtype C2
The other (assuming no formally charged or radical carbons are present) uses the number of attachments to the atom:
atom.ele C and atom.att 3
To specify polar hydrogens:
atom.ele H and not /C0-H0/
or
atom.ele H and not atom.mtype H1
Water molecules.
The ASL has a standard alias water
. For example to delete all
water molecules the Maestro command is:
delete atom water
Restricting an operation to the atoms that are currently displayed in the Workspace.
Often you will be working with only a subset of the atoms in the Workspace
displayed. If an operation is to be performed only on the atoms that are
displayed then the atom.displayed
property can be used. For example
to change the color to green of all the atoms currently displayed in the
Workspace and to leave alone the undisplayed Workspace atoms:
coloratom color=green atom.disp
To only do it for the atoms that are displayed and in the receptor:
coloratom color=green atom.disp and receptor
Specifying molecules.
All molecules with between 30 and 100 atoms:
mol.atoms 30-100
All molecules with over 100 atoms:
mol.atoms > 100
All molecules with a molecular weight over 300:
mol.weight > 300.0
All molecules that contain a halogen:
fillmol atom.ele F,Cl,Br,I
Specifying atoms based on a linear-substructure notation.
The ASL supports the use of a SMILES-like linear substructure notation to specify atoms with a particular bonding arrangement. The atoms are referred to by MacroModel atom types, but there are wildcard types that can be used to allow the expression to apply to any atoms of a given element type.
Some examples:
Any five-membered ring:
/00-00-00-00-00-1/
Aromatic six-membered carbon rings(C2 is sp2 carbon):
/C2-C2*C2-C2*C2-C2*1/
Amide groups:
/C2(*O2)-N2/
Methyl groups:
/C3(-H1)(-H1)(-H1)/
Water:
/H2-O3-H2/
Guanidinium group:
/N2(-H3)-C2(*N4(-H4)(-H4))-N2(-H3)(-H3)/
Using wildcard characters.
Most string-type property values can use wildcard characters. Some examples:
All PDB atom names beginning with C
atom.ptype C*
All forms of the histidine residue:
res.ptype HI*
All entries that begin with lig
:
entry.name lig*
Using SMARTS expressions.
The ASL supports the use of a SMARTS expression. All atoms that match this expression are considered part of the set. The syntax for this is:
smarts.
smarts_expression
Some examples:
All three-carbon chains
smarts. CCC
All ring nitrogens:
smarts. [R] and atom.ele N
All six-membered carbon rings:
smarts. C1CCCCC1
|