Methods

Methodology of the DR_bind DNA-binding residue prediction server

The DR_bind DNA-binding residue prediction webserver implements a structure-based DNA-binding residue prediction method based on (a) electrostatics, (b) conservation, and (c) geometry with the following rationale: (a) DNA-binding residues contain electropositive atoms, which would be in an unfavorable electrostatic environment in the absence of DNA or water; thus replacing one of these residues with a negatively charged Asp⁻/Glu⁻ would alleviate the electrostatic repulsion among the electropositive atoms in the gas phase. (b) DNA-binding residues and residues in the vicinity, which form a cluster of spatially interacting residues, are usually highly conserved within the same family due to their critical functional roles. (c) DNA-binding residues have been observed to be located on surface patches, as opposed to clefts/cavities for RNA-binding residues and enzyme substrates.

Definitions

An aa X is considered accessible for interacting with DNA if the percent ratio of its side-chain solvent-accessible surface area in the protein to that in the tripeptide, −Gly−X−Gly−, is >5%. MOLMOL was used to compute the relative solvent-accessible surface area of each aa from the protein structure using a solvent probe radius of 1.4 Å.

Geometry

Since DNA-binding sites are found on a protein surface, surface patches were generated by defining the Ca atom of each residue as an origin of a patch and including all residues whose atoms were within 10 Å of the origin in the patch. Non-identical patches with >5 solvent-accessible residues were used in computing the average electrostatic energy change and conservation (see below).

Electrostatics

Given al-residue DNA-binding protein structure, all Asp/Glu residues were deprotonated, while Arg/Lys residues were protonated; His residues were protonated or deprotonated depending on the availability of hydrogen-bond acceptors in the structure. Next, l mutant structures were generated by replacing Ala, Asn, Asp, Cys, Gly, Ser, Thr, or Val in the wild-type structure to Asp⁻ and the other residues to Glu⁻. The side chain replacements were carried out using SCWRL, followed by energy minimization with heavy constraints on all heavy atoms using AMBER to relieve any bad contacts. Based on the wild-type/mutant structures, the gas-phase (e = 1) electrostatic energy of the wild-type (E^elec_wt) or mutant (E^elec_mut) protein in the folded state relative to that in an extended reference state (E′ ^elec_wt or E′ ^elec_mut) was computed using AMBER with the all-hydrogen-atom AMBER force field. In this extended reference state, the residues do not interact with one another; hence, the electrostatic energy difference between the wild-type (E′ ^elec_wt) or mutant (E′ ^elec_mut) unfolded protein is equal to the difference between the electrostatic energies of the native residue at position i (E′ ^elec_i) and the corresponding mutant Asp⁻/Glu⁻ (E′ ^elec_D/E). The change in the gas-phase electrostatic energy ΔΔ^elec upon mutation of residue i to Asp⁻/Glu⁻ is given by:

ΔΔ^elec_i = (E^elec_mut,i− E^elec) − (E′ ^elec_D/E− E′ ^elec_i)

(1)

The average electrostatic energy change <ΔΔ^elec>_i of the N^aa_i residues comprising surface patch i was computed from:

<ΔΔ^elec>_i = ΔΔ^elec_j / N^aa_i

(2)

where the summation in Eq. (2) is over all residues in patch i.

Conservation

For a given DNA-binding protein, the conservation score _i, of residue i was obtained from the ConSurf-DB database or ConSurf server. The C_i score is an integer number, ranging from 1 (for a rapidly evolving, highly variable residue) to 9 (for a slowly evolving, conserved residue). The average conservation <C>_i of the N^aa_i residues comprising surface patch i was computed from:

<C>_i = Σ_j / N^aa_i

(3)

DNA-binding residue prediction

To determine the DNA-binding residues in a given protein, the distinct patches were ranked according to the <ΔΔ^elec>_i values so that the top-ranked cluster had the most favorable (most negative) <ΔΔ^elec>_i, whereas the bottom-ranked cluster had the least favorable <ΔΔ^elec>_i. Among the top 10% <ΔΔ^elec>_i-ranked surface patches, the three patches with the largest <C>_i values were selected and the constituent solvent-accessible residues were predicted to bind DNA.

DR_bind is hosted at The Institute of Biomedical Sciences, Academia Sinica, Taipei 11529, Taiwan.