Nucleic Acids Research Advance Access published online on October 11, 2007
Nucleic Acids Research, doi:10.1093/nar/gkm839
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Database Issue |
LigASite—a database of biologically relevant binding sites in proteins with known apo-structures
1Center for Structural Biology and Bioinformatics, Université Libre de Bruxelles (U. L. B.), Bld du Triomphe – CP 263, 1050 Bruxelles, Belgium 2Biomolecular Structure and Modelling Unit, University College of London, Gower Street, London WC1E 6BT, UK and 3Structural Biology and Biochemistry Program, Hospital for Sick Children, 555 University Avenue, Toronto, Ontario M5G 1X8, Canada
* To whom correspondence should be addressed. Tel: +44 (0) 20 7679 3890; Fax: +44 (0) 20 7679 7193; Email: benoit{at}biochem.ucl.ac.uk.
Received August 13, 2007. Revised September 24, 2007. Accepted September 25, 2007.
Better characterization of binding sites in proteins and the ability to accurately predict their location and energetic properties are major challenges which, if addressed, would have many valuable practical applications. Unfortunately, reliable benchmark datasets of binding sites in proteins are still sorely lacking. Here, we present LigASite (LIGand Attachment SITE), a gold-standard dataset of binding sites in 550 proteins of known structures. LigASite consists exclusively of biologically relevant binding sites in proteins for which at least one apo- and one holo-structure are available. In defining the binding sites for each protein, information from all holo-structures is combined, considering in each case the quaternary structure defined by the PQS server. LigASite is built using simple criteria and is automatically updated as new structures become available in the PDB, thereby guaranteeing optimal data coverage over time. Both a redundant and a culled non-redundant version of the dataset is available at http://www.scmbb.ulb.ac.be/Users/benoit/LigASite. The website interface allows users to search the dataset by PDB identifiers, ligand identifiers, protein names or sequence, and to look for structural matches as defined by the CATH homologous superfamilies. The datasets can be downloaded from the website as Schema-validated XML files or comma-separated flat files.
Present address: Biomolecular Structure and Modelling Unit, University College of London, Gower Street, London WC1E 6BT, UK