Skip Navigation

Nucleic Acids Research 2005 33(Database Issue):D580-D582; doi:10.1093/nar/gki006
This Article
Right arrow Abstract Freely available
Right arrow Print PDF (112K) Freely available
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Commercial Re-use Guidelines
for Open Access NAR Content
Google Scholar
Right arrow Articles by Ball, C. A.
Right arrow Articles by Sherlock, G.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Ball, C. A.
Right arrow Articles by Sherlock, G.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

Nucleic Acids Research, 2005, Vol. 33, Database issue D580-D582
© 2005, the authors
Nucleic Acids Research, Vol. 33, Database issue © Oxford University Press 2005; all rights reserved

The Stanford Microarray Database accommodates additional microarray platforms and data formats

Catherine A. Ball1,*, Ihab A. B. Awad2, Janos Demeter1, Jeremy Gollub1, Joan M. Hebert3, Tina Hernandez-Boussard1, Heng Jin1, John C. Matese4, Michael Nitzberg1, Farrell Wymore1, Zachariah K. Zachariah1, Patrick O. Brown1,5 and Gavin Sherlock2

1 Department of Biochemistry and 2 Department of Genetics, 3 Stanford University School of Medicine, Stanford, CA, USA, 4 Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ, USA and 5 Howard Hughes Medical Institute, Stanford, CA, USA

* To whom correspondence should be addressed. Tel: +1 650 724 3028; Fax: +1 650 724 3701; Email: ball{at}genome.stanford.edu

Received September 3, 2004; Accepted September 13, 2004


    ABSTRACT
 TOP
 ABSTRACT
 INTRODUCTION
 RESULTS
 REFERENCES
 
The Stanford Microarray Database (SMD) (http://smd.stanford.edu) is a research tool for hundreds of Stanford researchers and their collaborators. In addition, SMD functions as a resource for the entire biological research community by providing unrestricted access to microarray data published by SMD users and by disseminating its source code. In addition to storing GenePix (Axon Instruments) and ScanAlyze output from spotted microarrays, SMD has recently added the ability to store, retrieve, display and analyze the complete raw data produced by several additional microarray platforms and image analysis software packages, so that we can also now accept data from Affymetrix GeneChips (MAS5/GCOS or dChip), Agilent Catalog or Custom arrays (using Agilent's Feature Extraction software) or data created by SpotReader (Niles Scientific). We have implemented software that allows us to accept MAGE-ML documents from array manufacturers and to submit MIAME-compliant data in MAGE-ML format directly to ArrayExpress and GEO, greatly increasing the ease with which data from SMD can be published adhering to accepted standards and also increasing the accessibility of published microarray data to the general public. We have introduced a new tool to facilitate data sharing among our users, so that datasets can be shared during, before or after the completion of data analysis. The latest version of the source code for the complete database package was released in November 2004 (http://smd.stanford.edu/download/), allowing researchers around the world to deploy their own installations of SMD.


    INTRODUCTION
 TOP
 ABSTRACT
 INTRODUCTION
 RESULTS
 REFERENCES
 
The Stanford Microarray Database (SMD) (1) (http://smd.stanford.edu) was initially developed in 1999 to serve a small team of researchers using spotted DNA microarrays for human and yeast research at Stanford University. Since then, it has become a research tool for a much larger scientific community using multiple microarray platforms to study a myriad of biomedical research problems. SMD now supports the research of more than 1000 users in over 260 laboratories at Stanford and around the world. These users have entered data generated from more than 50 000 microarrays used to study the biology of 34 organisms, published more than 190 papers referring to data in SMD and have made the complete raw data from more than 7000 microarrays freely available via the SMD website. The public data can be selected, viewed, downloaded and analyzed by the public using most of the tools that are available to registered SMD users. The source code for SMD has been downloaded and installed at several academic and private locations.

Here, we discuss some of the recent developments at SMD that have enabled us to accept microarray data from additional platforms and image analysis software, export data in MAGE-ML format and permit greater data sharing between researchers. In addition, we present information about the latest release of the SMD source code.


    RESULTS
 TOP
 ABSTRACT
 INTRODUCTION
 RESULTS
 REFERENCES
 
Microarray platforms and software
Until 2003, all data in SMD were obtained from two-channel cDNA microarrays extracted from scanned images using either GenePix (www.axon.com) or ScanAlyze (http://rana.lbl.gov/EisenSoftware.htm). The increased interest among SMD users in other microarray platforms and other data acquisition software provided impetus to accommodate new data formats in SMD. Specifically, we have added the ability to accept data from Agilent arrays acquired by Agilent's Feature Extraction software, gene expression data from Affymetrix arrays acquired by the Affymetrix Microarray Analysis Suite v5.0 (MAS5, Affymetrix), Gene Chip Operating System (GCOS, Affymetrix) or DNA-Chip Analyzer (dChip) (2) and two-color data acquired using SpotReader (http://www.nilesscientific.com/) software.

In order to accept data from additional microarray platforms and software packages, we were faced with several hurdles. First, the data models used to describe both the microarray designs and the associated results had to be re-designed andre-implemented. We decided to store all fields available from every software package that we supported, which can be up to several dozen, rather than store only the data fields common to many different software packages. Second, we modified the software for entry and retrieval of array designs and for microarray data entry, retrieval, display and analysis. The design and implementation are object-oriented and relatively easy to extend, so that additional platforms and data formats can be accommodated in the future. These features are available in the 11/04 software release.

Data export to public repositories
In close collaboration with the staff of ArrayExpress, a public repository for microarray data (3) (http://www.ebi.ac.uk/arrayexpress/), we have constructed and implemented a pipeline that converts sets of microarray data within SMD into MAGE-ML files that support the MIAME standards for information content (4,5) that can be directly deposited to ArrayExpress, and more recently GEO (6; http://www.ncbi.nlm.nih.gov/geo). We took advantage of software developed by members of the microarray informatics community (MAGE-stk; http://mged.sourceforge.net/), to create this pipeline, which translates data from SMD into the MAGE-OM structure, export it as MAGE-ML, and then test its formatting with the ArrayExpress MAGE-ML validator (ftp://ftp.ebi.ac.uk/pub/databases/arrayexpress/MAGEvalidator-DISTRIB/). The MAGE-ML files are then transferred via ftp to ArrayExpress and GEO where they can be immediately entered.

Several new tables and user interfaces were designed and added to allow SMD users to provide the annotation of their experiments and biological samples that are required for MIAME compliance. These include tools to annotate protocols, experimental factors and experimental designs. Using MGED Ontology terms (http://mged.sourceforge.net/ontologies/MGEDontology.php), SMD users can describe the overall experimental design and how each member of a set of microarrays fits into that design, such as time points in a time series experiment or tumor samples from many individuals in a molecular survey of a type of cancer. Users can also describe the biological properties and parameters, including HIPAA-compliant clinical data, of each sample and the procedures and protocols used to treat the sample, extract the RNA or DNA, amplify and label the nucleic acids, hybridize the arrays, and scan and acquire data from the resulting microarray image. In this way, each set of microarrays associated with a publication in SMD can be adequately annotated and easily submitted to a public data repository.

Tools for collaboration
The large community of researchers using SMD participates in widespread and active collaborations. SMD's tools for sharing data facilitate collaboration and accurate communication of experimental details. To complement existing tools for organizing and annotating shared data, we have implemented a ‘data repository’ that allows users to save and share the results of data analysis (Figure 1). Users are able to save the results of data selection, filtering, transformation and analysis at each of these different steps and then specify other users or groups who should be able to view the datasets. In addition to providing a means to help users save their work and facilitate collaboration, the data repository provides a jumping-off point for various analysis tools, such as hierarchical clustering and singular value decomposition.



View larger version (28K):
[in this window]
[in a new window]
 
Figure 1. Screenshot of SMD's Microarray Data Repository. The repository allows the users to store their data sets created within SMD, upload datasets from other sources, share them with colleagues and collaborators and jump off to a variety of data retrieval, selection, transformation and analysis options. The interface provides the option to view any other repository for which a user has permission to view data. SMD users can store datasets before and after filtering and centering in pre-clustering (.pcl) files. Using repository options, a user can view, download, cluster and filter data, delete data from the repository, edit descriptions and permissions for a data set, perform singular value decomposition and collapse the data based on human genes into UniGene clusters, sections of chromosome or other ‘synthetic genes’ constructed by the researcher. Clustered data can also be stored, and users have additional options to view a heat-map cluster using GeneXplorer (8) to view spot-image clusters displaying or to view both heat-map and spot-image clusters side by side.

 
Availability
An updated version of the source code for SMD, including all the functions described in this paper, was made publicly available in November 2004 (http://smd.stanford.edu/download/). The source code is freely available under the liberal, open-source MIT license (http://www.opensource.org/licenses/mit-license.php).

Future directions
In the future, we will update and release our software with greater frequency and with better support for users who are upgrading their SMD installations and must therefore migrate data from one database schema to another. In this way, other installations of SMD will be able to benefit from improvements and new tools in a timely manner that will require less work to deploy. We also plan to provide web services through BioMOBY (7) to allow for more creative and flexible use of SMD by researchers or data miners who wish to be able to automate complex queries of the available data. Finally, techniques for determining data quality in an automated fashion remain an active area of research at SMD. Automatically recognizing problematic data will allow researchers to reject it from analysis, correct errors and, ideally, identify and prevent potential sources of poor-quality data. The large body of data in SMD will provide excellent training and test sets for quality assurance studies.


    ACKNOWLEDGEMENTS
 
This work was supported by grants from the US National Institutes of Health to P.O.B. (5R01 CA77097) and to G.S. (1R01 HG002732).


    Notes
 
The online version of this article has been published under an open access model. Users are entitled to use, reproduce, disseminate, or display the open access version of this article for non-commercial purposes provided that: the original authorship is properly and fully attributed; the Journal and Oxford University Press are attributed as the original place of publication with the correct citation details given; if an article is subsequently reproduced or disseminated not in its entirety but only in part or as a derivative work this must be clearly indicated. For commercial re-use permissions, please contact journals.permissions{at}oupjournals.org.


    REFERENCES
 TOP
 ABSTRACT
 INTRODUCTION
 RESULTS
 REFERENCES
 

  1. Gollub,J., Ball,C.A., Binkley,G., Demeter,J., Finkelstein,D.B., Hebert,J.M., Hernandez-Boussard,T., Jin,H., Kaloper,M., Matese,J.C. et al. ( (2003) ) The Stanford Microarray Database: data access and quality assessment tools. Nucleic Acids Res., , 31, , 94–96.[Abstract/Free Full Text] .

  2. Li,C. and Hung Wong,W. ( (2001) ) Model-based analysis of oligonucleotide arrays: model validation, design issues and standard error application. Genome Biol., , 2, , RESEARCH0032. .

  3. Brazma,A., Parkinson,H., Sarkans,U., Shojatalab,M., Vilo,J., Abeygunawardena,N., Holloway,E., Kapushesky,M., Kemmeren,P., Lara,G.G. et al. ( (2003) ) ArrayExpress—a public repository for microarray gene expression data at the EBI. Nucleic Acids Res., , 31, , 68–71.[Abstract/Free Full Text] .

  4. Spellman,P.T., Miller,M., Stewart,J., Troup,C., Sarkans,U., Chervitz,S., Bernhart,D., Sherlock,G., Ball,C., Lepage,M. et al. ( (2002) ) Design and implementation of microarray gene expression markup language (MAGE-ML). Genome Biol., , 3, , RESEARCH0046. .

  5. Brazma,A., Hingamp,P., Quackenbush,J., Sherlock,G., Spellman,P., Stoeckert,C., Aach,J., Ansorge,W., Ball,C.A., Causton,H.C. et al. ( (2001) ) Minimum information about a microarray experiment (MIAME)-toward standards for microarray data. Nature Genet., , 29, , 365–371.[CrossRef][Web of Science][Medline] .

  6. Edgar,R. Domrachev,M. and Lash,A.E. ( (2002) ) Gene Expression Omnibus: NCBI gene expression and hybridization array data repository. Nucleic Acids Res., , 30, , 207–210.[Abstract/Free Full Text] .

  7. Wilkinson,M.D. and Links,M. ( (2002) ) BioMOBY: an open source biological web services proposal. Brief. Bioinform, , 3, , 331–341.[Abstract/Free Full Text] .

  8. Rees,C.A., Demeter,J., Matese,J.M., Botstein,D. and Sherlock,G. ( (2004) ) GeneXplorer: an interactive web application for microarray data visualization and analysis. BMC Bioinformatics, , 5, , 141.[CrossRef][Medline] .


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
Mol Biol EvolHome page
M. J. Lercher and C. Pal
Integration of Horizontally Transferred Genes into Regulatory Interaction Networks Takes Many Million Years
Mol. Biol. Evol., March 1, 2008; 25(3): 559 - 567.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
I. Takigawa and H. Mamitsuka
Probabilistic path ranking based on adjacent pairwise coexpression for metabolic transcripts analysis
Bioinformatics, January 15, 2008; 24(2): 250 - 257.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
A. Pati, Y. Jin, K. Klage, R. F. Helm, L. S. Heath, and N. Ramakrishnan
CMGSDB: integrating heterogeneous Caenorhabditis elegans data sources using compositional data mining
Nucleic Acids Res., January 11, 2008; 36(suppl_1): D69 - D76.
[Abstract] [Full Text] [PDF]


Home page
J. Bacteriol.Home page
T. Fuhrer, L. Chen, U. Sauer, and D. Vitkup
Computational Prediction and Experimental Verification of the Gene Encoding the NAD+/NADP+-Dependent Succinate Semialdehyde Dehydrogenase in Escherichia coli
J. Bacteriol., November 15, 2007; 189(22): 8073 - 8078.
[Abstract] [Full Text] [PDF]


Home page
Genome ResHome page
T. Shlomi, M. Herrgard, V. Portnoy, E. Naim, B. O. Palsson, R. Sharan, and E. Ruppin
Systematic condition-dependent annotation of metabolic genes
Genome Res., November 1, 2007; 17(11): 1626 - 1633.
[Abstract] [Full Text] [PDF]


Home page
BloodHome page
L. Bullinger, F. G. Rucker, S. Kurz, J. Du, C. Scholl, S. Sander, A. Corbacioglu, C. Lottaz, J. Krauter, S. Frohling, et al.
Gene-expression profiling identifies distinct subclasses of core binding factor acute myeloid leukemia
Blood, August 15, 2007; 110(4): 1291 - 1300.
[Abstract] [Full Text] [PDF]


Home page
Ann Rheum DisHome page
T C T M van der Pouw Kraan, C A Wijbrandts, L G M van Baarsen, A E Voskuyl, F Rustenburg, J M Baggen, S M Ibrahim, M Fero, B A C Dijkmans, P P Tak, et al.
Rheumatoid arthritis subtypes identified by genomic profiling of peripheral blood cells: assignment of a type I interferon signature in a subpopulation of patients
Ann Rheum Dis, August 1, 2007; 66(8): 1008 - 1014.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
Z. Hu, D. M. Ng, T. Yamada, C. Chen, S. Kawashima, J. Mellor, B. Linghu, M. Kanehisa, J. M. Stuart, and C. DeLisi
VisANT 3.0: new modules for pathway visualization, editing, prediction and construction
Nucleic Acids Res., July 13, 2007; 35(suppl_2): W625 - W632.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
M. L. Green and P. D. Karp
Using genome-context data to identify specific types of functional associations in pathway/genome databases
Bioinformatics, July 1, 2007; 23(13): i205 - i211.
[Abstract] [Full Text] [PDF]


Home page
JCOHome page
D. Juric, N. J. Lacayo, M. C. Ramsey, J. Racevskis, P. H. Wiernik, J. M. Rowe, A. H. Goldstone, P. J. O'Dwyer, E. Paietta, and B. I. Sikic
Differential Gene Expression Patterns and Interaction Networks in BCR-ABL-Positive and -Negative Adult Acute Lymphoblastic Leukemias
J. Clin. Oncol., April 10, 2007; 25(11): 1341 - 1349.
[Abstract] [Full Text] [PDF]


Home page
Appl. Environ. Microbiol.Home page
R. Fong, Z. Hu, C. R. Hutchinson, J. Huang, S. Cohen, and C. Kao
Characterization of a Large, Stable, High-Copy-Number Streptomyces Plasmid That Requires Stability and Transfer Functions for Heterologous Polyketide Overproduction
Appl. Envir. Microbiol., February 15, 2007; 73(4): 1296 - 1307.
[Abstract] [Full Text] [PDF]


Home page
Infect. Immun.Home page
A. E. Fouts and J. C. Boothroyd
Infection with Toxoplasma gondii Bradyzoites Has a Diminished Impact on Host Transcript Levels Relative to Tachyzoite Infection
Infect. Immun., February 1, 2007; 75(2): 634 - 642.
[Abstract] [Full Text] [PDF]


Home page
Infect. Immun.Home page
D. N. Baldwin, B. Shepherd, P. Kraemer, M. K. Hall, L. K. Sycuro, D. M. Pinto-Santini, and N. R. Salama
Identification of Helicobacter pylori Genes That Contribute to Stomach Colonization
Infect. Immun., February 1, 2007; 75(2): 1005 - 1016.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
F. Kolpakov, V. Poroikov, R. Sharipov, Y. Kondrakhin, A. Zakharov, A. Lagunin, L. Milanesi, and A. Kel
CYCLONET--an integrated database on cell cycle regulation and carcinogenesis
Nucleic Acids Res., January 12, 2007; 35(suppl_1): D550 - D556.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
L. Bulow, M. Schindler, and R. Hehl
PathoPlant(R): a platform for microarray expression data to analyze co-regulated genes involved in plant defense responses
Nucleic Acids Res., January 12, 2007; 35(suppl_1): D841 - D845.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
C. M. Smith, J. H. Finger, T. F. Hayamizu, I. J. McCright, J. T. Eppig, J. A. Kadin, J. E. Richardson, and M. Ringwald
The mouse Gene Expression Database (GXD): 2007 update
Nucleic Acids Res., January 12, 2007; 35(suppl_1): D618 - D623.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
C. Galuschka, M. Schindler, L. Bulow, and R. Hehl
AthaMap web tools for the analysis and identification of co-regulated genes
Nucleic Acids Res., January 12, 2007; 35(suppl_1): D857 - D862.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
J. Demeter, C. Beauheim, J. Gollub, T. Hernandez-Boussard, H. Jin, D. Maier, J. C. Matese, M. Nitzberg, F. Wymore, Z. K. Zachariah, et al.
The Stanford Microarray Database: implementation of new analysis tools and open source release of software
Nucleic Acids Res., January 12, 2007; 35(suppl_1): D766 - D770.
[Abstract] [Full Text] [PDF]


Home page
Brief BioinformHome page
A. Ng, B. Bursteinas, Q. Gao, E. Mollison, and M. Zvelebil
Resources for integrative systems biology: from data through databases to networks and dynamic system models
Brief Bioinform, December 1, 2006; 7(4): 318 - 330.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
C. Huttenhower, M. Hibbs, C. Myers, and O. G. Troyanskaya
A scalable method for integration and functional analysis of multiple microarray datasets
Bioinformatics, December 1, 2006; 22(23): 2890 - 2897.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
X. Yu, J. Lin, D. J. Zack, and J. Qian
Computational analysis of tissue-specific combinatorial gene regulation: predicting interaction between transcription factors in human tissues
Nucleic Acids Res., October 18, 2006; 34(17): 4925 - 4936.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
J. Chen and B. Yuan
Detecting functional modules in the yeast protein-protein interaction network
Bioinformatics, September 15, 2006; 22(18): 2283 - 2290.
[Abstract] [Full Text] [PDF]


Home page
JCOHome page
F. G. Rucker, L. Bullinger, C. Schwaenen, D. B. Lipka, S. Wessendorf, S. Frohling, M. Bentz, S. Miller, C. Scholl, R. F. Schlenk, et al.
Disclosure of Candidate Genes in Acute Myeloid Leukemia With Complex Karyotypes Using Microarray-Based Molecular Characterization
J. Clin. Oncol., August 20, 2006; 24(24): 3887 - 3894.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
I. W. Manfield, C.-H. Jen, J. W. Pinney, I. Michalopoulos, J. R. Bradford, P. M. Gilmartin, and D. R. Westhead
Arabidopsis Co-expression Tool (ACT): web server tools for microarray-based gene expression analysis.
Nucleic Acids Res., July 1, 2006; 34(Web Server issue): W504 - W509.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
L. S Hon and A. N Jain
A deterministic motif finding algorithm with application to the human genome
Bioinformatics, May 1, 2006; 22(9): 1047 - 1054.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
P. L. Whetzel, H. Parkinson, H. C. Causton, L. Fan, J. Fostel, G. Fragoso, L. Game, M. Heiskanen, N. Morrison, P. Rocca-Serra, et al.
The MGED Ontology: a resource for semantics-based description of microarray experiments
Bioinformatics, April 1, 2006; 22(7): 866 - 873.
[Abstract] [Full Text] [PDF]


Home page
Proc. Natl. Acad. Sci. USAHome page
A. P. Gerber, S. Luschnig, M. A. Krasnow, P. O. Brown, and D. Herschlag
Genome-wide identification of mRNAs associated with the translational regulator PUMILIO in Drosophila melanogaster
PNAS, March 21, 2006; 103(12): 4487 - 4492.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
A. Elfilali, S. Lair, C. Verbeke, P. La Rosa, F. Radvanyi, and E. Barillot
ITTACA: a new database for integrated tumor transcriptome array and clinical data analysis
Nucleic Acids Res., January 1, 2006; 34(suppl_1): D613 - D616.
[Abstract] [Full Text] [PDF]


Home page
Genome ResHome page
W.J. Kent, F. Hsu, D. Karolchik, R. M. Kuhn, H. Clawson, H. Trumbower, and D. Haussler
Exploring relationships and mining data with the UCSC Gene Sorter
Genome Res., May 1, 2005; 15(5): 737 - 741.
[Abstract] [Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow Print PDF (112K) Freely available
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Commercial Re-use Guidelines
for Open Access NAR Content
Google Scholar
Right arrow Articles by Ball, C. A.
Right arrow Articles by Sherlock, G.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Ball, C. A.
Right arrow Articles by Sherlock, G.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?