Nucleic Acids Research Advance Access originally published online on May 3, 2007
Nucleic Acids Research 2007 35(10):e72; doi:10.1093/nar/gkl1133
Nucleic Acids Research, 2007, Vol. 35, No. 10 e72
© 2007 The Author(s)
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/2.0/uk/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
Three methods for optimization of cross-laboratory and cross-platform microarray expression data
Phillip Stafford1 and
Marcel Brun2
1Biodesign Institute, Arizona State University, Center for Innovations in Medicine, Tempe, AZ, USA and 2Translational Genomics Research Institute, Computational Biology Division, Phoenix, AZ, USA
Received April 21, 2006. Revised December 8, 2006. Accepted December 13, 2006.
 |
ABSTRACT
|
|---|
Microarray gene expression data becomes more valuable as our
confidence in the results grows. Guaranteeing data quality becomes
increasingly important as microarrays are being used to diagnose
and treat patients (14). The MAQC Quality Control Consortium,
the FDA's Critical Path Initiative, NCI's caBIG and others are
implementing procedures that will broadly enhance data quality.
As GEO continues to grow, its usefulness is constrained by the
level of correlation across experiments and general applicability.
Although RNA preparation and array platform play important roles
in data accuracy, pre-processing is a user-selected factor that
has an enormous effect. Normalization of expression data is
necessary, but the methods have specific and pronounced effects
on precision, accuracy and historical correlation. As a case
study, we present a microarray calibration process using normalization
as the adjustable parameter. We examine the impact of eight
normalizations across both Agilent and Affymetrix expression
platforms on three expression readouts: (1) sensitivity and
power, (2) functional/biological interpretation and (3) feature
selection and classification error. The reader is encouraged
to measure their own discordant data, whether cross-laboratory,
cross-platform or across any other variance source, and to use
their results to tune the adjustable parameters of their laboratory
to ensure increased correlation.
 |
BACKGROUND
|
|---|
Expression arrays have progressed to a point where low technical
variance, low background noise and a high degree of accuracy
have encouraged the development of array-based medical devices
that predict drug response, relapse potential or general prognosis
(
24). Normalization is a critical pre-processing step
for most array technologies, due to the known biases. As normalization
methods get more sophisticated and perhaps more specialized,
the list of pros and cons for each grows. The array user should
be aware of the bottom-line consequences of the normalization
methods available today.
Affymetrix (Affymetrix Inc, Santa Clara, CA, USA) and Agilent (Agilent Technologies, Santa Clara, CA, USA) are leaders in expression array manufacturing. They use quite different approaches to the construction, layout, optimization, hybridization, image acquisition and data extraction methods. Much of the difference that we see is attributable to the difference between in situ probe synthesisphotolithography (light-directed) versus liquid-based (ink-jet) oligonucleotide synthesis. Reports have found both poor (510) and good (6,1122) cross-platform correlation, but the MAQC consortium have generally found that proper sample preparation is sufficient to dramatically enhance multi-lab and multi-platform correlations (16,23,24). Quality control rules (2527) tell us that one could fix a high-quality RNA source and identify all other variables that could cause discordant data. With that logic, we propose a system that fixes the RNA source and changes data normalization methods in order to estimate their effect on data precision, classifier error and biological interpretation. The system we developed is a simple analysis that both graphically and quantitatively shows how adjustable parameters (in this case normalization) affect discordance. Although many publications have proposed somewhat esoteric methods for measuring cross-platform reproducibility, we believe that a simple, easy-to-understand analysis will not only highlight most sources of variance, but will also enable the user to visualize how process-control techniques improve reproducibility.
Normalization methods and cross-platform comparisons
How two arrays from different manufacturers correlate with each other depends in large part on how they respond to factors that cause ectopic hybridization. Agilent arrays have mostly full-length 60-mer probes versus mostly <25-mers on Affymetrix arrays, the difference primarily being due to the stepwise yield between shadow-masking and liquid in situ synthesis. Long oligo probes tend to disallow mishybridization due to increased hybridization and wash stringency; 25-mers and shorter are less well adapted to discriminate short mishybridization products, often showing up in partially degraded samples. Normalization cannot fix data obtained from degraded samples, but the analyses we propose enable one to spot patterns that implicate degraded RNA, and to pick a normalization method that may mitigate the most egregious effects.
For Affymetrix arrays, dChip PM and dChip PMMM (28) are very popular model-based approaches (MBEI) that rely on weighted average of PMMM differences, or an adjusted PM value (Perfect Match/MisMatch). dChip can either include or exclude mismatch data and then normalize using an invariant set method or quantile:quantile; both accommodate deviations in intensity-dependent variance quite well. GC-RMA and RMA (Robust Multi-Array Averaging) (2932) apply a type of variance stabilization that sums probes from all experiments in an analysis set and computes an average. GC-RMA weights the stronger G:::C bonds over A::T, yielding moderately higher precision in cases where the thermodynamics of the probe:target complex play a major role in hybridization. MAS5 (Microarray Suite 5) is a fairly conservative method that represents the manufacturer's suggested correction for mishybridization that occurs on the order of single mismatch destabilization energy,
G = 3.2 kCal. Signals from a mismatch probe are subtracted from a perfect match probe and total signal is calculated using a one step Tukey's biweight estimate after the highest and lowest probe values are discarded. RAW Affymetrix data, summarized by taking the median of all PM probes, make an excellent control for our comparisons since the biases that make normalization so important become abundantly clear. Many of these algorithms are included in the Affycomp library in Bioconductor (33).
Agilent arrays were originally optimized for two-color analysis but a one-color protocol is now available that includes a different panel of spike-in reagents for better optimization of single-color mode (34). Although this method would have been appropriate for a truly cross-platform comparison, we instead wanted to estimate error separately for each of the two Agilent channels, so we extracted each channel separately from a two-color experiment. Mean signal (MEAN) is most similar to RAW Affymetrix data, background subtracted (BSUB) is most similar to MAS5 and dChip PMMM and processed (PROCESSED) is most similar to GC-RMA and dChip PM. These three normalization steps are all found in Agilent's feature extraction output file.
Practical aspects of expression profiling
We define an expression profiling system as the array, scanner, RNA preparation techniques and the general laboratory infrastructure. Thus, when we use the term biosignature, we are really referring to the entire system that was involved in the generation of the data. Agendia's 70-gene Agilent-based MammaPrint® (24), aka the Amsterdam Signature, Veridex's 76-gene signature, aka the Rotterdam Signature, Genomic Health's 21-gene RT-PCR-based Oncotype DXTM (1,35) and a 41-gene expression set by Ahr et al. (36,37) have no gene in common, although all classify breast cancer profiles. Given zero-error measurements, a perfect signature could be found, but in reality gene-specific imprecision exists depending on the platform. In the clinic, misclassification can be potentially life threatening when false negatives predominate, and costly and uncomfortable for the patient when false positives predominate. With proper calibration and selection of platform-neutral gene expression profiles, one can expect good classification performance on a given expression platform, if one can validate biosignatures on public expression data (3840). Shyamsundar et al. (41) addressed the calibration problem by correlating fluorescence intensity to copy number using genomic DNA (present at two copies per gene) as a baseline. Although mid- and high-concentration endpoints would have been valuable, it remains a promising calibration method.
One of the most relevant tests of expression data quality is how well one can identify genes that participate in gene regulatory and metabolic networks that change between healthy and diseased samples. Cancer is often cited as the archetype of a process that redirects transcriptional signals, originally designed to maintain homeostasis, into new developmental pathways specializing in proliferation and survival (15). Classification of biological samples into distinct subtypes based only on the transcriptome is often able to predict disease progression, drug response and even survival (3,4244). Gene Ontology analysis has been shown to correlate well with changes in cellular physiology due to disease (45). Similarly, whole regulatory pathway analysis is informative when filtered for false positives. Pathway software includes Stratagene's Pathway Architect, Ingenuity's IPA, GeneGo's Metacore and open source Cytoscape, GenMapp, Kegg and Biocarta.
Three analytical methods
We propose three tests that generally provide expression data performance values; in our case we used these tests to decide which normalization method is most appropriate for the task at hand. We created an experimental design that compares three functionally different normal tissues: human liver, lung and spleen. The design was kept intentionally simple in order to facilitate this example, but note that the selection of functionally divergent tissues places a burden on normalization methods that assume minor changes in expression across samples. However we believe this design is superior for the current task than more complex designs (1117,4652). The analysis is simple enough to do without specialized software, and high quality RNA samples are readily obtained (Stratagene, La Jolla, CA). The tissue samples are normal healthy human samples rather than diseased tissues (15,5355), and provide a large range of differential expression values. These results should be comparable to the same analysis performed in any lab, which is our definition of standardized methodology. We describe each analysis in terms of the genei, tissuej and casek. The gene is the individual probe or averaged probeset targeting a single gene transcript. The tissue is one of three human commercial samples, in this case spleen, lung or liver. The case is one of three possible ratios, spleen:lung, lung:liver and spleen:liver. At least three replicates per tissue per platform were run, with three normalization methods for Agilent and five for Affymetrix. Twenty-four distinct data sets (Table 1) were analyzed.
 |
RESULTS
|
|---|
Power analysis and distributional tests (statistical)
Data was structured as follows: data sets were log
10 (intensity)
and log
2 (ratio) transformed as needed.
Figure 1 summarizes
the reproducibility and dispersion for each platform and tissue
combination across most of the twenty-four conditions. Agilent
CY3 was left out for brevity, but plots were very similar to
the CY5 data. The first three columns are the intensity replicates
(e.g. liver sample 1 versus liver sample 2) and graphically
illustrate technical variability as a function of fluorescence
intensity. Background-subtracted methods in general tended to
show the highest apparent dispersion (MAS5 and dChip PMMM)
while GC-RMA, dChip PM and most of the Agilent data showed much
less scatter. The third, fourth and fifth columns show the MvA
(BlandAltman) plots, indicating the
degree of correlation
between variance and intensity. Only the Affymetrix MAS5 and
GC-RMA data have substantial scatter, indicating a disjunction
between intensity and variance. The ratio replicate plots in
columns seven, eight and nine indicate how precisely each pair
of tissue samples can be used in ratios for each of the three
pairwise cases. MAS5 and dChip PMMM show comparatively
high scatter, indicating higher variability across replicate
ratio calculations, especially at ratios near one. The dChip
PM and RAW plots, and to a lesser extent MAS5, highlight the
problem of using either under-normalized or imperfectly estimated
mismatch data as a reliable estimate of background. The Agilent
data shows a slight trend to higher dispersion with the BSUB
and PROCESSED signals showing the impact of subtracting background.
The boxplots shown in
Figure 2 (top) indicate the relative data
spread, another graphical estimate of precision. Agilent MEAN
and Affymetrix GC-RMA and RAW show the lowest quartile ranges,
suggesting high precision. The bottom plots show the relative
compression of un-normalized signals, explaining the illusion
of precision due to the low dynamic range of near-RAW data.
Figure 3 shows the effect of normalization on hierarchical clustering
(Euclidean distance, average linkage, 1000 ANOVA-selected genes,
GeneSpring 7.2, Agilent Technologies, Palo Alto, CA). Affymetrix
data tends to form clusters based on the (relatively greater)
effect of normalization while Agilent data tends to cluster
by tissue regardless of the channel or normalization. The Venn
diagram shows the overlap of genes for each cluster experiment;
there were 699 common genes out of 1000 based on RefSeq. Precision
estimates such as these are always imperfect in some way, but
when taken together they provide a good estimate of relative
precision. Sensitivity was calculated in several ways. We first
estimated the power using normal.sample.size() in S+ or power.t.test()
in R. We computed

(the minimum detectable fold change) at an
arbitrary threshold of one potential false positive per array,
or

= 1/
Nprobes. The
p-value threshold used throughout this
article often use 1/
Nprobes, or
p = 5.3
x 10
5 for Agilent
and
p = 4.5
x 10
5 for Affymetrix. Calculations of delta
used
N = 3 replicates,
ß = 0.80 for every pairwise
gene expression value across each unique tissue
j case
k, per
platform and per normalization.
Figure 4 shows the sorted

(black
curve) calculated for each probe
i case
k with the actual ratios
between the two tissues plotted as blue bars. If abs(log
2 gene
ii/gene
ij)
>

, then gene
i is significant by definition, as indicated
by the red circles. Some circles lie below the curve

because
the significance was calculated by a
t-test using log
10 intensities
rather than the log
2 ratios in the power calculation. This is
formalized below in Equation (
1.1):
| (1.1) |
Table 2 shows the results from three methods
for calculating sensitivity. Column 1 shows the mean delta +/
the standard deviation computed by calculating power from every
possible pairwise case
k, column 2 shows the average minimum-detectable
fold-change (MDFC) across replicate measures at the 95th percentile.
Equation (
1.2) is the method for averaging delta for each case
k.
Column 3 shows the median MDFC across replicate measures at
the 95th percentile. Equation (1.3) clarifies the calculation
for delta across the
ith gene and the
kth sample where
m = 22
215 for Affymetrix and 18703 for Agilent.
| (1.2) |
| (1.3) |
Mean and median fold-change values across ratio replicates
were averaged across all case
k for all ratio calculations used
in sensitivity calculations. Sensitivity estimates correlate
well with the replicate scatterplots in
Figure 1. Agilent methods
BSUB and PROCESSED have the highest sensitivity followed by
Agilent MEAN, Affymetrix GC-RMA and dChip PM, with the worst
precision and sensitivity seen with MAS5 and dChip PMMM
normalizations. The fact that dChip PM produced better sensitivity
results than dChip PMMM is likely due to the scatter
that the mismatch subtraction causes, similar to the problem
that MAS5 has. Algorithms that use background subtraction methods
cause low-intensity imprecision when MM > PM. This effect
is manifested in MAS5 and dChip PMMM data by a minimum
detectable fold change near 2-fold, while GC-RMA and Agilent
data show 1.3-fold or less MDFC.

View larger version (37K):
[in this window]
[in a new window]
[Download PowerPoint slide]
|
Figure 1. Graphical view of precision. Intensity replicates (left three columns) are log10 scatter plots of technical replicates for each normalization and tissue. Low scatter indicate higher precision. MvA plots (center three columns) are BlandAltman charts showing variability (M = log2 (S1/S2)) as a function of the average intensity (A = log2 sqrt(S1/S2)) where S1 and S2 are the two replicate samples for each normalization and tissue. Linearity and low spread indicate high precision without intensity-sourced bias. Ratio replicates (right three columns) are log2 plots of tissue:tissue ratio replicates for each combination of tissue.
|
|

View larger version (63K):
[in this window]
[in a new window]
[Download PowerPoint slide]
|
Figure 2. Intensity plots using boxplots (top) and line-plots (bottom). Top: boxplots of each array are colored by normalization type. Top boxplots show Agilent data arranged from left to right from the CY3 and CY5 channels, respectively. Lower boxplots show Affymetrix data. Lower figures show the log10-transformed intensity values as line-plots. High intensity genes are colored red, low intensity genes are colored green. All data is log10-transformed and median normalized.
|
|

View larger version (100K):
[in this window]
[in a new window]
[Download PowerPoint slide]
|
Figure 3. Hierarchical grouping of 1000 genes selected using a Model I ANOVA for tissue differences ignoring the normalization class. Data was clustered using Euclidean distance to create the gene and experiment trees. Colored bars at the bottom of each dendrogram indicate the normalization method, tissue type or channel where appropriate. Vertical colored bars represent the Euclidean-based k-means gene clusters. Gene overlap was determined sequentially, using probename to RefSeq to HUGO Gene Symbol inside GeneSpring (translate genome function).
|
|

View larger version (50K):
[in this window]
[in a new window]
[Download PowerPoint slide]
|
Figure 4. Power calculations indicate limits of detection. The log2 ratio between the three tissues is plotted as blue bars along the X-axis. The X-axis is the probe number sorted by the calculated delta, the Y-axis is the log2 fold-change. Red circles indicate statistical significance at P < 0.00001. The black curve is each probe's delta (the minimum detectable difference expressed as a log2 ratio) calculated by computing the post-hoc power for each probe at = 0.05, ß = 0.20 and N = 3 per tissue. The lower the delta, the less difference must be seen between tissues for a ratio to be significant. Wider delta curves imply that a ratio must be large in order to reach significance. The delta curves roughly recapitulate the precision seen in Figure 1, but also provide a graphical view of the distribution and magnitude of ratios versus proportion of significant genes. GC-RMA tends to show ratios close to the calculated delta; MAS5 shows many high ratios but fewer actual significant genes, implying false positives are a concern. PM only shows good stability across the tissue replicates. The Agilent data shows a uniform distribution of high and low ratios and many significant genes, implying low false positives and due to the number of significant genes, likely low false negatives. Raw Affymetrix data has seemingly high precision but analysis shows high false negatives and ratios that often disagree in magnitude and direction with other highly correlative probes across both Affymetrix and Agilent data.
|
|
Biological interpretation (Gene Ontology)
We tested Gene Ontology functions by computing lists of genes
differentially expressed across each pair of tissues (
Table 3).
Each gene list was tested for unusual abundance using GO categories,
as calculated in GeneSpring 7.2 with corroborative results obtained
from OntoExpress (
56). Nearly identical results were obtained
across the Agilent normalizations (columns 3, 5 and 7), less
so among the Affymetrix normalizations, with dChip PM identifying
functions that are quite unique. MAS5 and GC-RMA showed the
greatest similarity to the Agilent results, suggesting that
differentially expressed genes identified using GC-RMA and the
Agilent samples led to a common biological interpretation. Subsequently,
we wanted to see the extent of overlap given a common set of
genes across the two platforms. We converted probe name to RefSeq,
then to Hugo Gene Symbol, then to HUGO gene name and selected
the intersection between the two platforms. We also used GeneSpring's
Translate Genome function, and obtained a similar overlap. Using
this common genome of probes, we selected the 1000 most significant
genes from a Model I ANOVA (
Figure 5). The highest overlap across
the two platforms exists between Affymetrix dChip PMMM
and Agilent PROCESSED (243 genes out of 1000,
Figure 5G) which,
given the precision results, was a little surprising. Overall
the overlap among MAS5, PMMM and RAW (127 genes,
Figure 5K)
is higher than across dChip PM and GC-RMA (39 genes,
Figure 5I).
The Agilent normalizations were very similar to each other,
with MEAN having the highest unique set of genes (288,
Figure 5B)
among the three normalizations. An interesting finding is the
relatively high overlap between the Affymetrix background subtraction
methods (dChip PMMM and MAS5) versus the Agilent data
(
Figure 5C). In contrast, the more precise measures of dChip
PM and GC-RMA versus the Agilent data (
Figure 5I) showed very
little overlap, again suggesting that the most aggressive and
platform-specific normalizations improved precision at the cost
of accuracy. The highest overlap between GO functions was found
between MAS5 or dChip PMMM and Agilent PROCESSED, again
suggesting that high Type I error may not affect a GO analysis
as dramatically as Type II errors. Using more detailed GO nodes
did not clarify the differences between our normalizations,
nor did it change the rank of besttoworst. We
feel this functional analysis is suitable as a 10 000 foot view
of biological consistency. However, we wished to examine another
biological analysis, and GenMapp, Biocarta, Kegg and Cytoscape
all yield sufficient discrimination to quantify biological differences
based on gene lists. We performed pathway analysis of 100 significant
genes from each list (
Table 4) using
http://. Interestingly,
once again we see that MAS5 and to a lesser extent dChip PMMM
match the Agilent data well, with Affymetrix RAW consistently
identifying pathways outside consensus. By comparing the pathways
from
Table 4, we find that the pathways tend to validate the
GO analysis from a different biological and mathematical perspective.

View larger version (54K):
[in this window]
[in a new window]
[Download PowerPoint slide]
|
Figure 5. Overlap between Agilent and Affymetrix data. Using a Model I ANOVA we identified 1000 genes that are most differentially expressed across the three tissues tested. This analysis identifies the influence of normalization on the amount of overlap. (A) shows the most unmodified data (MEAN and RAW) versus a strong background subtraction method (MAS5). (B) is a comparison among the Agilent normalization methods. (C) and (D) compare highly processed Affymetrix data with Agilent methods. (E) and (F) compare four Affymetrix normalization methods to RAW data. (G) and (L) show the highest Affymetrix/Agilent overlaps occur between PROCESSED or BSUB and PM-MM normalizations. (H), (I), (J) and (K) illustrate the various overlaps between and among Agilent and Affymetrix normalizations.
|
|
View this table:
[in this window]
[in a new window]
|
Table 3. Gene Ontology analysis of genes selected by t-test at p < 5.3 x 105 for Agilent and p < 4.5 x 105 for Affymetrix
|
|
Feature selection and classification (Error based)
We demonstrate how feature selection and classification can
be compromised by comparing classifier error rates across platforms
and normalizations (
57). We used a two-feature sequential forward
floating search (
58,
59) with bolstering error estimation to
score the feature sets, and linear discriminant analysis (LDA)
as the classification rule (
60). Overall error was estimated
using cross validation with 500 replicates to reduce internal
variability. Initially, we applied the selection routine to
whole data sets containing the full complement of genes, obtaining
in all cases zero misclassification error. In order to introduce
some variability, we iteratively removed 500 of the most significant
(by
t-test) probes until less than 500 probes remained for both
platforms; removal was done within the cross-validation step
to reduce error. In
Figure 6, we show the error rates per normalization
and per case for lung:spleen, liver:spleen and liver:lung, and
in
Table 4 we compute the area under each curve as a relative
rank of error. The Y-axis is the classifier error; the X-axis
is the percentage of probes removed per iteration. In all cases
the trends are generally consistent; Agilent data (dashed lines)
are generally below the dChip PMMM and RAW Affymetrix
normalization methods, and are similar to GC-RMA. It is likely
that a rapid increase in error indicates that the best predictive
genes were removed fairly quickly, implying that good predictive
features are not necessarily those with high statistical significance.
Another characteristic of this group is the instability in error
after

40% of the most significant probes were removed. The error
rate for MAS5 shows a linear increase in error suggesting that
this gene list contains features that contribute evenly to classification,
whereas other groups rise and fall quite suddenly. This variability
in error is likely not due to cross-validation since we performed
500 replicates, sufficient to converge to a stable error estimate.
This instability likely results from the disconnect between
a classifier error and the distributional tests we used in the
removal step. A random removal method with more replication
might have yielded a better estimate of error, but the computation
time would be excessive. The areas under the curve (
Table 5,
columns 2, 4, 6) show Agilent MEAN data to be marginally better
than PROCESSED and BSUB, but the confidence intervals overlap
indicating that these three normalizations are equivalent. MAS5
and RAW tended to show the highest Affymetrix error while GC-RMA
showed the lowest, again reflecting improvements caused by technical
precision, but also on bias, since the RAW data was much more
precise than the MAS5 data. The percent of total genes that
are significant at
p < 5.3
x 10
5 for Agilent and
p < 4.5
x 10
5 for Affymetrix reflects the pool of genes
tested in the classifier. The Affymetrix RAW data which is known
to be biased also contains many significant genes, showing that
our classifier is not compromised by inaccurate and biased signals.
The RAW classification resulted in high error, seen in
Figure 6.
GC-RMA had lower misclassification than any group or platform,
but we were less convinced that this was the best normalization
scheme for these tissues since the GO and pathway GC-RMA results
differed from consensus. We wanted to determine the probe position
for the best and worst correlated probes for the best normalizations
for classifier error: GC-RMA and PROCESSED (
12,
18). We sorted
the probes for the best and worst correlation across Agilent's
CY5 PROCESSED data and Affymetrix's GC-RMA data for liver and
spleen. We determined the probe location by identifying the
probe sequence (or exemplar) on Human Build 36 using BLAT. In
nearly all of the best and worst correlated cases, discrepancy
occurred when the probes were physically separated (
Figure 7),
but the degree to which this was the case varied. Within an
Affymetrix probeset, physical distance often resulted in poor
intra-probe correlation as well.

View larger version (28K):
[in this window]
[in a new window]
[Download PowerPoint slide]
|
Figure 6. Classifier error rates for tissue comparisons for Agilent and Affymetrix platforms and the associated normalizations. For each iteration, 500 of the most significantly differentially expressed genes were removed until less than 500 genes remained. A two-feature forward floating search with bolstering error estimation scored the features, linear discriminant analysis was the classifier rule. Overall error was estimated using cross validation with 500 replicates. (A) shows the lung versus spleen error rates. (B) shows the liver versus spleen and (C) the liver versus lung error. Dashed lines in all cases correspond to the Agilent normalization methods, solid lines correspond to the Affymetrix normalizations. Area under the curve was used to establish the rank order.
|
|

View larger version (33K):
[in this window]
[in a new window]
[Download PowerPoint slide]
|
Figure 7. Probe distance comparisons. Probe location for the 11 Affymetrix 25-mers and the single Agilent 60-mer are plotted along the target gene on the X-axis. Color in this case indicates the average log2 ratio between liver and spleen for two single normalizations, GC-RMA (Affymetrix) and PROCESSED CY5 (Agilent). Other normalizations and tissues produced similar results. Red indicates high relative signal in liver, green indicates high relative signal in spleen. Length of the probe is proportional to the amount of gene sequence shown in the diagram, which in turn is defined by the distance between the most distant probes. Blue triangles indicate introns; numbers along the bottom of each graph indicate the amount of gene up- and downstream of the current window. Y-axis (temp) is the Tm for each probe calculated in standard salt conditions. Left column contains genes that correlated well across Agilent CY5 PROCESSED and Affymetrix GC-RMA. Right column contains genes with poor correlation. Other normalization/tissue combinations produced lists of different genes that were either well or poorly correlated, but the pattern seen here was conserved.
|
|
View this table:
[in this window]
[in a new window]
|
Table 5. Area under the error curves (Figure 6) and the corresponding proportion of significant genes at p < 5.3 x 105 for Agilent and p < 4.5 x 105 for Affymetrix (called% <Pcrit), N = 3 samples
|
|
 |
MATERIALS AND METHODS
|
|---|
Commercial RNA from Stratagene (La Jolla, CA; liver #540017,
lung #540019, spleen #540187) enabled us to minimize variability
in RNA quality. We ran each set of replicates on the same day
and in the same laboratory, and followed the manufacturer's
hybridization and scanning protocols precisely. We used three
pooled human tissues
j (liver, lung, spleen) and all three pairwise
cases
k using three normalization methods for Agilent and five
for Affymetrix yielding the twenty-four separate measurements
per gene
i (
Table 1). Affymetrix data was processed using default
values in Microarray Suite 5.0; Affy data had low noise (RawQ
< 15), low background (<600) and low 3' to 5' ratio of
actin and GAPDH (ratio <2). Agilent arrays were scanned on
an Agilent scanner and processed using default values in Feature
Extraction version 8.1. Raw Affy Images were processed using
default settings, resulting in .CHP and .CEL files. CEL files
were used to generate MAS5, dChip, GC-RMA and RAW data using
AffyComp package in Bioconductor. dChip can use or ignore MM
data when building its model; we selected both PM and PMMM
settings, and invariant set normalization in dChip
2006 (February 16, 2006 build). GC-RMA requires that an entire
data set (experiment) be defined in order to estimate a grand
mean and variance estimate, so we included all tissues as a
defined experiment. All Agilent normalizations were performed
using default settings in Feature Extraction (
61). Expression
data was loaded into GeneSpring 7.2 to perform median normalization
on 22215 probes for Affymetrix and 18703 probes for Agilent.
Clustering, ANOVA,
t-tests and Venn diagrams were all done in
GeneSpring.
t-tests always used a homoscedastic Welch correction
with no familywise error rate (FWER) correction. Significance
level was set at the critical values of
p < 5.3
x 10
5 for Agilent and
p < 4.5
x 10
5 for Affymetrix, which
is that value where one false positive is expected. Ratio calculations,
power analysis, regression and other statistical calculations
were done in
S + 7.0.4. Feature selection and classification
were done on custom C and C++ programs run in parallel using
MPI messaging on an IBM 1350 Linux cluster running RedHat EL3.
Each of the 512 nodes contained 2 Intel Xeon 2.4 GHz CPUs with
4G RAM. Processing time ranged between 1 and 30 h depending
on the number of genes used per analysis and cross-validation
method.
 |
DISCUSSION
|
|---|
Intra-lab and intra-platform correlation and calibration can
optimize data quality and reduce lab- and platform-dependent
biases. In industrial Six Sigma Quality Control, the most influential
parameters affecting process quality are identified to reduce
faults in order of importance. In the case of expression data,
poorly correlated data is often caused by RNA quality. This
is prevalent even given the differences in probe location (
Figure 7)
and platform idiosyncrasies. Array users may be unable to obtain
the advertised performance figures for a commercial microarray
due to difficult-to-extract tissues, such as plant cells. We
propose that precision, power and pathway analysis can pinpoint
samples that lie outside a consensus, especially in large experiments
or with public data. Clustering has seen a backlash against
graphical interpretation of data, but taken in context and with
an understanding of the limitations, it presents array data
in a richly informative way. Degraded RNA causes signal compression
and high background which show clearly in clustering analysis.
Power and sample size calculations also pinpoint degraded RNA
or poor labeling by showing greatly reduced sensitivity and
delta values.
Classification has become a much-used method in disease prognosis and diagnosis (62); it is therefore important to understand the causes of misclassification. Microarray normalization methods, especially loess (63) and model-based (28), often cause large non-linear changes that attempt to improve the reliability of measuring relative differences across samples (64). High precision methods like GC-RMA can affect the classifier, resulting in very low error, but classifiers are less affected by highly biased data than significance tests. As seen in Figure 3, highly aggressive normalizations combined with very differential tissues, can cause mis-clustering. However, genes identified as either up or down between tissues across normalization methods can be quite comparable if one quantizes to the level of up, down and unchanged by using the appropriate confidence interval. Agilent data is almost unaffected by channel and normalization effects, but the normalizations were much more subtle than Affymetrix methods. Normalized expression data often exaggerates the magnitude of ratios and inflates false positives over comparable qRT-PCR data (54,65). That effect alone will change the rank of genes, and will change the biological pathways identified (Table 4). It is increasingly difficult to identify biomarkers that work independently of the platform (44,55,66,67), but appropriate normalization choice may ameliorate this effect somewhat. Affymetrix MAS5 and Agilent MEAN share 256 genes, MAS5 and BSUB share 261 genes and PMMM and PROCESSED share 243 genes, the highest overlap between platforms. These low-precision but high-accuracy methods, while often underpowered, can also provide genes that are more platform-neutral. Although the background subtraction methods generally provide the highest false positives, their conservative nature tends to avoid strong and potentially inaccurate biases (Tables 3 and 4). Based on these outcomes, we recommend MAS5 or dChip PMMM and Agilent PROCESSED normalizations for feature selection and classification, and for biological pathway analysis, especially when identifying platform-neutral biosignatures. If comparisons across laboratories or expression platforms will be done, the most conservative estimate of Affymetrix data is best. We caution the user that the power of detection drops considerably with MAS5 and dChip PMMM, and more technical replicates should be used to obtain the same detection limit as GC-RMA or dChip. Most public expression databases provide the MAS5-normalized data (e.g. the SOFT file format from GEO), but increasingly the .CEL files are being made available. We recommend GC-RMA normalization when large data sets are used, high sensitivity is needed, and samples are not terribly different from one another. GC-RMA provides a good signal that has been shown to have good sensitivity and accuracy in the context of distinguishing disease subtypes or other subtle phenotypes. When a moderate-to-small number of samples are used, dChip PM is an excellent choice since it strikes the best compromise between variance reduction methods and background subtraction methods. If single-color analysis is needed, extracting one of the two Agilent channels works well, but Agilent recognized the need for a single-color product and now offers one-channel protocols.
In Figure 7, we show the relationship between probe distance and the correlation between liver:spleen ratios between Agilent CY5 (PROCESSED) and Affymetrix (GC-RMA). In general, the best correlation occurred when the probes were relatively close to one another, the worst correlations occurred when the probes were distant, an effect previously reported (12,18,20). This effect actually occurs within a probeset on the Affymetrix platform, but the effect is not as pronounced. This effect is easy to measure since the probe sequences for these arrays are available from the manufacturer. When contrasting qRT-PCR and array data, one should carefully design RT primers that are uniformly spaced across the gene, rather than a single probe in the same location as the microarray. This principle reveals array limitations, but also gives the best RT results.
In summary, we provide three simple, qualitative methods of analysis to identify discrepancy in expression data sets. Precision and sensitivity measurements are useful in finding the minimal detectable fold-change and raw performance values for an array platform (or qRT-PCR). Biological comparisons such as the Gene Ontology and pathway analyses are a valuable way of examining and comparing the actual biological interpretation. Differences in pathways indicate consistency problems. This inconsistency can be quantified by counting the differentially expressed genes between platforms that move in different directions. Finally, classifier error provides a way of identifying misleading transcriptional signals. When sufficiently large numbers of informative genes exist, one can identify a platform-neutral set of genes that provide both low error across multiple platforms and low classifier error by utilizing the selection criteria mentioned above. Taken together, precision, biological interpretation and multiple platform data sets will allow better selection of genes that yield clinically useful biosignatures
 |
ACKNOWLEDGEMENTS
|
|---|
This project has been funded in whole or in part with Federal
funds from the National Institute of Allergies and Infectious
Diseases, National Institutes of Health, Department of Health
and Human Services, under Contract No. HHSN266200500040C.
Conflict of interest statement. None declared.
 |
REFERENCES
|
|---|
- Cobleigh MA, Tabesh B, Bitterman P, Baker J, Cronin M, Liu ML, Borchik R, Mosquera JM, Walker MG, et al. Tumor gene expression and prognosis in breast cancer patients with 10 or more positive lymph nodes. Clin. Cancer Res (2005) 11:86238631.[Abstract/Free Full Text]
- Weigelt B, Hu Z, He X, Livasy C, Carey LA, Ewend MG, Glas AM, Perou CM, van 't Veer LJ. Molecular portraits and 70-gene prognosis signature are preserved throughout the metastatic process of breast cancer. Cancer Res (2005) 65:91559158.[Abstract/Free Full Text]
- van't Veer LJ, Dai H, van de Vijver MJ, He YD, Hart AA, Mao M, Peterse HL, van der Kooy K, Marton MJ, et al. Gene expression profiling predicts clinical outcome of breast cancer. Nature (2002) 415:530536.[CrossRef][Medline]
- van de Vijver MJ, He YD, van't Veer LJ, Dai H, Hart AA, Voskuil DW, Schreiber GJ, Peterse JL, Roberts C, et al. A gene-expression signature as a predictor of survival in breast cancer. N. Engl. J. Med (2002) 347:19992009.[Abstract/Free Full Text]
- Kuo WP, Jennssen T-K, Butte AJ, Ohno-Machado L, Kohane IS. Analysis of matched mRNA measurements from two different microarray technologies. Bioinformatics (2002) 18:405412.[Abstract/Free Full Text]
- Shippy R, Sendera TJ, Lockner R, Palaniappan C, Kaysser-Kranich T, Watts G, Alsobrook J. Performance evaluation of commercial short-oligonucleotide microarrays and the impact of noise in making cross-platform correlations. BMC Genomics (2004) 5. doi:10.1186/1471-2164-11851161.
- Yauk CL, Berndt ML, Williams A, Douglas GR. Comprehensive comparison of six microarray technologies. Nucleic Acids Res (2004) 32. doi:10.1093.
- Park PJ, Cao YA, Lee SY, Kim J-W, Chang MS, Hart R, Choi S. Current issues for DNA microarrays: platform comparison, double linear amplification, and universal RNA reference. J. Biotechnol (2004) 112:225245.[CrossRef][Web of Science][Medline]
- Parmigiani G, Garrett E, Anbazhagan R, Gabrielson E. Molecular classification of lung cancer: a cross-platform comparison of gene expression data sets. Chest (2004) 125:103S.[CrossRef][Web of Science][Medline]
- Parmigiani G, Garrett-Mayer ES, Anbazhagan R, Gabrielson E. A cross-study comparison of gene expression studies for the molecular classification of lung cancer. Clin. Cancer Res (2004) 10:29222927.[Abstract/Free Full Text]
- Pan F, Ma X, Zhou XJ. Integrative differential gene expression analysis for cross-platform microarray datasets. J. Biomed. Inform (2006) 22:16651667.
- Kuo WP, Liu F, Trimarchi J, Punzo C, Lombardi M, Sarang J, Whipple ME, Maysuria M, Serikawa K, et al. A sequence-oriented comparison of gene expression measurements across different hybridization-based technologies. Nat. Biotechnol (2006) 24:832840.[CrossRef][Web of Science][Medline]
- Jong K, Marchiori E, van der Vaart A, Chin SF, Carvalho B, Tijssen M, Eijk PP, van den Ijssel P, Grabsch H, et al. Cross-platform array comparative genomic hybridization meta-analysis separates hematopoietic and mesenchymal from epithelial tumors. Oncogene (2006) doi: 10.1038/sj.onc.1209919.
- Guo L, Lobenhofer EK, Wang C, Shippy R, Harris SC, Zhang L, Mei N, Chen T, Herman D, et al. Rat toxicogenomic study reveals analytical consistency across microarray platforms. Nat. Biotechnol (2006) 24:11621169.[CrossRef][Web of Science][Medline]
- Warnat P, Eils R, Brors B. Cross-platform analysis of cancer microarray data improves gene expression based classification of phenotypes. BMC Bioinformatics (2005) 6:265.[CrossRef][Medline]
- Shi L, Tong W, Fang H, Scherf U, Han J, Puri RK, Frueh FW, Goodsaid FM, Guo L, et al. Cross-platform comparability of microarray technology: intra-platform consistency and appropriate data analysis procedures are essential. BMC Bioinformatics (2005) 6(Suppl 2):S12.
- Schlingemann J, Habtemichael N, Ittrich C, Toedt G, Kramer H, Hambek M, Knecht R, Lichter P, Stauber R, et al. Patient-based cross-platform comparison of oligonucleotide microarray expression profiles. Lab. Invest (2005) 85:10241039.[CrossRef][Web of Science][Medline]
- Mecham BH, Klus GT, Strovel J, Augustus M, Byrne D, Bozso P, Wetmore DZ, Mariani TJ, Kohane IS, et al. Sequence-matched probes produce increased cross-platform consistency and more reproducible biological results in microarray-based gene expression measurements. Nucleic Acids Res (2004) 32:e74.[Abstract/Free Full Text]
- Hardiman G. Microarray platformscomparisons and contrasts. Pharmacogenomics (2004) 5:487502.[CrossRef][Web of Science][Medline]
- Carter SL, Eklund AC, Mecham BH, Kohane IS, Szallasi Z. Redefinition of Affymetrix probe sets by sequence overlap with cDNA microarray probes reduces cross-platform inconsistencies in cancer-associated gene expression measurements. BMC Bioinformatics (2005) 6:107.[CrossRef][Medline]
- Canales RD, Luo Y, Willey JC, Austermiller B, Barbacioru CC, Boysen C, Hunkapiller K, Jensen RV, Knight CR, et al. Evaluation of DNA microarray results with quantitative gene expression platforms. Nat. Biotechnol (2006) 24:11151122.[CrossRef][Web of Science][Medline]
- Larkin JE, Frank BC, Gavras H, Sultana R, Quackenbush J. Independence and reproducibility across microarray platforms. Nat. Methods (2005) 2:337343.[CrossRef][Web of Science][Medline]
- Shi L, Reid LH, Jones WD, Shippy R, Warrington JA, Baker SC, Collins PJ, de Longueville F, Kawasaki ES, et al. The MicroArray Quality Control (MAQC) project shows inter- and intraplatform reproducibility of gene expression measurements. Nat. Biotechnol (2006) 24:11511161.[CrossRef][Web of Science][Medline]
- Tong W, Lucas AB, Shippy R, Fan X, Fang H, Hong H, Orr MS, Chu TM, Guo X, et al. Evaluation of external RNA controls for the assessment of microarray performance. Nat. Biotechnol (2006) 24:11321139.[CrossRef][Web of Science][Medline]
- Model F, Konig T, Piepenbrock C, Adorjan P. Statistical process control for large scale microarray experiments. Bioinformatics (2002) 18:155S163S.[Abstract]
- Adolphson A, Keighley T, Field J, Fisher N, Israel S, Phatak A, Saunders I, Sparks R, Veevers A. Commonwealth Scientific and Industrial Research Organization. (2002) Australia: Mathematical and Information Sciences.
- Crosier RB. A new two-sided cumulative sum quality control scheme. Technometrics (1986) 28:187194.[CrossRef][Web of Science]
- Li C, Wong WH. Model-based analysis of oligonucleotide arrays: expression index computation and outlier detection. Proc. Nat. Acad. Sci. U.S.A (2001) 98:3136.[Abstract/Free Full Text]
- Wu Z, Irizarry RA, Gentleman R, Murillo FM, Spencer F. A model based background adjustment for oligonucleotide expression arrays. In: Working Paper 1 (2004) Johns Hopkins University, Dept. of Biostatistics Working Papers.
- Irizarry RA, Warren D, Spencer F, Kim IF, Biswal S, Frank BC, Gabrielson E, Garcia JG, Geoghegan J, et al. Multiple-laboratory comparison of microarray platforms. Nat. Methods (2005) 2:329330.[CrossRef][Web of Science][Medline]
- Irizarry RA, Hobbs B, Collin F, Beazer-Barclay YD, Antonellis KJ, Scherf U, Speed TP. Exploration, normalization, and summaries of high density oligonucleotide array probe level data. Biostatistics (2003) 4:249264.[Abstract]
- Irizarry RA, Bolstad BM, Collin F, Cope LM, Hobbs B, Speed TP. Summaries of Affymetrix GeneChip probe level data. Nucleic Acids Res (2003) 31:e15.[Abstract/Free Full Text]
- Irizarry RA, Wu Z. Affycomp ver. 1.4.3. Bioconductor package for R. (2006).
- Xiao J, Lucas AB, D'Andrade PN, Visitacion M, Tangvoranuntakul P, Fulmer-Smentek SB. Performance of the Agilent microarray platform for one-color analysis of gene expression. Agilent Technical Note (2006) www.chem.agilent.com/temp/Rad607C5/00057115.pdf.
- Hornberger J, Cosler LE, Lyman GH. An economic analysis of targeting chemotherapy using a 21-gene RT-PCR assay in lymph-node-negative, estrogen-receptor-positive, early-stage breast cancer. Am. J. Manag. Care (2005) 11:313324.[Web of Science][Medline]
- Ahr A, Holtrich U, Solback C. Molecular classification of breast cancer patients by gene expression profiling. J. Pathol (2001) 195:312320.[CrossRef][Web of Science][Medline]
- Ahr A, Kam T, Solback C. Identification of high risk breast-cancer patients by gene expression profiling. Lancet (2002) 359:131132.[CrossRef][Web of Science][Medline]
- Detours V, Durmont JE, Bersini H, Maenhaur C. Integration and cross-validation of high-throughput gene expression data: comparing heterogenous data sets. FEBS Lett (2003) 546:98102.[CrossRef][Web of Science][Medline]
- Griffith OL, Pleasance ED, Fulton DL, Oveisi H, Ester M, Siddiqui AS, Jones SJM. Assessment and integration of publicly available SAGE, cDNA microarray, and oligonucleotide microarray expression data for global coexpression analyses. Genomics (2005) 86:476488.[CrossRef][Web of Science][Medline]
- Ghosh D, Barette TR, Rhodes DR, Chinnaiyan AM. Statistical issues and methods for meta-analysis of microarray data: a case study in prostate cancer. Funct. Integr. Genomics (2004) 3:180188.[CrossRef]
- Shyamsundar R, Kim YH, Higgins JP, Montgomery K, Jorden M, Sethuraman A, van de Rijn M, Botstein D, Brown PO, et al. A DNA microarray survey of gene expression in normal human tissues. Genome Biol (2005) 6:r22.21r22.29.
- Lee Y, Lee CK. Classification of multiple cancer types by multicategory support vector machines using gene expression data. Bioinformatics (2003) 19:11321139.[Abstract/Free Full Text]
- Culhane AC, Perriere G, Considine EC, Cotter TG, Higgins DG. Between-group analysis of microarray data. Bioinformatics (2002) 18:16001608.[Abstract/Free Full Text]
- Cho J-H, Lee D, Park JH, Kim K, Lee I-B. Optimal approach for classification of acute leukemia subtypes based on gene expression data. Biotechnol. Prog (2002) 18:847854.[CrossRef][Medline]
- Hvidsten TR, Laegreid A, Komorowski J. Learning rule-based models of biological process from gene expression time profiles using gene ontology. Bioinformatics (2003) 19:11161123.[Abstract/Free Full Text]
- He YD, Dai H, Schadt EE, Cavet G, Edwards SW, Stepaniants SB, Duenwald S, Kleinhanz R, Jones AR, et al. Microarray standard data set and figures of merit for comparing data processing methods and experiment designs. Bioinformatics (2003) 19:956965.[Abstract/Free Full Text]
- Rise ML, Douglas SE, Sakhrani D, Williams J, Ewart KV, Rise M, Davidson WS, Koop BF, Devlin RH. Multiple microarray platforms utilized for hepatic gene expression profiling of GH transgenic coho salmon with and without ration restriction. J. Mol. Endocrinol (2006) 37:259282.[Abstract/Free Full Text]
- Edelman E, Porrello A, Guinney J, Balakumaran B, Bild A, Febbo PG, Mukherjee S. Analysis of sample set enrichment scores: assaying the enrichment of sets of genes for individual samples in genome-wide expression profiles. Bioinformatics (2006) 22:e108e116.[Abstract/Free Full Text]
- Chudin E, Kruglyak S, Baker SC, Oeser S, Barker D, McDaniel TK. A model of technical variation of microarray signals. J. Comput. Biol (2006) 13:9961003.[CrossRef][Web of Science][Medline]
- Xu L, Tan AC, Naiman DQ, Geman D, Winslow RL. Robust prostate cancer marker genes emerge from direct integration of inter-study microarray data. Bioinformatics (2005) 21:39053911.[Abstract/Free Full Text]
- Culhane AC, Perriere G, Higgins DG. Cross-platform comparison and visualisation of gene expression data using co-inertia analysis. BMC Bioinformatics (2003) 4:59.[CrossRef][Medline]
- Bolstad BM, Irizarry RA, Astrand M, Speed TP. A comparison of normalization methods for high density oligonucleotide array data based on variance and bias. Bioinformatics (2003) 19:185193.[Abstract/Free Full Text]
- Rhodes DR, Yu J, Shanker K, Deshpande N, Varambally R, Ghosh D, Barrette T, Pandey A, Chinnaiyan AM. Large-scale meta-analysis of cancer microarray data identifies common transcriptional profiles of neoplastic transformation and progression. Proc. Nat. Acad. Sci. U.S.A (2004) 101:93099314.[Abstract/Free Full Text]
- Li J, Spletter ML, Johnson JA. Dissecting tBHQ induced ARE-driven gene expression through long and short oligonucleotide arrays. J. Physiol. Genom (2005) 21:4358.[CrossRef]
- Bloom G, Yang IV, Boulware D, Kwong KY, Coppola D, Eschrich S, Quackenbush J, Yeatman TJ. Multi-platform, multi-site, microarray-based human tumor classification. Am. J. Pathol (2004) 164.
- Draghici S. (2006) vortex.cs.wayne.edu/Projects.htm#Onto-Express.
- Parrish RS, Spencer HJ. Effect of normalization on significance testing for oligonucleotide microarrays. J. Biopharm. Stat (2004) 14:575589.[CrossRef][Medline]
- Sima C, Dougherty ER. What should be expected from feature selection in small-sample settings. Bioinformatics (2006) 22:24302436.[Abstract/Free Full Text]
- Xiong M, Fang X, Zhao J. Biomarker identification by feature wrappers. Genome Res (2001) 11:18781887.[Abstract/Free Full Text]
- Hua J, Xiong Z, Lowey J, Suh E, Dougherty ER. Optimal number of features as a function of sample size for various classification rules. Bioinformatics (2005) 21:15091515.[Abstract/Free Full Text]
- Dobbin K, Kawasaki E, Petersen D, Simon R. Characterizing dye bias in microarray experiments. Bioinformatics (2005) 21:24302437.[Abstract/Free Full Text]
- Bittner ML, Meltzer MR, Chen Y, Jiang Y, Seftor E, Hendrix M, Radmacher M, Simon R, Yakhini Z, et al. Molecular classification of cutaneous malignant melanoma by gene expression profiling. Nature (2000) 406:536540.[CrossRef][Medline]
- Wolber PK, Shannon KW, Fulmer-Smentek SB, Collins PJ, Lenkov K, Troup CD, Connell SD, Ghosh S, D'Andrade PN, et al. Robust local normalization of gene expression microarray data. Agilent Technical Note (2002) 14.
- Fan W, Pritchard JI, Olsen JM, Khalid N, Zhao LP. A class of models for analyzing GeneChip gene expression analysis array data. BMC Genomics (2005) 6. doi:10.1186/1471-2164-11861116.
- de Reynies A, Geromin D, Cayuela J-M, Petel F, Dessen P, Sigaux F, Rickman DS. Comparison of the latest commercial short and long oligonucleotide microarray technologies. BMC Genomics (2006) 7. doi:10.1186/1471-2164-11871151.
- Ben-Dor A, Bruhn L, Friedman N, Nachman I, Schummer M, Yakhini Z. Tissue classification with gene expression profiles. J. Comput. Biol (2000) 7:559583.[CrossRef][Web of Science][Medline]
- Dobbin K, Simon R. Comparison of microarray designs for class comparison and class discovery. Bioinformatics (2002) 18:14381445.[Abstract/Free Full Text]

CiteULike
Connotea
Del.icio.us What's this?
This article has been cited by other articles:

|
 |

|
 |
 
K. Kultima, A. Nilsson, B. Scholz, U. L. Rossbach, M. Falth, and P. E. Andren
Development and Evaluation of Normalization Methods for Label-free Relative Quantification of Endogenous Peptides
Mol. Cell. Proteomics,
October 1, 2009;
8(10):
2285 - 2295.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
Y.-S. Lee, C.-H. Chen, C.-N. Tsai, C.-L. Tsai, A. Chao, and T.-H. Wang
Microarray labeling extension values: laboratory signatures for Affymetrix GeneChips
Nucleic Acids Res.,
May 1, 2009;
37(8):
e61 - e61.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
A. Kopp, O. Barmina, A. M. Hamilton, L. Higgins, L. M. McIntyre, and C. D. Jones
Evolution of Gene Expression in the Drosophila Olfactory System
Mol. Biol. Evol.,
June 1, 2008;
25(6):
1081 - 1092.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
H. Koltai and C. Weingarten-Baror
Specificity of DNA microarray hybridization: characterization, effectors and approaches for data correction
Nucleic Acids Res.,
April 1, 2008;
36(7):
2395 - 2405.
[Abstract]
[Full Text]
[PDF]
|
 |
|