SourceIdentificationof Volatile
Organic Compoundsin Houston,
Texas
WEIXIANG ZHAO,
†
PHILIP K. HOPKE,*
,†
AND THOMAS KARL
‡
Department of Chemical Engineering, Clarkson University,
Box 5708, Potsdam, New York 13699-5708, and The National
Center for Atmospheric Research, Atmospheric Chemistry
Division, P.O. Box 3000, Boulder, Colorado 80307
The complexity of the volatileorganic compound (VOC)
mixture in the Houston area makes studies of the air quality
in that area very challenging. In this paper, a novel
factor analysis model, where the normal chemical mass
balance model was augmented by a parallel equation that
accounted for wind speed and direction, temperature,
and weekend/weekday effects, was fitted with a multilinear
engine (ME) to provide identification and apportionment
of the VOC sources at the La Porte Municipal Airport site
in Houston during the Texas Air Quality Study (TexAQS)
2000. The analysis determined the profiles and contributions
of nine sources and the corresponding wind speed,
wind direction, temperature, and weekend factors. The
reasonableness of these results not only suggests the high
resolving power of the expanded factor analysis model
for source apportionment but also provides the novel and
effective auxiliary information for more specific source
identification. In addition, a new approach to estimate the
measurement uncertainty and the details of determining
the source number and dealing with missing values are also
presented as important parts of the data analysis process.
This study demonstrates the feasibility of the expanded
model to identify sources in complex VOC systems and
extract useful information for locating VOC emitters
and controlling their emissions in the Houston area.
Introduction
Volatile organiccompounds (VOC) are organic chemicals
that easily vaporize at room temperature. Many VOCs have
been found to have adverse effects on air quality and human
health (1). For example, long time exposure to benzene will
increase the risk of leukemia, and reactive VOCs such as
primary olefinsare importantin theformation oftropospheric
ozone. However, motor vehicle exhaust, chemical manu-
facturing, paints, solvents, biogenic emissions, and many
other sourcescreate exposureto VOCs.Because identification
of the potential sources of VOCs is a prerequisite for
controlling VOCs’ emissions and protecting air quality and
public health, it has been paid more and more attention (2).
To identify the number of sources and their profiles,
receptor modelsare widelyused (3, 4).There aretwo principal
approaches to receptor modeling. If the number and the
profiles of sources are known, chemical mass balance (CMB)
can be used to estimate the contribution of each source to
the pollution (5) where regression methods are used to
provide quantitative results. However, in manycases, source
information is unknown a priori, so factor analysis (multi-
variate analysis) needs to be used to extract the sources
information.
Hopke and co-workers (6), Heidam (7), Henry (8), and
Barrie and Barrie (9) applied principal component analysis
(PCA) to source identification, but Paatero and Tapper (10,
11) showed that PCAcannot provide a true minimal variance
solution since they are based on an incorrect weighting. In
view of the limitations of PCA, a new technique, positive
matrix factorization (PMF), was developed for sources
identification and apportionment (12). The distinct advan-
tages of PMF over PCA are that non-negative constraints are
built in PMF models and PMF does not rely on the
information from the correlation matrix but utilizes a point-
by-point least-squaresminimization scheme (12). It has been
reported (13) that the source profiles produced by PMF are
better andmore reasonable at describing thesource structure
than those by PCA. Over the past few years, PMF has been
applied to a number of particle composition data sets (e.g.,
14, 15).
Recently, the PMF analysis can be expanded by using a
more general model (16), and a new analysis tool called the
multilinear engine (ME) was developed (17) to solve such
problems. ME is very flexible and provides a general
framework for fitting any of the multilinear model (18, 19),
so it becomes possible to obtain not only the sourcesprofiles
but also other interesting parametric factors that may be
important forsource identification andpollution control and
planning. Forexample, winddirectional information can help
locate the potential sources. It was reported (16, 19) that in
some cases the expanded factor models could determine
more sources than PMF.
The coexistingsystem of VOCs is complex. A small change
in environmental conditions (e.g., temperature) may result
in changes on VOC concentrations, and also some VOCs
may be involved in chemical reactions during the trans-
portation. Meanwhile, as a consequence of high density of
petroleum refineries, synthetic organic chemicalplants, and
various mobilesources, the formation rate andconcentration
of ozonein the Houston area areextremely high, and propene
becomes a dominant reactive hydrocarbon (20). The specific
VOC mixture in the Houston area represents a specific air
quality problem compared with other metropolitans (21),
which makes studies of the air quality in Houston very
challenging. Henry et al. (22, 23) made some studies on VOC
source identification, one of which was also for the Houston
area with the data for the period June through November
1993, butthese studies did not provideany information about
the influencesof environmentalparametric factors (e.g., wind
speed, wind direction, and temperature) on the observed
pollutant concentrations, and only three sources in the
Houston areawere identified. Therefore, in thepresent study,
an expanded factor analysis model will be used to identify
the VOCs sources in the Houston area with the goals of (1)
checking the feasibility of the expanded modeling to VOC
sources, (2) observing the influences of environmental
parametric factors on the observed concentrations, and (3)
supplying convincing information that will be useful for air
pollution control in the Houston area. ME will be used as an
optimization tool for data fitting in this study since it has
proved to be effective in model fitting (16, 18, 19).
* Corresponding author phone: (315)2683861; fax: (315)2684410;
e-mail: hopkepk@clarkson.edu.
†
Clarkson University.
‡
The National Center for Atmospheric Research.
Environ. Sci. Technol.
2004,
38,
1338-1347
1338
9
ENVIRONMENTAL SCIENCE & TECHNOLOGY / VOL. 38, NO. 5, 2004 10.1021/es034999c CCC: $27.50 2004 American Chemical Society
Published on Web 01/28/2004
Expanded Factor Analysis Model
In general,the ordinarybilinear receptormodel canbe written
as
where X is the matrix ofVOCs’ concentrations, F isthe matrix
of source profiles, G is the matrix ofsource contributions,
and E is the residuals matrix. Their elements x
ij
, f
jp
, and g
ip
can be respectively understood as the concentration of
compound j measured in sample i, the concentration of
compound j in the emission ofsource p, and the strength
of source p on sample i (16, 19).
In this section, wind direction and wind speed will be
used to illustrate the construction of the expanded factor
analysis model(16,19). Inthe bilinear model, the contribution
of source p to the concentration of compound j in sample
i, u
ijp
, is represented by u
ijp
) g
ip
f
jp
. In the expanded model,
a parallel equation is developed where the contribution u
ijp
can be represented by another form
where d
i
and s
i
are the values of wind direction and wind
speed of sample i. The ranges of wind direction and wind
speed are divided into a series of subranges having similar
numbers of samples in each. Then each wind direction/
speed belongs to a specific range. Thus, D(d
i
,p), an element
of D, represents the action ofsource p on pollution in the
wind direction range of d
i
. For example, if source 3 strongly
affects the observed concentration at the wind direction of
80°, D(4,3) (the first index, 4, corresponds to 80° if the wind
direction range 0-360° are evenly divided into 18subranges)
should be a relatively larger value. S(s
i
,p) has a similar
definition for wind speed. Thus, z
ip
can be considered as a
multiplier that represents the comprehensive action of wind
direction and speed on the observed pollution. Obviously,
in different physical models, z
ip
can correspond to different
expressions. In this study, z
ip
corresponds to the factors for
wind speed, wind direction, temperature, and weekday/
weekend.
The expanded receptor model can then be expressed by
where W(w
i
,p) denotes the action of weekdays or weekends
by source p on the observed concentration, I is the number
of samples,and Jis the number ofmeasured chemicalspecies.
By fixing the weekday coefficient at unity, W(w
i
,p) is a vector
with n
p
(the number of sources) elements. T(t
i
,p) represents
the action of temperature t
i
for source p. The task of solving
this expanded PMF model is to determine the values of F,
G, D, S, W, and T to fit the data as well as possible. The
optimization problem can be defined as
in which e
ij
and e′
ij
are determined by eqs 3a and 3b and σ
ij
and σ′
ij
are the error estimates, which can be considered as
special weights. Clearly, z
ip
is the combination of all
influencing factors such as wind speed, temperature, and
weekday/weekend factors, so with ME an expanded factor
analysis model can provide us not only the source profiles
and contributions but also the strength of other factors
affecting the observed concentrations. A prerequisite to
applying this model is that the action (contribution) of
considered parametric factors can be expressed in linear
terms. As an optimization method for factor analysis, ME
has two problems to be solved, that is, how to determine the
number of factors and how to avoid local optimal. In the
section of Results and Discussion, the methods for solving
these two problems for this case will be described in detail.
Because eq 3b will generate a poorer fit to the data than
eq 3a, the error estimate for eq 3b, σ′
ij
, must be (much) larger
than that for eq 3a, σ
ij
(16, 19). In this study σ′
ij
is 8 times of
σ
ij
and σ
ij
is represented as
where c
1
denotes the uncertainty of measurement and c
3
is
a constant. Here c
3
is valued at 0.2. Because of the complex
VOC mixture in the ship channel area and the potential
interference at low concentration, the experimental uncer-
tainties obtained by the measurement technique used here
were hard to access. An approach using the fast Fourier
transformation (FTT) was applied to solve this problem. The
procedure can be briefly described as follows. Xylene will be
used as an example from the species being studied in this
paper. Let c be the concentration series of xylene, which has
7292 measurements. Thus, the key steps to estimate the
measurement uncertainty from the measurement series are
as follows:
(a) Generate a random series r with the same length as
c (7292 elements) and variance ν
r
2
) 1.
(b) Perform FFT on c and r, and calculate theirmagnitude
spectra and call them mfc and mfr.
(c) Plot mfc and mfr, respectively. It can be seen from
Figure 1 that mfc consists of two parts; one with low
frequencies representsthe useful information while theother
with high frequencies represents the noise, and mfr does
not show two different parts because it is generated by a
random series.
(d) Selectan interval ofnoise in mfc.Although it isdifficult
to determinethe exact startingand ending pointsof the noise
interval, the selected noise range should be sufficiently long
to reflect thenoise information. In thisexample, the selected
interval was from mfc(1000) to mfc(6000). Then, calculate
the mean value of the selected interval and name it m_mfc.
X ) GF
T
+ E (1)
u
ijp
) z
ip
f
jp
) D(d
i
,p)S(s
i
,p)f
jp
(2)
x
ij
)
∑
p)1
N
g
ip
f
jp
+ e
ij
(3a)
x
ij
)
∑
p)1
N
z
ip
f
jp
+ e′
ij
)
∑
p)1
N
D(d
i
,p)S(s
i
,p)W(w
i
,p)T(t
i
,p)f
jp
+ e′
ij
(3b)
min Q )
∑
i)1
I
∑
j)1
J
(e
ij
/σ
ij
)
2
+
∑
i)1
I
∑
j)1
J
(e′
ij
/σ′
ij
)
2
(4)
FIGURE 1. Illustration of FFT-based uncertainty estimation. (a) The
concentration series of xylene, c; (b) the magnitude spectrum of
concentration series, mfc; (c) the random data series, r; (d) the
magnitude spectrum of random data series, mfr.
σ
ij
) c
1
+ c
3
x
ij
(5)
VOL. 38, NO. 5, 2004 / ENVIRONMENTAL SCIENCE & TECHNOLOGY
9
1339
(e) Select the same range in mfr, and calculate the mean
value of this range and name it m_mfr. Actually, it is also
feasible to use the whole range of mfr since there is only
noise in mfr.
(f) Calculate ν
c
according to
and consider it as the estimation of the uncertainty of the
xylene concentration series.
Although this estimation could not be guaranteed to be
fully accurate because, strictly speaking, each measurement
should have its own uncertainty, it is practical since it
produces satisfactory analysis results. Finally, the estimated
uncertainties for the compoundsin this study, acrylonitrile,
isoprene, benzene, toluene, styrene, c8-benzenes (with the
dominant component: xylene), c7-ketone, c9-benzenes,c10-
benzenes, c13-benzenes,M43, M61,and M87,are 0.074,0.102,
0.114, 0.147, 0.033, 0.114, 0.070, 0.082, 0.047, 0.016, 0.835,
0.360, and 0.089, respectively. Here M43, M61, and M87
denote theclasses ofcompounds withthe mass/chargevalues
of 43, 61, and 87, respectively. In this study, the dominant
components for them are propene, acetic acid, and vinyl
acetate, respectively.
Data Preprocessing
As part of the TexAQS 2000, a proton transfer reaction mass
spectrometer (PTR-MS) from the University of Innsbruck
was placed in an air-conditioned trailer situated next to a
10-m sampling tower at the southwest side of the municipal
airport at La Porte, TX, to identify and quantify the VOC
mixture in that area. A map showing the sampling site is
presented in Figure 2. The PTR-MS technique has been
previously described in detail (24), soonly a brief description
is given here. The principle of the PTR-MS is the reaction of
organic species in ambient air with H
3
O
+
ions, generated
from thehollow cathode discharge of watervapor, to produce
the protonated organic species (RH
+
). The concentration of
the product ions can be calculated from a reaction dynamic
equation (24). Only organic species with a proton affinity
greater than that of water can be detected by the mass
spectrometer. More details about the sampling procedure
can be found in ref 20.
The sampling period for the data in this study was from
08/20/00 to 09/08/00, and the most sampling frequencies
were about1/4-6 min
-1
, butthe frequenciesfor someperiods
were 1min
-1
. Allthe samples wereused for analysisto ensure
a sufficiency of samples. The concentrations of 14 VOCs
(methanol, acrylonitrile,isoprene, benzene, toluene,styrene,
c8-benzenes (xylenes), c7-ketone, c9-benzenes, c10-ben-
zenes, c13-benzenes, M43 (propene), M61 (acetic acid), and
M87 (vinyl acetate)) were selected for this study, with the
detection limits being 100 pptv, 60 pptv, 20 pptv, 70 pptv, 70
pptv, 30 pptv, 70 pptv, 60 pptv, 30 pptv, 30 pptv, 30 pptv, 1
ppbv, 1ppbv, and 200pptv, respectively. Someof the reasons
for this selection are benzene, toluene, and xylene (BTX)
compounds are usually considered of high importance for
urban VOC reactivity/air quality, propene is one of the
dominant reactive hydrocarbons inHouston, which makes
Houston a special case when compared to other U.S. cities,
and the toxicity of acrylonitrile directly affects the air quality
in the vicinity of an emitter. The meteorological data like
temperatures were from the NOAA AeronomyLab and winds
were measured next to the VOC inlet at 10 m above ground.
Because there were some missing andbelow detection limits
values in the concentration measurements and meteorologi-
cal data, 7292 samples were retained for analysis following
the pretreatment below. Consecutive missing values (for
example, c7-ketone has 2240 consecutive missing values)
were deleted and the values below detection limits were
replaced with half of the detection limits (25).
Additionally, the data for wind direction,wind speed, and
temperature were divided approximately evenly into 31, 5,
and 5 levels, respectively, so in each factor, the effect of any
level will not be overwhelmed by any of the others.
Results and Discussion
Due to the long lifetime and multiple sources, methanol
proves to be a ubiquitous compound. The initial trials
including methanol showed that methanol had a significant
contribution in each source profile, suppressing other
compounds such as vinyl acetate and c13-benzenes. Such
behavior typically suggested that there was a high variability
in the amounts of methanol associated with its sources. In
this case, an increase of the uncertainty of methanol could
not resolve this problem. Thus, methanol was excluded from
the final analysis. Figure 3 shows the concentration time
series for all compounds except methanol.
The determination of the number of sources is one of the
major problems in any factor analysis. In this study, three
rules were applied to decide the proper source number that
(1) the resolved source profiles should be explainable, (2) Q
value defined in eq 4 is expected to show a change in slope
with the number of sources from rapid to slow at the point
of the decidednumber, and (3) there should bea satisfactory
fit between the predicted concentrations and the measured
values. In detail, when the source number increased from 6
to 7, 7 to 8, 8 to 9, and 9 to 10, the decreases in Q were 6907,
8325, 5723, and 4915, respectively. Clearly, there is a change
in the slope at 9 sources.
In addition, there was a better fit between the predicted
concentrations and the measured values at that source
number. The fit between the predicted c13-benzenes con-
centration and the measured values increased exceptionally
quickly when the source number changed from 9 to 10. (The
correlation coefficient for 10 sources was 0.932 while that for
9 sources was 0.58.) However, actually more than 75% of the
c13-benzenes concentration measurements were below the
detection limit and replaced by half of the detection limit,
so the exceptionally good fit might suggest that there was
overfitting of c13-benzene for the case of 10 sources. Neither
poor fit nor over fit is acceptable. So the number of sources
for this analysis was chosen to be 9. During the experiments,
FIGURE 2. Sampling site (La Porte Municipal Airport) in Houston,
Texas.
v
c
v
r
)
m_mfc
m_mfr
(6)
1340
9
ENVIRONMENTAL SCIENCE & TECHNOLOGY / VOL. 38, NO. 5, 2004
the candidate cases had at least three runs to avoid local
optima, andfinally thecase of 9sources (excludingmethanol)
was selected as the best solution. The results are discussed
below.
The profiles and time-resolved contributions of nine
sources are shown in Figures 4 and 5. To describe the
contribution variation ofsource i between weekdays and
weekends in quantity, a ratio called KD is defined as eq 7:
The plots of the wind direction factor, wind speed factor,
temperature factor, and weekend factor of 9 sources are
shown in Figures 6, 7, 8, and 9, respectively. In the wind
directional plots, each column of matrix D is displayed in a
polar plot to represent the factorvalues for thedifferent wind
directions (i.e., the longer the radius is, the bigger the
contribution at that direction). In addition, the emitter
location plots of acrylonitrile, toluene-xylene, benzene,
styrene, and propene are presented in Figures 10-14. The
plot showing emitter locations was produced by superim-
posing the wind directional plot (blue area) onto the map
where the corresponding emitters in the observed area were
displayed as circles, squares, or triangles. Emission rates for
2000 that were obtained from the Toxic Release Inventory
(26) were shown on the plots with the corresponding colors,
if available. The size of blue area in each plot does not
represent the distance between receptor site and emitter
but denotes the strengths of the identified source on the
pollution at different wind directions.
Activity can change considerably from weekdays to
weekends. Some production factories do not operate on
weekends, sothe emissions of these sourcesvary accordingly.
In addition, the pattern of motor vehicle use also changes
as fewerpeople commuteto workand fewerheavy-duty diesel
trucks will be operated on weekends. Thus, the weekend
factor should reflect changes in the human activities.
However, inthis study,the numberof weekendsamples (there
are only 3 weekends in this study and moreover they contain
many missingvalues) mightnot besufficient enoughto obtain
FIGURE 3. Concentration series of each compound.
FIGURE 4. Profiles ofthe identifiedVOC sources inLa Porte,Houston.
FIGURE 5. Time-resolved contribution ofeach identified VOC source
in La Porte, Houston.
KD
i
)
mean{g
i,j
|j ∈ weekends}
mean{g
i,j
|j ∈ weekdays}
(7)
VOL. 38, NO. 5, 2004 / ENVIRONMENTAL SCIENCE & TECHNOLOGY
9
1341
a general conclusion, but they can provide us an initial
estimate of the influence of weekend factors on pollution.
It can be seen from Figure 9 (the value for the source without
weekend effect should be around 1) that sources 3 and 9
have significant weekend influence while sources 1, 2, 4, and
8 show only a weak weekend influence. The possible reasons
for weak weekend effect might be(1) although relatively little
isoprene insource 2 is biogenerated, its emission should be
independent of weekend/weekday and (2) the c9-benzene,
c10-benzene, toluene,xylene, and othercompounds in these
4 sources are from refineries that are usually operated in
continuous mode,and their emissionrates on weekends may
overwhelm the negative effect caused by the decreased
number of motor vehicles.
Source 1 contains mainly acrylonitrile. Figure 10 shows
that most acrylonitrile emitters are located to the northwest
and south of the sampling site. Likely, these emitters include
the boilers, dryer stacks, aeration tanks, ponds, and waste
gas processingequipments of chemical or rubber plants (27).
The wind directional plot for this source agrees with the
emitter locations as it shows a large contribution from the
northwest and a peak at about 150°. The high peaks of the
contribution plot for this source correspondto the nighttime
period when southerly winds dominate. No information is
available on the diurnal patterns of the source. In addition,
the KD value of this sourcein Table 1 is 0.404, so this source
is expectedto have asignificant weekend influence.However
such an influence does not agree with the result in Figure
9. As mentioned above, the limited number of weekend
measurements for analysis may notbe sufficient, which may
be the cause of this disagreement. Better results for the
weekend factor might be obtained if a larger data set were
available.
Source 2shows isopreneand M87 (vinylacetate). Isoprene
is atypical biogenicVOC (28),but the contribution of biogenic
isoprene is small in the immediate proximity of the La Porte
site (29).A numberof anthropogenic isoprene emitters (likely,
rubber industry) are located to the north and south of the
sampling site (20, 27). For M87, there are a number of vinyl
acetate emitters to the north and south of the sampling site,
and especially several large vinyl acetate emitters are located
to the north (27). These emitters are most likely the storage
tank and other equipment of chemical plants. The wind
directional plot for this sourcein Figure 6 largely confirms
the location of these emitters, as it shows some convexes in
FIGURE 6. Wind direction factor plots.
TABLE 1. Ratio of Mean Contribution of Weekend Samples to
That of Weekday Samples
a
no. 1 2 (M) 3 (M) 4 (M) 5 (M) 6 (M) 7 (M) 8 9 (M)
KD 0.404 0.964 0.599 1.039 0.579 0.662 0.738 1.410 0.493
a
The (M) after the number denotes the KD value of this source is
identical to the corresponding weekend factor result.
1342
9
ENVIRONMENTAL SCIENCE & TECHNOLOGY / VOL. 38, NO. 5, 2004
FIGURE 7. Wind speed factor plots.
FIGURE 8. Temperature factor plots.
VOL. 38, NO. 5, 2004 / ENVIRONMENTAL SCIENCE & TECHNOLOGY
9
1343
the south and a broad contribution from the north. The
contribution of biogenic isoprene is small, but a number of
anthropogenic isoprene emitters are located at similar
directions as the vinyl acetate emitters. This might be one
of the reasons why isoprene and M87 occur in the same
source. The KD value of this source is 0.964, which agrees
with the result of weekend factor that there is a very weak
weekend effect for this source.
Source 3 is characterized by c7-ketone. There are some
possible emitters (synthetic organic manufacturing plants)
to the southeast of the sampling site (20), so the wind
FIGURE 9. Weekend factor plot.
FIGURE 10. Locations of acrylonitrile emitters. The circles denote
the acrylonitrile emitters. The blue area corresponds to the wind
direction plot for this source. The red × at the center of the blue
area is the sampling site.
FIGURE 11. Location of toluene-xylene emitters. The red squares
and blue circles denote xylene and toluene emitters, respectively.
The blue area correspondsto the winddirection plot for thissource.
The red × at the center of the blue area is the sampling site.
FIGURE 12. Location ofbenzene emitters. Thegreen trianglesdenote
benzene emitters. The blue area corresponds to the wind direction
plot for this source. The red × at the center of the blue area is the
sampling site.
FIGURE 13. Location of styrene emitters. The black circles denote
styrene emitters. The blue area corresponds to the wind direction
plot for this source. The red × at the center of the blue area is the
sampling site.
FIGURE 14. Location of propene. The circles denote the propene
emitters. The blue area corresponds to the wind direction plot for
this source. The red × at the center of the blue area is the sampling
site.
1344
9
ENVIRONMENTAL SCIENCE & TECHNOLOGY / VOL. 38, NO. 5, 2004
directional plot shows a broad contribution from that
direction. There are a number of peaks in the corresponding
contribution plot, but none of these peaks were on the
weekend. In addition, the KD value of this source is 0.599
and identical to the result of weekend factor.
Source 4 contains toluene and xylene. These emitters are
operation units and equipment of the chemical and refining
industry (e.g.,tanks, boilers, reactors, pyrolysis furnaces)(27).
Figure 11 shows the emitters are mainly located to the
northwest and south of the sampling. The wind directional
plot shows a large contribution from the north and a sharp
spike in the south, in agreement with the locations of the
major emitters. In addition, toluene and xylene can be
generated by motorvehicles (highway 225 isjust to the north
and highway 146 to the east and southeast of the sampling
site). This can be another cause of the shape of the wind
directional plot. There are many peaks almost evenly
distributed in the contribution plot and the KD value of this
source is 1.039, which agrees with the weak weekend effect
in Figure 9. In addition, the time corresponding to the peaks
was mainly in the morning (6:00-8:00) and night (22:00-
24:00).
Source 5 is characterized by benzene mostly from
chemical plants or refineries. Figure 12 shows that most
benzene emitters are distributed to the north and south of
the sampling site (27). Particularly, one benzene source is
located on Bay Area Blvd (in the direction of 150°)(20). In
addition, the motor vehicles on highways 225 and 146 may
increase the concentration of benzene. The location infor-
mation of the emitters is supported by the wind directional
plot that shows a broad contribution from the north and a
spike in the direction at about 150°. The contribution plot
for this source shows a number of peaks, noneof whichwere
on the weekends, and the KD value of this source is 0.579.
This agrees with the result of weekend factor. The significant
variation between weekends and weekdayssuggests that the
contribution of mobile sources on weekdays might have a
greater impact on benzene emission.
Source 6 is characterized by styrene. Figure 13 shows the
location of the styrene emitters, most of which are likely to
be the units of petrochemical plants. There are many styrene
emitters around the sampling site and most of them are
located to the northwest and south of the sampling site (27).
Particularly there is a large styrene emitter at about 210°.
The corresponding wind directional plot in Figure 6 agrees
with thelocation informationas itshows a broad contribution
from the northand also a sharpspike at the directionof 210°.
Most of the high peaks in the contribution plot were at
nighttime when the southerly wind was dominant. The KD
FIGURE 15. Distribution plot for scaled residual errors.
VOL. 38, NO. 5, 2004 / ENVIRONMENTAL SCIENCE & TECHNOLOGY
9
1345
value of this source is 0.662, which agrees with the weekend
factor result that this source has a weekend effect.
Source 7 is represented by M61 whose kernel component
is acetic acid. Acetic acid is a typical photochemical reaction
product (30), so the wind direction plot shows a relatively
smooth shape.Meanwhile, a number of anthropogenicacetic
acid emitters (e.g., acetic acid storage tanks, boilers, and
exhausted liquid tanks) are located to the north and south
of the sampling site (27). The KD value of this source is 0.738
and identicalto theweekend factorresult. The photochemical
sources should not have weekend effect, so the variation
between weekday and weekend may be due to the changes
of theemission ratesof the anthropogenic acetic acidsources.
Source 8 is characterized by c9-benzenes and c10-
benzenes. A number of c9- and c10-benzenes emitters (e.g.,
the operation units of chemical or petrochemical plants) are
located to the north of the sampling site (27). The wind
directional plot for this source shows a broad contribution
from the northwest and a spike in the south. The motor
vehicles on highways 225 and 146 and Spencer Hwy may
increase the concentrations of c9- and c10-benzenes and
may be a reason for the spike in the south. The peaks in the
contribution plot are distributed over both weekdays and
weekends (August 27 and September 2), but the KD value of
this source is 1.410. Therefore, a weekend effect is expected.
However, the weekend factor result in Figure 9 shows only
a weak weekend influence. As before, this discrepancy may
arise from the limited weekend data.
Source 9 is represented by M43 (propene). Propene is
most likely emitted by the refineries along the ship channel
(20). Figure 14 shows a number of large emitters are located
to the north and northeast of the sampling site, which is
supported by the wind directional plot with a large contri-
bution from the northeast. Most high peaks in the contribu-
tion plot correspond to daytime periods when the dominant
wind is a northerly wind. The KD value of thissource is 0.493,
so this source seems to have a significant weekend effect,
which agrees with the weekend factor result.
The 13 VOCs, except c13-benzene which only appears in
small amounts, are distributed into reasonable source
profiles, and the corresponding contribution anddirectional
patterns are in general agreement with known source
information. One reason for the absence of c13-benzenes
can be that almost 75% of this compound’s measurements
were below the detection limit and were replaced with half
of the detection limit. Therefore, it may not be possible to
make any quantitative attributions for this compound.
Because of the same reason, the scaled residual errors for
c13-benzenes inFigure 15are notsatisfactory while the others
have a reasonable distribution. Although the weekend data
are not sufficient enough to make a correct conclusion on
weekend effect for each source, the weekend factors of most
sources (7 out of 9) are identical with the defined KD values.
These results suggest the feasibility ofincluding the weekend
effect analysis.
Wind speed and temperature are two potentially impor-
tant meteorological factors that can help interpret the
observed VOCconcentrations. Figure 7 shows thewind speed
factor. Formost factors,the wind speed factor valuesdecrease
with increasing wind speed. This trend suggests a dilution
effect that the same emitted mass is released into a larger
volume of air as wind speed increases; the concentration
therefore decreases (16). However, the factors of sources 2
and 9increase withincreasing wind speed and source3 shows
an almost flat curve. The possible reasons for these phe-
nomena mightbe (1) forthese sources that may becomposed
of pointemitters (e.g.,high-concentration storagetank), there
may be more coherent plume effect at higher wind speed
(higher wind speed makes these emitted VOCs gathered
together ratherthan dispersed) and (2) high-speed wind may
enhance the evaporations of some VOCs.
The influence of temperature on pollutants is more
complex than that of wind speed because increasing tem-
perature will not only speed up the vaporization of VOCs but
also change the chemical properties of VOCs and enhance
the reactions between VOCs and oxidants in the air. It is
relatively difficult to summarize the action of temperature
on the observed concentration. For some sources (e.g., Nos.
2, 3,and 7), the temperature factor values in Figure 8 increase
with temperature. This trend might bethe result ofincreased
vaporization. Another explanation for source 7 is likely that
the increase in temperature enhances the rates of the
photochemical reactions. However, the temperature factors
for other sources do not show increasing trends.
The sourceidentificationof VOCs in the La Porte Airport
has been successfully performed using an expanded factor
analysis modeland the correspondingoptimization tool, ME.
The profiles and contributions of the 9 identified sources
proved tobe reasonable. Besides, wind direction,wind speed,
temperature, and weekend factors were also determined.
The information on wind directions appears to agree with
the known emission inventories and the wind speed factors
for most sources suggest a dilution effect. For many sources,
weekend and temperature factors help in interpreting their
influences on the observed concentrations. It is not clear
that this model is the best representation of the physical and
chemical influences of such factors on the observed con-
centrations. However, the results do suggest this is a feasible
direction for such a study. In addition, the results suggest
that the error estimation obtained through the FFT is
reasonable in term of finding an interpretable solution. It
appears that expanded modeling is feasible for not only
identifying VOC sources in complex systems such as the air
system in Houston but also revealing the various important
features of these sources.
Acknowledgments
This workwas supported bythe United States Environmental
Protection Agency through cooperative agreement number
R-82806201 under a subcontract to Clarkson University by
The Universityof Texas atAustin (UT). Althoughthe research
described in this article has been funded wholly or in part
by the United States Environmental Protection Agency, it
has not been subjected to the Agency’s required peer and
policy review and, therefore, does not necessarily reflect the
views of the Agency and no official endorsement should be
inferred. Themeteorological data for this studywere supplied
by NOAA Aeronomy Lab.
Literature Cited
(1) Ilgen, E.; Karfich, N.; Levsen, K.; Angerer, J.; Schneider, P.;
Heinrich, J.; Wichmann, H.; Dunemann, L.; Begerow, J. Atmos.
Environ. 2001, 35, 1235-1252.
(2) Fujita, E. M.; Watson, J. G.; Chow, J. C.; Magliano, K. L. Atmos.
Environ. 1995, 29, 3019-3035.
(3) Hopke, P. K. Trends Anal. Chem. 1985, 4, 104-106.
(4) Hopke, P. K. Receptor Modeling for Air Quality Management;
Elsevier: Amsterdam, 1991.
(5) Cooper, J. A.; Watson, J. G.; Huntzicker, J. J. Atmos. Environ.
1984, 8, 1347-1355.
(6) Hopke, P. K.; Gladney, E. S.; Gordon, G. E.; Zoller, W. H.; Jones,
A. G. Atmos. Environ. 1976, 10, 1015-1025.
(7) Heidam, N. Z. Atmos. Environ. 1981, 15, 1421-1427.
(8) Henry, R. C. Atmos. Environ. 1987, 21, 1815-1820.
(9) Barrie, L. A.; Barrie, M. J. J. Atmos. Chem. 1990, 11, 211-226.
(10) Paatero, P.; Tapper,U. Chemom. Intell. Lab. Syst.1993, 18, 183-
194.
(11) Paatero, P.; Tapper, U. Environmetrics 1994, 5, 111-126.
(12) Paatero, P. Chemom. Intell. Lab. Syst. 1997, 37,23-35.
(13) Huang, S.; Rahn, K. A.; Arimoto, R. Atmos. Environ. 1999, 33,
2169-2185.
1346
9
ENVIRONMENTAL SCIENCE & TECHNOLOGY / VOL. 38, NO. 5, 2004
(14) Lee, E.; Chan, C. K.; Paatero, P. Atmos. Environ. 1999, 33, 3201-
3212.
(15) Kim, E.; Larson, T. V.; Hopke, P. K.; Slaughter, C.; Sheppard, L.
E.; Claiborne, C. Atmos. Res. 2003, 66, 291-305.
(16) Paatero, P.; Hopke, P. K. Chemom. Intell. Lab. Syst. 2002, 60,
25-41.
(17) Paatero, P. J. Comput. Graphical Stat. 1999, 8, 854-888.
(18) Xie, Y.;Hopke,P. K.; Paatero, P.;Barrie,L. A.; Li,S.Atmos. Environ.
1999, 33, 2549-2562.
(19) Kim, E.; Hopke,P. K.; Paatero,P.; Edgerton, E.S.Atmos. Environ.
2004, in press.
(20) Karl, T.; Jobson, T.; Kuster, B.; Williams, E.; Stutz, J.; Goldan, P.;
Fall, R.; Fehsenfeld, F.; Lindinger, W. J. Geophys. Res. 2003, 108
(D16): Art. No. 4508.
(21) Goldan, P. D.;Parrish, D. D.; Kuster, W.C.; Trainer, M.;McKeen,
S. A.; Holloway, J.; Jobson, B. T.; Sueper, D. T.; Fehsenfeld, F.
C. J. Geophys. Res. 2000, 105, 9091-9105.
(22) Henry, R. C.; Spiegelman, C. H.; Collins, J. F.; Park, E. Proc. Natl.
Acad. Sci. U.S.A. 1997, 94, 6596-6599.
(23) Henry, R. C.; Lewis, C. W.; Collins, J. F. Environ. Sci. Technol.
1994, 28, 823-832.
(24) Salisbury, G.; Williams,J.; Holzinger, R.;Gros, V.; Mihalopoulos,
N.; Vrekoussis, M.; Sarda-Este˘ ve, R.; Berresheim, H.; von
Kuhlmann, R.; Lawrence, M.; Lelieveld, J. Atmos. Chem. Phys.
2003, 3, 925-940.
(25) Polissar, A. V.; Hopke, P. K.; Malm, W. C.; Sisler, J. F. J. Geophys.
Res. 1998, 103, 19045-19057.
(26) Toxic ReleaseInventory, U.S. EnvironmentalProtection Agency,
http://www.epa.gov/tri/ (accessed July/August 2000).
(27) Texas Commission on Environmental Quality (TCEQ), 2001.
Development ofSource Speciation Profiles from the TNRCC
Point Source Database (Final Report).
(28) Guenther, A.; Baugh, W.; Davis, K.; Hampton, G.; Harley, P.;
Klinger, L.; Vierling, L.; Zimmerman, P.; Allwine, E.; Dilts, S.;
Lamb, B.; Westberg, H.; Baldocchi, D.; Geron, C.; Pierce, T. J.
Geophys. Res. 1996, 101, 18555-18567.
(29) Wiedinmyer, C.;Guenther,A.; Estes, M.; Strange,I.W.; Yardwood,
G.; Allen, D. T. Atmos. Environ. 2001, 35, 6465-6477.
(30) Seinfeld, J. H.; Pandis, S. N. Atmospheric Chemistry and Physics:
From Air Pollution to Climate Change; John Wiley & Sons: New
York, 1998.
Received for review September 11, 2003. Revised manuscript
received December 8, 2003. Accepted December 18, 2003.
ES034999C
VOL. 38, NO. 5, 2004 / ENVIRONMENTAL SCIENCE & TECHNOLOGY
9
1347
. Source Identification of Volatile
Organic Compounds in Houston,
Texas
WEIXIANG ZHAO,
†
PHILIP K. HOPKE,*
,†
AND THOMAS KARL
‡
Department of Chemical. the values of wind direction and wind
speed of sample i. The ranges of wind direction and wind
speed are divided into a series of subranges having similar
numbers