VNUJournalofScience,EarthSciences24(2008)153‐159
153
Building land unit database for supporting land use planning
in Thai Binh Province byintegrating ALES and GIS
Nhu Thi Xuan*, Dinh Thi Bao Hoa
College of Science, VNU
Received 5 November 2008; received in revised form 25 November 2008
Abstract. In order to ensure the effectiveness of land use planning, the information about land
quality and land characteristics plays an important role. The application of information technology
is one of the best solutions in the area of land use planning in which land unit database is
considered firstly and seriously. The land unit database consists of spatial data and attribute data,
both of which should follow the standard. The paper presents a procedure to build the land unit
database, and illustrates an application of the database to land suitability classification for paddy
field and crop in Thai Binh Province by comparing land unit with the requirement of each land use
type according to ecology characteristic.
Keywords: ALES and GIS; Land suitability; Land unit database.
1. Introduction
*
Thai Binh Province is located in the Red
River Delta. The province is close to the
northern focus economic triangle Hanoi - Hai
Phong - Quang Ninh and it is also a commercial
exchange gate between Hai Phong, Quang Ninh
and coastal provinces across the country.
Covering an area of about 1,535 km
2
, Thai
Binh makes up 0.5% of total area of Vietnam.
The province borders on the Gulf of Tonkin in
the east, Nam Dinh and Ha Nam provinces in
the west and southwest, and Hai Duong, Hung
Yen and Hai Phong City in the north.
The terrain is flat with slope less than 1%
stiffing from north to south. Elevation varies
from 1 to 2 m above mean sea level. Average
annual temperature of the area is 23.3
0
C. Total
_______
*
Corresponding author. Tel.: 84-913083269.
E-mail: xuannhu1954@yahoo.com
annual radiation is quite high. The average
annual rainfall ranges from 1600 to 2000 mm.
Rainy season lasts from April to October and
dry season from November to March. In rainy
season, large amount of rainfall is concentrated,
accounting for 80 to 90% of the total annual
rainfall.
The sediment includes mud and clay and is
red-brown colored. pH of stabilized soil, loam
or heavy loam is from 7.2 to 7.6. The soil is soft
mud, rich in nutrient suitable for paddy and
crops. The soil in Thai Binh is also good for
plantation of foodstuff and industrial plants of
short life, tropical fruit trees, flowers, etc.
Thai Binh has a population of 1.8 million
people, of which 94.2% are rural and 5.8% are
urban. Labor force is of 1.73 million people in
which 74.3% are working in agriculture and
forestry; 17% - in industry and construction;
and 8.7% - in trade service.
N.T.Xuan,D.T.B.Hoa/VNUJournalofScience,EarthSciences24(2008)153‐159
154
Total natural land area across the province
is 153,596 ha, of which 94,187 ha is under
cultivation. Thai Binh possesses fertile land and
large labor force working in agriculture having
experiences in cultivating 3-4 crops annually in
one year. The convenient irrigational system
has partly helped build up paddy fields yielding
up to 14-15 tons/ha.
The purpose of this GetrealbyintegratingegoandsoulGetrealbyintegratingegoandsoul Bởi: Joe Tye “The wonderful thing about letting go of the fear of failure is that it liberates us to try new experiences and new relationships Once you know in your heart that Real inventiveness, creativity and even love are impossible without the lessons we learn from failure, you feel more courageous about making an effort.” Toni Raiten-D’Antonio: The Velveteen Principles: A Guide to Becoming Real, Hidden Wisdom from a Children’s Classic The Yin and Yang of EgoandSoul For thousands of years, philosophers have written about how we humans are torn by conflicting inner drives We want to be recognized, but we want to be left alone We want material possessions, but we want our lives to be uncomplicated We want to work hard at work that really matters, but we want to spend time sitting on a riverbank with a fishing pole We are torn between temptation and virtue, almost as if there really is a little devil sitting on one shoulder and a little angel sitting on the other How we resolve this inner conflict has everything to with becoming Authentic (Core Action Value #1 in the course on The Twelve Core Action Values) I think of it as a battle between EgoandSoulEgo wants things, Soul wants time Ego wants fame, Soul wants friends Ego is insecure yet arrogant, Soul is centered yet humble Ego is concerned about what other people think, Soul is concerned about others When things go wrong, Ego points a finger, Soul looks in the mirror Ego complains, Soul gives thanks The voice of Ego is loud and demanding, the voice of Soul is soft and accepting 1/2 GetrealbyintegratingegoandsoulEgoandSoul are the yang and yin of personality It’s not that one is always bad and the other good; they can be complementary When I start working on a new writing project, Ego is motivated by the prospect of having a bestselling book; Soul loves the feel of a good pen rolling across a clean sheet of paper and the thought that people I might never meet will be inspired by my words The combined motivation produced byEgoandSoul together is more powerful than just one would be alone There are, of course, times when the two are in conflict Ego might be secretly pleased to see a perceived rival fall on his face, while Soul wants to help him up, dust him off, and give him a gentle push in the direction of the winners’ circle Ego might want to go to Las Vegas while Soul wants to help build a house with Habitat for Humanity Ego might want to take a nap while Soul wants to go for a walk 2/2 Proceedings of the COLING/ACL 2006 Main Conference Poster Sessions, pages 755–762,
Sydney, July 2006.
c
2006 Association for Computational Linguistics
Unsupervised Topic Identification byIntegrating Linguistic and
Visual Information Based on Hidden Markov Models
Tomohide Shibata
Graduate School of Information Science
and Technology, University of Tokyo
7-3-1 Hongo, Bunkyo-ku,
Tokyo, 113-8656, Japan
shibata@kc.t.u-tokyo.ac.jp
Sadao Kurohashi
Graduate School of Informatics,
Kyoto University
Yoshida-honmachi, Sakyo-ku,
Kyoto, 606-8501, Japan
kuro@i.kyoto-u.ac.jp
Abstract
This paper presents an unsupervised topic
identification method integrating linguis-
tic and visual information based on Hid-
den Markov Models (HMMs). We employ
HMMs for topic identification, wherein a
state corresponds to a topic and various
features including linguistic, visual and
audio information are observed. Our ex-
periments on two kinds of cooking TV
programs show the effectiveness of our
proposed method.
1 Introduction
Recent years have seen the rapid increase of mul-
timedia contents with the continuing advance of
information technology. To make the best use
of multimedia contents, it is necessary to seg-
ment them into meaningful segments and annotate
them. Because manual annotation is extremely ex-
pensive and timeconsuming, automatic annotation
technique is required.
In the field of video analysis, there have been
a number of studies on shot analysis for video
retrieval or summarization (highlight extraction)
using Hidden Markov Models (HMMs) (e.g.,
(Chang et al., 2002; Nguyen et al., 2005; Q.Phung
et al., 2005)). These studies first segmented videos
into shots, within which the camera motion is con-
tinuous, and extracted features such as color his-
tograms and motion vectors. Then, they classi-
fied the shots based on HMMs into several classes
(for baseball sports video, for example, pitch view,
running overview or audience view). In these
studies, to achieve high accuracy, they relied on
handmade domain-specific knowledge or trained
HMMs with manually labeled data. Therefore,
they cannot be easily extended to new domains
on a large scale. In addition, although linguistic
information, such as narration, speech of charac-
ters, and commentary, is intuitively useful for shot
analysis, it is not utilized by many of the previous
studies. Although some studies attempted to uti-
lize linguistic information (Jasinschi et al., 2001;
Babaguchi and Nitta, 2003), it was just keywords.
In the field of Natural Language Processing,
Barzilay and Lee have recently proposed a prob-
abilistic content model for representing topics and
topic shifts (Barzilay and Lee, 2004). This content
model is based on HMMs wherein a state corre-
sponds to a topic and generates sentences relevant
to that topic according to a state-specific language
model, which are learned from raw texts via anal-
ysis of word distribution patterns.
In this paper, we describe an unsupervised topic
identification method integrating linguistic and vi-
sual information using HMMs. Among several
types of videos, in which instruction videos (how-
to videos) about sports, cooking, D.I.Y., and oth-
ers are the most valuable, we focus on cooking
TV programs. In an example shown in Figure 1,
preparation, sauteing, and dishing up are automat-
ically labeled in sequence. Identified topics lead to
video segmentation and can be utilized for video
summarization.
Inspired by Barzilay’s work, we employ HMMs
for topic identification, wherein a state corre-
sponds to a topic, like preparation and frying, and
various features, which include Real estate analysis: statistical tools 43 3.1.3 Panel data Panel data have the dimensions of both time series and cross-sections – e.g. the monthly prices of a number of REITs in the United Kingdom, France and the Netherlands over two years. The estimation of panel regressions is an interesting and developing area, but will not be considered further in this text. Interested readers are directed to chapter 10 of Brooks (2008) and the references therein. Fortunately, virtually all the standard techniques and analysis in econo- metrics are equally valid for time series and cross-sectional data. This book concentrates mainly on time series data and applications, however, since these are more prevalent in real estate. For time series data, it is usual to denote the individual observation numbers using the index t and the total number of observations available for analysis by T. For cross-sectional data, the individual observation numbers are indicated using the index i and the total number of observations available for analysis by N. Note that there is, in contrast to the time series case, no natural ordering of the observations in a cross-sectional sample. For example, the observations i might be on city office yields at a particular point in time, ordered alphabetically by city name. So, in the case of cross-sectional data, there is unlikely to be any useful information contained in the fact that Los Angeles follows London in a sample of city yields, since it is purely by chance that their names both begin with the letter ‘L’. On the other hand, in a time series context, the ordering of the data is relevant as the data are usually ordered chronolog- ically. In this book, where the context is not specific to only one type of data or the other, the two types of notation (i and N or t and T ) are used interchangeably. 3.1.4 Continuous and discrete data As well as classifying data as being of the time series or cross-sectional type, we can also distinguish them as being either continuous or discrete, exactly as their labels would suggest. Continuous data can take on any value and are not confined to take specific numbers; their values are limited only by precision. For example, the initial yield on a real estate asset could be 6.2 per cent, 6.24 per cent, or 6.238 per cent, and so on. On the other hand, discrete data can take on only certain values, which are usually integers 1 (whole numbers), and are often defined to be count numbers – for instance, the number of people working in offices, or the number of industrial units 1 Discretely measured data do not necessarily have to be integers. For example, until they became ‘decimalised’, many financial asset prices were quoted to the nearest 1/16th or 1/32nd of a dollar. 44 Real Estate Modelling and Forecasting transacted in the last quarter. In these cases, having 2,013.5 workers or 6.7 units traded would not make sense. 3.1.5 Cardinal, ordinal and nominal numbers Another way in which we can classify numbers is according to whether they are cardinal, ordinal or nominal. This distinction is drawn in box 3.2. Box 3.2 Cardinal, ordinal and nominal numbers ● Cardinal numbers are those for which the actual numerical values that a particular variable takes have meaning, and for which there is an equal distance between the numerical values. ● On the other hand, ordinal numbers can be interpreted only as providing a position or an ordering. Thus, for cardinal numbers, a figure of twelve implies a measure that is ‘twice as good’ as a figure of six. Examples of cardinal numbers would be the price of a REIT or of a building, and the number of houses in a street. On the other hand, for an ordinal scale, a figure of twelve may be viewed as ‘better’ than a figure of An overview of regression analysis 75 x y 100 80 60 40 20 0 10 20 30 40 50 Figure 4.1 Scatter plot of two variables, y and x to get the line that best ‘fits’ the data. The researcher would then be seeking to find the values of the parameters or coefficients, α and β, that would place the line as close as possible to all the data points taken together. This equation (y = α + βx) is an exact one, however. Assuming that this equation is appropriate, if the values of α and β had been calculated, then, given a value of x, it would be possible to determine with certainty what the value of y would be. Imagine – a model that says with complete certainty what the value of one variable will be given any value of the other. Clearly this model is not realistic. Statistically, it would correspond to the case in which the model fitted the data perfectly – that is, all the data points lay exactly on a straight line. To make the model more realistic, a random disturbance term, denoted by u, is added to the equation, thus: y t = α + βx t + u t (4.2) where the subscript t (= 1, 2, 3, ) denotes the observation number. The disturbance term can capture a number of features (see box 4.2). Box 4.2 Reasons for the inclusion of the disturbance term ● Even in the general case when there is more than one explanatory variable, some determinants of y t will always in practice be omitted from the model. This might, for example, arise because the number of influences on y is too large to place in a single model, or because some determinants of y are unobservable or not measurable. ● There may be error s in the way that y is measured that cannot be modelled. 76 Real Estate Modelling and Forecasting ● There are bound to be random outside influences on y that, again, cannot be modelled. For example, natural disasters could affect real estate performance in a way that cannot be captured in a model and cannot be forecast reliably. Similarly, many researchers would argue that human behaviour has an inherent randomness and unpredictability! How, then, are the appropriate values of α and β determined? α and β are chosen so that the (vertical) distances from the data points to the fitted lines are minimised (so that the line fits the data as closely as possible). The parameters are thus chosen to minimise collectively the (vertical) distances from the data points to the fitted line. This could be done by ‘eyeballing’ the data and, for each set of variables y and x, one could form a scatter plot and draw on a line that looks as if it fits the data well by hand, as in figure 4.2. Notethatitisthevertical distances that are usually minimised, rather than the horizontal distances or those taken perpendicular to the line. This arises as a result of the assumption that x is fixed in repeated samples, so that the problem becomes one of determining the appropriate model for y given (or conditional upon) the observed values of x. This procedure may be acceptable if only indicative results are required, but of course this method, as well as being tedious, is likely to be impre- cise. The most common method used to fit a line to the data is known as ordinary least squares (OLS). This approach forms the workhorse of econo- metric model estimation, and is discussed in detail in this and subsequent chapters. x y Figure 4.2 Scatter plot of two variables with a line of best fit chosen by eye An overview of regression analysis 77 x y 10 8 6 4 2 0 01234567 Figure 4.3 Method of OLS fittingalinetothe data by minimising the sum of squared residuals Two alternative estimation methods (for determining the appropriate val- ues of the coefficients α and β) are the method of moments and the method of maximum likelihood. A generalised version of the method of moments, due to Hansen (1982), is popular, although the method of maximum .. .Get real by integrating ego and soul Ego and Soul are the yang and yin of personality It’s not that one is always bad and the other good; they can be complementary... inspired by my words The combined motivation produced by Ego and Soul together is more powerful than just one would be alone There are, of course, times when the two are in conflict Ego might... working on a new writing project, Ego is motivated by the prospect of having a bestselling book; Soul loves the feel of a good pen rolling across a clean sheet of paper and the thought that people