DSpace at VNU: Parallel dimensionality reduction transformation for time-series data

2009 First Asian Conference on Intelligent Information and Database Systems Parallel Dimensionality Reduction Transformation for Time-Series Data Hoang Chi Thanh Department of Informatics, Hanoi University of Science, VNUH 334 - Nguyen Trai Rd., Hanoi, Vietnam E-mail: thanhhc@vnu.vn possible and then take the average of each part The reduction is simpler than existing ones above presented and it may be performed in parallel So this method decreases the time for “narrowing” data and speeds up the matching We also use this dimensionality reduction for a special type of timeseries data – minimum bounding rectangles This paper is organized as follows In Section we present a dimensionality reduction function for high-dimensional time-series data and some its properties Section shows that this reduction function is safe when applying it to minimum bounding rectangles Some conclusion remarks are given in the last section Abstract The subsequence matching in large timeseries databases has been being an interesting problem Many methods have been proposed that cope with this problem in an adequate extend One of good ideas is reducing properly the dimensionality of time-series data In this paper, we propose a method to reduce the dimensionality of high-dimensional timeseries data The method is simpler than existing ones based on the discrete Fourier transform and the discrete cosine transform Furthermore, our dimensionality reduction may be executed in parallel It preserves planar geometric blocks and may be applied to minimum bounding rectangles as well Dimensionality reduction for timeseries data Keywords: Time-series data, dimensionality reduction, matching problem, minimum bounding rectangle Let T[1 n] be a time-series data The timeseries data consists of n real numbers, so it is called an n-dimensional data The dimensionality n of time-series data is as high as difficult to store, search and match So it turns out that how to “narrow” the data In other words, we have to construct an operation, which transforms a high-dimensional time-series data with hundreds or thousands of dimensions to a lowdimensional time-series data with some dimensions Instead of doing on high-dimensional time-series data, one can the same on low-dimensional timeseries data with high performance To so, we construct dimensionality reduction functions for time-series data Each such a function is indeed a mapping F : Rn → Rm Let F be any dimensionality reduction function transforming n-dimensional time-series data to m-dimensional time-series data, with < m < n We are interested only in those functions that satisfy the following requirement Introduction Time-series data are the sequences of real numbers representing values at specific points in time For example, the bid prices and the ask prices of stock items, exchange rates, weather data and human speech signals … are typical illustrations of time-series data The data stored in a database are called data sequences The aim of the subsequence matching problem in a large time-series database is finding data sequences similar to the given query sequence from the database This problem has attracted a lot of interest by its applications Many methods have been proposed that cope with this problem in an adequate extend [1-5] One of good ideas to increase the matching speed is proper dimensionality reductions for highdimensional time-series data In [6] the author proposed a data transformation based on the discrete Fourier transform The authors of [7] presented a data transformation based on the discrete cosine transform In this paper we present another dimensionality reduction for high-dimensional timeseries data The method splits a high-dimensional time-series data into parts as equal in time scale as 978-0-7695-3580-7/09 $25.00 © 2009 IEEE DOI 10.1109/ACIIDS.2009.48 Definition 1: A dimensionality reduction function F is proper if for any pair of n-dimensional time-series data X and Y then: Dm(F(X),F(Y)) ≤ Dn(X,Y) 104 (1) where, Dn and Dm are the distance functions of the ndimensional space and the m-dimensional space respectively Definition 2: The m-dimensional time-series data TR[1 m] constructed as follows: i ( q + ) ⎧ ∑ T [ k ] , if ≤ i ≤ d; ⎪ ⎪ q + k = ( i − ).( q + ) + T R [i ] = ⎨ d + i.q ⎪1 T [k ] , if d+1 ≤ i ≤ m ⎪ q k = d +∑ ( i − ) q + ⎩ (4) is called a reduced m-dimensional time-series data of the n-dimensional time-series data T[1 n] So each proper dimensionality reduction function on time-series data is a shrinking mapping The properness of a reduction function guarantees no false dismissals for range queries Let T[1 n] be an n-dimensional time-series data and let m be a positive integer such that < m

Định dạng
Số trang	5
Dung lượng	290,79 KB