Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống
1
/ 30 trang
THÔNG TIN TÀI LIỆU
Thông tin cơ bản
Định dạng
Số trang
30
Dung lượng
1,88 MB
Nội dung
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark. Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark. Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark. Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark. Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark. Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark. Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark. Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark. Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark. Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark. [...]... over datastreams Other problems include those of finding significant network differences over datastreams [I91 and finding quantiles [46,50] over datastreams Another interesting application is that of significant differences between datastreams [32,33], which has applications in numerous change detection scenarios Another recent application to sketches has been to XML and tree-structured data [82,83,87]... elegant, it is computationally intensive, and it is therefore not suitable for the data stream case We also note that the coefficient is defined according to lease purchase PDF Split-Merge on www.verypdf.com to remove this watermark DATA STREAMS: MODELSAND ALGORITHMS the wavelet coefficient definition i.e half the difference between the left hand and right hand side of the time series While this choice... watermark 186 4.3 DATA STREAMS: MODELSAND ALGORITHMS Sketches and their applications in DataStreams In the previous sections we discussed the application of sketches to the problem of massive time series Some of the methods such as fixed window sketch computation are inherently offline This does not suffice in many scenarios in which it is desirable to continuously compute the sketch over the data stream... coefficients The technique in [16] reduces the time and space efficiency for both updates and queries The method of sketches can be effectively used for second moment and join estimation First, we discuss the problem of second moment estimation [6] and < lease purchase PDF Split-Merge on www.verypdf.com to remove this watermark 188 DATA STREAMS: MODELSAND ALGORITHMS illustrate how it can be used for... dimensionality k by picking k random vectors of dimensionality d and calculating the dot product of the data point with each of these random vectors Each component of the k random vectors is drawn from the normal distribution with zero mean and unit variance In addition, the random vector is normalized to one unit in magnitude It has been shown in [64] that proportional L2 distances between the data points are approximately... watermark 196 DATA STREAMS: MODELSAND ALGORITHMS disk based index structures may be used to index and update frequency counts We argue that many applications in the sketch based literature which attempts to find specific properties of the frequency counts (eg second moments, join size estimation, heavy hitters) may in fact be implemented trivially by using simple main memory data structures, and the ability... purchase PDF Split-Merge on www.verypdf.com to remove this watermark 180 DATA STREAMS: MODELSAND ALGORITHMS of basis vectors in Figure 9.1 (in the same order as the corresponding wavelets illustrated) are as follows: The most detailed coefficients have only one +1 and one -1, whereas the most coarse coefficient has t/2 +1 and -1 entries Thus, in this case, we need 23 - 1 = 7 wavelet vectors In addition,... 184 DATA STREAMS: MODELSAND ALGORITHMS since different coordinates will render larger coefficients across different measures The technique in [25] uses a dynamic programming method to determine the optimal extended wavelet decomposition However, this method is not time and space efficient A method in [52] provides a fast algorithm whose space requirement is linear in the size of the synopsis and logarithmic... rln(l/S)l painvise independent hash functions, each of which map on to uniformly random integers in the range [0, el€], lease purchase PDF Split-Merge on www.verypdf.com to remove this watermark 192 DATA STREAMS: MODELSAND ALGORITHMS where e is the base of the natural logarithm Thus, we maintain a total of [ln(l/S)l hash tables, and there are a total of O(ln(l/S)/e) hash cells This apparently provides a... binary representation of that 1 integer will have length L The position (least significant and rightmost bit is counted as 0) of the rightmost 1-bit of the binary representation of that integer lease purchase PDF Split-Merge on www.verypdf.com to remove this watermark 194 DATA STREAMS: MODELSAND ALGORITHMS is tracked, and the largest such value is retained This value is logarithmically related to the number