AMultiresolutionSymbolicRepresentationofTimeSeries.ppt

上传人:本田雅阁 文档编号:2036793 上传时间:2019-02-07 格式:PPT 页数:28 大小:675.51KB
返回 下载 相关 举报
AMultiresolutionSymbolicRepresentationofTimeSeries.ppt_第1页
第1页 / 共28页
AMultiresolutionSymbolicRepresentationofTimeSeries.ppt_第2页
第2页 / 共28页
AMultiresolutionSymbolicRepresentationofTimeSeries.ppt_第3页
第3页 / 共28页
亲,该文档总共28页,到这儿已超出免费预览范围,如果喜欢就下载吧!
资源描述

《AMultiresolutionSymbolicRepresentationofTimeSeries.ppt》由会员分享,可在线阅读,更多相关《AMultiresolutionSymbolicRepresentationofTimeSeries.ppt(28页珍藏版)》请在三一文库上搜索。

1、A Multiresolution Symbolic Representation of Time Series,Vasileios Megalooikonomou Qiang Wang Guo Li Christos Faloutsos,Presented by Rui Li,Abstract,Introducing a new representation of time series, the Multiresolution Vector Quantized (MVQ) approximation MVQ keeps both local and global information a

2、bout the original time series in a hierarchical mechanism Processing the original time series at multiple resolutions,Abstract (cont.),Representation of time series is symbolic employing key subsequences and potentially allows the application of text-based retrieval techniques into the similarity an

3、alysis of time series.,Introduction,Two series should be considered similar if they have enough non-overlapping time-ordered pairs of subsequences that are similar.,Introduction (cont.),Instead of calculating the Euclidean distance, first extract key subsequences utilizing the Vector Quantization (V

4、Q) technique and encode each time series based on the frequency of appearance of each key subsequence. Then calculate similarities in terms of key subsequence matches.,Introduction (cont.),Hierarchical mechanism: the original time series are processed at several different resolutions, and similarity

5、 analysis is performed using a weighted distance function combining all the resolution levels,Background,Many of the previous work focus on the avoidance of false dismissals. However, in some cases the existence of too many false alarms may decrease the efficiency of retrieval. The Euclidean distanc

6、e is not always the optimal distance measure.,Background (cont.),For large datasets, the computational complexity associated with the Euclidean distance calculation is a problem ( O(N*n) ). Euclidean distance (point-based model) is vulnerable to shape transformations such as shifting and scaling.,Ba

7、ckground (cont.),A new framework that utilizes high-level features is proposed Codebook generation Time series encoding Time series representation and retrieval In order to keep both local and global information, use multiple codebooks with different resolutions,Background (cont.),For each resolutio

8、n, VQ is applied to discover the vocabulary of subsequences (codewords) In VQ, a codeword is used to represent a number of similar vectors. The Generalized Lloyd Algorithm is used to produce a “locally optimal” codebook from a training set.,Background (cont.),To quantitatively measure the similarity

9、 between different time series encoded with a VQ codebook, the Histogram Model is employed. where and refer to the appearance frequency of codeword in time series t and q, respectively.,Proposed Method,MVQ approximation Partitions each time series into equi-length segments and represents each segmen

10、t with the most similar key subsequence from a codebook. Represent each time series as the appearance frequency of each codeword in it. Apply at several resolutions,Proposed Method (cont.),Codebook Generation The dataset is preprocessed Each time series is partitioned into a number of segments each

11、of length l, and each segment forms a sample of the training set that is used to generate the codebook. Each codeword corresponds to a key subsequence,Example1 Codewords of a 2-level codebook,Proposed Method (cont.),Time Series Encoding Every time series is decomposed into segments of length l. For

12、each segment, the closest codeword in the codebook is found and the corresponding index is used to represent this segment. The appearance frequency of each codeword is counted.,Proposed Method (cont.),Time Series Encoding (cont.) The representation of a time series is a vector showing the appearance

13、 frequency of every codeword.,Proposed Method (cont.),Time Series Summarization The codewords stand for the most representative subsequences for the entire dataset. We can just check the appearance frequencies of the codewords and get an overview of the time series.,Example2,Proposed Method (cont.),

14、Distance Measure and Multiresolution Representation Using only one codebook (single resolution) introduces problems The order among the indices of codewords is not kept; some important global information is lost Increasing false alarms,Proposed Method (cont.),Distance Measure and Multiresolution Rep

15、resentation (cont.) A hierarchical mechanism is introduced. Several different resolutions are involved. higher resolution local information lower resolution global information,Example3 Reconstruction of time series using different resolutions,Proposed Method (cont.),Distance Measure and Multiresolut

16、ion Representation (cont.) By being assigned different weights to different resolutions, a weighted similarity measure (Hierarchical Histogram Model) is defined:,Experiments,Best Matches Retrieval SYNDATA 6 classes; 100 time series for each class; 60 points for each time series,Experiments (cont.),Best Matches Retrieval (cont.) CAMMOUSE 1600 points for each time series,Experiments (cont.),Best Matches Retrieval (cont.) Comparisons with other methods,Experiments (cont.),Clustering SYNDATA,Experiments (cont.),Clustering (cont.) CAMMOUSE,Thank you!,

展开阅读全文
相关资源
猜你喜欢
相关搜索

当前位置:首页 > 其他


经营许可证编号:宁ICP备18001539号-1