Dr. Mahsa Ashouri學術專題演講109/09/21
國立政治大學統計學系主講人：Dr. Mahsa Ashouri (中研院統計所博士後研究)
學 術 演 講
學 術 演 講
題 目：Tree-based Methods for Clustering Time Series Using Domain-Relevant Attributes
時 間：民國109年9月21日 (星期一) 下午1：30
We propose a set of two new methods for clustering time series that capture temporal information (trend, seasonality and autocorrelation) and domain-relevant cross-sectional attributes. The methods are based on model-based partitioning (MOB) trees and can be used as an automated yet transparent tool for clustering a large collection of time series. Our approach addresses the challenge of using common time series models within the MOB framework by utilizing the computationally-advantageous ordinary least squares (OLS) approach. We propose and compare two methods. The single-step method clusters series using trend, seasonality, time series lags and domain-relevant cross-sectional attributes, using a single linear regression model. The two-step method first clusters by trend, seasonality and domain-relevant cross-sectional attributes, and then further clusters the residuals series by autocorrelation and the domain-relevant cross-sectional attributes. Both methods produce clusters that are interpretable by domain experts. We illustrate the usefulness of the proposed clustering approach by considering one-step-ahead forecasting. We present empirical results of comparing our approach to forecasting each series using an Auto Regressive Integrated Moving Average (ARIMA) model applied to a large set of Wikipedia article pageviews time series. Our results show that the tree-based approach produces forecasts that are practically on par with ARIMA models, yet are significantly faster and more efficient, thereby suitable for scaling to large collections of time-series. Moreover, our method produces simple parametric forecasting models for interpretable clusters of time series, whereas ARIMA cannot provide such interpretability. Finally, we propose a user-friendly web-based interactive tool for visualizing the clusters produced by the single-step approach and we illustrate the tool by applying it to an air quality dataset (PM2.5 index) collected in different monitoring stations in Taiwan.Keywords: Time series, Clustering, Model-based partitioning tree, Linear regression, ARIMA, Forecasting, Web-based tool, Shiny.