Forecasting European Union CPI Figures by Caoimhín Ó Ceallaigh

Forecasting European Union CPI Figures by Caoimhín Ó Ceallaigh 

Ebook Introduction


From a young age I have learned that space, place and environment are paramount to our understanding of the world around us. As the son of a farmer along the northern Irish border, the comprehending of spaces and places directly in touch with us is key for annual events like that of selecting the best land to harvest, to battle floods in harsh winters and to move herds of livestock in warm summers. As an athlete I have learned that the better prepared and exposed you are to an environment, the more advantageous it is over your opponent.

On my first day in Maynooth University as an Undergraduate Student, Professor Gerry Kearns stated that it would be these three elements of Space, Place and Environment that our geographical learning would revolve around. It was this desire to refine the knowledge and understanding of these elements that motivated my academic career not only as an undergraduate but also as a postgraduate student and subsequently lead to the creation of this literature.

When reading the works of Yi Fu Tuan and David Harvey from the 1960’s even though both Geographers studied different types of Geography they both suggest that in order to be able to define what space is we as geographers must interpret its elements with a framework of thought developed from quantifiable data. With this philosophy, I too, wanted to interpret space with a structure of thought developed from data that was quantifiable, but I desired to push the limits of this quantifiable data. As a geographer I feel that it is our duty to push our research into new paradigms and to consult different disciplines to help analyse large amounts of data efficiently to keep pace with a fast moving world where data is constantly being recorded and the human being is now defined as a user who can zoom to their exact location on an interactive map and carry out basic GIS functions on a handheld device. Not only is this world moving at a fast pace but its development also is changing rapidly. In the past it was from academic study and research that spatial analysis was advanced but now we see large multinational corporations invest heavily in geospatial analysis tools in order to “get to know” their customer base better.

From experiencing these changes I became motivated in plotting and visualising a geography that had never been tackled. As a post-graduate student in the National Centre for Geocomputation under the supervision of Martin Charlton and Professor Chris Brunsdon I began to develop a passion for spatial statistics when exposed to complex models such as Geographical Weight Regression and Principal Component. The models came in the form of packages coded in the programming language R. With its highly extensive statistical models and graphical techniques which are all open source and freely available to the public, R allows research to grow and flourish. The data that I would insert into the Rstudio compiler came in the form of the European Harmonised Consumer Price Index. This type of index as Baskin states “Is the centre piece in almost every economic issue” and is a measure of economic inflation, which is the change in price of goods over time. Thanks to the rOpenGov project governmental data can easily be accessed and downloaded from the European Union statistic agency Eurostat.

Saved in the database of the Eurostat agency fourteen million rows recording over twenty years of monthly inflation of EU member states became available to me. This data is presented to the public on the European Central Bank Website on an interactive map. I posed the question can this data be processed to have more value in the wider world, can this quantifiable data define a new space? To try and answer these questions I wanted to forecast each category of inflation for each nation, these predictions would be small term only one month ahead. From reading numerous sources of research this would be difficult as inflation rates can be quite volatile. In order to produce forecasts, I needed a model which would repeatedly scan through each country’s time series for the 13 categories of inflation, 146 in total. The most appealing model was the Autoregressive Integrated Moving Average model (ARIMA) proposed by Box and Jenkins in 1973. What made this model appealing was that it would predict a value in a response to a time series as a linear combination of its own past values, past errors and current and past values of other time series” (The ARIMA Procedure, 2014) Using the “Forecast” package in R created by Rob J. Hyndman the model can be automated to forecast large numbers of univariate time series.

When the results of the forecasts were plotted with the real values of the month which was forecasted which was June 2016 a large amount were relatively accurate and did begin to define a space and a geography that had never been seen before. But this research did not just end with a simple output from a statistical model. I wanted to make the difference of allowing those who did not have a deep knowledge of statistics or computer science to be able to view if the prices of goods that they would need would increase in the incoming month. Thus, giving users the ability to prepare for the future by analysing the past events in order to help understand the economic spaces around them. But this had to be displayed in a visually efficient and accessible manner. This came in the form of R’s new extension Shiny, this package provides a framework for building a web application without requiring a deep knowledge of HTML, CSS or JavaScript. As a result, I developed the Inflex web application. This application consisted of an interactive map which displayed the predictions for all EU states’ inflation for the incoming month.

But the ARIMA model, even though accurate in some respects, did fail when it came to data that was non-stationary. These time series contained properties such as variance, autocorrelation and mean which were not constant over time, this is due to dramatic changes in inflation rates. When the residuals or the difference between the predicted value and the real observed value of these outputs were plotted a story began to be told, for example due to sudden increases in Malta’s hotel and restaurant category an unusual residual was produced. When plotted the start and end months of the Maltese tourism seasons begin to become apparent. The residuals of the non-stationary forecasts began to display annual festivals and public holidays across Europe allowing these places began become visible among large amounts of quantifiable data. This raised the question how can a space be predicted if the data set is none stationary? When researching an alternative to the ARIMA model the most proficient was the Artificial Neural Network model which works cognitively like the human neural network system to produce a predicted result. But in a world where datasets are larger and more complex than ever before how can a dataset be defined as stationary or non-stationary quickly? As Zhang states when modelling time series from the real world, the data is rarely linear or nonlinear.

It often possesses both types, therefore an ARIMA nor Artificial Neural Network can never fully model a time series dataset, as an ARIMA model cannot comprehend a relationship which is nonlinear while a Neural Network model does not have the ability to equally handle both types of patterns. But if combined both models could work together to differentiate the complex structures that are correlated within a dataset and find the best method for a selected problem while at the same time capturing the different patterns within the data to improve the accuracy of the forecasts. By identifying this opportunity for a combination of both models I aim to pursue to creation of a hybrid model which can identify the linear and nonlinear traits within a large dataset to improve the accuracy of forecasts over single model forecasting.

What can be seen from the story behind this research is that it is not just a simple coding study where a post graduate student took a large dataset processed it, compiled it and displayed the result, instead this research is a path of discovery across numerous fields of geostatisics, economics and computer science by using a model which was never applied in a spatial manner to process data that was economic to predict a geography that was never plotted before with the objective to help users in their busy lives prepare as best they can for the near future.

Personally I enjoyed every minute of this research. The deeper I reached into the world of geostatistics the more passionate I became. I would like to thank my family for the constant support throughout my academic studies, I would like to thank everyone at the National Centre of Geocomputation for creating a wonderful and positive learning environment and finally, I would like to thank my publisher and good friend Dr. Niall Mc Elwee at The Book Hub Publishing Group for all the help during my undergraduate, postgraduate and (hopefully!) future academic career and for his foresight in the development of his Clinic Series Encouraging New Writing.

*Caoimhin’s first ebook will be available from Book Hub Publishing on April 2nd 2017 and will be placed on Amazon and Book Hub Publishing websites.