Documentos. CDE Universitat de València

CoverAn overview of methods for treating selectivity in big data sources
The official statistics community is now seriously considering big data as a significant data source for producing statistics. It holds the potential for providing faster, cheaper, more detailed and completely new types of statistics. However, the use of big data also brings several challenges. One of them is the non-probabilistic character of most sources of big data, as very often, they were not designed to produce statistics. The resulting selectivity bias is therefore a major concern when using big data. This paper presents a statistical approach to big data, searching for a definition meaningful from the statistical point of view and identifying their main statistical characteristics. It then argues that big data sources share many characteristics with Internet opt-in panel surveys and proposes this as a reference to address selectivity and coverage problems in big data. Coverage and the self-selection process are briefly discussed in mobile network data, Twitter, Google Trends and Wikipedia page views data. An overview of methods which can be used to address selectivity and eliminate, or mitigate, bias is then presented, covering both methods applied at individual level, i.e. at the level of the statistical unit, and at domain level, i.e. at the level of the produced statistics. Finally, the applicability of the methods to the several big data sources is briefly discussed and a framework for adjusting selectivity in big data is proposed. [+]

CoverAnalysis of the most recent modelling techniques for big data with particular attention to Bayesian ones

In this report we describe various methods suited for the analysis of linear models with a very large number of explanatory variables, with a special emphasis on Bayesian approaches. We next consider some non-parametric and/or non-linear methods suited for applications with big data, such as random trees, random forests, cluster analysis, deep learning and neural networks. Finally, we survey techniques for summarizing the information in large (possibly sparse) datasets, forecast combination approaches, and techniques for the analysis of large mixed frequency datasets. [+]

CoverFiltering techniques for big data and big data based uncertainty indexes

This work is concerned with the analysis of outliers detection, signal extraction and decomposition techniques related to big data. In the first part, also with the use of a numerical example, we investigate how the presence of outliers in the big unstructured data might affect the aggregated time series. Any outliers must be removed prior to the aggregation and the resulting time series should be checked further for outliers in the lower frequency. In the second part, we explore the issue of seasonality, also continuing the numerical example. Seasonal patterns are not easily identified in the high frequency series but are evident in the aggregated time series. Finally, we construct uncertainty indexes based on Google Trends and compare them to the corresponding Reuters-based indexes, also checking for outliers and seasonal components. [+]

CoverTourism statistics: Early adopters of big data?

This paper, originally prepared for the 6th UNWTO International Conference on Tourism Statistics, gives an overview of the different sources of big data and their potential relevance in compiling tourism statistics. It discusses the opportunities and risks that the use of new sources can create: new or faster data with better geographical granularity; synergies with other areas of statistics sharing the same sources; cost efficiency; user trust; partnerships with organisations holding the data; access to personal data; continuity of access and output; quality control and independence; selectivity bias; alignment with existing concepts and definitions; the need for new skills, and so on.

The global dimension of big data and the transnational nature of companies or networks holding the data call for a discussion in an international context, even though legal and ethical issues often have a strongly local component. [+]

Con el Reglamento sobre la libre circulación de datos no personales la Comisión propone un nuevo principio que elimina los requisitos de localización de los datos al tiempo que garantiza los derechos de acceso a las autoridades competentes con fines de control reglamentario. Junto con la normativa europea sobre la protección de datos personales contenida en el Reglamento general de protección de datos (RGPD), las nuevas medidas crean un espacio común europeo de datos, que es un elemento fundamental de la Estrategia del Mercado Único Digital. (RAPID, MEMO/17/3191, 19.9.2017)

CoverBig data conversion techniques including their main features and characteristics

Big data have high potential for nowcasting and forecasting economic variables. However, they are often unstructured so that there is a need to transform them into a limited number of time series which efficiently summarise the relevant information for nowcasting or short term forecasting the economic indicators of interest. Data structuring and conversion is a difficult task, as the researcher is called to translate the unstructured data and summarise them into a format which is both meaningful and informative for the nowcasting exercise. In this paper we consider techniques to convert unstructured big data to structured time series suitable for nowcasting purposes. We also include several empirical examples which illustrate the potential of big data in economics. Finally, we provide a practical application based on textual data analysis, where we exploit a huge set of about 3 million news articles for the construction of an economic uncertainty indicator. [+]

La Comisión Europea ha propuesto hoy soluciones jurídicas y políticas para impulsar la economía de los datos de la UE, en el contexto de su Estrategia para el Mercado Único Digital presentada en mayo de 2015. La Comisión desea abordar este problema porque actualmente la UE no está aprovechando plenamente las posibilidades que ofrecen los datos. Para corregir esta situación, es necesario combatir las restricciones injustificadas a la libre circulación de datos a través de las fronteras, así como varias incertidumbres jurídicas. La Comunicación presentada hoy esboza soluciones políticas y jurídicas para impulsar la economía de los datos en Europa. La Comisión también ha puesto en marcha dos consultas públicas y un debate con los Estados miembros y las partes interesadas con vistas a definir los próximos pasos. (RAPID, IP/17/5, 10.1.2017)

Página 1 de 2

Esta web utiliza cookies con una finalidad estadistica y para mejorar su navegación