Parallel query processing in a polystore

Authors:

Pavlos Kranas, Boyan Kolev, Oleksandra Levchenko, Esther Pacitti, Patrick Valduriez, Ricardo Jiménez-Peris & Marta Patiño-Martinez

Abstract

The blooming of different data stores has made polystores a major topic in the cloud and big data landscape. As the amount of data grows rapidly, it becomes critical to exploit the inherent parallel processing capabilities of underlying data stores and data processing platforms. To fully achieve this, a polystore should: (i) preserve the expressivity of each data store’s native query or scripting language and (ii) leverage a distributed architecture to enable parallel data integration, i.e. joins, on top of parallel retrieval of underlying partitioned datasets. In this paper, we address these points by: (i) using the polyglot approach of the CloudMdsQL query language that allows native queries to be expressed as inline scripts and combined with SQL statements for ad-hoc integration and (ii) incorporating the approach within the LeanXcale distributed query engine, thus allowing for native scripts to be processed in parallel at data store shards. In addition, (iii) efficient optimization techniques, such as bind join, can take place to improve the performance of selective joins. We evaluate the performance benefits of exploiting parallelism in combination with high expressivity and optimization through our experimental validation.

Published in: Springer Distributed and Parallel Databases. An International Journal of Data Science, Engineering, and Management

Read the full article here: https://link.springer.com/article/10.1007/s10619-021-07322-5

Photo by Anton Maksimov juvnsky on Unsplash

5164 reads

News

Sofia Co-creation 3rd Workshop: Post Event Summary

The co-creation workshop in Sofia was held on October 26, 2022. The participants had the opportunity to see how to enter the Policy Development Toolkit (PDT) where they can choose from a wide range of models that are already prepared and uploaded.

Policy Cloud as part of the Data Driven Policy Cluster presents parallel session

The Data Driven Policy Cluster will be presenting a parallel session ‘Data usage improvin

PolicyCloud’s White paper: Towards standardisation in data driven policymaking

Policy Cloud has released the White Paper “Cloud for Data Driven Policy Management”, providing recommendations on research and innovation action towards interoperable data driven policymaking.

Events

Data for Policy 2022

13 December 2022 - 09:00

Vrije University Belgium