广西科学

引用本文：

杨宁,许嘉,吕品,李陶深.基于混合处理模型的乱序数据流分布式聚合查询处理技术[J].广西科学,2019,26(4):398-404. [点击复制]
YANG Ning,XU Jia,LV Pin,LI Taoshen.Distributed Aggregation Query Processing Technology for Out-of-order Data Streams Based on Hybrid Processing Model[J].Guangxi Sciences,2019,26(4):398-404. [点击复制]

【打印本页】【在线阅读全文】【下载PDF全文】【查看/发表评论】【下载PDF阅读器】【关闭】

本文已被：浏览 813次下载 655次	码上扫一扫！
基于混合处理模型的乱序数据流分布式聚合查询处理技术
杨宁¹, 许嘉^1,2, 吕品^1,2, 李陶深^1,2,3
0 字体:加大+\|默认\|缩小-
(1.广西大学计算机与电子信息学院, 广西南宁 530004;2.广西高校并行与分布式计算技术重点实验室, 广西南宁 530004;3.南宁学院, 广西南宁 530200)

摘要:

为了解决现有的乱序数据流聚合查询处理技术不能在降低查询处理延迟，同时保障聚合查询结果的最终正确性的局限性问题，本研究设计了混合嵌入分布式流处理模块和分布式批处理模块的乱序数据流分布式聚合查询处理技术。该技术一方面基于用户给定的结果质量，限制自适应地优化流处理模块所用的缓冲区大小，从而尽可能降低流处理的查询处理延迟；另一方面基于备份于分布式数据存储系统的历史流数据，并以批处理的方式实现对极其晚到流元组的查询处理，从而保障聚合查询结果的最终正确性。基于真实的乱序数据流数据集对该技术进行测试分析表明：该技术在平均查询处理时延、查询结果精度和系统可扩展性方面，比目前最好的基于缓存的乱序数据流处理技术均具有显著优势。

关键词: 乱序数据流混合处理模型聚合查询分布式查询处理

DOI：10.13656/j.cnki.gxkx.20190808.010

基金项目:“广西八桂学者”专项经费，广西高等教育本科教学改革工程项目重点项目（2017JGZ10），广西大学科研基金项目（XGZ141182，XGZ150322）和广西研究生教育创新计划项目（YCSW2018036）资助。

Distributed Aggregation Query Processing Technology for Out-of-order Data Streams Based on Hybrid Processing Model

YANG Ning¹, XU Jia^1,2, LV Pin^1,2, LI Taoshen^1,2,3

(1.School of Computer, Electronics and Information, Guangxi University, Nanning, Guangxi, 530004, China;2.Guangxi Colleges and Universities Key Laboratory of Parallel and Distributed Computing Technology, Nanning, Guangxi, 530004, China;3.Nanning University, Nanning, Guangxi, 530200, China)

Abstract:

The existing out-of-order data stream aggregation query processing techniques cannot guarantee the final correctness of the aggregated query result while reducing the query processing delay. In order to solve this limitation, this paper designs a distributed aggregation query processing technique for out-of-order data streams based on both of the distributed streaming processing model and the distributed batch processing model. The proposed technique on one hand optimizes the buffer sizes used by the distributed streaming processing model based on a user-given constraint on query result quality, thereby minimizing the query processing delay of the stream processing as much as possible. And on the other hand, based on the historical stream data backed up in the distributed data storage system and in batch processing mode, the query processing of the extremely late tuples is realized, so as to ensure the final precision of the aggregated query results. The test analysis based on the real out-of-order data stream dataset shows that compared with the current best cache-based out-of-order data stream processing technique, the proposed technique has significant advantages in average query processing delay, query result precision and system scalability.

Key words: out-of-order data streams hybrid processing model aggregated queries distributed query processing

用微信扫一扫