广西科学

引用本文：

魏星贝,李陶深,许嘉,吕品,杨宁.QJoin：质量驱动的乱序数据流连接处理技术[J].广西科学,2020,27(3):266-275. [点击复制]
WEI Xingbei,LI Taoshen,XU Jia,LV Pin,YANG Ning.QJoin: Quality-driven Join Processing Technique over Out-of-Order Data Streams[J].Guangxi Sciences,2020,27(3):266-275. [点击复制]

【打印本页】【在线阅读全文】【下载PDF全文】【查看/发表评论】【下载PDF阅读器】【关闭】

本文已被：浏览 650次下载 827次	码上扫一扫！
QJoin：质量驱动的乱序数据流连接处理技术
魏星贝¹, 李陶深^1,2, 许嘉^1,2, 吕品^1,2, 杨宁¹
0 字体:加大+\|默认\|缩小-
(1.广西大学计算机与电子信息学院, 广西南宁 530004;2.广西高校并行与分布式计算技术重点实验室, 广西南宁 530004)

摘要:

数据流乱序现象会导致数据流处理结果的丢失，给数据流的分析处理带来了巨大困难。本研究探讨了质量驱动下的乱序数据流连接处理问题，提出一种质量驱动的乱序数据流连接处理技术（QJoin）。QJoin采用缓冲存储技术和对称连接策略，实现并确保对流元组进行即时分析处理，从而降低了流元组处理的平均等待时间。同时，基于质量驱动的理念，根据临近阶段连接处理过程中收集统计的数据，自适应地调整和优化内存缓存区的大小，从而在满足用户结果质量要求的前提下，降低系统内部历史数据的内存缓存量，尽可能保证迟到元组的连接处理完整性。真实数据集上的实验结果表明，与传统的数据流乱序处理技术MP-K-slack相比，QJoin在满足用户结果质量要求的前提下，确保能够即时地分析处理数据流的流元组，显著降低系统的内存开销。

关键词: 质量驱动连接处理乱序数据流存储开销流元组缓存

DOI：10.13656/j.cnki.gxkx.20200707.004

基金项目:国家自然科学基金项目（61402494）和广西自然科学基金面上项目（2019JJA170045）资助。

QJoin: Quality-driven Join Processing Technique over Out-of-Order Data Streams

WEI Xingbei¹, LI Taoshen^1,2, XU Jia^1,2, LV Pin^1,2, YANG Ning¹

(1.School of Computer, Electronics and Information, Guangxi University, Nanning, Guangxi, 530004, China;2.Guangxi Colleges and Universities Key Laboratory of Parallel and Distributed Computing Technology, Nanning, Guangxi, 530004, China)

Abstract:

The out-of-order phenomenon of data streams will cause the missing of data stream processing results, which brings great challenges to the analysis and processing of the data stream. This study explores the problem of quality-driven join processing over out-of-order data streams and proposes a technique named QJoin. QJoin adopts cache storage technique and symmetric join processing strategy to ensure the real-time analysis and processing of each arriving stream tuple, thereby reducing the average waiting time of stream tuple processing. Meanwhile, based on the concept of quality-driven, QJoin collects statistic data during the join processing in the approaching stage and adaptively adjusts the size of the memory cache based on the statistic data, which reduces the amount of memory cache of the system's internal history data and ensures the connection processing integrity of the late tuple as much as possible.The experimental results on the real data set show that compared with the traditional out-of-order data stream processing technique K-slack, on the premise of meeting the user's quality requirements for results, QJoin ensures that stream tuples can analyze and process data streams in real time, significantly reducing the memory overhead of the system.

Key words: quality driven join-processing out-of-order data streams storage consumption stream tuples cache

用微信扫一扫