流数据处理的博文

发布时间：2021-01-07 16:31:06 所属栏目：大数据来源：网络整理

导读：副标题#e# The world beyond batch: Streaming 101 A high-level tour of modern data-processing concepts. By Tyler Akidau August 5,2015 Three women wading in a stream gathering leeches (source: Wellcome Library,London). Editor's note: This is

[1] One which I propose is not an inherent limitation of streaming systems,but simply a consequence of design choices made in most streaming systems thus far. The efficiency delta between batch and streaming is largely the result of the increased bundling and more efficient shuffle transports found in batch systems. Modern batch systems go to great lengths to implement sophisticated optimizations that allow for remarkable levels of throughput using surprisingly modest compute resources. There’s no reason the types of clever insights that make batch systems the efficiency heavyweights they are today couldn’t be incorporated into a system designed for unbounded data,providing users flexible choice between what we typically consider to be high-latency,higher-efficiency “batch” processing and low-latency,lower-efficiency “streaming” processing. This is effectively what we’ve done with Cloud Dataflow by providing both batch and streaming runners under the same unified model. In our case,we use separate runners because we happen to have two independently designed systems optimized for their specific use cases. Long-term,from an engineering perspective,I’d love to see us merge the two into a single system which incorporates the best parts of both,while still maintaining the flexibility of choosing an appropriate efficiency level. But that’s not what we have today. And honestly,thanks to the unified Dataflow Model,it’s not even strictly necessary; so it may well never happen. (Return)

（编辑：应用网_阳江站长网）

【声明】本站内容均来自网络，其相关言论仅代表作者个人观点，不代表本站立场。若无意侵犯到您的权利，请及时与联系站长删除相关内容!

18/19

首页

尾页

绕过使用大数据的保护	用Elastic Block Stor
技术迷途者指南我有问	转向未来的AI自动化测