加入收藏 | 设为首页 | 会员中心 | 我要投稿 应用网_阳江站长网 (https://www.0662zz.com/)- 科技、建站、经验、云计算、5G、大数据,站长网!
当前位置: 首页 > 大数据 > 正文

流数据处理的博文

发布时间:2021-01-07 16:31:06 所属栏目:大数据 来源:网络整理
导读:副标题#e# The world beyond batch: Streaming 101 A high-level tour of modern data-processing concepts. By Tyler Akidau August 5,2015 Three women wading in a stream gathering leeches (source: Wellcome Library,London). Editor's note: This is

It’s worth noting: these algorithms typically do have some element of time in their design (e.g.,some sort of built-in decay). And since they process elements as they arrive,that element of time is usually processing-time based. This is particularly important for algorithms that provide some sort of provable error bounds on their approximations. If those error bounds are predicated on data arriving in order,they mean essentially nothing when you feed the algorithm unordered data with varying event-time skew. Something to keep in mind.

Approximation algorithms themselves are a fascinating subject,but as they are essentially another example of time-agnostic processing (modulo the temporal features of the algorithms themselves),they’re quite straightforward to use,and thus not worth further attention given our current focus.

Windowing

The remaining two approaches for unbounded data processing are both variations of windowing. Before diving into the differences between them,I should make it clear exactly what I mean by windowing since I’ve only touched on it briefly. Windowing is simply the notion of taking a data source (either unbounded or bounded),and chopping it up along temporal boundaries into finite chunks for processing. The following diagram shows three different windowing patterns:

Figure 8: Example windowing strategies. Each example is shown for three different keys,highlighting the difference between aligned windows (which apply across all the data) and unaligned windows (which apply across a subset of the data). Image: Tyler Akidau.

  • Fixed windows: Fixed windows slice up time into segments with a fixed-size temporal length. Typically (as in Figure 8),the segments for fixed windows are applied uniformly across the entire data set,which is an example of aligned windows. In some cases,it’s desirable to phase-shift the windows for different subsets of the data (e.g.,per key) to spread window completion load more evenly over time,which instead is an example of unaligned windows since they vary across the data.
  • Sliding windows: A generalization of fixed windows,sliding windows are defined by a fixed length and a fixed period. If the period is less than the length,then the windows overlap. If the period equals the length,you have fixed windows. And if the period is greater than the length,you have a weird sort of sampling window that only looks at subsets of the data over time. As with fixed windows,sliding windows are typically aligned,though may be unaligned as a performance optimization in certain use cases. Note that the sliding windows in the Figure 8 are drawn as they are to give a sense of sliding motion; in reality,all five windows would apply across the entire data set.
  • Sessions: An example of dynamic windows,sessions are composed of sequences of events terminated by a gap of inactivity greater than some timeout. Sessions are commonly used for analyzing user behavior over time,by grouping together a series of temporally-related events (e.g.,a sequence of videos viewed in one sitting). Sessions are interesting because their lengths cannot be defined a priori; they are dependent upon the actual data involved. They’re also the canonical example of unaligned windows since sessions are practically never identical across different subsets of data (e.g.,different users).

The two domains of time discussed — processing time and event time — are essentially the two we care about[2]. Windowing makes sense in both domains,so we’ll look at each in detail and see how they differ. Since processing time windowing is vastly more common in existing systems,I’ll start there.

Windowing by processing time

Figure 9: Windowing into fixed windows by processing time. Data are collected into windows based on the order they arrive in the pipeline. Image: Tyler Akidau.

(编辑:应用网_阳江站长网)

【声明】本站内容均来自网络,其相关言论仅代表作者个人观点,不代表本站立场。若无意侵犯到您的权利,请及时与联系站长删除相关内容!

热点阅读