July 13th, 2017

Apache Spark 2.2 gets streaming, R language boosts

Big Data, Data Analytics, Data Storage and Management, Open Source, others, Programing, by admin.

With version 2.2 of Apache Spark, a long-awaited feature for the multipurpose in-memory data processing framework is now available for production use.

Structured Streaming, as that feature is called, allows Spark to process streams of data in ways that are native to Spark’s batch-based data-handling metaphors. It’s part of Spark’s long-term push to become, if not all things to all people in data science, then at least the best thing for most of them.

Structured Streaming in 2.2 benefits from a number of other changes aside from losing its experimental designation. It can now work as a source or a sink for data coming from or being written to an Apache Kafka source, with lower latency for Kafka connections than previously.

Kafka, itself an Apache Software Foundation project, is a distributed messaging bus widely used in streaming applications. Kafka has typically been paired with another stream-processing framework, Apache Storm, but Storm is limited to stream processing only, and Spark presents less complex APIs to the developer.

