Posts Tagged ‘Data Analytics’

With version 2.2 of Apache Spark, a long-awaited feature for the multipurpose in-memory data processing framework is now available for production use.

Structured Streaming, as that feature is called, allows Spark to process streams of data in ways that are native to Spark’s batch-based data-handling metaphors. It’s part of Spark’s long-term push to become, if not all things to all people in data science, then at least the best thing for most of them.

Structured Streaming in 2.2 benefits … Read the rest

If there is one subset of machine learning that spurs the most excitement, that seems most like the intelligence in artificial intelligence, it’s deep learning. Deep learning frameworks—aka deep neural networks—power complex pattern-recognition systems that provide everything from automated language translation to image identification.

Deep learning holds enormous promise for analyzing unstructured data. There are just three problems: It’s hard to do, it requires large amounts of data, and it uses lots of processing power. Naturally, great minds are at … Read the rest

It’s tempting to think of machine learning as a magic black box. In goes the data; out come predictions. But there’s no magic in there—just data and algorithms, and models created by processing the data through the algorithms.

If you’re in the business of deriving actionable insights from data through machine learning, it helps for the process not to be a black box. The more you know what’s inside the box, the better you’ll understand every step of the process … Read the rest

An aggregate in mathematics is defined as a “collective amount, sum, or mass arrived at by adding or putting together all components, elements, or parts of an assemblage or group without implying that the resulting total is whole.” While there are many uses for aggregation in data science–examples include log aggregation, spatial aggregation, and network aggregation–it always pertains to some form of summation or collection. In this article, we’ll look at the mechanics of aggregation in Apache Spark, a top-level … Read the rest

The more cores you can use, the better — especially with big data. But the easier a big data framework is to work with, the harder it is for the resulting pipelines, such as TensorFlow plus Apache Spark, to run in parallel as a single unit.

Researchers from MIT CSAIL, the home of envelope-pushing big data acceleration projects like Milk and Tapir, have paired with the Stanford InfoLab to create a possible solution. Written in the Rust language, WeldRead the rest

Perhaps the most positive technical theme of 2016 was the long-delayed triumph of artificial intelligence, machine learning, and in particular deep learning. In this article we’ll discuss what that means and how you might make use of deep learning yourself.

Perhaps you noticed in the fall of 2016 that Google Translate suddenly went from producing, on the average, word salad with a vague connection to the original language to emitting polished, coherent sentences more often than not — at least … Read the rest

Imagine if the files, processes, and events in your entire network of Windows, MacOS, and Linux endpoints were recorded in a database in real time. Finding malicious processes, software vulnerabilities, and other evil artifacts would be as easy as asking the database. That’s the power of OSquery, a Facebook open source project that makes sifting through system and process information to uncover security issues as simple as writing a SQL query.

Facebook ported OSquery to Windows in 2016, finally … Read the rest

In the beginning, life in the cloud was simple. Type in your credit card number and—voilà—you had root on a machine you didn’t have to unpack, plug in, or bolt into a rack.

That has changed drastically. The cloud has grown so complex and multifunctional that it’s hard to jam all the activity into one word, even a word as protean and unstructured as “cloud.” There are still root logins on machines to rent, but there are also services for … Read the rest

Artificial intelligence is affecting everything from automobiles to health care to home automation and even sports. It’s also going to have a measurable impact on software development, with developers becoming more like data scientists, an AI official with Nvidia believes.

AI and deep learning will mean changes in how software is written, said Jim McHugh, vice president and general manager for Nvidia’s DGX-1 supercomputer, which is used in deep learning and accelerated analytics. The long-standing paradigm of developers spending months … Read the rest