Archive for the ‘Data Analytics’ Category

Every day human beings eat, sleep, work, play, and produce data—lots and lots of data. According to IBM, the human race generates 2.5 quintillion (25 billion billion) bytes of data every day. That’s the equivalent of a stack of DVDs reaching to the moon and back, and encompasses everything from the texts we send and photos we upload to industrial sensor metrics and machine-to-machine communications.

That’s a big reason why “big data” has become such a common catch phrase. … Read the rest

You’ve probably encountered the term “machine learning” more than a few times lately. Often used interchangeably with artificial intelligence, machine learning is in fact a subset of AI, both of which can trace their roots to MIT in the late 1950s.

Machine learning is something you probably encounter every day, whether you know it or not. The Siri and Alexa voice assistants, Facebook’s and Microsoft’s facial recognition, Amazon and Netflix recommendations, the technology that keeps self-driving cars from crashing into … Read the rest

Big data and analytics initiatives can be game-changing, giving you insights to help blow past the competition, generate new revenue sources, and better serve customers.

Big data and analytics initiatives can also be colossal failures, resulting in lots of wasted money and time—not to mention the loss of talented technology professionals who become fed up at frustrating management blunders.

How can you avoid big data failures? Some of the best practices are the obvious ones from a basic business management … Read the rest

There’s now a JavaScript library for executing neural networks inside a webpage, using the hardware-accelerated graphics API available in modern web browsers.

Developed by a team of MIT graduate students, TensorFire can run TensorFlow-style machine learning models on any GPU, without requiring the GPU-specific middleware typically needed by machine learning libraries such as Keras-js.

TensorFire is another step towards making machine learning available to the broadest possible audience, using hardware and software people are already likely to possess, and … Read the rest

No one doubts that software engineering shapes every last facet of our 21st century existence. Given his vested interest in companies whose fortunes were built on software engineering, it was no surprise when Marc Andreessen declared that “software is eating the world.”

But what does that actually mean, and, just as important, does it still apply, if it ever did? These questions came to me recently when I reread Andreessen’s op-ed piece and noticed that he equated “software” with … Read the rest

With version 2.2 of Apache Spark, a long-awaited feature for the multipurpose in-memory data processing framework is now available for production use.

Structured Streaming, as that feature is called, allows Spark to process streams of data in ways that are native to Spark’s batch-based data-handling metaphors. It’s part of Spark’s long-term push to become, if not all things to all people in data science, then at least the best thing for most of them.

Structured Streaming in 2.2 benefits … Read the rest

If there is one subset of machine learning that spurs the most excitement, that seems most like the intelligence in artificial intelligence, it’s deep learning. Deep learning frameworks—aka deep neural networks—power complex pattern-recognition systems that provide everything from automated language translation to image identification.

Deep learning holds enormous promise for analyzing unstructured data. There are just three problems: It’s hard to do, it requires large amounts of data, and it uses lots of processing power. Naturally, great minds are at … Read the rest

It’s tempting to think of machine learning as a magic black box. In goes the data; out come predictions. But there’s no magic in there—just data and algorithms, and models created by processing the data through the algorithms.

If you’re in the business of deriving actionable insights from data through machine learning, it helps for the process not to be a black box. The more you know what’s inside the box, the better you’ll understand every step of the process … Read the rest

An aggregate in mathematics is defined as a “collective amount, sum, or mass arrived at by adding or putting together all components, elements, or parts of an assemblage or group without implying that the resulting total is whole.” While there are many uses for aggregation in data science–examples include log aggregation, spatial aggregation, and network aggregation–it always pertains to some form of summation or collection. In this article, we’ll look at the mechanics of aggregation in Apache Spark, a top-level … Read the rest

The more cores you can use, the better — especially with big data. But the easier a big data framework is to work with, the harder it is for the resulting pipelines, such as TensorFlow plus Apache Spark, to run in parallel as a single unit.

Researchers from MIT CSAIL, the home of envelope-pushing big data acceleration projects like Milk and Tapir, have paired with the Stanford InfoLab to create a possible solution. Written in the Rust language, WeldRead the rest