Archive for the ‘Data Storage and Management’ Category

From its humble beginnings in the AMPLab at U.C. Berkeley in 2009, Apache Spark has become one of the key big data distributed processing frameworks in the world. Spark can be deployed in a variety of ways, provides native bindings for the Java, Scala, Python, and R programming languages, and supports SQL, streaming data, machine learning, and graph processing. You’ll find it used by banks, telecommunications companies, games companies, governments, and all of the major tech giants such as Apple, … Read the rest

The Apache Foundation has added a new machine learning project to its roster, Apache PredictionIO, an open-sourced version of a project originally devised by a subsidiary of Salesforce.

What PredictionIO does for machine learning and Spark

Apache PredictionIO is built atop Spark and Hadoop, and serves Spark-powered predictions from data using customizable templates for common tasks. Apps send data to PredictionIO’s event server to train a model, then query the engine for predictions based on the model.

Spark, MLlib, … Read the rest

What is a database? Once upon a time, it was simple. The database was a modern Bob Cratchit putting data in tables made up of very straight columns filled with one row per entry. Long, endless rectangles of information stretching on into the future.

The relational database has been the bedrock of modern computing. The vast majority of websites are just a bunch of CSS lipstick painted on top of SQL. Everything that makes us special is just another row … Read the rest

Most any application needs some form of persistence—a way to store the data outside of the application for safekeeping. The most basic way is to write data to the file system, but that can quickly become a slow and unwieldy way to solve the problem. A full-blown database provides a powerful way to index and retrieve data, but may also be overkill. Sometimes all you need is a quick way to take a freeform piece of information, associate it with … Read the rest

Today, Structured Query Language is the standard means of manipulating and querying data in relational databases, though with proprietary extensions among the products. The ease and ubiquity of SQL have even led the creators of many “NoSQL” or non-relational data stores, such as Hadoop, to adopt subsets of SQL or come up with their own SQL-like query languages.

But SQL wasn’t always the “universal” language for relational databases. From the beginning (circa 1980), SQL had certain strikes … Read the rest

Big data and analytics initiatives can be game-changing, giving you insights to help blow past the competition, generate new revenue sources, and better serve customers.

Big data and analytics initiatives can also be colossal failures, resulting in lots of wasted money and time—not to mention the loss of talented technology professionals who become fed up at frustrating management blunders.

How can you avoid big data failures? Some of the best practices are the obvious ones from a basic business management … Read the rest

Machine learning is still a pipe dream for most organizations, with Gartner estimating that fewer than 15 percent of enterprises successfully get machine learning into production. Even so, companies need to start experimenting now with machine learning so that they can build it into their DNA.

Easy? Not even close, says Ted Dunning, chief application architect at MapR, but “anybody who thinks that they can just buy magic bullets off the shelf has no business” buying machine learning technology in … Read the rest

Apache Kafka is on a roll. Last year it registered a 260 percent jump in developer popularity, as Redmonk’s Fintan Ryan highlights, a number that has only ballooned since then as IoT and other enterprise demands for real-time, streaming data become common. Hatched at LinkedIn, Kafka’s founding engineering team spun out to form Confluent, which has been a primary developer of the Apache project ever since.

But not the only one. Indeed, given the rising importance of … Read the rest

No one doubts that software engineering shapes every last facet of our 21st century existence. Given his vested interest in companies whose fortunes were built on software engineering, it was no surprise when Marc Andreessen declared that “software is eating the world.”

But what does that actually mean, and, just as important, does it still apply, if it ever did? These questions came to me recently when I reread Andreessen’s op-ed piece and noticed that he equated “software” with … Read the rest

Serverless computing may be the hottest thing in cloud computing today, but what, exactly, is it? In this two-part article you’ll get started with serverless computing–from what it is, to why it’s considered disruptive to traditional cloud computing, and how you might find yourself using it in Java-based programming. Following the overview, you’ll get a tutorial introduction to AWS Lambda, which is considered by many the premiere Java-based solution for serverless computing today. In Part 1, you’ll use AWS Lambda … Read the rest