Archive for the ‘Data Storage and Management’ Category

Serverless computing may be the hottest thing in cloud computing today, but what, exactly, is it? This two-part tutorial starts with an overview of serverless computing–from what it is, to why it’s considered disruptive to traditional cloud computing, and how you might use it in Java-based programming.

Following the overview, you’ll get a hands-on introduction to AWS Lambda, which is considered by many the premiere Java-based solution for serverless computing today. In Part 1, you’ll use AWS Lambda to build, … Read the rest

Until very recently, when you shopped for a database you had to choose: Scalability or consistency? SQL databases such as MySQL guarantee strong consistency, but don’t scale well horizontally. (Manual sharding for scalability is no one’s idea of fun.) NoSQL databases such as MongoDB scale beautifully, but offer only eventual consistency. (“Wait long enough, and you can read the right answer”—which isn’t any way to do financial transactions.)

Google Cloud Spanner, a fully managed relational database service running on … Read the rest

Modern ethos is that all data is valuable, should be stored forever, and that machine learning will one day magically find the value of it. You’ve probably seen that EMC picture about how there will be 44 zettabytes of data by 2020? Remember how everyone had Fitbits and Jawbone Ups for about a minute? Now Jawbone is out of business. Have you considered this “all data is valuable” fad might be the corporate equivalent? Maybe we shouldn’t take a … Read the rest

PostgreSQL (aka Postgres) is old as dirt, yet over the past five years it has panned out as pure gold. MongoDB got the billion-dollar IPO and AWS launched the mind-bendingly cool Aurora Serverless, but it’s PostgreSQL that keeps having its moment—again and again and again.

Now the world’s fourth most popular database, according to DB-Engines’ multicomponent ranking, PostgreSQL has a ways to go before it surpasses Oracle, MySQL, and Microsoft SQL Server. Yet at its current pace, … Read the rest

From its humble beginnings in the AMPLab at U.C. Berkeley in 2009, Apache Spark has become one of the key big data distributed processing frameworks in the world. Spark can be deployed in a variety of ways, provides native bindings for the Java, Scala, Python, and R programming languages, and supports SQL, streaming data, machine learning, and graph processing. You’ll find it used by banks, telecommunications companies, games companies, governments, and all of the major tech giants such as Apple, … Read the rest

The Apache Foundation has added a new machine learning project to its roster, Apache PredictionIO, an open-sourced version of a project originally devised by a subsidiary of Salesforce.

What PredictionIO does for machine learning and Spark

Apache PredictionIO is built atop Spark and Hadoop, and serves Spark-powered predictions from data using customizable templates for common tasks. Apps send data to PredictionIO’s event server to train a model, then query the engine for predictions based on the model.

Spark, MLlib, … Read the rest

What is a database? Once upon a time, it was simple. The database was a modern Bob Cratchit putting data in tables made up of very straight columns filled with one row per entry. Long, endless rectangles of information stretching on into the future.

The relational database has been the bedrock of modern computing. The vast majority of websites are just a bunch of CSS lipstick painted on top of SQL. Everything that makes us special is just another row … Read the rest

Most any application needs some form of persistence—a way to store the data outside of the application for safekeeping. The most basic way is to write data to the file system, but that can quickly become a slow and unwieldy way to solve the problem. A full-blown database provides a powerful way to index and retrieve data, but may also be overkill. Sometimes all you need is a quick way to take a freeform piece of information, associate it with … Read the rest

Today, Structured Query Language is the standard means of manipulating and querying data in relational databases, though with proprietary extensions among the products. The ease and ubiquity of SQL have even led the creators of many “NoSQL” or non-relational data stores, such as Hadoop, to adopt subsets of SQL or come up with their own SQL-like query languages.

But SQL wasn’t always the “universal” language for relational databases. From the beginning (circa 1980), SQL had certain strikes … Read the rest

Big data and analytics initiatives can be game-changing, giving you insights to help blow past the competition, generate new revenue sources, and better serve customers.

Big data and analytics initiatives can also be colossal failures, resulting in lots of wasted money and time—not to mention the loss of talented technology professionals who become fed up at frustrating management blunders.

How can you avoid big data failures? Some of the best practices are the obvious ones from a basic business management … Read the rest