Posts Tagged ‘Data Storage and Management’

Machine learning is still a pipe dream for most organizations, with Gartner estimating that fewer than 15 percent of enterprises successfully get machine learning into production. Even so, companies need to start experimenting now with machine learning so that they can build it into their DNA.

Easy? Not even close, says Ted Dunning, chief application architect at MapR, but “anybody who thinks that they can just buy magic bullets off the shelf has no business” buying machine learning technology in … Read the rest

Apache Kafka is on a roll. Last year it registered a 260 percent jump in developer popularity, as Redmonk’s Fintan Ryan highlights, a number that has only ballooned since then as IoT and other enterprise demands for real-time, streaming data become common. Hatched at LinkedIn, Kafka’s founding engineering team spun out to form Confluent, which has been a primary developer of the Apache project ever since.

But not the only one. Indeed, given the rising importance of … Read the rest

No one doubts that software engineering shapes every last facet of our 21st century existence. Given his vested interest in companies whose fortunes were built on software engineering, it was no surprise when Marc Andreessen declared that “software is eating the world.”

But what does that actually mean, and, just as important, does it still apply, if it ever did? These questions came to me recently when I reread Andreessen’s op-ed piece and noticed that he equated “software” with … Read the rest

Serverless computing may be the hottest thing in cloud computing today, but what, exactly, is it? In this two-part article you’ll get started with serverless computing–from what it is, to why it’s considered disruptive to traditional cloud computing, and how you might find yourself using it in Java-based programming. Following the overview, you’ll get a tutorial introduction to AWS Lambda, which is considered by many the premiere Java-based solution for serverless computing today. In Part 1, you’ll use AWS Lambda … Read the rest

As with all relational databases, MySQL can prove to be a complicated beast, one that can crawl to a halt at a moment’s notice, leaving your applications in the lurch and your business on the line.

The truth is, common mistakes underlie most MySQL performance problems. To ensure your MySQL server hums along at top speed, providing stable and consistent performance, it is important to eliminate these mistakes, which are often obscured by some subtlety in your workload or a … Read the rest

Scaling a relational database isn’t easy. Scaling a relational database out to multiple replicas and regions over a network while maintaining strong consistency, without sacrificing performance, is really hard.

ed choice plumInfoWorld

How hard? The CAP Theorem says that you can only have two of the following three properties: consistency, 100 percent availability, and tolerance to network partitions.

A network partition is a break that blocks all possible paths between some two points on the network. Partitions do happen, even if you … Read the rest

Everyone wants faster database queries, and both SQL developers and DBAs can turn to many time-tested methods to achieve that goal. Unfortunately, no single method is foolproof or ironclad. But even if there is no right answer to tuning every query, there are plenty of proven do’s and don’ts to help light the way. While some are RDBMS-specific, most of these tips apply to any relational database.

Whether you’re coding on SQL Server, Oracle, DB2, Sybase, MySQL, or some other … Read the rest

With version 2.2 of Apache Spark, a long-awaited feature for the multipurpose in-memory data processing framework is now available for production use.

Structured Streaming, as that feature is called, allows Spark to process streams of data in ways that are native to Spark’s batch-based data-handling metaphors. It’s part of Spark’s long-term push to become, if not all things to all people in data science, then at least the best thing for most of them.

Structured Streaming in 2.2 benefits … Read the rest

DynamoDB, a fully-managed NoSQL database, is an impressive piece of technology, and it’s amazing that AWS has opened it for the entire world to use. What took millions of dollars in R&D to build – a product that services millions of queries per second with low latency – can be effectively rented for dollars per hours by anyone with a credit card. For those who need a key-value store that can store massive amounts of data reliably, there aren’t … Read the rest

“The right tool for the right job.” If such wisdom holds true anywhere, it certainly holds true with the choice of database a developer picks for a given application. Document databases, one of the family of data products collectively referred to as “NoSQL,” are for developers who want to focus on their application rather than the database technology.

With a document database, data is not stored in tables with distinct column types. Instead, it’s stored in freeform “documents” with … Read the rest