Archive for the ‘Big Data’ Category

If you looked at TensorFlow as a deep learning framework last year and decided that it was too hard or too immature to use, it might be time to give it another look.

Since I reviewed TensorFlow r0.10 in October 2016, Google’s open source framework for deep learning has become more mature, implemented more algorithms and deployment options, and become easier to program. TensorFlow is now up to version r1.4.1 (stable version and web documentation), r1.5 (release candidate), and pre-release … Read the rest

Modern ethos is that all data is valuable, should be stored forever, and that machine learning will one day magically find the value of it. You’ve probably seen that EMC picture about how there will be 44 zettabytes of data by 2020? Remember how everyone had Fitbits and Jawbone Ups for about a minute? Now Jawbone is out of business. Have you considered this “all data is valuable” fad might be the corporate equivalent? Maybe we shouldn’t take a … Read the rest

From its humble beginnings in the AMPLab at U.C. Berkeley in 2009, Apache Spark has become one of the key big data distributed processing frameworks in the world. Spark can be deployed in a variety of ways, provides native bindings for the Java, Scala, Python, and R programming languages, and supports SQL, streaming data, machine learning, and graph processing. You’ll find it used by banks, telecommunications companies, games companies, governments, and all of the major tech giants such as Apple, … Read the rest

Every day human beings eat, sleep, work, play, and produce data—lots and lots of data. According to IBM, the human race generates 2.5 quintillion (25 billion billion) bytes of data every day. That’s the equivalent of a stack of DVDs reaching to the moon and back, and encompasses everything from the texts we send and photos we upload to industrial sensor metrics and machine-to-machine communications.

That’s a big reason why “big data” has become such a common catch phrase. … Read the rest

H2O, now in its third major revision, provides access to machine learning algorithms by way of common development environments (Python, Java, Scala, R), big data systems (Hadoop, Spark), and data sources (HDFS, S3, SQL, NoSQL). H2O is meant to be used as an end-to-end solution for gathering data, building models, and serving predictions. For instance, models can be exported as Java code, allowing predictions to be served on many platforms and in many environments.

H2O can work as a … Read the rest

You’ve probably encountered the term “machine learning” more than a few times lately. Often used interchangeably with artificial intelligence, machine learning is in fact a subset of AI, both of which can trace their roots to MIT in the late 1950s.

Machine learning is something you probably encounter every day, whether you know it or not. The Siri and Alexa voice assistants, Facebook’s and Microsoft’s facial recognition, Amazon and Netflix recommendations, the technology that keeps self-driving cars from crashing into … Read the rest

Big data and analytics initiatives can be game-changing, giving you insights to help blow past the competition, generate new revenue sources, and better serve customers.

Big data and analytics initiatives can also be colossal failures, resulting in lots of wasted money and time—not to mention the loss of talented technology professionals who become fed up at frustrating management blunders.

How can you avoid big data failures? Some of the best practices are the obvious ones from a basic business management … Read the rest

No one doubts that software engineering shapes every last facet of our 21st century existence. Given his vested interest in companies whose fortunes were built on software engineering, it was no surprise when Marc Andreessen declared that “software is eating the world.”

But what does that actually mean, and, just as important, does it still apply, if it ever did? These questions came to me recently when I reread Andreessen’s op-ed piece and noticed that he equated “software” with … Read the rest

With version 2.2 of Apache Spark, a long-awaited feature for the multipurpose in-memory data processing framework is now available for production use.

Structured Streaming, as that feature is called, allows Spark to process streams of data in ways that are native to Spark’s batch-based data-handling metaphors. It’s part of Spark’s long-term push to become, if not all things to all people in data science, then at least the best thing for most of them.

Structured Streaming in 2.2 benefits … Read the rest

MySQL is a bit of an attention hog. With relational databases supposedly put on deathwatch by NoSQL, MySQL should have been edging gracefully to the exit by now (or not so gracefully, like IBM’s DB2).

Instead, MySQL remains neck-and-neck with Oracle in the database popularity contest, despite nearly two decades less time in the market. More impressive still, while Oracle’s popularity keeps falling, MySQL is holding steady. Why?

Read the rest