In today’s blog, I will be introducing you to a new open-source distributed SQL query engine, Presto. It is designed for running SQL queries over Big Data (petabytes of data). It was designed by the people at Facebook. Quoting its formal definition:
“Presto is an open-source distributed SQL query engine for running interactive analytic queries against data sources of all sizes ranging from gigabytes to petabytes.”
Th folks at Facebook are at it again. They build a SQL engine especially for analytical work, this is not an online transaction processing (OLTP) engine. It’s an engine for ad-hoc queries across SQL/NoSQL databases distributed all over the place.
They use connectors for MySQL, Hadoop/Hive, MongoDB, Postgres and more. Missing are some of standards like Microsoft SQL and Teradata. However, this won’t be the story for long.
Presto is in its open source newness but you should take a look at the documentation to really appreciate the power of this new thing.
via An Introduction to Presto — DZone Big Data Zone
Check out the champion blog at HTTP://GeekMustHave.COM
This a new channel and the Geek needs your help, please click on the subscribe button,
watch the videos and click on the like button, leave comments and questions.
The Geek is busy learning and building stuff, so don’t be upset if the response isn’t immediate.
Thank you and now ….“Let’s build something…”
Big Data in Healthcare Made Simple – DZone Big Data Knowing how to use big data to improve patient care is beneficial for those working in the healthcare industry. Big data is valuable to the healthcare industry in dozens of ways. Physicians can use specific data about their patients taking a type of medication and their reaction to the medicine. Data can also be used to determine high-risk groups based upon common factors. Knowing how to use big data to improve patient care is…
Read on to learn more.
via Big Data in Healthcare Made Simple — DZone Big Data Zone
Hadoop and HealthCare is a pairing that can help patient outcomes become much more positive. Everyone who is a health care provider from the nurse aids, doctors, pharmacists, large corporate medical providers all the way to State and Federal governments could be using this but, many are not. Big data sounds impressive and is the “Bright Shinny” thing. The Hadoop elephant is slowly plodding though the ranks of these providers, and it scares them. The lack of the “Structure” in the data makes some think it not very usable, highly inaccurate and less intuitive to consume. The IT departments say “It’s Not SQL, it’s not relational”. Others think it’s necessary to convert all the structured SQL based databases to Hadoop Document databases. Some other worry about how they combine two different beasts together. This article from Richard Proctor outlines just some of the way the elephant in the room should be a new tool for innovation in health care; it is a multi-part article and worthy read.
I suggest read a book before you dive headlong into the next shiny bright thing in databases. This book appears to be a good read, I’ve downloaded it to my Kindle and started it. So far it is very interesting and easy to read. If you don’t have any background in databases I would suggest reading an entry level book first. This review by I Programmer is a much better review than I would ever write, give it a quick2-minutee read. We are truly in the “next Generation” of database evolution, you need to pay close attention to what going on right now, or be stuck in the dBase/Paradox past again.
This is a very good article to read. It is academic based but still is very relevant to business. Data Science was a term coined in 1974, it was one of the courses I took in college. Now it is back again, to define some skills you should consider learning to help manage and use any “Big Data” you may have.
I highly recommend reading this article on NoSQL databases. It is the best description of the different types of NoSQL that exist. There are four big NoSQL types: key-value store, document store, column-oriented database, and graph database. Each type solves a problem that can’t be solved with relational databases. Actual implementations are often combinations of these. OrientDB, for example, is a multi-model database, combining NoSQL types. OrientDB is graph database where each node is a document.
The book is a pre-order from Amazon and is expensive, but I think it will be a good read and reference. I placed my order already.