OMG I just had a flashback. They think my data-driven prototype is the “Real thing”

At some point, early in the development of an application, a prototype is usually created. Prototypes are useful for getting early feedback from the users, and to make user-interface problems more obvious. We used to call these prototypes ‘clays’, after the practice of the car industry of creating a facsimile made from clay to showcase the design of a new car design to the general management.

It takes a lot of care to make a good prototype; it must look convincing. The clay cars had to look so real as to make you believe that you could hop right into it and drive off. Drawings or wireframes just don’t serve as well because the observer cannot make that imaginative leap.

With data-driven applications, the verisimilitude includes the data. A prototype application should have data in it that is so close to real data that even the dullest, most literal-minded manager could look beyond the detail, to the important matters.

Read this article to understand the scenario.  I was shocked it was the same conversation I’ve had with customers after using a high-resolution, “Data-Driven” prototype.  I called the following conversation the “Dirty Ferrari Syndrome (DFS)”.

Image result for dirty ferrari

My beautiful data-driven prototype was the Ferrari.  I showed the customer it could do 150 MPH and they said: “It’s dirty”.  I told them about the fine leather interior and they said: It’s dirty”  I told them while it looks a rough, if we clean it up and add a little polish it will be gorgeous.    They said:  “Call us back when it’s cleaned up”.    I said we need your help, they said: “That’s not our job.”

In your efforts to make the prototype look as “Real” as possible you walk into the uncanny valley where folks think they are looking at the Real thing, no matter how many stamps shouting “PROTOTYPE” there are.  I have reverted back to bare frameworks with data from characters in the movies.  We will see how this works out.

Via: Data-driven Prototypes SQL Server Central

Data Warehouses and the Flying Car Dilemma — DZone Big Data Zone

Image result for flying car images

Data Warehouses and the Flying Car Dilemma — DZone Big Data ZoneTraditional data warehouses and databases were built for workloads that manifested 20 years ago. They are sufficient for what they were built to do, but these systems are struggling to meet the demands of modern business with the volume, velocity, and user demand of data. IT departments are being challenged from both ends. On one…

You can’t just paste a set of wings on your Toyota and expect to fly to your next appointment.  You can just tack on some new technologies to legacy data warehouses and expect to provide the insights to make your company survive and prosper.  Just as well you can’t just throw out the baby with the bath water.  The data warehouse probably cost quite a bit of money and effort and is embedded in the company process.

Twenty plus years ago I helped to introduce data warehousing to some very large government and corporate customers.  When I look back some of them. they are still using the same tools and processes after 20 years.  Would you continue to drive a 20-year-old car just because you paid too much for it?

This article has some great insights as to some of the alternatives.  Augment some and replace some.  The diagrams of traditional and alternative data warehouses are keepers.

via Data Warehouses and the Flying Car Dilemma — DZone Big Data Zone

Is the traditional data warehouse dead?



I think the ultimate question is: Can all the benefits of a traditional relational data warehouse be implemented inside of a Hadoop data lake with interactive querying via Hive LLAP or Spark SQL, or should I use both a data lake and a relational data warehouse in my big data solution?  The short answer is you should use both.  The rest of this post will dig into the reasons why.

The love affair with the noSQL (BigData) databases seems to be over.  Many of the projects using Hadoop and the other “not” relational databases have fallen by the wayside.  Some things like structured data are still done better on the old school relational database server s and accessed with SQL or some SQL tool.  As the amount of unstructured data increases so will the use of noSQL databases.

Via: Is the traditional data warehouse dead?

The Power BI Gateway; All You Need to Know

Power BI is a data analysis tool that connects to many data sources. If the data source for Power BI is located in an on-premises location, then the connection from cloud-based Power BI service, and on-premises located data source should be created with an application called Gateway. In this post, you will learn what the Gateway is, what are types of the gateway, their differences, installing the gateway, and scheduling a data set with that gateway.

If you are using a data warehouse like Teradata, Netezza, Oracle, etc then using Power-BI with the BI-Gateway may be the best option.  The Microsoft BI-Gateway can be installed as a “Personal” gateway on your desktop or laptop.  This gives the data analytical folks the ability to develop and test BI reports and data visualizations without transferring large amounts a data about.  In a production environment, the gateway would probably be installed on a separate server to help distribute the workload.  The centralized server also appeals to the “Enterprise” and “Security” minded folks.

Here is the link to the Microsoft Power-BI Gateway.

Via: The Power BI Gateway; All You Need to Know


Data Driven Documents (D3), API Server (Cdata) Generate REST Server 80+ Data Sources.



D3.js is a JavaScript library for producing dynamic, interactive data visualizations in Web browsers, using the widely implemented SVG, HTML5, and CSS standards. The CData API Server enables you to generate REST APIs for 80+ data sources, including both on-premises and cloud-based databases. This article walks through setting up the CData API Server to create a REST…

I have used D3 in concert with C3 to create a data visualization front end to a data warehouse.  D3 has a ton of features but is difficult to use out of the box for the novice user.  The C3 library puts a layer of smarts on top of D3 making it much easier to get started in graphic visualizations.  I’ve also used CData product to get an ODBC connection to Google sheets in order to analyze and transfer results from a Google form survey to a data warehouse.  CData products are very good just be willing to pay for that excellence.

This article is a good read because it combines the two to assemble a REST service and use it to provide the data feed to D3.

via Building Dynamic D3.js Web Apps With Database Data — DZone Web Dev Zone

API Gateways, the Rosetta Stone for data

Services in a microservices architecture share some common requirements regarding authentication and transportation when they need to be accessible by external clients. API Gateway s provide a shared layer to handle differences between service protocols and fulfills the requirements of specific clients like desktop browsers, mobile devices, and legacy systems. Click to see all chapters…

API Gateways are the middle man in the Application-Data relationship.  They serve as a community hall where folks go to meet and talk to one another.  This community hall has a universal translator like on Star Trek that makes data understood by all the people in the room.   Developers don’t worry about XML/JSON because the gateway understands them both.   DBA don’t worry about formatting the data because the gateway loves to format stuff.

Have you ever been fustrated with Sri, OK Google or Alexa?  Gateway quality varies from one vendor to another.  Write your own in Node may be an alternative, I don’t know.  Let’s talk.

via Building an API Gateway using Node.js — RisingStack Engineering

Presto, Magico open source distributed SQL Engine

2017-04-25_11-23-16In today’s blog, I will be introducing you to a new open-source distributed SQL query engine, Presto. It is designed for running SQL queries over Big Data (petabytes of data). It was designed by the people at Facebook. Quoting its formal definition:

“Presto is an open-source distributed SQL query engine for running interactive analytic queries against data sources of all sizes ranging from gigabytes to petabytes.”

Th folks at Facebook are at it again.   They build a SQL engine especially for analytical work, this is not an online transaction processing (OLTP) engine.  It’s an engine for ad-hoc queries across SQL/NoSQL databases distributed all over the place.

They use connectors for MySQL, Hadoop/Hive, MongoDB, Postgres and more.  Missing are some of standards like Microsoft SQL and Teradata.  However, this won’t be the story for long.

Presto is in its open source newness but you should take a look at the documentation to really appreciate the power of this new thing.

via An Introduction to Presto — DZone Big Data Zone

Backlogger web app released to Github, it’s a Scrum thing.

2017-03-07 12.30.08.jpg

“Backlog” is one of those SCRUMmy terms used to identify features or functions that have been dreamed up or discussed for an application.   You collect these ideas into a list which is called the “Backlog”.  Then this list is reviewed (Sprint Review) and the ideas are refined (Groomed) and an estimate of effort (Story Points) is assigned to it.  Then folks get together and discuss which ones should be done in the next timeframe (Sprint).  To collect these ideas some companies use an issue tracking system or an off the shelf ticket system (Atlassian JIRA) and others just use a spreadsheet… gasp.

Sometimes all you need is a simple web application that all the participants can use to enter any ANY of the ideas that came up.  Even things like “The buttons should be colored blue.”  I needed a simple project to help me learn some technologies that are new to me.  Hence the “Backlogger” was born.  The whiteboard above shows the original concept.

Technologies used in Backlogger

  • JavaScript
  • NodeJS
  • Bootstrap
  • Mongo without the headache, neDB
  • jsGrid

Design Requirements

The design requirements were meant to be simple as possible to make this project something that could be done quickly.  They also needed to be flexible to allow for better learning.

  • Single Page Application
  • Open Source
  • No user logins, just a password, we are a big happy family
  • Self-contained application, no need for outside services or servers
  • Mongo database and Mongo queries
  • Allow for a maintainable list of people names who contributed ideas to the backlog
  • Allow for a maintainable list of functional areas to help groups the ideas
  • One time entry of an idea, no editing,
  • The editing of an idea will be done during the grooming
  • Filters that help find ideas quickly
  • Ability to backup and wipe the database (Mongo Documents)
  • Simple report that can be printed directly

GeekMustHave would like to thank Phoenix Learning Labs for the resources and funding to do this project.  GeekMustHave would also like to thank the MDHHS-DWIP team for the testing and feedback.

Open source, common components

What does it look like?


Backlogger is an Open Source project available on Github.


I’ve used “Backlogger” in one project so far but others who have seen it have expressed some interest in it.  That’s another reason why it’s Open Source.

Depending on the feedback I might do additional updates.  Maybe I need a “Backlogger” for the “Backlogger”?













Manning free programming eBooks

freebooks.gif Just click the icon to the left, follow the instructions to sign up, and you’ll be added to Manning’s Deal of the Day mailing list. This means that during the month of December you’ll receive a special discount for a particular Manning publication in your email each day, good for that day only. So, remember to check your inbox regularly and act fast if you wish to get the deal!

Microsoft ups the BI game


Microsoft is trying to give the other “on-Demand” Business Intelligence vendors a run for their money.  The enterprise folks didn’t give Power-BI a second glance when they leanered the only way to publish was to the cloud.  Cloud = Bad.   There was talk of SSRS being able to be the portal for all things Power-BI and now some if this has come true.

Microsoft has an Azure appliance that you can “play with” while Microsoft tunes things up.  I think this is going to be a look-see for me in the near future.  I will post what I find.