Listen to tech vendors and the tool – HANA or Hadoop – is guaranteed to make you a data hero. Watch IBM commercials and each of their employees must already be a data hero.
Data heroes are elusive, and get written up in books and blogs because they do some heroic, quirky things:
If there is a fork in the road – take the more difficult path
He could have just gone and talked to a handful of taxi drivers to hear what they did when it rained. But no. Oliver Senn, analyst at the Singapore-MIT Alliance for Research and Technology (SMART) first triangulated two months of weather data with 830 million GPS records of 80 million trips of over 16,000 Singapore taxicabs. Armed with the data which showed most taxis stopped moving when it rains, he finally went and talked to some drivers as to why.
Try and boil the ocean
Don’t just sign up to pretty up reports for your CFO. Sign up to solve world hunger. The Climate Corp provides crop insurance for farmers by analyzing 22 data sets for weather at sub-zip code level every few hours, calculating about 10,000 scenarios that could happen to a grower during the next two years. TaKaDu helps water companies around the world monitor leakage.It is an industry which routinely mislays 25-30% of its product in the process of delivering it. Old infrastructure and theft are common leak culprits but not easy or cheap to monitor. So analysis which pinpoints best payback areas makes lots of sense.
Beg, borrow, steal primary data
The National Hurricane Center uses ocean-based buoy sensors to collect moisture, temperature, wind speed and other data about storms from below, satellites to collect data from above and Hurricane Hunters and drones and dropsondes to gather data from within them. Then it crunches all that data in multiple models. It’s primary data it cannot buy from Nielsen. Union Pacific has rail-side sensors which collect noise and temperature data from wheels to monitor for ball bearings which may be failing. Same thing – it cannot buy that data from Bloomberg. More and more, it is becoming clear, primary data is expensive and excruciating to collect, but hugely important for “aha” analysis.
Continually improve and showcase how you are improving
In the 80s, The National Hurricane Center’s average storm track forecast error 48 hours out, was 225 nautical miles. Today, that error is a little less than 100 nautical miles. At the end of each season, it also audits its forecasts and provides “verification” data. How many analysts go back and audit and then share how well they forecasted?
Turn analysis into a sport
Follow Kaggle’s crowdsourced philosophy: “There are countless approaches to solving any predictive modeling problem. No single participant (or in-house expert, or consultant) can try them all. By exposing the problem to a large number of participants trying different techniques, competitions can very quickly advance the frontier of what's possible using a given dataset. Competitive pressures drive participants to keep trying new ideas.”
Finally, don’t take yourself too seriously
As Nate Silver, author and election predictor extraordinaire, tells Time
"Peggy Noonan or Dick Morris--these people take themselves seriously. And that's harmful to people trying to be informed,"
Comments
So you want to be a (big) data hero?
Listen to tech vendors and the tool – HANA or Hadoop – is guaranteed to make you a data hero. Watch IBM commercials and each of their employees must already be a data hero.
Data heroes are elusive, and get written up in books and blogs because they do some heroic, quirky things:
If there is a fork in the road – take the more difficult path
He could have just gone and talked to a handful of taxi drivers to hear what they did when it rained. But no. Oliver Senn, analyst at the Singapore-MIT Alliance for Research and Technology (SMART) first triangulated two months of weather data with 830 million GPS records of 80 million trips of over 16,000 Singapore taxicabs. Armed with the data which showed most taxis stopped moving when it rains, he finally went and talked to some drivers as to why.
Try and boil the ocean
Don’t just sign up to pretty up reports for your CFO. Sign up to solve world hunger. The Climate Corp provides crop insurance for farmers by analyzing 22 data sets for weather at sub-zip code level every few hours, calculating about 10,000 scenarios that could happen to a grower during the next two years. TaKaDu helps water companies around the world monitor leakage.It is an industry which routinely mislays 25-30% of its product in the process of delivering it. Old infrastructure and theft are common leak culprits but not easy or cheap to monitor. So analysis which pinpoints best payback areas makes lots of sense.
Beg, borrow, steal primary data
The National Hurricane Center uses ocean-based buoy sensors to collect moisture, temperature, wind speed and other data about storms from below, satellites to collect data from above and Hurricane Hunters and drones and dropsondes to gather data from within them. Then it crunches all that data in multiple models. It’s primary data it cannot buy from Nielsen. Union Pacific has rail-side sensors which collect noise and temperature data from wheels to monitor for ball bearings which may be failing. Same thing – it cannot buy that data from Bloomberg. More and more, it is becoming clear, primary data is expensive and excruciating to collect, but hugely important for “aha” analysis.
Continually improve and showcase how you are improving
In the 80s, The National Hurricane Center’s average storm track forecast error 48 hours out, was 225 nautical miles. Today, that error is a little less than 100 nautical miles. At the end of each season, it also audits its forecasts and provides “verification” data. How many analysts go back and audit and then share how well they forecasted?
Turn analysis into a sport
Follow Kaggle’s crowdsourced philosophy: “There are countless approaches to solving any predictive modeling problem. No single participant (or in-house expert, or consultant) can try them all. By exposing the problem to a large number of participants trying different techniques, competitions can very quickly advance the frontier of what's possible using a given dataset. Competitive pressures drive participants to keep trying new ideas.”
Finally, don’t take yourself too seriously
As Nate Silver, author and election predictor extraordinaire, tells Time
"Peggy Noonan or Dick Morris--these people take themselves seriously. And that's harmful to people trying to be informed,"
So you want to be a (big) data hero?
Listen to tech vendors and the tool – HANA or Hadoop – is guaranteed to make you a data hero. Watch IBM commercials and each of their employees must already be a data hero.
Data heroes are elusive, and get written up in books and blogs because they do some heroic, quirky things:
If there is a fork in the road – take the more difficult path
He could have just gone and talked to a handful of taxi drivers to hear what they did when it rained. But no. Oliver Senn, analyst at the Singapore-MIT Alliance for Research and Technology (SMART) first triangulated two months of weather data with 830 million GPS records of 80 million trips of over 16,000 Singapore taxicabs. Armed with the data which showed most taxis stopped moving when it rains, he finally went and talked to some drivers as to why.
Try and boil the ocean
Don’t just sign up to pretty up reports for your CFO. Sign up to solve world hunger. The Climate Corp provides crop insurance for farmers by analyzing 22 data sets for weather at sub-zip code level every few hours, calculating about 10,000 scenarios that could happen to a grower during the next two years. TaKaDu helps water companies around the world monitor leakage.It is an industry which routinely mislays 25-30% of its product in the process of delivering it. Old infrastructure and theft are common leak culprits but not easy or cheap to monitor. So analysis which pinpoints best payback areas makes lots of sense.
Beg, borrow, steal primary data
The National Hurricane Center uses ocean-based buoy sensors to collect moisture, temperature, wind speed and other data about storms from below, satellites to collect data from above and Hurricane Hunters and drones and dropsondes to gather data from within them. Then it crunches all that data in multiple models. It’s primary data it cannot buy from Nielsen. Union Pacific has rail-side sensors which collect noise and temperature data from wheels to monitor for ball bearings which may be failing. Same thing – it cannot buy that data from Bloomberg. More and more, it is becoming clear, primary data is expensive and excruciating to collect, but hugely important for “aha” analysis.
Continually improve and showcase how you are improving
In the 80s, The National Hurricane Center’s average storm track forecast error 48 hours out, was 225 nautical miles. Today, that error is a little less than 100 nautical miles. At the end of each season, it also audits its forecasts and provides “verification” data. How many analysts go back and audit and then share how well they forecasted?
Turn analysis into a sport
Follow Kaggle’s crowdsourced philosophy: “There are countless approaches to solving any predictive modeling problem. No single participant (or in-house expert, or consultant) can try them all. By exposing the problem to a large number of participants trying different techniques, competitions can very quickly advance the frontier of what's possible using a given dataset. Competitive pressures drive participants to keep trying new ideas.”
Finally, don’t take yourself too seriously
As Nate Silver, author and election predictor extraordinaire, tells Time
"Peggy Noonan or Dick Morris--these people take themselves seriously. And that's harmful to people trying to be informed,"
January 09, 2013 in Industry Commentary | Permalink