Having recently committed myself to earning my living as a Data Scientist, I’ve been reading anything I can find to guide my self-education. So I just spent the last hour reading and mulling over DJ Patil‘s article/report Building Data Science Teams (BDST henceforth) which is available free from various outlets; I read the Kindle version. (Disclaimer: DJ is a friend and occasional drinking buddy.)
Patil takes up the necessary and generally thankless task of writing a “big-think piece”. It’s necessary because, with all the recent talk about Data Scientists, it would be easy for some to see “Data Science” as a recent entry in the long history of business fads. Business fads tend to indulge in one of two sins: oversimplifying, or being so general that anything counts. Both are sins to which Data Science advocates are susceptible. An advocate of Data Science might oversimplify by giving a recipe or IT shopping list for doing Data Science; Patil avoids this triteness by emphasizing the wide diversity of the good teams he’s built or worked with. Alternatively, one could sin by making anything count as Data Science. Commenter “Verbose” anticipates this problem with his erudite critique of BDST: “Same technical data analysis, new bullshit name“. Patil avoids this and provides a blueprint for others to do the same.
When discussing the etymology of “Data Scientist” Patil writes that “Research Scientist” would not be as appropriate a term for this profession:
“Research scientist” was a reasonable job title used by companies like Sun, HP, Xerox, Yahoo, and IBM. However, we felt that most research scientists worked on projects that were futuristic and abstract, and the work was done in labs that were isolated from the product development teams. It might take years for lab research to affect key products, if it ever did. Instead, the focus of our teams was to work on data applications that would have an immediate and massive impact on the business. (emphasis added) The term that seemed to fit best was data scientist: those who use both data and science to create something new.
Elsewhere Patil provides a solid definition of Data Scientist, but this paragraph encapsulates the concept just as well: A Data Scientist uses data and science to have an immediate and massive impact on the business. Just moving data around? Not data science. Have an impact in the vague future? Not data science. Improving entirely on the margins? Not (all of) data science. Holding the Data Scientist’s feet to the fire — asking “How does this immediately and massively impact our business?” — provides accountability and hence focus for the team.
Writing a “big-think piece” is also a thankless task: the breadth of the topic means that it’s easy for critics to find something to criticize as being presented too simply. This ignores the contribution of providing a view of the new discipline from space, showing all of it as a piece, and showing (if briefly) how the disparate parts fit together. I was glad for a look at a map for this road I’m traveling.
Overall BDST is short, shorter than I would have liked. (I’m glad that Patil is sharing more of his experience through other venues.) The advice Patil gives about Data Science, forging teams, and hiring Data Scientists seems both specific and useful; I’ll post again as I have occasion to use his advice.
Recommendation: “Building Data Science Teams” is short, but with enough good ideas as to be required for anyone in business intelligence, internal data analysis, or applied computational modeling and prediction.