Review of “Building Data Science Teams” by DJ Patil

Having recently committed myself to earning my living as a Data Scientist, I’ve been reading anything I can find to guide my self-education. So I just spent the last hour reading and mulling over DJ Patil‘s article/report Building Data Science Teams (BDST henceforth) which is available free from various outlets; I read the Kindle version.  (Disclaimer: DJ is a friend and occasional drinking buddy.)

Patil takes up the necessary and generally thankless task of writing a “big-think piece”.  It’s necessary because, with all the recent talk about Data Scientists, it would be easy for some to see “Data Science” as a recent entry in the long history of business fads.  Business fads tend to indulge in one of two sins: oversimplifying, or being so general that anything counts.  Both are sins to which Data Science advocates are susceptible.  An advocate of Data Science might oversimplify by giving a recipe or IT shopping list for doing Data Science; Patil avoids this triteness by emphasizing the wide diversity of the good teams he’s built or worked with.  Alternatively, one could sin by making anything count as Data Science.  Commenter “Verbose” anticipates this problem with his erudite critique of BDST: “Same technical data analysis, new bullshit name“.  Patil avoids this and provides a blueprint for others to do the same.

When discussing the etymology of “Data Scientist” Patil writes that “Research Scientist” would not be as appropriate a term for this profession:

“Research scientist” was a reasonable job title used by companies like Sun, HP, Xerox, Yahoo, and IBM. However, we felt that most research scientists worked on projects that were futuristic and abstract, and the work was done in labs that were isolated from the product development teams. It might take years for lab research to affect key products, if it ever did. Instead, the focus of our teams was to work on data applications that would have an immediate and massive impact on the business. (emphasis added) The term that seemed to fit best was data scientist: those who use both data and science to create something new.

Elsewhere Patil provides a solid definition of Data Scientist, but this paragraph encapsulates the concept just as well:  A Data Scientist uses data and science to have an immediate and massive impact on the business.  Just moving data around?  Not data science.  Have an impact in the vague future?  Not data science.  Improving entirely on the margins?  Not (all of) data science.  Holding the Data Scientist’s feet to the fire — asking “How does this immediately and massively impact our business?” — provides accountability and hence focus for the team.

Writing a “big-think piece” is also a thankless task: the breadth of the topic means that it’s easy for critics to find something to criticize as being presented too simply.  This ignores the contribution of providing a view of the new discipline from space, showing all of it as a piece, and showing (if briefly) how the disparate parts fit together.  I was glad for a look at a map for this road I’m traveling.

Overall BDST is short, shorter than I would have liked.  (I’m glad that Patil is sharing more of his experience through other venues.)  The advice Patil gives about Data Science, forging teams, and hiring Data Scientists seems both specific and useful; I’ll post again as I have occasion to use his advice.

Recommendation: “Building Data Science Teams” is short, but with enough good ideas as to be required for anyone in business intelligence, internal data analysis, or applied computational modeling and prediction.

Planned Serendipity

Yesterday I got back from a great APSA in Seattle.  My undergraduate students were despondent at me having to cancel class Thursday so I could attend.  A few were curious about what happens at a scientific conference and asked about the structure.  I explained that there would be several thousand political scientists at this conference and that most of the planned interaction would take place in panels.

In theory, a panel takes place in a room where 3-300 (median = 10) people watch three to five papers get presented by their authors.  Then a discussant, who reads the papers in advance, comments on the papers both to draw connections among them and to stimulate conversation among the attendees.  Then the audience asks questions and offers feedback to the authors.  The whole panel takes about 1 3/4 hours.

Although panels comprise most of the scheduled events at a conference, they are not the best reason for scientists to attend conferences and they are far from the most rewarding part of a conference.  Panels are often poorly attended.  The papers in a panel often have very little to do with each other.  The discussant may not receive the papers until days or moments before the panel, if at all, and even so the comments may focus more on typography than on big ideas.

Panels are a party game. They are an excuse to get smart people, who are interested in similar things, together in a room talking. Put a bunch of clever folks together and strange, wonderful, unpredictable things happen.  A conference is mass planned serendipity.

The largest conference benefits to my research have happened when I have not been at panels: between panels, skipping panels, into the evening and the night. It’s the networking, but not “networking” in the Machiavelian, sales-person sense.  It’s the comment on my paper that someone was a little too shy to offer in front of everyone, the comment that helps me recast the paper so it will place higher.  It’s running into the same person at three panels and finally discovering we would love to work together on some research.  It’s the dinner outing that leads to an invited talk or an interview.  It’s the shared coffee followed up with a Facebook friending that leads to a new real friendship.

All of this is made possible by panels, but it’s not the direct result of the panels.  So when someone tells me they didn’t go to a lot of panels, I understand that they probably got a lot of professional good out of the conference.

Also, I had a lot of fun at the Space Needle.

Regain your confidence (intervals)

Next time you see someone “misinterpret” a confidence interval, wait a second.  They’re actually probably okay. Continue reading ‘Regain your confidence (intervals)’ »

Bayes fixes small n, doesn’t it?

What is a methods-careful practitioner to do when the number of observations (n) is small?  I don’t know how many times I’ve been told by a well-meaning Bayesian some variation of

Bayesian estimation addresses the “small n problem”

This is right and wrong. Continue reading ‘Bayes fixes small n, doesn’t it?’ »

Truth and Choices: Computational v. Analytical formal models

How do we show a statement about politics is true? Analytic formal modelers suggest one way:

Continue reading ‘Truth and Choices: Computational v. Analytical formal models’ »

We all carry the scars

I served in the US Navy for a few months in 1986, five years in the early 90s, and another year and a half in the reserves. I was never asked to shoot someone. I never pulled a trigger when the weapon was aimed at a person. I served during, but not “in” the first Gulf War. I served during “peacetime”, or at least that’s how I thought about it. However, over the last few months I have been thinking more about my time in uniform, realizing the lasting and deep effects that experience had on me. Continue reading ‘We all carry the scars’ »

Change of Intuition about the Definition of Insanity

My dad and I went to the recent Brown/Whitman California gubernatorial debate here at UC Davis. It was fun seeing “democracy” live and up close. One of the candidates twice repeated an old saw:

One definition of insanity is doing the same thing over and over and expecting different results.

Continue reading ‘Change of Intuition about the Definition of Insanity’ »

Dimensionality matters: three implications of ideology being multidimensional

Left or right? Liberal or conservative? Blue or red?  We know the terms bandied about in the punditverse, but it’s easy to forget that there is more than one way to divide the world into two political ideologies.

Continue reading ‘Dimensionality matters: three implications of ideology being multidimensional’ »

The better the question, the worse the answer

Justin Wolfers wrote recently about the level of interaction between economics and other social sciences.  In particular, he wonders why economic work is not well represented in a list of the books most cited in social science research.  It’s a good question: I find many of the tools and techniques developed by economists are useful in my works studying political phenomena, and I do cite economic research.

One particularly thoughtful commenter on Wolfers’ post notes that economics combines the controversy of addressing everyday issues with the general inaccessibility of chemistry.  This conflict may make some people resist the conclusions of economists, ie. strong prior + incomprehensible evidence = small amount of updating.

Continue reading ‘The better the question, the worse the answer’ »

You’re Asking the Wrong Question, Fortunately

Today I got up, finishing a decision I started last night about how much to sleep before today.  I will choose my attire to fit the weather and strike the right tone in the classes I will teach. I will go to work and spend the day at work making optimal decisions about how to allocate my time and effort considering my immediate goals, teaching effectively and  preparing for an experiment, and longer term goals like getting along with my peers and building my tenure packet.  I will come home along a route that balances safety, convenience, fuel economy, and curiosity.  I will talk with my wife, play with my daughter, read to my son, all with an eye toward building both their individual lives and my relationships with them.  I may make a few allocation decisions about improving our house or saving for retirement.  I will decide whether to work out tomorrow morning, then  begin the decision about how much to sleep before tomorrow.

Continue reading ‘You’re Asking the Wrong Question, Fortunately’ »