Skip to Content

Handbook for Business Leaders to Win with Data Science by Howard Friedman and Akshay Swaminathan

In “Winning with Data Science: A Handbook for Business Leaders,” Howard Friedman and Akshay Swaminathan expertly guide you through the transformative power of data science in driving business success. This indispensable handbook empowers you with the knowledge and strategies to harness data science effectively, propelling your organization to new heights of achievement.

Dive into this groundbreaking book and discover the secrets to leveraging data science for unparalleled business growth.

Genres

Business, Data Science, Leadership, Technology, Analytics, Strategy, Management, Decision Making, Innovation, Competitive Advantage

Handbook for Business Leaders to Win with Data Science by Howard Friedman and Akshay Swaminathan

“Winning with Data Science” is a comprehensive guide that equips business leaders with the tools and insights to effectively integrate data science into their organizations. The authors, Howard Friedman and Akshay Swaminathan, draw upon their extensive experience to provide a clear and practical framework for harnessing the power of data science.

The book covers essential topics such as understanding the fundamentals of data science, building and managing data science teams, implementing data-driven decision-making, and developing a data-centric culture within the organization. The authors emphasize the importance of aligning data science initiatives with business objectives and provide strategic guidance on selecting and executing high-impact projects.

Throughout the book, real-world case studies and examples illustrate how leading companies have successfully leveraged data science to gain a competitive edge. The authors also address common challenges and offer practical solutions for overcoming obstacles in data science implementation.

Review

“Winning with Data Science” is an indispensable resource for business leaders seeking to capitalize on the transformative potential of data science. Howard Friedman and Akshay Swaminathan have crafted a highly accessible and actionable guide that demystifies the complexities of data science and provides a clear roadmap for success.

One of the book’s greatest strengths lies in its ability to bridge the gap between technical concepts and business application. The authors skillfully translate complex data science principles into relatable and relevant insights for decision-makers. They strike a perfect balance between providing a solid foundation in data science fundamentals and offering practical strategies for implementation.

The book’s structure is well-organized and logical, allowing readers to easily navigate the content and focus on the areas most relevant to their needs. The authors’ writing style is engaging and conversational, making even the most technical concepts accessible to a wide audience.

Another notable aspect of the book is its emphasis on the human element of data science. The authors stress the importance of building diverse and collaborative data science teams, fostering a data-driven culture, and ensuring ethical considerations in data usage. This holistic approach sets the book apart from other technical guides and underscores the critical role of leadership in driving successful data science initiatives.

While the book provides a comprehensive overview of data science in business, some readers may crave more in-depth technical details or specific industry applications. However, the authors’ focus on high-level strategies and best practices ensures that the book remains relevant and valuable across various business contexts.

Overall, “Winning with Data Science” is a must-read for any business leader looking to harness the power of data science for competitive advantage. With its compelling insights, practical guidance, and real-world examples, this book is an essential resource for navigating the data-driven landscape of modern business.

Recommendation

Modern business leaders need to know data science basics. As authors and data scientists Howard Steven Friedman and Akshay Swaminathan explain, a company’s business leaders and its data science team need to collaborate in pursuit of larger business goals. For that to occur, leaders must be able to communicate their needs to their data scientists and understand the options they provide. Friedman and Swaminathan’s practical guide to data science offers a host of concrete examples to help business people and entrepreneurs master the fundamentals and put their new knowledge to work.

Take-Aways

  • Collaborate with your data scientist team to create an optimal data workflow.
  • Data science for business requires hands-on project management.
  • Data scope and limitations form a data science project’s foundation.
  • Data analysis seeks to determine the causal relationships between phenomena.
  • You have to consider the population to which you’re applying a “prediction model.”
  • It’s best to start with simple models and then figure out how to improve them.
  • Natural language processing (NLP), “geospatial analysis,” and “computer vision (CV)” can be critical data analysis tools.
  • Data science can raise important ethical issues.

Summary

Collaborate with your data scientist team to create an optimal data workflow.

A team leader at a top financial firm, Steve, wanted to improve the company’s Recoveries Department operations — maximizing the amount of money they collected and minimizing costs. Hiring more staff wasn’t possible. So, the company needed a process that would allow them to use data-informed insights to prioritize some accounts over others and streamline employee workload.With these goals in mind, he approached the data science team. After discussing his priorities and the Recoveries Department’s constraints, the data scientists collaborated with Steve to develop a “data workflow” plan for the problem-solving effort.

“Basic data workflow consists of five stages: data collection, storage, preparation, exploration, and modeling, including experimentation and prediction.”

Collecting data from diverse sources and bringing that data to a single location is typically an automated process called “extract, transform, and load (ETL).” Companies can extract data from existing databases, or they can create relational databases from customer purchase histories, phone calls, and the like. Data transformation involves preparing data for analysis, eliminating inconsistencies and anomalies, and standardizing its format. Data storage may include physical storage devices like hard disk drives or solid-state drives, but, given the advantages of cloud computing — including the ability to scale computation power and easily share data across an organization, lower costs and added security — many companies embrace cloud-based solutions.

In the end, however, data scientists can’t accomplish business goals with data unless data quality is certain. Cleaning data sets properly requires that data scientists know what’s most important. In general, however, data sets should not include repetitions and all values should fall within a predetermined range. Data cleaners should also note any missing data and determine if it’s needed to provide accurate analyses.

Data science for business requires hands-on project management.

A woman named Kamala with an MD and MBA from Stanford University, and enormous amounts of student debt, got involved in a medical technology start-up, founded by a data scientist. Despite her ambitious hope for a lucrative IPO, the start-up went under in less than a year. She bounced back with a job at a mid-sized health insurance company that tasked her with providing a methodology for data-driven decisions about drugs and procedures that are optimal in terms of both health outcomes and costs.

“A key lesson that Kamala took from her med-tech start-up failure was the importance of project management.”

Kamala learned an important lesson from her time at the start-up, which she brought to bear in her new role: The reason Kamala’s start-up never got beyond the ivory tower was that its data science team spent all their time pursuing interesting ideas instead of figuring out how to become profitable. They needed a disciplined project manager.

So, when Kamala brought in a data science team to help with her health insurance project, she started by assigning one of her experienced team members to handle project management. Data science projects are like any other project, and they tend to have four distinctive phases: “Concept, planning, implementation, and closeout.” The concept clarifies the project demands and hoped-for outcomes. Planning involves determining necessary resources, what people will actually do and when, what it will all cost, and what the final results will be. The implementation phases require more specific thinking about human resources and skills, how to ensure consistent quality, and how to manage and mitigate risks. The closeout phase involves everyone reviewing and signing off on the now completed project.

Data scope and limitations form a data science project’s foundation.

The principal expenditure for the health insurance company Kamala ended up working for was reimbursements for the healthcare people received. The company found out that 80% of their disbursements went to 20% of their patients, and tasked Kamala with finding out whether it was possible to improve those patients’ cost-effectiveness. Kamala harnessed a data science team and took some initial steps to understand the company’s claims data.

“A good first step in understanding data is an exploratory data analysis.”

An exploratory data analysis can help you determine your data’s scope, including the time period covered by the data, criteria for inclusion, and whether any potentially relevant data wasn’t included. In addition, it can provide a broad perspective on the kinds of data in the database, and the database’s limitations. Some types of data just might not be in the database, such as reimbursed visits to the gym. The data science team will be able to use the exploratory analysis’ results to provide a statistical overview of the data. This overview can help answer questions such as, “Did the number of patients on the plan increase over time?” and “Did the number of claims increase, and if so by how much?”

Data analysis seeks to determine the causal relationships between phenomena.

If you’re in a leadership role, you will have to make important decisions with wide implications based on data and the analyses your company’s data scientists provide. It’s important to find ways to avoid drawing the wrong conclusions based on flawed or misleading data. To ensure analyses are accurate, you must work with your data science team to ensure underlying biases aren’t distorting the data and that evidence quality is high.

“Determining causality is one of the main goals of data analysis. Causality refers to the underlying causal relationships between phenomena.”

Causality is most obvious in medicine. Suppose, for example, researchers want to know whether eating a particular food leads to worse health outcomes. Ideally, you’d have a set of identical people or “clones,” some of whom eat the questionable food and some of whom do not — and see what happens. The use of clones isolates and confirms the causal factor. Since cloning isn’t possible in the real world, experiments seeking to establish causality are typically randomized among groups with relevant baseline characteristics. These studies use a control group: Some people receive a placebo instead of the drug being tested.

Issues of direct causality arise in business contexts, too. For instance, the people who run a social media platform will want to know whether certain features directly cause people to stay on the platform longer. Randomized A/B testing can also apply to such cases but even when randomization isn’t possible, data analysis can still establish causality in other ways. For example, you could analyze the behavior of two similar groups of people at a key inflection point — say, hospitalization rates of people who are a few months away from qualifying for Medicare coverage versus those who have recently qualified.

Problematic data-driven conclusions can emerge from biases in the data.For instance, data can be biased because the participants in a study or survey aren’t randomly selected. Data can be biased because of shoddy data collection techniques or because there’s a bias built into the people who respond to a particular survey, or even because the publication of certain kinds of data is biased in one direction or another. Being aware of these biases can help people identify misleading statistics.

You have to consider the population to which you’re applying a “prediction model.”

Making predictions is one of data science’s fundamental aims. A prediction model is, in and of itself, a fairly straightforward concept. You feed data into a model, and the model uses it to make a prediction. The prediction might be something continuous — like whether someone will continue drinking a certain amount of alcohol or spend money at a certain level — or something discrete — such as whether someone will purchase a home or suffer a cardiac arrest.Still, models differ depending on their purpose. Some models attempt to explain a phenomenon. Others simply aim to provide an accurate prediction.

“Feature selection — how to identify which features to include in your model — is an important challenge that must be addressed in any predictive modeling.”

The first thing you need to do when creating a model is articulate the outcome you want it to predict. In the case of the health insurance industry, the predicted outcome might be hospitalization under a given set of circumstances. Once you’ve established a relevant population and outcome, you can set about creating a model. The features selected serve as the model’s foundation. When you’re looking for a model that predicts hospitalizations, you need variables that predict hospitalizations in other models.

A common feature selection method uses “filters.” Some filters use variables with a strong correlation with the desired outcome. A more sophisticated feature selection method is called a “wrapper.” Wrappers help you analyze how each feature affects the model’s overall performance by allowing you to add or remove them one at a time. “Embedded” feature selection methods incorporate feature selection into the model’s algorithm. There isn’t necessarily one solution to the feature selection problem. The data science team should experiment with different options and see which one is most effective. If the model’s performance isn’t what you and your data science team had hoped for, you can fine-tune it by adding additional features and providing access to more data.

It’s best to start with simple models and then figure out how to improve them.

Suppose your new model for making healthcare decisions is working out fairly well: You’re able to make better decisions in a shorter amount of time than before, with good health outcomes and lower costs. Even though you’re getting less than 50% incorrect decisions at this point, it would be great to bring that number down as close as possible to 0%.

“In data science, it’s always best to start simple and evaluate how much room there is for improvement…. If a simple model isn’t able to work well, chances are that a complex model won’t do much better.”

“Residuals” mark the difference between the value of an outcome and the model’s predictive outcome. Identifying residuals can suggest ways to improve a model by adjusting the outcome variable. Another way of improving a predictive model is to adjust features. Without well-engineered features, you won’t get accurate predictions. One simple way to add features without needing to gather new data is to recalculate the relationship between two principal terms in the data. For example, you might have the model consider the relationship between age and gender — which can reveal how sex affects how much patients spend as they grow older. Finally, instead of experimenting with feature engineering, you can reevaluate the data with which the model was trained in the first place. If the original model wasn’t trained on a truly representative data set, it might be better to use “weighted regression” or to weigh the data in a way that better reflects the outcomes sought.

Natural language processing (NLP), “geospatial analysis,” and “computer vision (CV)” can be critical data analysis tools.

Artificial intelligence (AI) and deep learning are all the rage these days. Still, what’s most important for data scientists is to stay focused and complete the task at hand.Generative AI capable of generating text, images, audio and other media is going to transform business in many ways, but it’s important to not get seduced by buzzwords and to stay focused on problem-solving.

“Businesspeople like us … are less concerned about understanding the details of the algorithms that are used and more concerned about solving the problems.”

AI applications may be helpful to your business, depending on its model.Natural language processing (NLP), which takes unstructured written and spoken language as input, is crucial for a business like Google. A business like Uber, on the other hand, will find geospatial analysis more useful. It incorporates location information from various sources into its model.Finally, Tesla’s self-driving cars are more likely to use computer vision (CV), which uses visual inputs to determine the location, dimension, and status of objects in the visual field. CV itself tends to be a more specialized field. The most common algorithms for CV are convolutional neural networks (CNN) for static images and recurrent neural networks (RNN) for video, both of which involve multiple layers of processing.

Data science can raise important ethical issues.

When students enter medical school, they learn the “Hippocratic oath” which includes the moral commitment to “do no harm” and make morally sound decisions. It’s obvious why doctors should take such an oath: They are in a singular position to help people but also to harm them. It is, however, becoming ever more clear that people can use data science and artificial intelligence in extremely harmful ways. It may be time for data scientists to adopt something like the Hippocratic oath.

“The National Academies of Sciences, Engineering, and Medicine released a data science oath, adapted from the Hippocratic oath.”

The data science oath created by The National Academies of Sciences, Engineering, and Medicine contains three principal points:

  1. Data scientists should acknowledge when they don’t know something and must be willing to bring in other people to solve problems.
  2. Data scientists must prioritize data privacy and security.
  3. Data scientists should keep in mind that data isn’t just numbers but represents or reflects the lives of real people and that issues such as algorithmic bias can lead to unintended, real-world consequences.

The former chief data scientist of the US Office of Science and Technology Policy, DJ Patil, and two colleagues, released their own data ethics checklist that asks data scientists to consider the possibility of data bias, whether users have given their consent to use their data, and how they are protecting user data. Both the data science oath and the data ethics checklist focus on fairness, privacy, transparency, and the broader social effects of data use.​​​​

About the Authors

Howard Steven Friedman is a data scientist with experience in both the private and public sectors. He is an adjunct professor at Columbia University. His previous books include Ultimate Price and Measure of a Nation. Akshay Swaminathan is a data scientist who focuses on health systems. He is a Knight-Hennessy scholar at the Stanford University School of Medicine.