5 Books to stay relevant as a Data Scientist

These 5 books will help you get skills to improve as a Data Scientist

Ivo Bernardo
6 min readAug 27, 2024
Photo by Susan Q Yin @ Unsplash.com

Being a Data Scientist in 2024 is hard. After being labeled as the sexiest job of the 21st century in 2012, becoming a Data Scientist today is not so straightforward as it was 12 years ago. While the foundational concepts (math, statistics and programming) are still very relevant, other skills are becoming important to stand out as the field matures.

Even as a seasoned professional, it seems impossible to keep up with all the new developments. But, even in this vortex, are there any new fundamental skills one can learn to keep themselves relevant in the field? I believe so.

This list is based on my experience with DareData and my involvement in leading Machine Learning system deployment projects over the past five years. Some skills, like LLMs and GenAI, have recently emerged, while others, such as communication, have been overshadowed but remain critical in the evolving AI landscape.

The goal of the post is to give you some ideas on books to read, and topics to tackle to improve as a data scientist. As the field continues to evolve, I think that learning these skills will give you higher chances of maintaining relevancy in the future.

These 5 books touch upon 5 different topics:

  • The transition from data scientist to machine learning engineer: and how to think about deployments of ML.
  • The value of data and analytics: how to improve your communication and leadership in the areas of data.
  • The rise of LLMs and learning how to work with generative AI models.
  • The importance of communication for the data roles of the future.
  • Ethics in data science and machine learning.

Let’s start!

ML Deployment — Chip Huyen’s Designing Machine Learning Systems

The age of the “notebook” data science is probably over (outside of research labs). Companies no longer want to play around with models that they can’t deploy in production.

This book details some of the most important concepts in MLOps (Machine Learning Operations):

  • Model Deployment
  • Monitoring and Drift
  • Framing ML Problems
  • Data Engineering Basics
  • Feature Engineering
  • Infrastructure and Tooling

The transition from Data Scientist to Machine Learning Engineer starts by learning MLOps concepts. There’s no question that a Data Scientist that knows how to handle the scalability and deployment phase is a more valuable data scientist for the market.

Value of Data — Competing on Analytics: The New Science of Winning

Creating value from data is probably one of data science’s hardest tasks. It requires business knowledge, extreme attention to detail and knowledge in data science foundational concepts.

There’s a growing trend of data scientists specializing in specific industries, allowing them to bring more value than generalists. It all ties with a larger ability to extract value from data. Knowing the nuances of how to embed Machine Learning into businesses is a skill that is not going anywhere.

The hype around Generative AI has turned this even more essential. How can we ensure these models truly add value? How do we invest in use cases that will genuinely improve the lives of employees and customers? These questions are difficult to answer.

Competing on Analytics is a good book to start, particularly if you haven’t really delved into the topics on how to use data in the context of management.

LLMS — Generative Deep Learning: Teaching Machines To Paint, Write, Compose, and Play

Based on the demand of the market, working with GenAI and LLMs will be a great add-on to your career. For Data Scientists that have only worked with traditional ML, this is a great opportunity to increase your skillset in a field where you will naturally fit in.

Data Scientists are likely one of the best job roles to pick up the new tasks and roles that demand GenAI skills. Studying Natural Language Processing and GenAI will help you transitioning from classical ML to generative ML.

76% of companies are planning or already deploying GenAI solutions. The market for these solutions will only continue to grow — and even if you consider this hype unjustified, it’s hard to deny that there’s a growing excitement around the subject. Even if we are currently experiencing a ‘dot-com’ bubble moment with generative AI, it’s worth mentioning that, just as internet jobs didn’t vanish (they even exploded) , GenAI-related jobs are likely to increase in the future.

If you would like a good technical resource on GenAI, David Foster’s book is a good start — being a nice blend of technical applications and architectural frameworks.

Communication— Made to Stick: Why Some Ideas Survive and Others Die

One of the most underrated skills in the world is the ability to communicate, particularly storytelling.

Since the beginning of the ages, humans are tailored to read and engage with stories. For Data Scientists, communication has often been the “hidden hint” of a great professional — and normally, the data scientists that mix technical with communication skills are the ones that are most admired by the community and their peers.

Made to Stick is a classic book on communication. Particularly, because it focus on storytelling and ideas that won (and why they won). Our ability to manipulate language and convey ideas to other human beings will still be relevant and important for many years to come.

Particularly, in an age where AI will become the norm, communication will stand apart as one of the most human-centric skills. Leveraging it is a must for most job roles, but it will become essential for the humans that will likely work in the intersection between machines and the rest of the world.

Ethics— Weapons of Math Destruction

Finally, tied to the rise of AI machines, algorithms and robots, data ethics related job roles will likely increase in the future.

Cathy O’Neil’s book is one of the first books to address ethics within the context of algorithms and machine learning. What’s really important in this book is that you will have contact with real stories within the context of traditional ML (not GenAI). Although many people believe that ethical concerns are only being raised now due to the threat of job loss, the reality is that these issues have been present in the industry for many years.

Many new job roles will come up tied to ethics and security within the context of AI. Most of them will show up because we will face several situations where there’s no right or wrong answer. Working in this field promises to be incredibly exciting and fulfilling in the future, as it has the potential to positively impact the lives of millions.

Thank you for taking the time to read this post. As a recap, here’s the 5 skills we’ve addressed during this blog post (that will help you stay relevant as a data scientist):

  • Machine learning operations
  • Extracting value from data
  • Large language models and generative AI
  • Communication
  • Ethics within the context of Data Science

Are there any other topics that you feel will be important for the data scientist of the future? These five are based on my own experience, and I’m probably missing some skills that I haven’t seen yet. Let me know in the comments! (if you would like, leave some extra books or resources, as well).

If you want to read / see more content related to AI and DS, subscribe my youtube Channel “The Data Journey”:

https://www.youtube.com/@TheDataJourney42

--

--

Ivo Bernardo

I write about data science and analytics | Partner @ DareData | Instructor @ Udemy | also on thedatajourney.substack.com/ and youtube.com/@TheDataJourney42