The Variable

LLMs for Data Science: What You Need to Know

More and more data professionals now encounter large language models as integral components in core data science workflows. From basic data analysis to complex extraction processes, LLMs play an increasingly visible role in areas where they used to have a minimal footprint (if any).

How should you navigate this rapid change? What can data scientists do to avoid human-LLM turf wars and instead leverage their powers to produce better, more streamlined results? This week, we zoom in on data-specific use cases with articles that show how agents, prompts, and other LLM-powered tools can enhance, rather than jeopardize, the value of your work.

Before we jump in: in case you missed it, we recently published our latest Author Spotlight, an insight-filled Q&A with longtime TDS contributor Egor Howell discussing his career journey and offering advice for aspiring ML engineers.

LangChain for EDA: Build a CSV Sanity-Check Agent in Python

Tired of performing the same exploratory data-analysis chores time after time? Sarah Schürch walks us through an automation project — powered by Python and LangChain — that produces agents with the ability to display columns, detect missing values, and retrieve descriptive statistics, among other time-saving benefits.

Read More

The End-to-End Data Scientist’s Prompt Playbook

If you're skeptical about LLMs' place in data scientists' toolkit, Sara Nobrega's latest exploration of prompting techniques — especially in the area of stakeholder communication — might just change your mind.

Read More

Extracting Structured Data with LangExtract: A Deep Dive into LLM-Orchestrated Workflows

Subha Ganapathi offers a hands-on guide to building modular workflows for structured intelligence, ensuring schema alignment and fact completeness.

Read More

From our partner, Tonic

Keep PII Out of AI

Keep sensitive data out of inputs and embeddings for AI is paramount for privacy. Make the process easy with Tonic Textual.

Demo Today

This Week's Most-Read Stories

The articles our community has been buzzing about in recent days cover MCP, the future of data generalists, and more:

Using LangGraph and MCP Servers to Create My Own Voice Assistant, by Benjamin Lee
Built over 14 days, all locally run, no API keys, cloud services, or subscription fees.
The Generalist: The New All-Around Type of Data Professional?, by Loizos Loizou

Is over-specialization ending and are data generalists on the rise?
The Machine Learning Lessons I’ve Learned This Month, by Pascal Janetzky

August 2025: logging, lab notebooks, overnight runs.

Contribute to TDS

We love publishing articles from new authors, so if you’ve recently written an interesting project walkthrough, tutorial, or theoretical reflection on any of our core topics, why not share it with us?

Submit Your Article

Meet Our New Authors

Explore excellent work from some of our recently added contributors:

James Gibbins is a data scientist with a multidisciplinary background, and has been publishing a popular series on hyperparameter tuning.
Erika G. Gonçalves joins us with expertise in applied math and statistics, as well as deep industry experience; her first article looks under the hood of AI applications.
Sean Moran has led multiple AI/ML initiatives at large-scale enterprises. His TDS debut looks into a potential future where scientific innovation might be heavily AI-assisted.

Want more insights, industry trends, and exclusive content? Follow us on social media for real-time updates and expert discussions!

The Variable

LLMs for Data Science: What You Need to Know

LangChain for EDA: Build a CSV Sanity-Check Agent in Python

Keep PII Out of AI

This Week's Most-Read Stories

Other Recommended Reads

Contribute to TDS

Meet Our New Authors