How to prevent data leakage in pandas & scikit-learn โ
What is data leakage, why is it problematic, and how can you prevent it when working on a supervised Machine Learning problem in Python?continue reading.
What is data leakage, why is it problematic, and how can you prevent it when working on a supervised Machine Learning problem in Python?continue reading.
Learn how to “discretize” or “bin” your continuous features using Python’s scikit-learn, and find out why I usually don’t recommend doing so.continue reading.
Are you trying to understand the differences between Jupyter Notebook, JupyterLab, IPython, Colab, and related terms? You’re in the right place!continue reading.
Discover the benefits of virtual environments and learn the six conda commands you need to know to get started!continue reading.
What is the Anaconda distribution and why do people use it? How is it related to conda & Miniconda? As a Data Scientist, which should I use?continue reading.
Use Python to solve this classic probability puzzle that has stumped mathematicians and Nobel Prize winners!continue reading.
Learn how to use the power of GPT to interact with your private documents. All using Python, all 100% private, all 100% free!continue reading.
Learn how to use pandas, requests, and regular expressions (“regex”) to create a dataset of every Python version and its release date!continue reading.
Are you working with datetime data in pandas? Learn how to become “timezone-aware” so that your dataset cooperates with Daylight Saving Time!continue reading.
Learn how to use Python’s f-strings for substitution and formatting, and then combine those features to solve a real-world pandas problem!continue reading.
Need help with your code? Learn my step-by-step process for asking great Stack Overflow questions that will get answered quickly!continue reading.
Watch me answer 59 of YOUR scikit-learn questions in 90 minutes! Topic include class imbalance, preprocessing, categorical features, data leakage, and more…continue reading.
Learn how to use the “merge” function in pandas so that you can combine multiple datasets into one DataFrame. Includes examples of the four types of joins.continue reading.
In this 28-minute video, you’ll learn how to properly encode your categorical features using scikit-learn’s OneHotEncoder, ColumnTransformer, and Pipeline.continue reading.
There are two ways to select a Series from a DataFrame: “dot notation” and “bracket notation” (square brackets). Find out which one you should use, and why!continue reading.
50+ tricks that will save you time and energy every time you use pandas! New tricks added daily. Up-to-date with the latest version of pandas (0.25)continue reading.
Work faster, write better pandas code, and impress your friends! These are the most useful tricks I’ve learned from 5 years of teaching Python’s pandas library.continue reading.
Learn how to use Python’s pandas library to effectively explore, clean, and visualize your data. Become more fluent at using pandas to answer data science questions.continue reading.
Comparing free services for running an interactive Jupyter Notebook in the cloud: Binder, Kaggle Kernels, Google Colab, Azure Notebooks, CoCalc, Datalore.continue reading.
pandas is a very popular Python library for data analysis, manipulation, and visualization, but it still hasn’t reached version 1.0. What’s next for pandas?continue reading.