Introduction to Pandas DataFrames

What Is Pandas?

Pandas is Python’s primary library for data manipulation and analysis. The DataFrame — a two-dimensional labeled data structure — is its core object. Think of it as a programmable spreadsheet with far more power.

Creating DataFrames

import pandas as pd

# From a dictionary
df = pd.DataFrame({
    "name": ["Ada", "Bob", "Cat"],
    "age": [30, 25, 35],
    "city": ["London", "Paris", "Tokyo"]
})

# From a CSV file
df = pd.read_csv("data.csv")

# From JSON
df = pd.read_json("data.json")

Exploring Data

df.head()          # First 5 rows
df.tail(3)         # Last 3 rows
df.shape           # (rows, columns)
df.dtypes          # Column data types
df.describe()      # Statistical summary
df.info()          # Memory usage + types
df.columns         # Column names
df.nunique()       # Unique values per column

Selecting Data

# Single column (returns Series)
df["name"]

# Multiple columns (returns DataFrame)
df[["name", "age"]]

# By position
df.iloc[0]         # First row
df.iloc[0:3, 1:3]  # Rows 0-2, columns 1-2

# By label
df.loc[0, "name"]  # Row 0, column "name"

Filtering Rows

# Single condition
adults = df[df["age"] >= 18]

# Multiple conditions
young_londoners = df[(df["age"] < 30) & (df["city"] == "London")]

# Using isin()
selected = df[df["city"].isin(["London", "Tokyo"])]

# String methods
starts_with_a = df[df["name"].str.startswith("A")]

Adding and Modifying Columns

# New column
df["senior"] = df["age"] >= 65

# Computed column
df["birth_year"] = 2026 - df["age"]

# Apply a function
df["name_lower"] = df["name"].apply(str.lower)

# Conditional column
df["group"] = df["age"].apply(lambda x: "young" if x < 30 else "adult")

Grouping and Aggregation

# Group by city, get mean age
df.groupby("city")["age"].mean()

# Multiple aggregations
df.groupby("city").agg(
    avg_age=("age", "mean"),
    count=("name", "count"),
    max_age=("age", "max")
)

Handling Missing Data

# Find missing values
df.isnull().sum()

# Drop rows with any missing values
df.dropna()

# Fill missing values
df["age"].fillna(df["age"].median(), inplace=True)

Summary

Pandas is essential for anyone working with data in Python. The patterns above cover 80% of daily data work. Master selection, filtering, and groupby, and you can handle most data tasks confidently.