Introduction to Pandas DataFrames
Get started with Pandas — loading data, selecting columns, filtering rows, grouping, and the operations you will use every day.
2 min read
Updated Feb 12, 2026
Reviewed Feb 6, 2026
What Is Pandas?
Pandas is Python’s primary library for data manipulation and analysis. The DataFrame — a two-dimensional labeled data structure — is its core object. Think of it as a programmable spreadsheet with far more power.
Creating DataFrames
import pandas as pd
# From a dictionary
df = pd.DataFrame({
"name": ["Ada", "Bob", "Cat"],
"age": [30, 25, 35],
"city": ["London", "Paris", "Tokyo"]
})
# From a CSV file
df = pd.read_csv("data.csv")
# From JSON
df = pd.read_json("data.json")
Exploring Data
df.head() # First 5 rows
df.tail(3) # Last 3 rows
df.shape # (rows, columns)
df.dtypes # Column data types
df.describe() # Statistical summary
df.info() # Memory usage + types
df.columns # Column names
df.nunique() # Unique values per column
Selecting Data
# Single column (returns Series)
df["name"]
# Multiple columns (returns DataFrame)
df[["name", "age"]]
# By position
df.iloc[0] # First row
df.iloc[0:3, 1:3] # Rows 0-2, columns 1-2
# By label
df.loc[0, "name"] # Row 0, column "name"
Filtering Rows
# Single condition
adults = df[df["age"] >= 18]
# Multiple conditions
young_londoners = df[(df["age"] < 30) & (df["city"] == "London")]
# Using isin()
selected = df[df["city"].isin(["London", "Tokyo"])]
# String methods
starts_with_a = df[df["name"].str.startswith("A")]
Adding and Modifying Columns
# New column
df["senior"] = df["age"] >= 65
# Computed column
df["birth_year"] = 2026 - df["age"]
# Apply a function
df["name_lower"] = df["name"].apply(str.lower)
# Conditional column
df["group"] = df["age"].apply(lambda x: "young" if x < 30 else "adult")
Grouping and Aggregation
# Group by city, get mean age
df.groupby("city")["age"].mean()
# Multiple aggregations
df.groupby("city").agg(
avg_age=("age", "mean"),
count=("name", "count"),
max_age=("age", "max")
)
Handling Missing Data
# Find missing values
df.isnull().sum()
# Drop rows with any missing values
df.dropna()
# Fill missing values
df["age"].fillna(df["age"].median(), inplace=True)
Summary
Pandas is essential for anyone working with data in Python. The patterns above cover 80% of daily data work. Master selection, filtering, and groupby, and you can handle most data tasks confidently.