Pandas Interview Questions and Answers

Sanjay Kumar PhD
7 min readJan 2, 2025

--

Image Generated by Author using DALL E

1. What is Pandas in Python, and why is it used?

Question: What is the primary purpose of the Pandas library in Python, and what are its core data structures?

Answer:
Pandas is an open-source Python library used for data manipulation and analysis. It provides easy-to-use data structures and tools for handling structured data, such as tabular data in spreadsheets or SQL tables.

Core Data Structures:

  • Series: A one-dimensional labeled array capable of holding any data type.
  • DataFrame: A two-dimensional labeled data structure with rows and columns, similar to a spreadsheet or SQL table.

Pandas is widely used for tasks like data cleaning, transformation, aggregation, and merging.

2. Explain the difference between a Pandas Series and a DataFrame.

Question: How do Series and DataFrames differ in Pandas?

Series:

  • One-dimensional labeled array.
  • Homogeneous data (all elements are of the same type).
  • Example: A single column from a DataFrame.

DataFrame:

  • Two-dimensional data structure.
  • Heterogeneous data (each column can have a different type).
  • Example: A spreadsheet with labeled rows and columns.

3. How do you handle missing data in Pandas?

Question: What methods are available in Pandas to deal with missing data?

  • Identify Missing Data: Use .isnull() to check for missing values.
  • Drop Missing Values: Use .dropna() to remove rows or columns with missing data.
  • Fill Missing Values: Use .fillna() to fill missing data with a specified value (e.g., mean, median).
  • Interpolate: Use .interpolate() for linear or other types of interpolation.

4. How can you merge or concatenate DataFrames in Pandas?

Question: Describe how to combine multiple DataFrames in Pandas.

Answer:

  • Concatenation: Use pd.concat() to combine DataFrames either along rows (axis=0) or columns (axis=1).
  • Merging: Use pd.merge() to combine DataFrames based on common columns or indices.
  • Join: Use .join() for merging on indices.

5. How do you filter rows in a DataFrame?

Question: What techniques are available in Pandas to filter data based on conditions?

6. Explain the groupby operation in Pandas.

Question: What is the purpose of the groupby function in Pandas?

Answer:
The groupby() function is used to group data based on one or more columns and perform aggregate functions like sum(), mean(), or custom functions.

7. What are some common methods to summarize data in Pandas?

Question: How can you generate summary statistics for a DataFrame?

Answer:

  • Use .describe() to get statistics like count, mean, std, min, max, etc.
  • Use .info() to understand the structure of the DataFrame.
  • Use .value_counts() to count occurrences of unique values in a Series.

8. How do you sort data in Pandas?

Question: How can you sort rows or columns in Pandas?

Answer: Use .sort_values() to sort rows by values

Use .sort_index() to sort by index.

9. What are common file formats Pandas can read from or write to?

Question: Which file formats are supported by Pandas for reading and writing data?

Answer:

Reading:

  • CSV: pd.read_csv()
  • Excel: pd.read_excel()
  • SQL: pd.read_sql()
  • JSON: pd.read_json()

Writing:

  • CSV: .to_csv()
  • Excel: .to_excel()

10. What is the purpose of the .apply() method in Pandas?

Question: How does the .apply() method enhance functionality in Pandas?

Answer:
The .apply() method allows applying a function to each element or row/column of a DataFrame or Series.
Example:

11. How can you select specific rows and columns in Pandas?

Question: What are the ways to index and slice data in Pandas?

12. Explain the difference between .iloc and .loc.

Question: How does .iloc differ from .loc in Pandas?

Answer:

  • .iloc: Uses integer-based indexing (row/column positions).
  • .loc: Uses label-based indexing (row/column labels).

13. How can you modify column names in a DataFrame?

Question: What are ways to rename columns in Pandas?

14. What is vectorized operation in Pandas, and why is it important?

Question: Why are vectorized operations faster than loops in Pandas?

Answer:

  • Vectorized operations apply a function over an entire array or Series simultaneously.
  • They are optimized and implemented in C, making them faster than Python loops.
  • Example

15. How can you handle duplicate data in Pandas?

Question: What methods are available to deal with duplicate rows in a DataFrame?

Answer:

16. Explain the concept of broadcasting in Pandas.

Question: What is broadcasting in Pandas, and how is it used?

Answer:
Broadcasting automatically aligns Series or DataFrame operations by index or column labels.
Example:

17. How can you reshape data in Pandas?

Question: What functions allow reshaping a DataFrame?

Answer:

18. What is the difference between apply() and applymap()?

Question: When should you use apply() vs. applymap()?

19. How do you work with time series data in Pandas?

Question: What are some key functions for handling time series data?

Answer:

20. How can you perform mathematical operations on columns?

Question: What are some ways to perform arithmetic operations on DataFrame columns?

Answer:

21. What are common ways to visualize data using Pandas?

Question: How can you create visualizations directly in Pandas?

Answer:

22. How do you save and load DataFrames?

Question: What are common methods to persist DataFrames?

Answer:

23. What is the astype() method, and why is it used?

Question: How does the astype() method work in Pandas?

24. What is the difference between .map(), .apply(), and .applymap()?

Question: When should you use .map(), .apply(), or .applymap() in Pandas?

25. How do you merge DataFrames with different key columns?

Question: How do you perform a merge on DataFrames when the keys differ?

26. What is the difference between pd.concat() and pd.merge()?

Question: How do concat and merge differ in Pandas?

27. How can you pivot a DataFrame in Pandas?

Question: What is the purpose of the pivot() function?

Answer:
pivot() reshapes a DataFrame by specifying index, columns, and values.
Example:

28. How is pivot_table() different from pivot()?

Question: What advantages does pivot_table() have over pivot()?

Answer:

  • pivot_table() supports aggregation functions, whereas pivot() does not.
  • Handles duplicate values gracefully using the aggfunc parameter.
    Example:

29. How can you remove a column from a DataFrame?

Question: What are different ways to drop a column in Pandas?

30. How can you handle categorical data in Pandas?

Question: What are some techniques to work with categorical data?

Answer:

31. What is the purpose of the .groupby() method?

Question: Explain the stages of a groupby operation in Pandas.

Answer:

  1. Splitting: Divide the data into groups.
  2. Applying: Perform an operation on each group (e.g., sum, mean).
  3. Combining: Combine the results into a single DataFrame or Series.

Example:

32. How do you check the memory usage of a DataFrame?

Question: What methods help you inspect memory usage in Pandas?

Answer:

33. How do you filter rows based on string values?

Question: How can you filter rows that contain specific strings?

Answer:

34. What is the difference between .at and .iat?

Question: When should you use .at versus .iat?

35. How do you reset the index of a DataFrame?

Question: What method allows you to reset the index of a DataFrame?

Answer:

--

--

Sanjay Kumar PhD
Sanjay Kumar PhD

Written by Sanjay Kumar PhD

AI Product | Data Science| GenAI | Machine Learning | LLM | AI Agents | NLP| Data Analytics | Data Engineering | Deep Learning | Statistics

No responses yet