System Design Interview Questions and Answers

Sanjay Kumar PhD
7 min readJan 9, 2025

--

Imgae generated by Author using DALL E

1. Design a Parking Lot Management System

Features:

  • Vehicle Management: Entry/exit logs, license plate recognition.
  • Parking Slot Allocation: Dynamic slot assignment, floor preferences.
  • Payment System: Hourly/daily rates, prepaid/postpaid options.
  • Admin Panel: Slot overview, reports, maintenance scheduling.

Architecture:

Frontend:

  • Mobile/web app for customers (book slots, view history).
  • Admin dashboard (manage slots, generate reports).

Backend:

  • REST API for slot booking, vehicle logs, and payments.
  • Database for storing parking slots, vehicles, transactions (SQL or NoSQL based on scale).

IoT Integration:

  • Sensors for slot occupancy.
  • Cameras for license plate recognition.

Challenges & Solutions:

  • Concurrency: Use locks or distributed transactions to prevent overbooking.
  • Scalability: Implement horizontal scaling for high-traffic parking lots.
  • Real-Time Updates: Use WebSockets or Server-Sent Events for live updates.

2. Design an API Rate Limiter

Features:

  • Rate Limiting: Throttling requests based on user or IP.
  • Burst Handling: Temporary allowance for bursts.
  • Quota Management: Monthly/annual limits per user tier.

Implementation:

Algorithms:

  • Token Bucket: Each request consumes a token; tokens refill at a fixed rate.
  • Leaky Bucket: Requests are queued and processed at a fixed rate.

Storage:

  • Use Redis to maintain counters for user requests due to its speed and atomic operations.

Middleware:

  • Integrate with API Gateway or use libraries like ngx_http_limit_req_module (Nginx).

Challenges & Solutions:

  • Distributed System: Use a consistent hashing mechanism for distributed rate-limiting across servers.
  • User Differentiation: Assign different limits for free vs. premium users.

3. Handle Security in Distributed Systems

Key Aspects:

Authentication:

  • OAuth 2.0 or OpenID Connect.
  • Use short-lived JWTs for secure communication.

Encryption:

  • TLS for communication.
  • Encrypt sensitive data at rest (e.g., AES-256).

Access Control:

  • Implement Role-Based Access Control (RBAC) or Attribute-Based Access Control (ABAC).

Auditing:

  • Log all critical actions (e.g., admin changes, data access).

Example:

  • For microservices: Use a service mesh (e.g., Istio) for encrypted communication between services.

4. Handle Millions of Events Per Second

Features:

  • Ingest, process, and store large volumes of events in real time.
  • Provide fault-tolerant and low-latency operations.

Architecture:

Data Ingestion:

  • Use Kafka or RabbitMQ for high-throughput message queues.

Processing:

  • Use Apache Flink or Apache Storm for real-time event processing.

Storage:

  • Use columnar databases like Cassandra for scalable writes.

Visualization:

  • Integrate with tools like Grafana for dashboards.

Challenges & Solutions:

  • Backpressure: Use Flow Control mechanisms in Kafka.
  • Latency: Ensure processing clusters are geographically close to data sources.

5. Design an Online Booking System like Airbnb

Key Features:

Search & Listings:

  • ElasticSearch for fast query results.
  • Ranking based on user preferences and ratings.

Availability & Booking:

  • Lock slots in the database during the transaction.
  • Use distributed locks for consistency.

Payment Integration:

  • Secure payment gateways like Stripe or PayPal.

Notifications:

  • Email/SMS alerts for booking confirmations.

Architecture:

  • Frontend: Progressive Web App (PWA) for users and hosts.
  • Backend:
  • REST/GraphQL APIs.
  • Database: Shard by geographic region for scalability.

6. High Availability in Critical Applications

Strategies:

Redundancy:

  • Use Active-Active deployments for databases.
  • Implement multi-region replication.

Load Balancing:

  • Use tools like AWS Elastic Load Balancer.

Health Monitoring:

  • Implement tools like Prometheus for alerting.

Data Backups:

  • Frequent snapshots of databases for disaster recovery.

7. Photo-Sharing Service like Instagram

Features:

  • Image upload, processing, and sharing.
  • User feeds and notifications.

Architecture:

Frontend:

  • Web/mobile app.
  • Image optimization on the client side.

Backend:

  • Image Storage: Use a CDN (e.g., AWS S3 + CloudFront).
  • Metadata: Store in relational DB (e.g., MySQL).
  • Feed Generation: Precompute feeds for efficiency.

Search:

  • Use ElasticSearch for tag-based image search.

8. CAP Theorem

Explanation:

  • In a distributed system, you can only achieve two out of three:
  • Consistency: All nodes see the same data at the same time.
  • Availability: Every request receives a response (success/failure).
  • Partition Tolerance: The system continues to operate despite network partitions.

Examples:

  1. CP Systems (e.g., HBase): Prioritize consistency over availability.
  2. AP Systems (e.g., DynamoDB): Prioritize availability over consistency.

9. Big Data in Real-Time

Architecture:

Data Ingestion:

  • Kafka for event streaming.

Processing:

  • Flink for event aggregation.

Storage:

  • HDFS or S3 for archival.
  • Cassandra for real-time analytics.

Challenges:

  • Data Skew: Use key partitioning strategies to balance load.
  • Latency: Use low-latency databases like Redis for temporary storage.

10. Distributed File Storage System

Features:

  • High durability and availability.
  • Efficient data retrieval.

Architecture:

  • Metadata Server: Stores file system metadata (e.g., file locations).
  • Chunk Servers: Store actual file data.
  • Replication: Replicate data across multiple nodes.

Example:

  • Google’s GFS: Divides files into chunks and stores them across nodes.

11. Design an Ad-Serving Platform

Key Features:

Real-Time Bidding (RTB):

  • Advertisers bid on ad space in real-time.
  • Use a demand-side platform (DSP) to manage bids.

Targeting:

  • Behavioral targeting based on user history, preferences, and location.
  • Contextual targeting based on content.

Ad Delivery:

  • Optimize delivery to reduce latency using a Content Delivery Network (CDN).

Architecture:

Frontend:

  • User-facing application to display ads.

Backend:

  • Ad Server: Stores ad creatives and serves them based on rules.
  • User Profile Store: Stores data for targeting (e.g., demographics, history).
  • Tracking System: Tracks clicks, impressions, and conversions.

Storage:

  • NoSQL database (e.g., MongoDB or DynamoDB) for storing ad metadata.

Real-Time Processing:

  • Use Kafka for clickstream data ingestion.
  • Spark for fraud detection and analytics.

Challenges:

  • Fraud Detection: Use anomaly detection models to identify invalid clicks/impressions.
  • Latency: Aim for response times <100ms for seamless user experience.

12. Strategies for Fraud Detection in Online Transactions

Techniques:

Rule-Based Systems:

  • Define rules (e.g., transactions > $10,000 from a new IP).
  • Quick to implement but lacks adaptability.

Machine Learning Models:

  • Supervised learning models (e.g., Random Forest, XGBoost) for classification.
  • Anomaly detection models for unsupervised fraud detection.

Behavioral Analysis:

  • Monitor typical user behaviors (e.g., locations, transaction times).

Architecture:

Ingestion:

  • Use Kafka to stream transaction events.

Processing:

  • Real-time scoring using ML models deployed via Flask/FastAPI.

Storage:

  • Store flagged transactions in a NoSQL database for further investigation.

Challenges:

  • False Positives: Optimize models to reduce legitimate transaction rejections.
  • Real-Time Analysis: Ensure decisions are made within milliseconds.

13. Real-Time Analytics System

Key Features:

  • Event Ingestion: Handle high-volume data from multiple sources.
  • Processing: Aggregate, transform, and analyze data in real time.
  • Visualization: Present insights via dashboards.

Architecture:

Data Ingestion:

  • Kafka for event streaming.

Real-Time Processing:

  • Apache Flink or Apache Spark Streaming for aggregations.

Storage:

  • Real-time data: Redis or Memcached.
  • Historical data: Data lake (e.g., AWS S3) or Druid.

Visualization:

  • Tools like Tableau, Grafana, or Power BI.

Challenges:

  • High Throughput: Partition data streams to distribute the load.
  • Fault Tolerance: Use checkpoints in Flink to recover from failures.

14. Design a Trending Topics Feature for a Platform Like Twitter

Key Features:

Topic Detection:

  • Use NLP techniques like Named Entity Recognition (NER).
  • Hashtag frequency analysis.

Trend Ranking:

  • Rank by tweet velocity, user engagement, and geographic location.

Localization:

  • Tailor trends to specific regions or languages.

Architecture:

Data Ingestion:

  • Stream tweets using Kafka.

Processing:

  • Use Spark Streaming for calculating tweet frequencies.
  • ML models for sentiment analysis and topic classification.

Storage:

  • Store trending topics in a cache (e.g., Redis) for low-latency reads.

Challenges:

  • Real-Time Updates: Use sliding window aggregations to compute trends.
  • Spam Detection: Filter out bots and fake trends using behavioral analytics.

15. Design an Email Sending Service

Key Features:

Email Queueing:

  • Queue emails for asynchronous sending.
  • Retry mechanism for failures.

Spam Prevention:

  • Validate email addresses and enforce SPF/DKIM/DMARC.

Tracking:

  • Open rates, click-through rates (CTR), and delivery statuses.

Architecture:

Frontend:

  • Interface for users to compose and schedule emails.

Backend:

  • Email Sending Service: Use SMTP libraries like SendGrid or AWS SES.
  • Queueing: Use RabbitMQ or Kafka for email queueing.

Storage:

  • Relational DB for email logs and tracking data.

Challenges:

  • Rate Limiting: Use a rate limiter to prevent spamming.
  • Deliverability: Use domain reputation management tools.

16. Ensure Data Consistency in Microservices Architecture

Strategies:

Eventual Consistency:

  • Accept that different services may temporarily hold different states.
  • Use event-driven communication (e.g., Kafka).

Distributed Transactions:

  • Implement the Saga pattern to manage transactions.
  • Compensating Transactions: Roll back changes if a failure occurs.

Data Validation:

  • Use data reconciliation jobs to identify and fix inconsistencies.

Example:

  • For an e-commerce platform, ensure order, inventory, and payment services are consistent by publishing events to a central event bus.

17. Design a Calendar System

Key Features:

Event Creation:

  • Single and recurring events.
  • Notifications and reminders.

Time Zone Management:

  • Handle users in different time zones.

Sharing:

  • Allow sharing and collaboration on events.

Architecture:

Frontend:

  • Calendar UI with drag-and-drop functionality.

Backend:

  • Store events in a relational database (e.g., PostgreSQL).
  • Use a job scheduler (e.g., Quartz) for reminders.

Challenges:

  • Manage conflicts for shared events.
  • Efficiently query events for a specific time range.

18. Zero-Downtime Deployments

Strategies:

Blue-Green Deployment:

  • Deploy the new version to a staging environment.
  • Switch traffic to the new version after validation.

Canary Deployment:

  • Gradually release updates to a small subset of users.
  • Rollback quickly if issues are detected.

Rolling Updates:

  • Deploy updates incrementally across servers.

Example:

  • Use Kubernetes rolling updates with health checks to ensure no downtime.

19. Track User Actions on a Website

Features:

Action Logging:

  • Log page views, clicks, and interactions.

Storage:

  • Store raw events for analytics and debugging.

Architecture:

Ingestion:

  • Use JavaScript trackers to send events to Kafka.

Processing:

  • Aggregate events using Spark/Flink.

Storage:

  • Use ClickHouse for scalable event storage.

Visualization:

  • Build dashboards in Grafana or Tableau.

Challenges:

  • Data Volume: Partition data by user ID for efficient storage and querying.
  • GDPR Compliance: Anonymize user data for privacy.

20. Optimize for Read-Heavy vs. Write-Heavy Systems

Read-Heavy:

Caching:

  • Use Redis/Memcached for frequently accessed data.

Read Replicas:

  • Add database replicas to distribute read queries.

Denormalization:

  • Pre-compute and store aggregated data for faster reads.

Write-Heavy:

Efficient Schema Design:

  • Use append-only logs for writes.

Event Sourcing:

  • Write events instead of updating the state directly.

Partitioning:

  • Partition data by write-intensive keys to balance load.

--

--

Sanjay Kumar PhD
Sanjay Kumar PhD

Written by Sanjay Kumar PhD

AI Product | Data Science| GenAI | Machine Learning | LLM | AI Agents | NLP| Data Analytics | Data Engineering | Deep Learning | Statistics

No responses yet