System Design Interview Questions and Answers
1. Design a Parking Lot Management System
Features:
- Vehicle Management: Entry/exit logs, license plate recognition.
- Parking Slot Allocation: Dynamic slot assignment, floor preferences.
- Payment System: Hourly/daily rates, prepaid/postpaid options.
- Admin Panel: Slot overview, reports, maintenance scheduling.
Architecture:
Frontend:
- Mobile/web app for customers (book slots, view history).
- Admin dashboard (manage slots, generate reports).
Backend:
- REST API for slot booking, vehicle logs, and payments.
- Database for storing parking slots, vehicles, transactions (SQL or NoSQL based on scale).
IoT Integration:
- Sensors for slot occupancy.
- Cameras for license plate recognition.
Challenges & Solutions:
- Concurrency: Use locks or distributed transactions to prevent overbooking.
- Scalability: Implement horizontal scaling for high-traffic parking lots.
- Real-Time Updates: Use WebSockets or Server-Sent Events for live updates.
2. Design an API Rate Limiter
Features:
- Rate Limiting: Throttling requests based on user or IP.
- Burst Handling: Temporary allowance for bursts.
- Quota Management: Monthly/annual limits per user tier.
Implementation:
Algorithms:
- Token Bucket: Each request consumes a token; tokens refill at a fixed rate.
- Leaky Bucket: Requests are queued and processed at a fixed rate.
Storage:
- Use Redis to maintain counters for user requests due to its speed and atomic operations.
Middleware:
- Integrate with API Gateway or use libraries like ngx_http_limit_req_module (Nginx).
Challenges & Solutions:
- Distributed System: Use a consistent hashing mechanism for distributed rate-limiting across servers.
- User Differentiation: Assign different limits for free vs. premium users.
3. Handle Security in Distributed Systems
Key Aspects:
Authentication:
- OAuth 2.0 or OpenID Connect.
- Use short-lived JWTs for secure communication.
Encryption:
- TLS for communication.
- Encrypt sensitive data at rest (e.g., AES-256).
Access Control:
- Implement Role-Based Access Control (RBAC) or Attribute-Based Access Control (ABAC).
Auditing:
- Log all critical actions (e.g., admin changes, data access).
Example:
- For microservices: Use a service mesh (e.g., Istio) for encrypted communication between services.
4. Handle Millions of Events Per Second
Features:
- Ingest, process, and store large volumes of events in real time.
- Provide fault-tolerant and low-latency operations.
Architecture:
Data Ingestion:
- Use Kafka or RabbitMQ for high-throughput message queues.
Processing:
- Use Apache Flink or Apache Storm for real-time event processing.
Storage:
- Use columnar databases like Cassandra for scalable writes.
Visualization:
- Integrate with tools like Grafana for dashboards.
Challenges & Solutions:
- Backpressure: Use Flow Control mechanisms in Kafka.
- Latency: Ensure processing clusters are geographically close to data sources.
5. Design an Online Booking System like Airbnb
Key Features:
Search & Listings:
- ElasticSearch for fast query results.
- Ranking based on user preferences and ratings.
Availability & Booking:
- Lock slots in the database during the transaction.
- Use distributed locks for consistency.
Payment Integration:
- Secure payment gateways like Stripe or PayPal.
Notifications:
- Email/SMS alerts for booking confirmations.
Architecture:
- Frontend: Progressive Web App (PWA) for users and hosts.
- Backend:
- REST/GraphQL APIs.
- Database: Shard by geographic region for scalability.
6. High Availability in Critical Applications
Strategies:
Redundancy:
- Use Active-Active deployments for databases.
- Implement multi-region replication.
Load Balancing:
- Use tools like AWS Elastic Load Balancer.
Health Monitoring:
- Implement tools like Prometheus for alerting.
Data Backups:
- Frequent snapshots of databases for disaster recovery.
7. Photo-Sharing Service like Instagram
Features:
- Image upload, processing, and sharing.
- User feeds and notifications.
Architecture:
Frontend:
- Web/mobile app.
- Image optimization on the client side.
Backend:
- Image Storage: Use a CDN (e.g., AWS S3 + CloudFront).
- Metadata: Store in relational DB (e.g., MySQL).
- Feed Generation: Precompute feeds for efficiency.
Search:
- Use ElasticSearch for tag-based image search.
8. CAP Theorem
Explanation:
- In a distributed system, you can only achieve two out of three:
- Consistency: All nodes see the same data at the same time.
- Availability: Every request receives a response (success/failure).
- Partition Tolerance: The system continues to operate despite network partitions.
Examples:
- CP Systems (e.g., HBase): Prioritize consistency over availability.
- AP Systems (e.g., DynamoDB): Prioritize availability over consistency.
9. Big Data in Real-Time
Architecture:
Data Ingestion:
- Kafka for event streaming.
Processing:
- Flink for event aggregation.
Storage:
- HDFS or S3 for archival.
- Cassandra for real-time analytics.
Challenges:
- Data Skew: Use key partitioning strategies to balance load.
- Latency: Use low-latency databases like Redis for temporary storage.
10. Distributed File Storage System
Features:
- High durability and availability.
- Efficient data retrieval.
Architecture:
- Metadata Server: Stores file system metadata (e.g., file locations).
- Chunk Servers: Store actual file data.
- Replication: Replicate data across multiple nodes.
Example:
- Google’s GFS: Divides files into chunks and stores them across nodes.
11. Design an Ad-Serving Platform
Key Features:
Real-Time Bidding (RTB):
- Advertisers bid on ad space in real-time.
- Use a demand-side platform (DSP) to manage bids.
Targeting:
- Behavioral targeting based on user history, preferences, and location.
- Contextual targeting based on content.
Ad Delivery:
- Optimize delivery to reduce latency using a Content Delivery Network (CDN).
Architecture:
Frontend:
- User-facing application to display ads.
Backend:
- Ad Server: Stores ad creatives and serves them based on rules.
- User Profile Store: Stores data for targeting (e.g., demographics, history).
- Tracking System: Tracks clicks, impressions, and conversions.
Storage:
- NoSQL database (e.g., MongoDB or DynamoDB) for storing ad metadata.
Real-Time Processing:
- Use Kafka for clickstream data ingestion.
- Spark for fraud detection and analytics.
Challenges:
- Fraud Detection: Use anomaly detection models to identify invalid clicks/impressions.
- Latency: Aim for response times <100ms for seamless user experience.
12. Strategies for Fraud Detection in Online Transactions
Techniques:
Rule-Based Systems:
- Define rules (e.g., transactions > $10,000 from a new IP).
- Quick to implement but lacks adaptability.
Machine Learning Models:
- Supervised learning models (e.g., Random Forest, XGBoost) for classification.
- Anomaly detection models for unsupervised fraud detection.
Behavioral Analysis:
- Monitor typical user behaviors (e.g., locations, transaction times).
Architecture:
Ingestion:
- Use Kafka to stream transaction events.
Processing:
- Real-time scoring using ML models deployed via Flask/FastAPI.
Storage:
- Store flagged transactions in a NoSQL database for further investigation.
Challenges:
- False Positives: Optimize models to reduce legitimate transaction rejections.
- Real-Time Analysis: Ensure decisions are made within milliseconds.
13. Real-Time Analytics System
Key Features:
- Event Ingestion: Handle high-volume data from multiple sources.
- Processing: Aggregate, transform, and analyze data in real time.
- Visualization: Present insights via dashboards.
Architecture:
Data Ingestion:
- Kafka for event streaming.
Real-Time Processing:
- Apache Flink or Apache Spark Streaming for aggregations.
Storage:
- Real-time data: Redis or Memcached.
- Historical data: Data lake (e.g., AWS S3) or Druid.
Visualization:
- Tools like Tableau, Grafana, or Power BI.
Challenges:
- High Throughput: Partition data streams to distribute the load.
- Fault Tolerance: Use checkpoints in Flink to recover from failures.
14. Design a Trending Topics Feature for a Platform Like Twitter
Key Features:
Topic Detection:
- Use NLP techniques like Named Entity Recognition (NER).
- Hashtag frequency analysis.
Trend Ranking:
- Rank by tweet velocity, user engagement, and geographic location.
Localization:
- Tailor trends to specific regions or languages.
Architecture:
Data Ingestion:
- Stream tweets using Kafka.
Processing:
- Use Spark Streaming for calculating tweet frequencies.
- ML models for sentiment analysis and topic classification.
Storage:
- Store trending topics in a cache (e.g., Redis) for low-latency reads.
Challenges:
- Real-Time Updates: Use sliding window aggregations to compute trends.
- Spam Detection: Filter out bots and fake trends using behavioral analytics.
15. Design an Email Sending Service
Key Features:
Email Queueing:
- Queue emails for asynchronous sending.
- Retry mechanism for failures.
Spam Prevention:
- Validate email addresses and enforce SPF/DKIM/DMARC.
Tracking:
- Open rates, click-through rates (CTR), and delivery statuses.
Architecture:
Frontend:
- Interface for users to compose and schedule emails.
Backend:
- Email Sending Service: Use SMTP libraries like SendGrid or AWS SES.
- Queueing: Use RabbitMQ or Kafka for email queueing.
Storage:
- Relational DB for email logs and tracking data.
Challenges:
- Rate Limiting: Use a rate limiter to prevent spamming.
- Deliverability: Use domain reputation management tools.
16. Ensure Data Consistency in Microservices Architecture
Strategies:
Eventual Consistency:
- Accept that different services may temporarily hold different states.
- Use event-driven communication (e.g., Kafka).
Distributed Transactions:
- Implement the Saga pattern to manage transactions.
- Compensating Transactions: Roll back changes if a failure occurs.
Data Validation:
- Use data reconciliation jobs to identify and fix inconsistencies.
Example:
- For an e-commerce platform, ensure order, inventory, and payment services are consistent by publishing events to a central event bus.
17. Design a Calendar System
Key Features:
Event Creation:
- Single and recurring events.
- Notifications and reminders.
Time Zone Management:
- Handle users in different time zones.
Sharing:
- Allow sharing and collaboration on events.
Architecture:
Frontend:
- Calendar UI with drag-and-drop functionality.
Backend:
- Store events in a relational database (e.g., PostgreSQL).
- Use a job scheduler (e.g., Quartz) for reminders.
Challenges:
- Manage conflicts for shared events.
- Efficiently query events for a specific time range.
18. Zero-Downtime Deployments
Strategies:
Blue-Green Deployment:
- Deploy the new version to a staging environment.
- Switch traffic to the new version after validation.
Canary Deployment:
- Gradually release updates to a small subset of users.
- Rollback quickly if issues are detected.
Rolling Updates:
- Deploy updates incrementally across servers.
Example:
- Use Kubernetes rolling updates with health checks to ensure no downtime.
19. Track User Actions on a Website
Features:
Action Logging:
- Log page views, clicks, and interactions.
Storage:
- Store raw events for analytics and debugging.
Architecture:
Ingestion:
- Use JavaScript trackers to send events to Kafka.
Processing:
- Aggregate events using Spark/Flink.
Storage:
- Use ClickHouse for scalable event storage.
Visualization:
- Build dashboards in Grafana or Tableau.
Challenges:
- Data Volume: Partition data by user ID for efficient storage and querying.
- GDPR Compliance: Anonymize user data for privacy.
20. Optimize for Read-Heavy vs. Write-Heavy Systems
Read-Heavy:
Caching:
- Use Redis/Memcached for frequently accessed data.
Read Replicas:
- Add database replicas to distribute read queries.
Denormalization:
- Pre-compute and store aggregated data for faster reads.
Write-Heavy:
Efficient Schema Design:
- Use append-only logs for writes.
Event Sourcing:
- Write events instead of updating the state directly.
Partitioning:
- Partition data by write-intensive keys to balance load.