PySpark Basics
Core concepts: RDD, DataFrame, Transformations vs Actions.
Common functions: map, flatMap, filter, groupBy, join, withColumn, select, agg.
Spark Architecture: Driver, Executors, Cluster Manager.
Optimizations: Catalyst Optimizer, DAG, RDD Lineage, Broadcast join.
Partitioning: Why partitions matter, repartition vs coalesce.
🔹 SQL Basics
Joins (inner, left, right, full, self join).
Group By / Aggregations (SUM, COUNT, AVG).
Window functions (ROW_NUMBER, RANK, PARTITION BY).
Subqueries (scalar vs correlated).
CTE (WITH clause).