
Comprehensive Detailed Explanation
We have a KQL table (TaxiData) with columns:
VendorID
tpep_pickup_datetime (timestamp)
payment_type
total_amount
The requirement:
Create a new column FirstPickupDateTime
It should contain the first pickup timestamp per hour
Partitioning should be done by payment_type
Step 1: Which windowing function?
row_cumsum → running cumulative sum (not needed here).
row_rank_dense → assigns ranks without gaps, but does not guarantee minimum value only.
row_rank_min → gives the first/minimum value in each window partition. ✅ Correct.
row_window_session → sessionization of events, not required.
So, the correct function is row_rank_min.
Step 2: Which comparison operator?
We need to select the row where the rank = 1 (the first per partition).
So the correct operator is == (equals).
Step 3: Partitioning
The KQL query should partition by:
bin(tpep_pickup_datetime, 1h) → buckets data into 1-hour windows
payment_type → partitions further by payment type
Completed KQL Query
TaxiData
| sort by tpep_pickup_datetime asc, payment_type asc
| extend FirstPickupDateTime = row_rank_min(tpep_pickup_datetime, 1h, 0m, payment_type)
| where FirstPickupDateTime == 1
This assigns a rank within each 1-hour, per-payment-type window, then keeps the first pickup timestamp.
Why This Works
row_rank_min → ensures we capture the first occurrence in each hour.
== → filters only the first row per partition.
bin(..., 1h) ensures grouping is by hour.
References
Kusto row_rank_min() function
KQL window functions