Monday, 26 August 2013

How do I select every pair of consecutive events in Hive?

How do I select every pair of consecutive events in Hive?

Imagine I have Hive table T with consecutive events:
n
---
1
2
3
4
...
I need to write some code to select every pair of consecutive events from
this table. Currently I have a solution like
select t1.n, min(t2.n) from t t1 join t t2 where t1.n < t2.n group by t1.n;
Which is very ineffective even for relatively small table (thousands of
rows) as it produces temporary cartesian product of table on itself (i.e.
O(n^2) in complexity).
I would like to find less expensive (hopefully linear) solution to the
same problem.

No comments:

Post a Comment