start_duration = max(previous end_duration + 1, current date);
end_duration = min(presciption_end date, start_duration + duration – 1)
+----------------+-----------+---------+-----------+----------------+----------+--------+----------+----------+
|prescription_uid|patient_uid|ndc |label |dispensation_uid|date |duration|start_date|end_date |
+----------------+-----------+---------+-----------+----------------+----------+--------+----------+----------+
|0 |0 |16714-128|sinvastatin|0 |2015-06-10|30 |2015-06-01|2015-12-01|
|0 |0 |16714-128|sinvastatin|1 |2015-07-15|30 |2015-06-01|2015-12-01|
|0 |0 |16714-128|sinvastatin|2 |2015-08-01|30 |2015-06-01|2015-12-01|
|0 |0 |16714-128|sinvastatin|3 |2015-10-01|30 |2015-06-01|2015-12-01|
+----------------+-----------+---------+-----------+----------------+----------+--------+----------+----------+
EXPECTED RESULT:
+----------------+-----------+---------+-----------+----------------+----------+--------+----------+----------+--------------------+------------------+--------------+------------+
|prescription_uid|patient_uid|ndc |label |dispensation_uid|date |duration|start_date|end_date |first_start_duration|first_end_duration|start_duration|end_duration|
+----------------+-----------+---------+-----------+----------------+----------+--------+----------+----------+--------------------+------------------+--------------+------------+
|0 |0 |16714-128|sinvastatin|0 |2015-06-10|30 |2015-06-01|2015-12-01|2015-06-10 |2015-07-09 |2015-06-10 |2015-07-09 |
|0 |0 |16714-128|sinvastatin|1 |2015-07-15|30 |2015-06-01|2015-12-01|2015-06-10 |2015-07-09 |2015-07-15 |2015-08-13 |
|0 |0 |16714-128|sinvastatin|2 |2015-08-01|30 |2015-06-01|2015-12-01|2015-06-10 |2015-07-09 |2015-08-14 |2015-09-13 |
|0 |0 |16714-128|sinvastatin|3 |2015-10-01|30 |2015-06-01|2015-12-01|2015-06-10 |2015-07-09 |2015-10-01 |2015-10-30 |
+----------------+-----------+---------+-----------+----------------+----------+--------+----------+----------+--------------------+------------------+--------------+------------+
https://stackoverflow.com/questions/64396803/how-to-apply-window-function-in-memory-transformation-with-new-column-scala/64405160#64405160
把你的问题分成两部分。
1使用lag获取前一列并导致(示例)和板条箱新列
2使用最小(end_duration)和最大(start_duration)来获取。(示例链接
我目前有一个大型数据集,但为了简单起见,它看起来如下所示: 我想在此数据集上使用一个窗口函数使其看起来如下所示: 过滤器背后的逻辑应该是,对于每个人,我们按照他们认识的时间长短对他们的朋友进行排序(较高的值位于顶部),然后只保留足够的朋友,以便他们的为100。 例如,Alice只需要Bob,因为她认识他的时间最长,而且他们的超过100。Bob需要Daniel和Alice,因为Bob认识Daniel
我有以下数据: 现在我想以这样一种方式过滤数据,我可以删除第6行和第7行,对于特定的uid,我想在代码中只保留一行值为'c' 所以预期的数据应该是: 我使用的窗口函数如下所示:
我有一个数据流是键控的,需要计算不同时间段(1分钟,5分钟,1天,1周)的翻滚计数。 有可能在一个应用程序中计算所有四个窗口计数吗?
我在Scala中查看幻灯片函数中的Spark。