목록Data Engineering/spark pyspark (2)
incastle의 콩나물
pyspark Array<string> column fill na with empty list
핵심은 F.array().cast('array')을 사용하는 것 list column의 na를 채워줄 때 어떻게 해도 계속 에러가 나서 2시간은 삽질한 것 같다. import pyspark.sql.functions as F fill_array = F.array().cast("array") fill_rule = F.when(F.col('txt_set').isNull().fill_array).otherwise(F.col('txt_set') cntn_tb = cntn_tb.withColumn('txt_set', fill_rule)
Data Engineering/spark pyspark
2021. 7. 12. 18:10
Pyspark에 대해 알아보자 (1)
spark.apache.org/docs/latest/api/python/getting_started/quickstart.html Quickstart — PySpark 3.1.1 documentation This is a short introduction and quickstart for the PySpark DataFrame API. PySpark DataFrames are lazily evaluated. They are implemented on top of RDDs. When Spark transforms data, it does not immediately compute the transformation but plans how to compute spark.apache.org 해당 문서를 번역하면..
Data Engineering/spark pyspark
2021. 3. 28. 17:30