Oprava Dátumov v Pyspark DataFrame - nastavený na minimálnu hodnotu,

0

Otázka

Mám údajov rám s timestamp oblasti - RECEIPTDATEREQUESTED:timestamp Z nejakého dôvodu, nie sú dáta, ktoré sú menej ako 1900-01-01. Nechcem, aby sa tieto, čo chcem robiť, je pre každú hodnotu v stĺpci dataframe, kde RECEIPTDATEREQUESTED<'1900-01-01 00:00:00" a potom nastavte čas, aby buď 1900-01-01 alebo null. Snažil som sa niekoľko spôsobov, ako to urobiť, ale zdá sa, že niektoré jednoduché musí existovať. Myslel som, že niečo podobné by mohlo fungovať, ale

import datetime
def testdate(date_value):
    oldest = datetime.datetime.strptime('1900-01-01 00:00:00', '%Y-%m-%d')
    try:
        if (date_value < oldest):
            return oldest
        else:
            return date_value
    except ValueError:
        return oldest
udf_testdate = udf(lambda x:testdate(x),TimestampType())
bdf = olddf.withColumn("RECEIPTDATEREQUESTED",udf_testdate(col("RECEIPTDATEREQUESTED")))
data-cleaning pyspark
2021-11-23 20:05:00
1

Najlepšiu odpoveď

0

Môžete použiť podmienené hodnotenie pomocou when and otherwise ak chcete nastaviť, RECEIPTDATEREQUESTED buď null alebo 1900-01-01 00:00:00 whenerver hodnota je < '1900-01-01 00:00:00'.


from pyspark.sql import functions as F

data = [("1000-01-01 00:00:00",), 
        ("1899-12-31 23:59:59",),
        ("1900-01-01 00:00:00",), 
        ("1901-01-01 00:00:00",)]

df = spark.createDataFrame(data, ("RECEIPTDATEREQUESTED",))\
          .withColumn("RECEIPTDATEREQUESTED", F.to_timestamp(F.col("RECEIPTDATEREQUESTED")))


# Fill null

df.withColumn("RECEIPTDATEREQUESTED", 
              F.when(F.col("RECEIPTDATEREQUESTED") < "1900-01-01 00:00:00", F.lit(None))
               .otherwise(F.col("RECEIPTDATEREQUESTED")))\
  .show(200, False)

# Fill default value

df.withColumn("RECEIPTDATEREQUESTED", 
              F.when(F.col("RECEIPTDATEREQUESTED") < "1900-01-01 00:00:00", F.lit("1900-01-01 00:00:00").cast("timestamp"))
               .otherwise(F.col("RECEIPTDATEREQUESTED")))\
  .show(200, False)

Výstup

Vyplniť null

+--------------------+
|RECEIPTDATEREQUESTED|
+--------------------+
|null                |
|null                |
|1900-01-01 00:00:00 |
|1901-01-01 00:00:00 |
+--------------------+

Vyplniť 1900-01-01 00:00:00

+--------------------+
|RECEIPTDATEREQUESTED|
+--------------------+
|1900-01-01 00:00:00 |
|1900-01-01 00:00:00 |
|1900-01-01 00:00:00 |
|1901-01-01 00:00:00 |
+--------------------+
2021-11-23 20:53:37

áno. ďakujem. Nie som si istý, ako alebo prečo to funguje, tak som google to viac, ale ďakujem.
roguecode

V iných jazykoch

Táto stránka je v iných jazykoch

Русский
..................................................................................................................
Italiano
..................................................................................................................
Polski
..................................................................................................................
Română
..................................................................................................................
한국어
..................................................................................................................
हिन्दी
..................................................................................................................
Français
..................................................................................................................
Türk
..................................................................................................................
Česk
..................................................................................................................
Português
..................................................................................................................
ไทย
..................................................................................................................
中文
..................................................................................................................
Español
..................................................................................................................