Ako vypočítať rozdiel medzi v hodinách dve časové a vylúčiť víkendy

0

Otázka

Mám dataframe, ako je tento:

     Folder1                   Folder2                 
0   2021-11-22 12:00:00      2021-11-24 10:00:00
1   2021-11-23 10:30:00      2021-11-25 18:30:00    
2   2021-11-12 10:30:00      2021-11-15 18:30:00    
3   2021-11-23 10:00:00            NaN         

         

Pomocou tohto kódu:

def strfdelta(td: pd.Timestamp):
    seconds = td.total_seconds()
    hours = int(seconds // 3600)
    minutes = int((seconds % 3600) // 60)
    seconds = int(seconds % 60)
    return f"{hours:02}:{minutes:02}:{seconds:02}"
            
df["Folder1"] = pd.to_datetime(df["Folder1"])
df["Folder2"] = pd.to_datetime(df["Folder2"])

bm1 = df["Folder1"].notna() & df["Folder2"].notna()
bm2 = df["Folder1"].notna() & df["Folder2"].isna()

df["Time1"] = (df.loc[bm1, "Folder2"] - df.loc[bm1, "Folder1"]).apply(strfdelta)
df["Time2"] = (datetime.now() - df.loc[bm2, "Folder1"]).apply(strfdelta)

Mám toto df:

     Folder1                   Folder2                           Time1     Time2
0   2021-11-22 12:00:00      2021-11-24 10:00:00                46:00:00    NaN
1   2021-11-23 10:30:00      2021-11-25 18:30:00                56:00:00    NaN
2   2021-11-12 10:30:00      2021-11-15 18:30:00                80:00:00    NaN
3   2021-11-23 10:00:00            NaN                             NaN     03:00:00

V podstate, to je to, čo chcem, ale, ako sa môžem vylúčiť víkendy hodín pri výpočte rozdiel medzi časové z Folder1 a Folder2? Čo by som mal zmeniť mať df, ako je tento:

     Folder1                   Folder2                           Time1     Time2
0   2021-11-22 12:00:00      2021-11-24 10:00:00                46:00:00    NaN
1   2021-11-23 10:30:00      2021-11-25 18:30:00                56:00:00    NaN
2   2021-11-12 10:30:00      2021-11-15 18:30:00                32:00:00    NaN
3   2021-11-23 10:00:00            NaN                            NaN     03:00:00

Takže, v riadku s indexom 2, 13.11 a 14.11 boli víkendy, takže v Čase 1 rozdiel by mal byť 32 namiesto 80

dataframe datetime pandas python
2021-11-23 12:10:15
3

Najlepšiu odpoveď

2

Myslím, že by ste pákový efekt na pandy.date_range funkciu v kombinácii s pandy.tseries.znižovanie.CustomBusinessHour , ako je tento:

# import pandas and numpy
import pandas as pd
import numpy as np

# construct dataframe
df = pd.DataFrame()
df["Folder1"] = pd.to_datetime(
    pd.Series(
        [
            "2021-11-22 12:00:00",
            "2021-11-23 10:30:00",
            "2021-11-12 10:30:00",
            "2021-11-23 10:00:00",
        ]
    )
)
df["Folder2"] = pd.to_datetime(
    pd.Series(
        [
            "2021-11-24 10:00:00", 
            "2021-11-25 18:30:00", 
            "2021-11-15 18:30:00", 
            np.NaN
        ]
    )
)

# define custom business hours
cbh = pd.tseries.offsets.CustomBusinessHour(start="0:00", end="23:59")

# actual calculation
df["Time1"] = df[~(df["Folder1"].isnull() | df["Folder2"].isnull())].apply(
    lambda row: len(
        pd.date_range(
            start=row["Folder1"], 
            end=row["Folder2"], 
            freq=cbh)),
    axis=1,
)

df.head()

Ktoré pre mňa výnosov:

print(df.head())
              Folder1             Folder2  Time1
0 2021-11-22 12:00:00 2021-11-24 10:00:00   46.0
1 2021-11-23 10:30:00 2021-11-25 18:30:00   56.0
2 2021-11-12 10:30:00 2021-11-15 18:30:00   32.0
3 2021-11-23 10:00:00                 NaT    NaN

Ako bonus si môžete urobiť svoj Time2 výpočet efektívnejšie pomocou to, ako dobre:

df["Time2"] = df[df["Folder2"].isnull()].apply(
    lambda row: len(
        pd.date_range(
            start=row["Folder1"],
            end=datetime.datetime.now(),
            freq=cbh)),
    axis=1,
)

Ktoré pre mňa výnosy (v 14:45 SEČ):

print(df.head())
              Folder1             Folder2  Time1  Time2
0 2021-11-22 12:00:00 2021-11-24 10:00:00   46.0    NaN
1 2021-11-23 10:30:00 2021-11-25 18:30:00   56.0    NaN
2 2021-11-12 10:30:00 2021-11-15 18:30:00   32.0    NaN
3 2021-11-23 10:00:00                 NaT    NaN    5.0
2021-11-23 13:54:11

Ahoj @Jonathan, ďakujem za odpoveď, to funguje v poriadku, ale potrebujem stĺpcov v Time1 a Time2 byť uvedené tiež v minútach a sekundách, formát, pretože potrebujem vypočítať SLA, takže namiesto 46.0, možno som si to v formáte ako 46:00:00, pretože mám prípadoch, kde hodnota v Time1 je, ako je táto: 12:34:23.
user14073111

Ah chýbala, že. V tom prípade môžete možno kombinovať ho s týmto: stackoverflow.com/a/40276658/2186184
Jonathan

Navrhoval by som prepracovať svoje strfdelta funkcia vrátiť datetime.timedelta namiesto reťazec pomocou datetime.timedelta(seconds=seconds_input)
Jonathan
1
df = pd.DataFrame({'Folder1': ['2021-11-22 12:00:00', '2021-11-23 10:30:00', '2021-11-12 10:30:00', '2021-11-23 10:00:00'],
                   'Folder2': ['2021-11-24 10:00:00', '2021-11-25 18:30:00', '2021-11-15 18:30:00', None]})
df[['Folder1','Folder2']] = df[['Folder1','Folder2']].astype('datetime64')

def strfdelta(t1, t2):
    hd = pd.date_range(t1, t2, freq='W-SAT').append(pd.date_range(t1, t2, freq='W-SUN'))
    sec = (t2-t1).total_seconds() - len(hd)*24*3600
    return f"{int(sec//3600):02d}:{int((sec%3600)//60):02d}:{int(sec%60):02d}"

now = pd.to_datetime('now')
df['Time1'] = df.fillna(now).apply(lambda x: strfdelta(x['Folder1'], x['Folder2']), axis=1)
print(df) 

Tlačí:

              Folder1             Folder2     Time1
0 2021-11-22 12:00:00 2021-11-24 10:00:00  46:00:00
1 2021-11-23 10:30:00 2021-11-25 18:30:00  56:00:00
2 2021-11-12 10:30:00 2021-11-15 18:30:00  32:00:00
3 2021-11-23 10:00:00                 NaT  20:58:26
2021-11-23 13:35:17

Niektoré úpravy na pridanie Time2 stĺpec, to je to, čo som chcel, ďakujeme!
user14073111
1
df['Folder1']=pd.to_datetime(df['Folder1'])
df['Folder2']=pd.to_datetime(df['Folder2']).fillna(df['Folder1'])

df['missing']=df.apply(lambda x: pd.date_range(start=x['Folder1'], end=x['Folder2'], freq='D'), axis=1)#Create column with missing date periods



df=(df.assign(time=np.where((df['missing'].apply(lambda x: x.strftime('%w'))).map(set).astype(str).str.contains('0|6'),#Where missing periods have a Saturday or Sunday
                            
                            (df['Folder2']-df['Folder1']).astype('timedelta64[h]')-48,# When above condition is met, subtract two 48 hours from the two days columns difference
                            (df['Folder2']-df['Folder1']).astype('timedelta64[h]'))#When condition not met substract just the two date columns)
             ).drop(columns=['missing']) )             
print(df)



Folder1             Folder2  time
0 2021-11-22 12:00:00 2021-11-24 10:00:00  46.0
1 2021-11-23 10:30:00 2021-11-25 18:30:00  56.0
2 2021-11-12 10:30:00 2021-11-15 18:30:00  32.0
3 2021-11-23 10:00:00 2021-11-23 10:00:00   0.0
2021-11-24 06:59:50

V iných jazykoch

Táto stránka je v iných jazykoch

Русский
..................................................................................................................
Italiano
..................................................................................................................
Polski
..................................................................................................................
Română
..................................................................................................................
한국어
..................................................................................................................
हिन्दी
..................................................................................................................
Français
..................................................................................................................
Türk
..................................................................................................................
Česk
..................................................................................................................
Português
..................................................................................................................
ไทย
..................................................................................................................
中文
..................................................................................................................
Español
..................................................................................................................