DataFrame-level Preprocessing Examples#

Basic examples of preprocessing methods

The following methods selectively use these libraries:

Impute Missing Values#

impute_missing() - Fills missing values using various methods

from frameon import load_dataset, FrameOn as fo

titanic = fo(load_dataset('titanic'))
titanic_cleaned = titanic.preproc.impute_missing(
    target_cols='age',
    auxiliary_cols=['fare', 'pclass'],
    method='knn',
    n_neighbors=5
)

Restore Full Index#

restore_full_index() - Restores a full index for a DataFrame by filling in missing dates and categories.

It creates a full MultiIndex by generating all possible combinations of dates
(within the range of the date column) and unique values of the grouping columns.

from frameon import FrameOn as fo
import pandas as pd
# from IPython.display import display
data = {
    'Date': ['2023-01-01', '2023-01-03'],
    'Category': ['A', 'B'],
    'Value': [10, 20]
}
df = fo(pd.DataFrame(data))
df['Date'] = pd.to_datetime(df['Date'])
display(df)
df_restored = df.preproc.restore_full_index(
    date_cols='Date',
    group_cols=['Category'],
    freq='D',
    fill_value=0
)
display(df_restored)
Date Category Value
0 2023-01-01 A 10
1 2023-01-03 B 20
Date Category Value
0 2023-01-01 A 10
1 2023-01-01 B 0
2 2023-01-02 A 0
3 2023-01-02 B 0
4 2023-01-03 A 0
5 2023-01-03 B 20