DataFrame-level Preprocessing Examples#
Basic examples of preprocessing methods
The following methods selectively use these libraries:
Impute Missing Values#
impute_missing() - Fills missing values using various methods
from frameon import load_dataset, FrameOn as fo
titanic = fo(load_dataset('titanic'))
titanic_cleaned = titanic.preproc.impute_missing(
target_cols='age',
auxiliary_cols=['fare', 'pclass'],
method='knn',
n_neighbors=5
)
Restore Full Index#
restore_full_index() - Restores a full index for a DataFrame by filling in missing dates and categories.
It creates a full MultiIndex by generating all possible combinations of dates
(within the range of the date column) and unique values of the grouping columns.
from frameon import FrameOn as fo
import pandas as pd
# from IPython.display import display
data = {
'Date': ['2023-01-01', '2023-01-03'],
'Category': ['A', 'B'],
'Value': [10, 20]
}
df = fo(pd.DataFrame(data))
df['Date'] = pd.to_datetime(df['Date'])
display(df)
df_restored = df.preproc.restore_full_index(
date_cols='Date',
group_cols=['Category'],
freq='D',
fill_value=0
)
display(df_restored)
| Date | Category | Value | |
|---|---|---|---|
| 0 | 2023-01-01 | A | 10 |
| 1 | 2023-01-03 | B | 20 |
| Date | Category | Value | |
|---|---|---|---|
| 0 | 2023-01-01 | A | 10 |
| 1 | 2023-01-01 | B | 0 |
| 2 | 2023-01-02 | A | 0 |
| 3 | 2023-01-02 | B | 0 |
| 4 | 2023-01-03 | A | 0 |
| 5 | 2023-01-03 | B | 20 |