DataFrame-level Explore Examples#

Basic examples of explore methods.

Used Libraries#

The following methods selectively use these libraries:

import plotly.io as pio
pio.renderers.default = "notebook"

Info Method#

info() - Shows basic dataframe statistics

from frameon import load_dataset, FrameOn as fo

iris = fo(load_dataset('iris'))
iris.explore.info()
Dataframe Overview
Summary Column Types
Rows 150 Text 0
Features 6 Categorical 1
Missing cells --- Int 1
Exact Duplicates 3 (2%) Float 4
Fuzzy Duplicates 5 (3%) Datetime 0
Memory Usage (Mb) <1 Mb

Anomalies Report#

anomalies_report() - Detects and reports anomalies

from frameon import load_dataset, FrameOn as fo

titanic = fo(load_dataset('titanic'))
titanic.explore.anomalies_report(anomaly_type='missing')
Missings by Column
  Count Percent
age 177 19.9%
embarked 2 0.2%
deck 688 77.2%
embark_town 2 0.2%
Co-occurring Missings (Pairwise)
  age embarked deck embark_town
age
embarked 0
deck < 23.0% / ^ 89.3% 0
embark_town 0 < 100.0% / ^ 100.0% 0
Missings distribution across categories
Column Category Total Anomaly Anomaly Rate Total % Anomaly % % Diff
deck nan 688 688 100.0% 77.2% 97.0% 19.8%
class Third 491 481 98.0% 55.1% 67.8% 12.7%
alive no 549 490 89.3% 61.6% 69.1% 7.5%
Sample Missings Rows
  survived pclass sex age sibsp parch fare embarked class who adult_male deck embark_town alive alone
5 0 3 male nan 0 0 8.4583 Q Third man True nan Queenstown no True
25 1 3 female 38.0 1 5 31.3875 S Third woman False nan Southampton yes False
797 1 3 female 31.0 0 0 8.6833 S Third woman False nan Southampton yes True
93 0 3 male 26.0 1 2 20.575 S Third man True nan Southampton no False
333 0 3 male 16.0 2 0 18.0 S Third man True nan Southampton no False

Detect Anomalies#

detect_anomalies() - Finds rows with anomalies

from frameon import load_dataset, FrameOn as fo

tips = fo(load_dataset('tips'))
tips = fo(tips)
tips.explore.detect_anomalies(anomaly_type='outlier', method='quantile')
Outliers by Column (method: quantile, threshold: 0.05)
  Count Percent
total_bill 26 10.66%
tip 25 10.25%
size 13 5.33%

Anomalies Correlation Matrix#

anomalies_corr_matrix() - Shows anomaly correlations

from frameon import load_dataset, FrameOn as fo

titanic = fo(load_dataset('titanic'))
fig = titanic.explore.anomalies_corr_matrix(anomaly_type='missing')
fig.show()

Anomalies Combinations#

anomalies_combinations() - Shows frequent anomaly combinations

from frameon import load_dataset, FrameOn as fo

titanic = fo(load_dataset('titanic'))
titanic.explore.anomalies_combinations(anomaly_type='missing')
Co-occurring Missings (Pairwise)
  age embarked deck embark_town
age
embarked 0
deck < 23.0% / ^ 89.3% 0
embark_town 0 < 100.0% / ^ 100.0% 0

Anomalies by Categories#

anomalies_by_categories() - Shows anomalies by category

from frameon import load_dataset, FrameOn as fo

titanic = fo(load_dataset('titanic'))
titanic.explore.anomalies_by_categories(anomaly_type='missing')
Missings distribution across categories
Column Category Total Anomaly Anomaly Rate Total % Anomaly % % Diff
deck nan 688 688 100.0% 77.2% 97.0% 19.8%
class Third 491 481 98.0% 55.1% 67.8% 12.7%
alive no 549 490 89.3% 61.6% 69.1% 7.5%
sex male 577 483 83.7% 64.8% 68.1% 3.4%
who man 537 450 83.8% 60.3% 63.5% 3.2%
class Second 184 169 91.8% 20.7% 23.8% 3.2%
embark_town Southampton 644 529 82.1% 72.3% 74.6% 2.3%
embarked S 644 529 82.1% 72.3% 74.6% 2.3%
embarked Q 77 75 97.4% 8.6% 10.6% 1.9%
embark_town Queenstown 77 75 97.4% 8.6% 10.6% 1.9%
who child 83 70 84.3% 9.3% 9.9% 0.6%
embarked nan 2 2 100.0% 0.2% 0.3% 0.1%
embark_town nan 2 2 100.0% 0.2% 0.3% 0.1%

Detect Simultaneous Anomalies#

detect_simultaneous_anomalies() - Finds rows with multiple anomalies

from frameon import load_dataset, FrameOn as fo

titanic = fo(load_dataset('titanic'))
titanic.explore.detect_simultaneous_anomalies(
    anomaly_type='missing',
    columns=['age', 'deck']
)
Simultaneous Missing Anomalies Analysis
Column Individual Anomalies % of Total % in Simultaneous
age 177 19.87% 89.27%
deck 688 77.22% 22.97%

Anomalies Over Time#

anomalies_over_time() - Plot anomalies across ALL columns over time using resampling

from frameon import load_dataset, FrameOn as fo

taxis = fo(load_dataset('taxis'))
fig = taxis.explore.anomalies_over_time(
    anomaly_type='missing',
    time_column='pickup'
)
fig.show()