DataFrame-level Explore Examples#
Basic examples of explore methods.
Used Libraries#
The following methods selectively use these libraries:
import plotly.io as pio
pio.renderers.default = "notebook"
Info Method#
info() - Shows basic dataframe statistics
from frameon import load_dataset, FrameOn as fo
iris = fo(load_dataset('iris'))
iris.explore.info()
| Summary | Column Types | |||
|---|---|---|---|---|
| Rows | 150 | Text | 0 | |
| Features | 6 | Categorical | 1 | |
| Missing cells | --- | Int | 1 | |
| Exact Duplicates | 3 (2%) | Float | 4 | |
| Fuzzy Duplicates | 5 (3%) | Datetime | 0 | |
| Memory Usage (Mb) | <1 Mb | |||
Anomalies Report#
anomalies_report() - Detects and reports anomalies
from frameon import load_dataset, FrameOn as fo
titanic = fo(load_dataset('titanic'))
titanic.explore.anomalies_report(anomaly_type='missing')
| Count | Percent | |
|---|---|---|
| age | 177 | 19.9% |
| embarked | 2 | 0.2% |
| deck | 688 | 77.2% |
| embark_town | 2 | 0.2% |
| age | embarked | deck | embark_town | |
|---|---|---|---|---|
| age | ||||
| embarked | 0 | |||
| deck | < 23.0% / ^ 89.3% | 0 | ||
| embark_town | 0 | < 100.0% / ^ 100.0% | 0 |
| Column | Category | Total | Anomaly | Anomaly Rate | Total % | Anomaly % | % Diff |
|---|---|---|---|---|---|---|---|
| deck | nan | 688 | 688 | 100.0% | 77.2% | 97.0% | 19.8% |
| class | Third | 491 | 481 | 98.0% | 55.1% | 67.8% | 12.7% |
| alive | no | 549 | 490 | 89.3% | 61.6% | 69.1% | 7.5% |
| survived | pclass | sex | age | sibsp | parch | fare | embarked | class | who | adult_male | deck | embark_town | alive | alone | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 5 | 0 | 3 | male | nan | 0 | 0 | 8.4583 | Q | Third | man | True | nan | Queenstown | no | True |
| 25 | 1 | 3 | female | 38.0 | 1 | 5 | 31.3875 | S | Third | woman | False | nan | Southampton | yes | False |
| 797 | 1 | 3 | female | 31.0 | 0 | 0 | 8.6833 | S | Third | woman | False | nan | Southampton | yes | True |
| 93 | 0 | 3 | male | 26.0 | 1 | 2 | 20.575 | S | Third | man | True | nan | Southampton | no | False |
| 333 | 0 | 3 | male | 16.0 | 2 | 0 | 18.0 | S | Third | man | True | nan | Southampton | no | False |
Detect Anomalies#
detect_anomalies() - Finds rows with anomalies
from frameon import load_dataset, FrameOn as fo
tips = fo(load_dataset('tips'))
tips = fo(tips)
tips.explore.detect_anomalies(anomaly_type='outlier', method='quantile')
| Count | Percent | |
|---|---|---|
| total_bill | 26 | 10.66% |
| tip | 25 | 10.25% |
| size | 13 | 5.33% |
Anomalies Correlation Matrix#
anomalies_corr_matrix() - Shows anomaly correlations
from frameon import load_dataset, FrameOn as fo
titanic = fo(load_dataset('titanic'))
fig = titanic.explore.anomalies_corr_matrix(anomaly_type='missing')
fig.show()
Anomalies Combinations#
anomalies_combinations() - Shows frequent anomaly combinations
from frameon import load_dataset, FrameOn as fo
titanic = fo(load_dataset('titanic'))
titanic.explore.anomalies_combinations(anomaly_type='missing')
| age | embarked | deck | embark_town | |
|---|---|---|---|---|
| age | ||||
| embarked | 0 | |||
| deck | < 23.0% / ^ 89.3% | 0 | ||
| embark_town | 0 | < 100.0% / ^ 100.0% | 0 |
Anomalies by Categories#
anomalies_by_categories() - Shows anomalies by category
from frameon import load_dataset, FrameOn as fo
titanic = fo(load_dataset('titanic'))
titanic.explore.anomalies_by_categories(anomaly_type='missing')
| Column | Category | Total | Anomaly | Anomaly Rate | Total % | Anomaly % | % Diff |
|---|---|---|---|---|---|---|---|
| deck | nan | 688 | 688 | 100.0% | 77.2% | 97.0% | 19.8% |
| class | Third | 491 | 481 | 98.0% | 55.1% | 67.8% | 12.7% |
| alive | no | 549 | 490 | 89.3% | 61.6% | 69.1% | 7.5% |
| sex | male | 577 | 483 | 83.7% | 64.8% | 68.1% | 3.4% |
| who | man | 537 | 450 | 83.8% | 60.3% | 63.5% | 3.2% |
| class | Second | 184 | 169 | 91.8% | 20.7% | 23.8% | 3.2% |
| embark_town | Southampton | 644 | 529 | 82.1% | 72.3% | 74.6% | 2.3% |
| embarked | S | 644 | 529 | 82.1% | 72.3% | 74.6% | 2.3% |
| embarked | Q | 77 | 75 | 97.4% | 8.6% | 10.6% | 1.9% |
| embark_town | Queenstown | 77 | 75 | 97.4% | 8.6% | 10.6% | 1.9% |
| who | child | 83 | 70 | 84.3% | 9.3% | 9.9% | 0.6% |
| embarked | nan | 2 | 2 | 100.0% | 0.2% | 0.3% | 0.1% |
| embark_town | nan | 2 | 2 | 100.0% | 0.2% | 0.3% | 0.1% |
Detect Simultaneous Anomalies#
detect_simultaneous_anomalies() - Finds rows with multiple anomalies
from frameon import load_dataset, FrameOn as fo
titanic = fo(load_dataset('titanic'))
titanic.explore.detect_simultaneous_anomalies(
anomaly_type='missing',
columns=['age', 'deck']
)
| Column | Individual Anomalies | % of Total | % in Simultaneous |
|---|---|---|---|
| age | 177 | 19.87% | 89.27% |
| deck | 688 | 77.22% | 22.97% |
Anomalies Over Time#
anomalies_over_time() - Plot anomalies across ALL columns over time using resampling
from frameon import load_dataset, FrameOn as fo
taxis = fo(load_dataset('taxis'))
fig = taxis.explore.anomalies_over_time(
anomaly_type='missing',
time_column='pickup'
)
fig.show()