New Features in overviewpy
Mid Year, New overviewpy? π
It’s been a while since the first release of overviewpy, and I’m excited to share what’s new in v0.2.0 β it’s a big one! The package has grown substantially from its humble beginnings as a table-generating features into a full(er) toolkit for exploratory data analysis of panel data. Let me walk you through what’s new.
A New API: The Overview Class
The biggest structural change in v0.2.0 is that overviewpy now has a proper object-oriented interface. Instead of calling standalone functions, you create an Overview object once and then call methods on it:
from overviewpy.overview import Overview
import pandas as pd
data = {
'id': ['RWA', 'RWA', 'RWA', 'GAB', 'GAB', 'FRA'],
'year': [2022, 2023, 2021, 2023, 2020, 2019]
}
df = pd.DataFrame(data)
overview = Overview(df=df, id_col='id', time='year')
This makes for cleaner, more readable code β you define your data and columns once, and every function call flows naturally from there. A big thank you to Barrett Smith for bringing this idea to life β the OOP approach was his initiative, and it introduced a whole new approach of how you can use overviewpy. And speaking of those calls, there are a lot of new ones!
Method Calls in overviewpy
overview_tab β Automatic NA Handling
overview_tab hasn’t changed in what it produces β a compact two-column summary of which ids appear in which time periods β but it now handles missing values for you automatically. Rows with NAs in id_col or time are silently dropped with a UserWarning, so no extra preprocessing needed.
df_overview = overview.overview_tab()

Alternative text
Table showing two columns (id and time) with aggregated time stamps.overview_summary β Know Your Data at a Glance
New in v0.2.0 is overview_summary, which returns a per-column summary of any data frame: non-null count, number of unique values, and a few sample values. This feature was also introduced by Barrett Smith - thank you so much for adding it!
overview.overview_summary()

Alternative text
A table showing you the summary output of the data (the name of the columns, the non-missing values, the unique values as well as sample values).
Think of it as a friendly first look at any unfamiliar dataset.
overview_na β Missing Values at a Glance
overview_na plots an overview of missing values across your dataset. New in v0.2.0 are several extra parameters:
- you can now switch between percentages and absolute counts,
- flip to a row-wise view,
- and control the y-axis scale.
overview.overview_na(perc=True, row_wise=False)

Alternative text
A bar plot showing NA values per column.
overview_plot β Visualize Observation Coverage
Ever wanted to see at a glance which units are observed in which time periods? overview_plot generates a connected dot-plot that maps each id against its time coverage:
overview.overview_plot()

Alternative text
Image showing connected dots in a plot indicating a consecutive time period being represented in the data.
We perceived it as valuable in overviewR and hopefully it also serves as an intuitive way to spot gaps or irregular coverage in the data of the Python community.
overview_heat β A Heatmap of Your Sample
For a more granular take on coverage, overview_heat produces a heatmap of observation counts (or percentages) for each idβtime combination. The darker the cell, the more observations you have β making sparse periods immediately visible:
overview.overview_heat()

Alternative text
Heatmap indicating the coverage of a id x time combination in your data.overview_overlap β Compare Two Data Frames
How much do two datasets actually overlap? overview_overlap answers that question visually, with either a bar chart or a Venn diagram comparing the id coverage of two data frames:
overview.overview_overlap(df2=other_df, id_col2='country')
Alternative text
Grouped bar plots indicating the overlap of two datasets.
overview.overview_overlap(df2=other_df, id_col2='country', plot_type='venn')

Alternative text
A Venn diagram indicating the overlap across two datasets.
This is especially useful when merging datasets and you want to know what you’d be losing.
overview_crossplot & overview_crosstab β Split by Conditions
Sometimes you want to examine how observations are distributed across two conditions simultaneously. overview_crossplot creates a scatter plot divided into four quadrants by two user-defined thresholds, while overview_crosstab produces a 2Γ2 cross-table of the same:
overview.overview_crossplot(var1='gdp', var2='pop', threshold1=25000, threshold2=2.75e7)

Alternative text
A scatter plot showing the distribution of observation points based on two conditions.
overview_markdown & overview_latex β Export-Ready Output
Once you have your overview_tab summary, you can now export it directly to Markdown or LaTeX β no manual formatting needed.
print(overview.overview_markdown())
## Time and scope of the sample
| Sample | Time frame |
|---|---|
| ARG | 2002 |
| BEL | 2013-2014 |
| FRA | 2015, 2019 |
| GAB | 2020, 2023 |
| RWA | 2021-2023 |
overview_markdown is great for dropping tables straight into a blog post or README, while overview_latex produces publication-ready output you can paste into a paper:
overview.overview_latex(save_out=True, file_path='sample_table.tex')
Getting Started
Install the latest version from PyPI:
pip install overviewpy
For the full API reference and worked examples, check out the documentation.
Did you get a chance to give it a try? I’d love to hear your feedback β feel free to open an issue or start a discussion on GitHub!