Mid Year, New overviewpy? πŸŽ‰

It’s been a while since the first release of overviewpy, and I’m excited to share what’s new in v0.2.0 β€” it’s a big one! The package has grown substantially from its humble beginnings as a table-generating features into a full(er) toolkit for exploratory data analysis of panel data. Let me walk you through what’s new.

A New API: The Overview Class

The biggest structural change in v0.2.0 is that overviewpy now has a proper object-oriented interface. Instead of calling standalone functions, you create an Overview object once and then call methods on it:

from overviewpy.overview import Overview
import pandas as pd

data = {
    'id': ['RWA', 'RWA', 'RWA', 'GAB', 'GAB', 'FRA'],
    'year': [2022, 2023, 2021, 2023, 2020, 2019]
}

df = pd.DataFrame(data)
overview = Overview(df=df, id_col='id', time='year')

This makes for cleaner, more readable code β€” you define your data and columns once, and every function call flows naturally from there. A big thank you to Barrett Smith for bringing this idea to life β€” the OOP approach was his initiative, and it introduced a whole new approach of how you can use overviewpy. And speaking of those calls, there are a lot of new ones!

Method Calls in overviewpy

overview_tab β€” Automatic NA Handling

overview_tab hasn’t changed in what it produces β€” a compact two-column summary of which ids appear in which time periods β€” but it now handles missing values for you automatically. Rows with NAs in id_col or time are silently dropped with a UserWarning, so no extra preprocessing needed.

df_overview = overview.overview_tab()

small_image

Alternative textTable showing two columns (id and time) with aggregated time stamps.

overview_summary β€” Know Your Data at a Glance

New in v0.2.0 is overview_summary, which returns a per-column summary of any data frame: non-null count, number of unique values, and a few sample values. This feature was also introduced by Barrett Smith - thank you so much for adding it!

overview.overview_summary()

small_image

Alternative textA table showing you the summary output of the data (the name of the columns, the non-missing values, the unique values as well as sample values).

Think of it as a friendly first look at any unfamiliar dataset.

overview_na β€” Missing Values at a Glance

overview_na plots an overview of missing values across your dataset. New in v0.2.0 are several extra parameters:

  • you can now switch between percentages and absolute counts,
  • flip to a row-wise view,
  • and control the y-axis scale.
overview.overview_na(perc=True, row_wise=False)

small_image

Alternative textA bar plot showing NA values per column.

overview_plot β€” Visualize Observation Coverage

Ever wanted to see at a glance which units are observed in which time periods? overview_plot generates a connected dot-plot that maps each id against its time coverage:

overview.overview_plot()

small_image

Alternative textImage showing connected dots in a plot indicating a consecutive time period being represented in the data.

We perceived it as valuable in overviewR and hopefully it also serves as an intuitive way to spot gaps or irregular coverage in the data of the Python community.

overview_heat β€” A Heatmap of Your Sample

For a more granular take on coverage, overview_heat produces a heatmap of observation counts (or percentages) for each id–time combination. The darker the cell, the more observations you have β€” making sparse periods immediately visible:

overview.overview_heat()

small_image

Alternative textHeatmap indicating the coverage of a id x time combination in your data.

overview_overlap β€” Compare Two Data Frames

How much do two datasets actually overlap? overview_overlap answers that question visually, with either a bar chart or a Venn diagram comparing the id coverage of two data frames:

overview.overview_overlap(df2=other_df, id_col2='country')
Alternative textGrouped bar plots indicating the overlap of two datasets.

small_image

overview.overview_overlap(df2=other_df, id_col2='country', plot_type='venn')

small_image

Alternative textA Venn diagram indicating the overlap across two datasets.

This is especially useful when merging datasets and you want to know what you’d be losing.

overview_crossplot & overview_crosstab β€” Split by Conditions

Sometimes you want to examine how observations are distributed across two conditions simultaneously. overview_crossplot creates a scatter plot divided into four quadrants by two user-defined thresholds, while overview_crosstab produces a 2Γ—2 cross-table of the same:

overview.overview_crossplot(var1='gdp', var2='pop', threshold1=25000, threshold2=2.75e7)

small_image

Alternative textA scatter plot showing the distribution of observation points based on two conditions.

overview_markdown & overview_latex β€” Export-Ready Output

Once you have your overview_tab summary, you can now export it directly to Markdown or LaTeX β€” no manual formatting needed.

print(overview.overview_markdown())
## Time and scope of the sample

| Sample | Time frame |
|---|---|
| ARG | 2002 |
| BEL | 2013-2014 |
| FRA | 2015, 2019 |
| GAB | 2020, 2023 |
| RWA | 2021-2023 |

overview_markdown is great for dropping tables straight into a blog post or README, while overview_latex produces publication-ready output you can paste into a paper:

overview.overview_latex(save_out=True, file_path='sample_table.tex')

Getting Started

Install the latest version from PyPI:

pip install overviewpy

For the full API reference and worked examples, check out the documentation.

Did you get a chance to give it a try? I’d love to hear your feedback β€” feel free to open an issue or start a discussion on GitHub!