overviewR (v 0.0.10) is on CRAN and comes with new features 🚀
The package is meant to serve as a Swiss army knife for exploratory data analysis. The basic functions allow you to investigate sample coverage across different time points, missing values across your variables, and also the overlap among two data sets.
Here are the changes in a nutshell:
First we start by installing the newest version and other packages that might be helpful.
# Load the newest CRAN version install.packages("overviewR", force = TRUE) library(overviewR) # Easily Extracting Information About Your Data library(dplyr) library(magrittr) # A Forward-Pipe Operator for R
overview_tab allows you to use multiple time arguments. Here are some examples how to use the function:
Time can be a character vector containing one time variable (it can come in a
YYYY-MM-DD format and can either come as an integer or in the
overview_tab(dat = toydata, id = ccode, time = year)
# A tibble: 5 × 2 # Groups: ccode  ccode time_frame <chr> <chr> 1 AGO 1990 - 1992 2 BEN 1995 - 1999 3 FRA 1993, 1996, 1999 4 GBR 1991, 1993, 1995, 1997, 1999 5 RWA 1990 - 1995
It can also be a list containing multiple time variables (
time = list(year = NULL, month = NULL, day = NULL)).
overview_tab(dat = toydata, id = ccode, time = list(year = toydata$year, month = toydata$month, day = toydata$day), complex_date = TRUE)
# A tibble: 5 × 2 # Groups: ccode  ccode time_frame <chr> <chr> 1 AGO 1990-01-01, 1990-02-02, … 2 BEN 1995-01-01, 1995-02-02, … 3 FRA 1993-01-01, 1993-02-02, … 4 GBR 1991-01-01, 1991-02-02, … 5 RWA 1990-01-01 - 1990-01-12, …
You can use colors in
overview_plot to identify time periods. Here, we introduce a dummy variable that indicates whether the year was before 1995 or not. We use this dummy to color the time lines using the
# Code whether a year was before 1995 toydata %<>% dplyr::mutate(before = ifelse(year < 1995, 1, 0)) # Plot using the `color` argument overview_plot(dat = toydata, id = ccode, time = year, color = before)
You can also change the dot size in
# Plot using the `color` argument overview_plot(dat = toydata, id = ccode, time = year, dot_size = 5)
overview_crosstab has now its visualizing counter-part with
overview_crossplot( toydata, id = ccode, time = year, cond1 = gdp, cond2 = population, threshold1 = 25000, threshold2 = 27000, color = TRUE, label = TRUE )
overview_overlap, you can now compare the overlap in time and id variables across two data sets visually.
# Subset one data set for comparison toydata2 <- toydata %>% dplyr::filter(year > 1992) overview_overlap( dat1 = toydata, dat2 = toydata2, dat1_id = ccode, dat2_id = ccode, plot_type = "bar" # This is the default )
data.tableunder the hood
And, last but not least,
overview_na now also work if you’re using
data.table objects 🥳 (Thanks to my old team @ Kienbaum for being patient enough to explain and let me learn the (not so intuitive) syntax 👩🏼💻)
Here’s a more detailed overview of what each function can do:
||Multiple time arguments|