| Type: | Package |
| Title: | Descriptive Analysis and Visualization for Panel Data |
| Version: | 0.1.1 |
| Description: | Provides a comprehensive set of tools for describing and visualizing panel data structures, as well as for summarizing and visualizing variables within a panel data context. |
| License: | GPL-3 |
| URL: | https://github.com/dtereshch/paneldesc, https://dtereshch.github.io/paneldesc-guides/ |
| BugReports: | https://github.com/dtereshch/paneldesc/issues |
| LazyData: | true |
| Encoding: | UTF-8 |
| RoxygenNote: | 7.3.2 |
| Suggests: | knitr, rmarkdown |
| VignetteBuilder: | knitr |
| NeedsCompilation: | no |
| Packaged: | 2026-03-19 10:02:55 UTC; dmitrii-work |
| Author: | Dmitrii Tereshchenko
|
| Maintainer: | Dmitrii Tereshchenko <dtereshch@gmail.com> |
| Repository: | CRAN |
| Date/Publication: | 2026-03-23 17:30:10 UTC |
Panel Data Factor Variable Decomposition
Description
This function performs one-way tabulations and decomposes counts into between and within components for categorical (factor) variables in panel data.
Usage
decompose_factor(
data,
select = NULL,
index = NULL,
format = "wide",
digits = 3
)
Arguments
data |
A data.frame containing panel data in a long format. |
select |
A character vector specifying which categorical (factor) variables to analyze. If not specified, all factor variables in the data.frame will be used. |
index |
A character vector of length 1 or 2 specifying the names of the entity and (optionally) time variables. The first element is the entity variable; if a second element is provided, it is used as the time variable. Not required if data has panel attributes. |
format |
A character string specifying the output format: "wide" or "long". Default = "wide". |
digits |
An integer indicating the number of decimal places to round shares. Default = 3. |
Details
The output format is controlled by the format parameter.
When format = "wide" (default), returns a data.frame with columns:
variableThe name of the analyzed variable
categoryThe category level of the variable
count_overallOverall frequency (person-time observations)
share_overallOverall share (count_overall / total_obs)
count_betweenBetween-entity frequency (number of entities ever having this category)
share_betweenBetween-entity share (count_between / total_entities)
share_withinWithin-entity share (average share of time entities have this category)
When format = "long", returns a data.frame with columns:
variableThe name of the analyzed variable
categoryThe category level of the variable
dimensionType of decomposition: "overall", "between", or "within"
countFrequency count (NA for within dimension)
shareShare proportion (0 to 1)
The object has class "panel_summary" and two additional attributes:
metadataList containing the function name and the arguments used.
detailsList containing additional information:
count_entities.
Value
A data.frame with categorical panel data decomposition statistics.
References
For Stata users: This corresponds to the xttab command.
See Also
See also decompose_numeric(), summarize_transition().
Examples
data(production)
# Basic usage
decompose_factor(production, index = "firm")
# With panel_data object
panel <- make_panel(production, index = c("firm", "year"))
decompose_factor(panel)
# Selecting specific variables
decompose_factor(production, select = "industry", index = "firm")
# Returning results in a long format
decompose_factor(production, index = "firm", format = "long")
# Custom rounding
decompose_factor(production, index = "firm", digits = 2)
# Accessing attributes
out_dec_fac <- decompose_factor(production, index = "firm")
attr(out_dec_fac, "metadata")
attr(out_dec_fac, "details")
Panel Data Numeric Variable Decomposition
Description
This function decomposes variance of numeric variables into between and within components in panel data.
Usage
decompose_numeric(
data,
select = NULL,
index = NULL,
detail = TRUE,
format = "long",
digits = 3
)
Arguments
data |
A data.frame containing panel data in a long format. |
select |
A character vector specifying which numeric variables to analyze. If not specified, all numeric variables in the data.frame will be used. |
index |
A character vector of length 1 or 2 specifying the names of the entity and (optionally) time variables. The first element is the entity variable; if a second element is provided, it is used as the time variable. Not required if data has panel attributes. |
detail |
A logical flag indicating whether to return detailed Stata-like output. Default = TRUE. |
format |
A character string specifying the output format: "long" or "wide". Default = "long". |
digits |
An integer indicating the number of decimal places to round statistics. Default = 3. |
Details
The output format is controlled by two parameters: format and detail.
When format = "long" and detail = TRUE (default), returns a data.frame with:
variableThe name of the analyzed variable
dimensionType of decomposition: "overall", "between", or "within"
meanMean value (only for "overall" row)
stdStandard deviation
minMinimum value
maxMaximum value
countNumber of observations or entities
When format = "long" and detail = FALSE, returns a data.frame with:
variableThe name of the variable
dimensionType of decomposition: "overall", "between", or "within"
meanMean value
stdStandard deviation
When format = "wide" and detail = TRUE, returns a data.frame with:
variableThe name of the variable
meanOverall mean
std_overallOverall standard deviation
min_overallOverall minimum
max_overallOverall maximum
count_overallNumber of observations
std_betweenBetween-entity standard deviation
min_betweenMinimum of entity means
max_betweenMaximum of entity means
count_betweenNumber of entities
std_withinWithin-entity standard deviation
min_withinWithin-entity minimum (transformed)
max_withinWithin-entity maximum (transformed)
count_withinAverage observations per entity
When format = "wide" and detail = FALSE, returns a data.frame with:
variableThe name of the variable
meanOverall mean
std_overallOverall standard deviation
std_betweenBetween-entity standard deviation
std_withinWithin-entity standard deviation
The object has class "panel_summary" and two additional attributes:
metadataList containing the function name and the arguments used.
detailsList containing additional information:
count_entities.
Value
A data.frame with panel data decomposition statistics.
References
For Stata users: This corresponds to the xtsum command.
See Also
See also decompose_factor(), summarize_numeric(), plot_heterogeneity().
Examples
data(production)
# Basic usage
decompose_numeric(production, index = "firm")
# With panel_data object
panel <- make_panel(production, index = c("firm", "year"))
decompose_numeric(panel)
# Selecting specific variables
decompose_numeric(production, select = c("sales", "labor"), index = "firm")
# Returning results in a wide format without excessive details
decompose_numeric(production, index = "firm", detail = FALSE, format = "wide")
# Custom rounding
decompose_numeric(production, index = "firm", digits = 2)
# Accessing attributes
out_dec_num <- decompose_numeric(production, index = "firm")
attr(out_dec_num, "metadata")
attr(out_dec_num, "details")
Panel Data Balance Description
Description
This function provides summary statistics for panel data structure with focus on balance and data completeness.
Usage
describe_balance(data, index = NULL, detail = FALSE, digits = 3)
Arguments
data |
A data.frame containing panel data in a long format. |
index |
A character vector of length 2 specifying the names of the entity and time variables. Not required if data has panel attributes. |
detail |
A logical flag indicating whether to return additional statistics (5th, 25th, 50th, 75th, and 95th percentiles). Default = FALSE. |
digits |
An integer specifying the number of decimal places for rounding mean values. Default = 3. |
Details
The statistics for entities describe the distribution of the number of entities observed per time period (cross‑sectional size per period). The statistics for periods describe the distribution of the number of time periods observed per entity (temporal length per entity).
The returned data.frame always contains the following columns:
dimensionEither "entities" or "periods".
meanMean number of entities per period (or periods per entity).
stdStandard deviation.
minMinimum value.
maxMaximum value.
When detail = TRUE, five additional percentile columns are included:
p55th percentile.
p2525th percentile (first quartile).
p5050th percentile (median).
p7575th percentile (third quartile).
p9595th percentile.
All statistics are rounded to the number of decimal places specified by digits.
The object has class "panel_description" and two additional attributes:
metadataList containing the function name and the arguments used.
detailsList containing the full presence matrix.
Value
A data.frame with panel data summary statistics for entities and periods.
Note
An entity-time combination is considered present if the corresponding row contains at least one non‑NA value in any substantive variable (all columns except the entity and time identifiers).
See Also
See also describe_dimensions(), describe_periods(), describe_patterns(), plot_periods().
Examples
data(production)
# Basic usage
describe_balance(production, index = c("firm", "year"))
# With panel_data object
panel <- make_panel(production, index = c("firm", "year"))
describe_balance(panel)
# Returning detailed statisitcs
describe_balance(production, index = c("firm", "year"), detail = TRUE)
# Custom rounding
describe_balance(production, index = c("firm", "year"), digits = 2)
# Accessing attributes
out_des_bal <- describe_balance(production, index = c("firm", "year"))
attr(out_des_bal, "metadata")
attr(out_des_bal, "details")
Panel Data Dimensions Description
Description
This function provides basic dimension counts for panel data: number of rows, unique entities, unique time periods, and substantive variables.
Usage
describe_dimensions(data, index = NULL)
Arguments
data |
A data.frame containing panel data in a long format. |
index |
A character vector of length 2 specifying the names of the entity and time variables. Not required if data has panel attributes. |
Details
The returned data.frame has the following structure:
rowsTotal number of rows in the data frame.
entitiesNumber of distinct values in the entity variable.
periodsNumber of distinct values in the time variable.
variablesNumber of substantive variables (all columns except entity and time).
The object has class "panel_description" and two additional attributes:
metadataList containing the function name and the arguments used.
detailsList with the actual vectors of entities, periods, and substantive variables.
Value
A data.frame containing panel dimension counts.
See Also
See also describe_balance(), describe_periods().
Examples
data(production)
# Basic usage
describe_dimensions(production, index = c("firm", "year"))
# With panel_data object
panel <- make_panel(production, index = c("firm", "year"))
describe_dimensions(panel)
# Accessing attributes
out_des_dim <- describe_dimensions(production, index = c("firm", "year"))
attr(out_des_dim, "metadata")
attr(out_des_dim, "details")
Incomplete Entities Description
Description
This function provides a descriptive table of entities with incomplete observations (missing values).
Usage
describe_incomplete(data, index = NULL, detail = FALSE)
Arguments
data |
A data.frame containing panel data in a long format. |
index |
A character vector of length 1 or 2 specifying the names of the entity and (optionally) time variables. The first element is the entity variable; if a second element is provided, it is used as the time variable. Not required if data has panel attributes. |
detail |
A logical flag indicating whether to include detailed missing counts for each variable. Default = FALSE. |
Details
The returned data.frame has the following structure:
[entity]The entity identifier (name matches input entity variable)
na_countTotal number of missing observations for the entity
variablesNumber of variables with at least one missing value for that entity
When detail = TRUE, additional columns are included for each substantive variable,
showing the number of NAs in that variable for the entity.
The data.frame is sorted by:
Number of variables with NAs (descending)
Total number of NAs (descending)
The object has class "panel_description" and two additional attributes:
metadataList containing the function name and the arguments used.
detailsList containing total entity counts and the IDs of incomplete entities.
Value
A data.frame with incomplete entities description.
Note
The interpretation of incomplete entities may differ depending on whether the panel is balanced or unbalanced. In a balanced panel, each entity has the same number of time periods, so the total possible observations per entity are equal. In an unbalanced panel, entities may have different numbers of time periods, so the number of missing values should be interpreted relative to the entity's total observations. The function does not adjust for the number of time periods per entity; the missing counts reflect absolute counts of NAs in the data. Users should consider the panel structure when interpreting the results.
See Also
See also summarize_missing(), describe_patterns(), describe_periods().
Examples
data(production)
# Basic usage with entity only
describe_incomplete(production, index = "firm")
# With time variable (check duplicates)
describe_incomplete(production, index = c("firm", "year"))
# With panel_data object
panel <- make_panel(production, index = c("firm", "year"))
describe_incomplete(panel)
# Returning detailed results
describe_incomplete(production, index = "firm", detail = TRUE)
# Accessing attributes
out_des_inc <- describe_incomplete(production, index = c("firm", "year"))
attr(out_des_inc, "metadata")
attr(out_des_inc, "details")
Entities Presence Patterns Description
Description
This function describes entities presence patterns in panel data over time.
Usage
describe_patterns(
data,
index = NULL,
delta = NULL,
limits = NULL,
detail = TRUE,
format = "wide",
digits = 3
)
Arguments
data |
A data.frame containing panel data in a long format. |
index |
A character vector of length 2 specifying the names of the entity and time variables. Not required if data has panel attributes. |
delta |
An optional integer giving the expected interval between time periods. |
limits |
Either a single integer (show that many most frequent patterns) or a vector of two integers (show patterns with ranks between the two values, inclusive). If not specified, all patterns are shown. |
detail |
A logical flag indicating whether to return detailed patterns. Default = TRUE. |
format |
A character string specifying the output format: "wide" or "long". Default = "wide". |
digits |
An integer specifying the number of decimal places for rounding share column. Default = 3. |
Details
The output format is controlled by format and detail.
When format = "wide" and detail = TRUE (default):
patternPattern number (ranked by frequency).
[time1], [time2], ...Presence (1) / absence (0) for each time period.
countNumber of entities sharing this pattern.
shareProportion of entities with this pattern (rounded to
digits).
When format = "wide" and detail = FALSE, only the pattern and presence columns are returned.
When format = "long" and detail = TRUE:
patternPattern number.
[time]Time period identifier (name equals the original time variable).
presencePresence (1) / absence (0).
countNumber of entities with this pattern.
shareProportion of entities with this pattern.
When format = "long" and detail = FALSE, only pattern, time, and presence columns are returned.
Effect of delta:
If delta is supplied, the function checks that all observed time points are separated by multiples of delta.
If gaps are detected, a message lists the missing periods (unless the interval was inherited from panel attributes),
and columns for those missing periods are added to the presence matrix – and therefore to the output data.frame – with all zeros.
This ensures that the patterns reflect the full regular sequence of time periods.
The object has class "panel_description" and two additional attributes:
metadataList containing the function name and the arguments used.
detailsList with the full presence matrix, pattern‑entity mapping, and the pattern matrix.
Value
A data.frame with presence patterns.
Note
An entity-time combination is considered present if the corresponding row contains at least one non‑NA value in any substantive variable (i.e., all columns except the entity and time identifiers).
See Also
See also plot_patterns(), describe_periods(), describe_balance().
Examples
data(production)
# Basic usage
describe_patterns(production, index = c("firm", "year"))
# With panel_data object
panel <- make_panel(production, index = c("firm", "year"))
describe_patterns(panel)
# Specifying time interval
describe_patterns(production, index = c("firm", "year"), delta = 1)
# Showing only the top 3 patterns
describe_patterns(production, index = c("firm", "year"), limits = 3)
# Showing patterns ranked 4 to 6
describe_patterns(production, index = c("firm", "year"), limits = c(4, 6))
# Returning results in a long format without excessive details
describe_patterns(production, index = c("firm", "year"), detail = FALSE, format = "long")
# Custom rounding
describe_patterns(production, index = c("firm", "year"), digits = 2)
# Accessing attributes
out_des_pat <- describe_patterns(production, index = c("firm", "year"))
attr(out_des_pat, "metadata")
attr(out_des_pat, "details")
Time Periods Completeness Description
Description
This function calculates, for each time period, the number of entities that have at least one non‑missing value in any substantive variable, and the corresponding share of all entities.
Usage
describe_periods(data, index = NULL, delta = NULL, digits = 3)
Arguments
data |
A data.frame containing panel data in a long format. |
index |
A character vector of length 2 specifying the names of the entity and time variables. Not required if data has panel attributes. |
delta |
An optional integer giving the expected interval between time periods. |
digits |
An integer specifying the number of decimal places for rounding the share column. Default = 3. |
Details
The returned data.frame contains the following columns:
[time]Time period identifier (name matches the input
timevariable).countNumber of distinct entities observed in that period, i.e., entities with at least one row containing a non‑NA value in substantive variables.
shareProportion of entities observed in that period (0 to 1), rounded to
digits.
Effect of delta:
If delta is supplied, the function checks that all observed time points
are separated by multiples of delta.
If gaps are detected, a message lists the missing periods
(unless the interval was inherited from panel attributes).
For each missing period, a row is added to the output with count = 0 and share = 0,
ensuring that the output covers the full regular time sequence.
The object has class "panel_description" and two additional attributes:
metadataList containing the function name and the arguments used.
detailsList with a named list
entitiesgiving, for each period, the vector of entities observed.
Value
A data.frame with entities presence summary by time period.
See Also
See also plot_periods(), describe_balance(), describe_patterns().
Examples
data(production)
# Basic usage
describe_periods(production, index = c("firm", "year"))
# With panel_data object
panel <- make_panel(production, index = c("firm", "year"))
describe_periods(panel)
# Specifying time interval
describe_periods(production, index = c("firm", "year"), delta = 1)
# Custom rounding
describe_periods(production, index = c("firm", "year"), digits = 2)
# Accessing attributes
out_des_per <- describe_periods(production, index = c("firm", "year"))
attr(out_des_per, "metadata")
attr(out_des_per, "details")
Panel Data Structure Setting and Balancing
Description
This function adds panel structure attributes to a data.frame, storing entity and time variable names, and optionally checks the expected interval between time periods. It can also balance the panel with a chosen method.
Usage
make_panel(data, index, delta = NULL, balance = NULL)
Arguments
data |
A data.frame containing panel data in a long format. |
index |
A character vector of length 2 specifying the names of the entity and time variables. |
delta |
An optional integer giving the expected interval between time periods. |
balance |
One of |
Details
This function adds attributes to a data.frame to mark it as panel data.
The returned object has class "panel_data" and includes the following attributes:
metadataList containing the function name and the arguments used (
entity,time,delta, andbalanceif provided).detailsList with diagnostic vectors:
entitiesUnique values of the entity variable.
periodsSorted unique values of the time variable.
periods_restored,periods_missingIf
deltais supplied and gaps are detected, the full sequence and missing periods.
Effect of delta:
If delta is supplied, the function checks that all observed time points are separated by multiples of delta.
If gaps are detected, a message lists the missing periods and the full sequence is stored in details$periods_restored.
Balancing the panel (presence definition as in describe_patterns):
balance = "rows"Create a row for every entity‑time combination. If
deltais supplied, the full time sequence (including missing periods) is used. Missing combinations getNAin all other columns.balance = "entities"Keep only entities present in all time periods.
balance = "periods"Keep only time periods where all entities are present.
Value
The input data.frame with additional attributes, after possibly filtering or expanding rows.
See Also
See also describe_dimensions(), describe_balance(), describe_periods().
Examples
data(production)
# Basic usage
panel <- make_panel(production, index = c("firm", "year"))
# Specifying time interval
panel <- make_panel(production, index = c("firm", "year"), delta = 1)
# Creating balanced panels
panel_bal_ent <- make_panel(production, index = c("firm", "year"), balance = "entities")
panel_bal_per <- make_panel(production, index = c("firm", "year"), balance = "periods")
panel_bal_row <- make_panel(production, index = c("firm", "year"), balance = "rows", delta = 1)
# Accessing attributes
attr(panel, "metadata")
attr(panel, "details")
Heterogeneity Visualization
Description
This function creates visualizations of heterogeneity among groups.
Usage
plot_heterogeneity(data, select, group = NULL, colors = c("darkblue", "gray"))
Arguments
data |
A data.frame containing variables for analysis. |
select |
A character string specifying the numeric variable of interest. |
group |
A character string or vector of character strings specifying the grouping variable(s). If data has panel attributes and group is not specified, both the entity and time variables will be used as grouping variables. |
colors |
A character vector of two colors: first for mean line and points, second for individual points. Default = c("darkblue", "gray"). |
Details
This function creates one or more plots (depending on the number of grouping variables) showing the heterogeneity among groups. Each plot displays individual observations (points) and group means (connected line).
The returned list contains the following components:
metadataList containing the function name, selection, group, and colors.
detailsList containing group-level statistics for each grouping variable, each containing means, standard deviations, and counts per group.
Value
Invisibly returns a list with summary statistics and metadata.
See Also
See also decompose_numeric(), summarize_numeric().
Examples
data(production)
# Basic usage with regular data.frame
plot_heterogeneity(production, select = "labor", group = "year")
# Using multiple grouping variables
plot_heterogeneity(production, select = "sales", group = c("firm", "industry", "year"))
# With panel_data object (uses both entity and time)
panel <- make_panel(production, index = c("firm", "year"))
plot_heterogeneity(panel, select = "labor")
# Custom colors
plot_heterogeneity(production, select = "sales", group = "year",
colors = c("black", "gray"))
# Accessing list components
out_plo_het <- plot_heterogeneity(panel, select = "capital", group = "year")
out_plo_het$metadata
out_plo_het$details
Missing Values Heatmap by Period
Description
This function creates a heatmap showing the number of missing values for each variable across all time periods in panel data.
Usage
plot_missing(data, select = NULL, index = NULL, colors = c("darkblue", "gray"))
Arguments
data |
A data.frame containing panel data in a long format. |
select |
A character vector specifying which variables to include. If not specified, all substantive variables (except entity and time) are used. |
index |
A character vector of length 2 giving the names of the entity and time variables. Not required if data has panel attributes. |
colors |
A character vector of two colors defining the gradient for the heatmap. The first color represents the largest number of missing values, the second color the smallest number. Default = c("darkblue", "gray"). |
Details
The function creates a heatmap where rows are variables and columns are time periods.
Cell color reflects the number of missing values in that variable for that period,
using a continuous gradient from colors[1] (most missing) to colors[2] (least missing).
Rows are ordered as the variables appear (first at the top). Columns are ordered chronologically.
The returned list contains:
metadataList containing the function call,
select, entity/time variables, andcolors.detailsList with the missing count matrix (variables × periods).
Value
Invisibly returns a list with summary statistics and metadata.
Note
The interpretation of missing counts may differ depending on whether the panel is balanced or unbalanced. In a balanced panel, each time period contains the same number of entities, so the raw NA counts per period are directly comparable across periods. In an unbalanced panel, the number of entities varies by period, so the raw NA counts should be interpreted relative to the number of observations available in each period. The function does not standardize the counts by period size; users should account for the panel structure when interpreting the results.
See Also
See also summarize_missing, plot_patterns(), plot_periods().
Examples
data(production)
# Basic usage
plot_missing(production, index = c("firm", "year"))
# With panel_data object
panel <- make_panel(production, index = c("firm", "year"))
plot_missing(panel)
# Selecting specific variables
plot_missing(production, select = c("labor", "capital"), index = c("firm", "year"))
# Custom colors
plot_missing(production, index = c("firm", "year"), colors = c("black", "white"))
# Access the returned list
out_plo_mis <- plot_missing(production, index = c("firm", "year"))
out_plo_mis$metadata
out_plo_mis$details
Entities Presence Patterns Visualization
Description
This function creates a heatmap showing the presence/absence pattern of each entity over time.
Usage
plot_patterns(
data,
index = NULL,
delta = NULL,
limits = NULL,
colors = c("darkblue", "white")
)
Arguments
data |
A data.frame containing panel data in a long format. |
index |
A character vector of length 2 specifying the names of the entity and time variables. Not required if data has panel attributes. |
delta |
An optional integer giving the expected interval between time periods. |
limits |
Either a single integer (show that many most frequent patterns) or a vector of two integers (show patterns with ranks between the two values, inclusive). If not specified, all patterns are shown. |
colors |
A character vector of two colors for present and missing observations. Default = c("darkblue", "white"). |
Details
The function creates a heatmap where rows are entities and columns are time periods. Present cells are colored with the first color, missing cells with the second. Rows are ordered by pattern frequency: the most frequent pattern is at the top. Within each pattern block, entities appear in their original order.
Effect of delta:
If delta is supplied, the function checks for regular spacing and adds missing periods
(with all zeros) to the plot.
A message lists missing periods unless the interval was inherited from panel attributes.
The heatmap will therefore show columns for the full regular time sequence,
with missing periods appearing entirely white (or the color for missing).
The returned list contains:
metadataList containing the function name and the arguments used.
detailsList with the sorted presence matrix, pattern‑entity mapping, pattern count, and the pattern matrix (unique patterns as rows).
Value
Invisibly returns a list with summary statistics and metadata.
Note
An entity-time combination is considered present if the corresponding row contains at least one non‑NA value in any substantive variable (all columns except the entity and time identifiers).
See Also
See also describe_patterns(), plot_periods(), plot_missing().
Examples
data(production)
# Basic usage
plot_patterns(production, index = c("firm", "year"))
# With panel_data object
panel <- make_panel(production, index = c("firm", "year"))
plot_patterns(panel)
# Specifying time interval
plot_patterns(production, index = c("firm", "year"), delta = 1)
# Show only the top 3 patterns
plot_patterns(production, index = c("firm", "year"), limits = 3)
# Show patterns ranked 4 to 6
plot_patterns(production, index = c("firm", "year"), limits = c(4, 6))
# Custom colors
plot_patterns(production, index = c("firm", "year"), colors = c("black", "white"))
# Accessing list components
out_plo_pat <- plot_patterns(production, index = c("firm", "year"))
out_plo_pat$metadata
out_plo_pat$details
Time Coverage Distribution Visualization
Description
This function calculates summary statistics and creates a histogram showing the distribution of time periods covered by each entity in panel data.
Usage
plot_periods(data, index = NULL, colors = c("darkblue", "white"))
Arguments
data |
A data.frame containing panel data in a long format. |
index |
A character vector of length 2 specifying the names of the entity and time variables. Not required if data has panel attributes. |
colors |
A character vector of length 2 specifying the fill color and line color for the histogram. First color is for fill, second color is for the border line. Default = c("darkblue", "white"). |
Details
The function creates a histogram of the number of time periods covered by each entity. The x‑axis shows coverage (periods per entity), the y‑axis shows the count of entities.
The returned list contains:
metadataList containing the function name and the arguments used.
detailsList with the coverage vector per entity and the histogram data used for plotting.
Value
Invisibly returns a list with summary statistics and metadata.
Note
An entity-time combination is considered present if the corresponding row contains at least one non‑NA value in any substantive variable (all columns except the entity and time identifiers).
See Also
See also describe_periods(), plot_patterns(), plot_missing().
Examples
data(production)
# Basic usage
plot_periods(production, index = c("firm", "year"))
# With panel_data object
panel <- make_panel(production, index = c("firm", "year"))
plot_periods(panel)
# Custom colors
plot_periods(production, index = c("firm", "year"), colors = c("gray", "black"))
# Accessing list components
out_plo_per <- plot_periods(production, index = c("firm", "year"))
out_plo_per$metadata
out_plo_per$details
Simulated Unbalanced Panel Data for Cobb-Douglas Production Function Analysis
Description
A simulated dataset containing firm-level panel data with industry affiliation, entry, exit, random missing values, and ownership information. The data follows industry-specific production structures with occasional industry and ownership changes.
Usage
production
Format
A data frame with 180 rows (30 firms × 6 years) and 7 variables:
- firm
integer; firm identifier (1 to 30)
- year
integer; year identifier (1 to 6)
- sales
numeric; firm sales/output generated from a Cobb-Douglas production function with industry-specific parameters and technology shocks. Contains random missing values (~2%).
- capital
numeric; capital input, log‑normally distributed with firm-specific effects and industry-specific time trends. Contains random missing values (~2%).
- labor
numeric; labor input, log‑normally distributed with firm-specific effects and industry-specific time trends. Contains random missing values (~2%).
- industry
factor; industry affiliation with three levels: "Industry 1", "Industry 2", "Industry 3". Some firms change industry over time.
- ownership
factor; ownership type with three levels: "private", "public", "mixed". The variable is stable over time but changes with a probability of 5% per year.
Details
The dataset exhibits several realistic features of firm-level panel data:
50% of firms (15 firms) have complete data for all 6 years.
50% of firms (15 firms) have entry and exit patterns with different start and end years.
Three industry categories with different production function parameters.
About 20% of firms change industry affiliation at least once.
Ownership changes occur with 5% probability per year.
Industry-specific Cobb‑Douglas parameters:
Industry 1:
\alpha = 0.25,\beta = 0.65,A = 2.0(labor‑intensive)Industry 2:
\alpha = 0.35,\beta = 0.55,A = 2.2(balanced, high productivity)Industry 3:
\alpha = 0.30,\beta = 0.60,A = 1.8(standard)
Additional random missing values (approx. 2%) in sales, capital, and labor.
Firm-specific effects and industry-specific time trends in inputs.
Technology shocks affecting output.
Source
Simulated data for econometric analysis and demonstration purposes.
Examples
data(production)
head(production)
table(production$ownership)
Round numeric values if needed
Description
Round numeric values if needed
Usage
round_if_needed(x, digits)
Arguments
x |
A vector (typically numeric). |
digits |
Number of decimal places. |
Value
Rounded vector if numeric and not all NA, otherwise unchanged.
Sort unique values preserving original type where sensible
Description
Sort unique values preserving original type where sensible
Usage
sort_unique_preserve(x)
Arguments
x |
A vector. |
Value
Sorted unique values.
Missing Values Summary for Panel Data
Description
This function calculates summary statistics for missing values (NAs) in panel data, providing both overall and detailed period-specific missing value counts.
Usage
summarize_missing(
data,
select = NULL,
index = NULL,
detail = FALSE,
digits = 3
)
Arguments
data |
A data.frame containing panel data in a long format. |
select |
A character vector specifying which variables to analyze for missing values. If not specified, all variables (except entity and time) will be used. |
index |
A character vector of length 2 specifying the names of the entity and time variables. Not required if data has panel attributes. |
detail |
A logical flag indicating whether to return detailed period-specific NA counts. Default = FALSE. |
digits |
An integer indicating the number of decimal places to round the share column. Default = 3. |
Details
When detail = FALSE, returns columns:
variableVariable name.
na_countTotal number of missing values in that variable.
na_shareProportion of missing values (rounded to
digits).entitiesNumber of distinct entities that have at least one missing value in that variable.
periodsNumber of distinct time periods that have at least one missing value in that variable.
When detail = TRUE, additional columns for each time period contain the number of missing values
in that variable for that period.
The object has class "panel_summary" and two additional attributes:
metadataList containing the function name and the arguments used.
detailsList with counts of variables with/without NAs, and their names.
Value
A data.frame with missing value summary statistics.
Note
The interpretation of missing counts may differ depending on whether the panel is balanced or unbalanced. In a balanced panel, each time period contains the same number of entities, so the raw NA counts per period are directly comparable across periods. In an unbalanced panel, the number of entities varies by period, so the raw NA counts should be interpreted relative to the number of observations available in each period. The function does not standardize the counts by period size; users should account for the panel structure when interpreting the results.
See Also
See also plot_missing(), describe_incomplete(), describe_patterns(), describe_periods().
Examples
data(production)
# Basic usage
summarize_missing(production, index = c("firm", "year"))
# With panel_data object
panel <- make_panel(production, index = c("firm", "year"))
summarize_missing(panel)
# Selecting specific variables
summarize_missing(production, select = c("labor", "capital"), index = c("firm", "year"))
# Returning detailed results
summarize_missing(production, index = c("firm", "year"), detail = TRUE)
# Custom rounding
summarize_missing(production, index = c("firm", "year"), digits = 2)
# Accessing attributes
out_sum_mis <- summarize_missing(production, index = c("firm", "year"))
attr(out_sum_mis, "metadata")
attr(out_sum_mis, "details")
Summary Statistics for Numeric Variables
Description
This function calculates summary statistics for numeric variables, either overall or grouped by a single grouping variable.
Usage
summarize_numeric(
data,
select = NULL,
group = NULL,
detail = FALSE,
digits = 3
)
Arguments
data |
A data.frame containing variables for analysis. |
select |
A character vector specifying which numeric variables to analyze. If not specified, all numeric variables in the data.frame will be used. |
group |
A character string specifying the grouping variable name. If not specified, overall statistics will be returned. |
detail |
A logical flag indicating whether to return additional statistics (25th, 50th, and 75th percentiles). Default = FALSE. |
digits |
An integer specifying the number of decimal places for rounding statistics. Default = 3. |
Details
The returned data.frame contains columns depending on the arguments:
When no grouping variable is specified (overall):
variableThe name of the numeric variable.
countNumber of non‑NA observations.
meanArithmetic mean.
stdStandard deviation.
minMinimum value.
maxMaximum value.
When detail = TRUE, additional columns are included:
p2525th percentile (first quartile).
p5050th percentile (median).
p7575th percentile (third quartile).
When a grouping variable is specified, statistics are calculated for each group, and the data.frame includes a column named after the grouping variable, followed by the same statistics columns as above.
The object has class "panel_summary" and two additional attributes:
metadataList containing the function name and the arguments used.
detailsList with counts of variables, groups, and total observations.
Value
A data.frame with descriptive statistics summary.
See Also
See also decompose_numeric(), plot_heterogeneity().
Examples
data(production)
# Basic usage
summarize_numeric(production)
# Selecting specific variables
summarize_numeric(production, select = "sales")
summarize_numeric(production, select = c("capital", "labor"))
# Grouped statistics
summarize_numeric(production, group = "year")
# Detailed statistics
summarize_numeric(production, detail = TRUE)
# Custom rounding
summarize_numeric(production, digits = 2)
# Accessing attributes
out_sum_num <- summarize_numeric(production)
attr(out_sum_num, "metadata")
attr(out_sum_num, "details")
Transition Summary
Description
Calculates transition counts and shares between states of a categorical (factor) variable across consecutive time periods within entities for panel data.
Usage
summarize_transition(data, select, index = NULL, format = "wide", digits = 3)
Arguments
data |
A data.frame containing panel data in a long format. |
select |
A character string specifying the factor variable to analyze transitions for. |
index |
A character vector of length 2 specifying the names of the entity and time variables. Not required if data has panel attributes. |
format |
A character string specifying the output format: |
digits |
An integer indicating the number of decimal places to round transition shares. Default = 3. |
Details
The structure depends on format:
When format = "wide", a transition matrix as a data.frame:
from_toThe originating state (row label).
[state1], [state2], ...Columns for each destination state, containing the share of transitions from the row state to the column state (rounded).
When format = "long", a data.frame with columns:
fromOriginating state.
toDestination state.
countNumber of observed transitions.
shareProportion of transitions from
fromthat go toto(rounded).
The object has class "panel_summary" and two additional attributes:
metadataList containing the function name and the arguments used.
detailsList with the vector of all category levels.
Value
A data.frame containing transition summaries.
See Also
See also decompose_factor().
Examples
data(production)
# Basic usage
summarize_transition(production, select = "industry", index = c("firm", "year"))
# With panel_data object
panel <- make_panel(production, index = c("firm", "year"))
summarize_transition(panel, select = "industry")
# Returning results in a long format
summarize_transition(production, select = "industry",
index = c("firm", "year"), format = "long")
# Custom rounding
summarize_transition(production, select = "industry", index = c("firm", "year"), digits = 2)
# Accessing attributes
out_sum_tra <- summarize_transition(production, select = "industry", index = c("firm", "year"))
attr(out_sum_tra, "metadata")
attr(out_sum_tra, "details")