Prepare data for UpSet plots

upset_data(
  data,
  intersect,
  min_size = 0,
  max_size = Inf,
  min_degree = 0,
  max_degree = Inf,
  n_intersections = NULL,
  keep_empty_groups = FALSE,
  warn_when_dropping_groups = FALSE,
  warn_when_converting = "auto",
  sort_sets = "descending",
  sort_intersections = "descending",
  sort_intersections_by = "cardinality",
  sort_ratio_numerator = "exclusive_intersection",
  sort_ratio_denominator = "inclusive_union",
  group_by = "degree",
  mode = "exclusive_intersection",
  size_columns_suffix = "_size",
  encode_sets = FALSE,
  max_combinations_datapoints_n = 10^10,
  intersections = "observed"
)

Arguments

data: a dataframe including binary columns representing membership in classes
intersect: which columns should be used to compose the intersection
min_size: minimal number of observations in an intersection for it to be included
max_size: maximal number of observations in an intersection for it to be included
min_degree: minimal degree of an intersection for it to be included
max_degree: maximal degree of an intersection for it to be included
n_intersections: the exact number of the intersections to be displayed; n largest intersections that meet the size and degree criteria will be shown
keep_empty_groups: whether empty sets should be kept (including sets which are only empty after filtering by size)
warn_when_dropping_groups: whether a warning should be issued when empty sets are being removed
warn_when_converting: whether a warning should be issued when input is not boolean
sort_sets: whether to sort the rows in the intersection matrix (descending sort by default); one of: 'ascending', 'descending', FALSE
sort_intersections: whether to sort the columns in the intersection matrix (descending sort by default); one of: 'ascending', 'descending', FALSE
sort_intersections_by: the mode of sorting, the size of the intersection (cardinality) by default; one of: 'cardinality', 'degree', 'ratio', or any combination of these (e.g. c('degree', 'cardinality'))
sort_ratio_numerator: the mode for numerator when sorting by ratio
sort_ratio_denominator: the mode for denominator when sorting by ratio
group_by: the mode of grouping intersections; one of: 'degree', 'sets'
mode: region selection mode for sorting and trimming by size. See get_size_mode() for accepted values.
size_columns_suffix: suffix for the columns to store the sizes (adjust if conflicts with your data)
encode_sets: whether set names (column in input data) should be encoded as numbers (set to TRUE to overcome R limitations of max 10 kB for variable names for datasets with huge numbers of sets); default TRUE for upset() and FALSE for upset_data()
max_combinations_datapoints_n: a fail-safe limit preventing accidental use of intersections='all' with a high number of sets and observations
intersections: whether only the intersections present in data (observed, default), or all intersections (all) should be computed; using all intersections for a high number of sets is not computationally feasible - use min_degree and max_degree to narrow down the selection; this is only useful for modes different from the default exclusive intersection. You can also provide a list with a custom selection of intersections (order is respected when you set sort_intersections=FALSE)