Package 'oneclust' reference manual

Title:	Maximum Homogeneity Clustering for Univariate Data
Description:	Maximum homogeneity clustering algorithm for one-dimensional data described in W. D. Fisher (1958) <doi:10.1080/01621459.1958.10501479> via dynamic programming.
Authors:	Nan Xiao [aut, cre]
Maintainer:	Nan Xiao <[email protected]>
License:	GPL-3
Version:	0.3.0
Built:	2025-03-06 04:18:34 UTC
Source:	https://github.com/nanxstats/oneclust

Masataka Okabe and Kei Ito's Color Universal Design palette

Description

Masataka Okabe and Kei Ito's Color Universal Design palette

Usage

cud(x, shift = TRUE, reverse = FALSE)
cud(x, shift = TRUE, reverse = FALSE)

Arguments

`x`	Vector, color index.
`shift`	Start from the second color in the CUD palette?
`reverse`	Reverse the color order?

Value

A vector of color hex values.

Examples

barplot(rep(1, 7), col = cud(1:7))
barplot(rep(1, 8), col = cud(1:8, shift = FALSE))
barplot(rep(1, 8), col = cud(1:8, shift = FALSE, reverse = TRUE))
barplot(rep(1, 7), col = cud(1:7))
barplot(rep(1, 8), col = cud(1:8, shift = FALSE))
barplot(rep(1, 8), col = cud(1:8, shift = FALSE, reverse = TRUE))

Maximum homogeneity clustering for one-dimensional data

Description

Maximum homogeneity clustering for one-dimensional data

Usage

oneclust(x, k, w = NULL, sort = TRUE)
oneclust(x, k, w = NULL, sort = TRUE)

Arguments

`x`	Numeric vector, samples to be clustered.
`k`	Integer, number of clusters.
`w`	Numeric vector, sample weights (optional). Note that the weights here should be sampling weights (for example, a certain proportion of the population), not frequency weights (for example, number of occurrences).
`sort`	Should we sort `x` (and `w`) before clustering? Default is `TRUE`. Otherwise the order of the data is respected.

Value

A list containing:

cluster - cluster id of each sample.
cut - index of the optimal cut points.

References

Fisher, Walter D. 1958. On Grouping for Maximum Homogeneity. Journal of the American Statistical Association 53 (284): 789–98.

Examples

set.seed(42)
x <- sample(c(
  rnorm(50, sd = 0.2),
  rnorm(50, mean = 1, sd = 0.3),
  rnorm(100, mean = -1, sd = 0.25)
))
oneclust(x, 3)
set.seed(42)
x <- sample(c(
  rnorm(50, sd = 0.2),
  rnorm(50, mean = 1, sd = 0.3),
  rnorm(100, mean = -1, sd = 0.25)
))
oneclust(x, 3)

Simulate the levels and their sizes in a high-cardinality feature

Description

Simulate the levels and their sizes in a high-cardinality feature

Usage

sim_postcode_levels(nlevels = 100L, seed = 1001)
sim_postcode_levels(nlevels = 100L, seed = 1001)

Arguments

`nlevels`	Number of levels to generate.
`seed`	Random seed.

Value

A data frame of postal codes and sizes.

Note

The code is derived from the example described in the "rare levels" vignette in the vtreat package.

Examples

df_levels <- sim_postcode_levels(nlevels = 500, seed = 42)
head(df_levels)
df_levels <- sim_postcode_levels(nlevels = 500, seed = 42)
head(df_levels)

Simulate a high-cardinality feature and a binary response

Description

Simulate a high-cardinality feature and a binary response

Usage

sim_postcode_samples(
  df_levels,
  n = 2000L,
  threshold = 1000,
  prob = c(0.3, 0.1),
  seed = 1001
)
sim_postcode_samples(
  df_levels,
  n = 2000L,
  threshold = 1000,
  prob = c(0.3, 0.1),
  seed = 1001
)

Arguments

`df_levels`	Number of levels.
`n`	Number of samples.
`threshold`	The threshold for determining if a postal code is rare.
`prob`	Occurrence probability vector of the class 1 event in rare and non-rare postal codes.
`seed`	Random seed.

Value

A data frame of samples with postal codes, response labels, and level rarity status.

Note

The code is derived from the example described in the "rare levels" vignette in the vtreat package.

Examples

df_levels <- sim_postcode_levels(nlevels = 500, seed = 42)
df_postcode <- sim_postcode_samples(
  df_levels,
  n = 10000, threshold = 3000, prob = c(0.2, 0.1), seed = 43
)
head(df_postcode)
df_levels <- sim_postcode_levels(nlevels = 500, seed = 42)
df_postcode <- sim_postcode_samples(
  df_levels,
  n = 10000, threshold = 3000, prob = c(0.2, 0.1), seed = 43
)
head(df_postcode)

Package 'oneclust'

Help Index

Masataka Okabe and Kei Ito's Color Universal Design palette

Description

Usage

Arguments

Value

Examples

Maximum homogeneity clustering for one-dimensional data

Description

Usage

Arguments

Value

References

Examples

Simulate the levels and their sizes in a high-cardinality feature

Description

Usage

Arguments

Value

Note

Examples

Simulate a high-cardinality feature and a binary response

Description

Usage

Arguments

Value

Note

Examples