LikertMakeR

LikertMakeR synthesises and correlates Likert-scale and related rating-scale data. You decide the mean and standard deviation, and (optionally) the correlations among vectors, and the package will generate data with those same predefined properties.

The package generates a column of values that simulate the same properties as a rating scale. If multiple columns are generated, then you can use LikertMakeR to rearrange the values so that the new variables are correlated exactly in accord with a user-predefined correlation matrix.

Functions can be combined to generate synthetic rating-scale data from a predefined Cronbach’s Alpha.

Purpose

The package should be useful for teaching in the Social Sciences, and for scholars who wish to “replicate” rating-scale data for further analysis and visualisation when only summary statistics have been reported.

Motivation

I was prompted to write the functions in LikertMakeR after reviewing too many journal article submissions where authors presented questionnaire results with only means and standard deviations (often only the means), with no apparent understanding of scale distributions, and their impact on scale properties.

Hopefully, this tool will help researchers, teachers & students, and other reviewers, to better think about rating-scale distributions, and the effects of variance, scale boundaries, and number of items in a scale. Researchers can also use LikertMakeR to prepare analyses ahead of a formal survey.

Rating scale properties

A Likert scale is the mean, or sum, of several ordinal rating scales. Typically, they are bipolar (usually “agree-disagree”) responses to propositions that are determined to be moderately-to-highly correlated and that capture some facet of a theoretical construct.

Rating scales, such as Likert scales, are not continuous or unbounded.

For example, a 5-point Likert scale that is constructed with, say, five items (questions) will have a summed range of between 5 (all rated ‘1’) and 25 (all rated ‘5’) with all integers in between, and the mean range will be ‘1’ to ‘5’ with intervals of 1/5=0.20. A 7-point Likert scale constructed from eight items will have a summed range between 8 (all rated ‘1’) and 56 (all rated ‘7’) with all integers in between, and the mean range will be ‘1’ to ‘7’ with intervals of 1/8=0.125.

Rating-scale boundaries define minima and maxima for any scale values. If the mean is close to one boundary then data points will gather more closely to that boundary. If the mean is not in the middle of a scale, then the data will be always skewed, as shown in the following plots.

Off-centre means always give skewed distribution in bounded rating scales

Off-centre means always give skewed distribution in bounded rating scales


LikertMakeR functions

  • lfast() generate a vector of values with predefined mean and standard deviation.

  • lcor() takes a dataframe of rating-scale values and rearranges the values in each column so that the columns are correlated to match a predefined correlation matrix.

  • makeCorrAlpha constructs a random correlation matrix of given dimensions from a predefined Cronbach’s Alpha.

  • makeItems() is a wrapper function for lfast() and lcor() to generate synthetic rating-scale data with predefined first and second moments and a predefined correlation matrix.

  • makeItemsScale() generates a random dataframe of scale items based on a predefined summated scale with a desired Cronbach’s Alpha.

  • correlateScales() creates a dataframe of correlated summated scales as one might find in completed survey questionnaire and possibly used in a Structural Equation model.

  • Helper Functions

    • alpha() calculates Cronbach’s Alpha from a given correlation matrix or a given dataframe.

    • eigenvalues() calculates eigenvalues of a correlation matrix, reports on positive-definite status of the matrix and, optionally, displays a scree plot to visualise the eigenvalues.


Using LikertMakeR

Download and Install LikertMakeR

from CRAN

> ```
>
> install.packages("LikertMakeR")
> library(LikertMakeR)
>
> ```

development version from GitHub.

> ```
> 
> library(devtools)
> install_github("WinzarH/LikertMakeR")
> library(LikertMakeR)
>
> ```

Generate synthetic rating-scale data

lfast()

  • lfast() applies a simple evolutionary algorithm which draws repeated random samples from a scaled Beta distribution. It produces a vector of values with mean and standard deviation typically correct to two decimal places.

To synthesise a rating scale with lfast(), the user must input the following parameters:

  • n: sample size

  • mean: desired mean

  • sd: desired standard deviation

  • lowerbound: desired lower bound

  • upperbound: desired upper bound

  • items: number of items making the scale - default = 1

An earlier version of LikertMakeR had a function, lexact(), which was slow and no more accurate than the latest version of lfast(). So, lexact() is now deprecated.

lfast() example

a four-item, five-point Likert scale
nItems <- 4
mean <- 2.5
sd <- 0.75

x1 <- lfast(
  n = 512,
  mean = mean,
  sd = sd,
  lowerbound = 1,
  upperbound = 5,
  items = nItems
)
#> best solution in 623 iterations
Example: 4-item, 1-5 Likert scale

Example: 4-item, 1-5 Likert scale

an 11-point likelihood-of-purchase scale
lfast()
x2 <- lfast(256, 3, 2.5, 0, 10)
#> best solution in 1760 iterations
Example: likelihood-of-purchase scale

Example: likelihood-of-purchase scale


Correlating rating scales

The function, lcor(), rearranges the values in the columns of a data-set so that they are correlated at a specified level. It does not change the values - it swaps their positions within each column so that univariate statistics do not change, but their correlations with other vectors do.

lcor()

lcor() systematically selects pairs of values in a column and swaps their places, and checks to see if this swap improves the correlation matrix. If the revised dataframe produces a correlation matrix closer to the target correlation matrix, then the swap is retained. Otherwise, the values are returned to their original places. This process is iterated across each column.

To create the desired correlated data, the user must define the following parameters:

  • data: a starter data set of rating-scales. Number of columns must match the dimensions of the target correlation matrix.

  • target: the target correlation matrix.

lcor() example

Let’s generate some data: three 5-point Likert scales, each with five items.

## generate uncorrelated synthetic data
n <- 128
lowerbound <- 1
upperbound <- 5
items <- 5

mydat3 <- data.frame(
  x1 = lfast(n, 2.5, 0.75, lowerbound, upperbound, items),
  x2 = lfast(n, 3.0, 1.50, lowerbound, upperbound, items),
  x3 = lfast(n, 3.5, 1.00, lowerbound, upperbound, items)
)
#> best solution in 120 iterations
#> best solution in 683 iterations
#> best solution in 7086 iterations

The first six observations from this dataframe are:

#>    x1  x2  x3
#> 1 3.0 2.2 1.4
#> 2 2.4 3.0 5.0
#> 3 3.6 4.2 4.4
#> 4 1.6 3.4 3.0
#> 5 2.6 1.0 4.8
#> 6 2.6 2.0 1.4

And the first and second moments (to 3 decimal places) are:

#>         x1    x2    x3
#> mean 2.502 3.000 3.502
#> sd   0.750 1.498 1.001

We can see that the data have first and second moments are very close to what is expected.

As we should expect, randomly-generated synthetic data have low correlations:

#>       x1    x2    x3
#> x1  1.00 -0.12 -0.11
#> x2 -0.12  1.00  0.00
#> x3 -0.11  0.00  1.00

Now, let’s define a target correlation matrix:

## describe a target correlation matrix
tgt3 <- matrix(
  c(
    1.00, 0.85, 0.75,
    0.85, 1.00, 0.65,
    0.75, 0.65, 1.00
  ),
  nrow = 3
)

So now we have a dataframe with desired first and second moments, and a target correlation matrix.

## apply lcor() function
new3 <- lcor(data = mydat3, target = tgt3)

Values in each column of the new dataframe do not change from the original; the values are rearranged.

The first ten observations from this dataframe are:

#>     V1 V2  V3
#> 1  1.0  1 1.2
#> 2  4.0  5 3.6
#> 3  4.0  5 5.0
#> 4  1.0  1 1.2
#> 5  4.0  5 5.0
#> 6  1.8  1 1.4
#> 7  1.4  1 1.4
#> 8  1.2  1 1.4
#> 9  1.4  1 1.6
#> 10 4.0  5 5.0

And the new data frame is correlated close to our desired correlation matrix; here presented to 3 decimal places:

#>      V1   V2   V3
#> V1 1.00 0.85 0.75
#> V2 0.85 1.00 0.65
#> V3 0.75 0.65 1.00

Generate a correlation matrix from Cronbach’s Alpha

makeCorrAlpha()

makeCorrAlpha(), constructs a random correlation matrix of given dimensions and predefined Cronbach’s Alpha.

To create the desired correlation matrix, the user must define the following parameters:

  • items: or “k” - the number of rows and columns of the desired correlation matrix.

  • alpha: the target value for Cronbach’s Alpha

  • variance: a notional variance coefficient to affect the spread of values in the correlation matrix. Default = ‘0.5’. A value of ‘0’ produces a matrix where all off-diagonal correlations are equal. Setting ‘variance = 1.0’ gives a wider range of values. Setting ‘variance = 2.0’, or above, may be feasible but increases the likelihood of a non-positive-definite matrix.

makeCorrAlpha() is volatile

Random values generated by makeCorrAlpha() are highly volatile. makeCorrAlpha() may not generate a feasible (positive-definite) correlation matrix, especially when

  • variance is high relative to

    • desired Alpha, and

    • desired correlation dimensions

makeCorrAlpha() will inform the user if the resulting correlation matrix is positive definite, or not.

If the returned correlation matrix is not positive-definite, a feasible solution may be still possible, and often is. The user is encouraged to try again, possibly several times, to find one.

makeCorrAlpha() examples

Four variables, alpha = 0.85, variance = default
## define parameters
items <- 4
alpha <- 0.85
# variance <- 0.5 ## by default

## apply makeCorrAlpha() function
set.seed(42)

cor_matrix_4 <- makeCorrAlpha(items, alpha)
#> correlation values consistent with desired alpha in 59 iterations
#> The correlation matrix is positive definite

makeCorrAlpha() produced the following correlation matrix (to three decimal places):

#>       [,1]  [,2]  [,3]  [,4]
#> [1,] 1.000 0.425 0.433 0.507
#> [2,] 0.425 1.000 0.693 0.694
#> [3,] 0.433 0.693 1.000 0.766
#> [4,] 0.507 0.694 0.766 1.000
test output with Helper functions
## using helper function alpha()

alpha(cor_matrix_4)
#> [1] 0.8500063
## using helper function eigenvalues()

eigenvalues(cor_matrix_4, 1)

#> cor_matrix_4  is positive-definite
#> [1] 2.7842025 0.6581071 0.3291732 0.2285172

twelve variables, alpha = 0.90, variance = 1

## define parameters
items <- 12
alpha <- 0.90
variance <- 1.0

## apply makeCorrAlpha() function
set.seed(42)

cor_matrix_12 <- makeCorrAlpha(items = items, alpha = alpha, variance = variance)
#> correlation values consistent with desired alpha in 4312 iterations
#> Correlation matrix is not yet positive definite
#>         
#> Working on it
#> improved at swap - 12
#> improved at swap - 67
#> improved at swap - 79
#> improved at swap - 80
#> improved at swap - 115
#> improved at swap - 121
#> improved at swap - 128
#> improved at swap - 130
#> improved at swap - 134
#> improved at swap - 137
#> improved at swap - 146
#> improved at swap - 151
#> improved at swap - 160
#> improved at swap - 162
#> improved at swap - 166
#> improved at swap - 174
#> improved at swap - 183
#> improved at swap - 188
#> improved at swap - 191
#> improved at swap - 208
#> improved at swap - 263
#> improved at swap - 304
#> improved at swap - 399
#> improved at swap - 400
#> improved at swap - 402
#> improved at swap - 445
#> improved at swap - 485
#> improved at swap - 542
#> stopped at swap - 542
#> The correlation matrix is positive definite
-

makeCorrAlpha() produced the following correlation matrix (to two decimal places):

#>        [,1]  [,2]  [,3]  [,4]  [,5]  [,6]  [,7]  [,8]  [,9] [,10] [,11] [,12]
#>  [1,]  1.00 -0.51 -0.67 -0.32 -0.30 -0.29 -0.27 -0.14 -0.07 -0.04 -0.03  0.00
#>  [2,] -0.51  1.00  0.06  0.31  0.43  0.26  0.28  0.20  0.26  0.06  0.25  0.34
#>  [3,] -0.67  0.06  1.00  0.61  0.36  0.62  0.57  0.47  0.45  0.46  0.47  0.33
#>  [4,] -0.32  0.31  0.61  1.00  0.48  0.50  0.60  0.36  0.39  0.53  0.64  0.59
#>  [5,] -0.30  0.43  0.36  0.48  1.00  0.42  0.56  0.62  0.62  0.62  0.56  0.63
#>  [6,] -0.29  0.26  0.62  0.50  0.42  1.00  0.81  0.66  0.70  0.70  0.70  0.70
#>  [7,] -0.27  0.28  0.57  0.60  0.56  0.81  1.00  0.57  0.71  0.72  0.72  0.73
#>  [8,] -0.14  0.20  0.47  0.36  0.62  0.66  0.57  1.00  0.71  0.79  0.79  0.78
#>  [9,] -0.07  0.26  0.45  0.39  0.62  0.70  0.71  0.71  1.00  0.80  0.83  0.84
#> [10,] -0.04  0.06  0.46  0.53  0.62  0.70  0.72  0.79  0.80  1.00  0.88  0.89
#> [11,] -0.03  0.25  0.47  0.64  0.56  0.70  0.72  0.79  0.83  0.88  1.00  0.97
#> [12,]  0.00  0.34  0.33  0.59  0.63  0.70  0.73  0.78  0.84  0.89  0.97  1.00
test output
## calculate Cronbach's Alpha
alpha(cor_matrix_12)
#> [1] 0.9000045

## calculate eigenvalues of the correlation matrix
eigenvalues(cor_matrix_12, 1) |> round(3)

#> cor_matrix_12  is positive-definite
#>  [1] 6.964 1.743 1.087 0.658 0.567 0.377 0.254 0.159 0.127 0.051 0.014 0.001

Generate a dataframe of rating scales from a correlation matrix and predefined moments

makeItems()

makeItems() generates a dataframe of random discrete values from a scaled Beta distribution so the data replicate a rating scale, and are correlated close to a predefined correlation matrix.

Generally, means, standard deviations, and correlations are correct to two decimal places.

makeItems() is a wrapper function for

  • lfast(), which takes repeated samples selecting a vector that best fits the desired moments, and

  • lcor(), which rearranges values in each column of the dataframe so they closely match the desired correlation matrix.

To create the desired dataframe, the user must define the following parameters:

  • n: number of observations

  • dfMeans: a vector of length ‘k’ of desired means of each variable

  • dfSds: a vector of length ‘k’ of desired standard deviations of each variable

  • lowerbound: a vector of length ‘k’ of values for the lower bound of each variable (For example, ‘1’ for a 1-5 rating scale)

  • upperbound: a vector of length ‘k’ of values for the upper bound of each variable (For example, ‘5’ for a 1-5 rating scale)

  • cormatrix: a target correlation matrix with ‘k’ rows and ‘k’ columns.

makeItems() examples

## define parameters
n <- 128
dfMeans <- c(2.5, 3.0, 3.0, 3.5)
dfSds <- c(1.0, 1.0, 1.5, 0.75)
lowerbound <- rep(1, 4)
upperbound <- rep(5, 4)

corMat <- matrix(
  c(
    1.00, 0.25, 0.35, 0.45,
    0.25, 1.00, 0.70, 0.75,
    0.35, 0.70, 1.00, 0.85,
    0.45, 0.75, 0.85, 1.00
  ),
  nrow = 4, ncol = 4
)

## apply makeItems() function
df <- makeItems(
  n = n,
  means = dfMeans,
  sds = dfSds,
  lowerbound = lowerbound,
  upperbound = upperbound,
  cormatrix = corMat
)
#> Variable  1
#> reached maximum of 16384 iterations
#> Variable  2
#> reached maximum of 16384 iterations
#> Variable  3
#> best solution in 2371 iterations
#> Variable  4
#> reached maximum of 16384 iterations
#> 
#> Arranging data to match correlations
#> 
#> Successfully generated correlated variables

## test the function
head(df)
#>   V1 V2 V3 V4
#> 1  4  5  5  5
#> 2  5  5  5  5
#> 3  5  5  5  5
#> 4  3  5  5  5
#> 5  3  5  5  5
#> 6  3  5  5  5
tail(df)
#>     V1 V2 V3 V4
#> 123  2  1  1  2
#> 124  2  4  3  4
#> 125  3  2  5  4
#> 126  3  3  3  4
#> 127  3  2  2  3
#> 128  3  2  3  3

### means should be correct to two decimal places
dfmoments <- data.frame(
  mean = apply(df, 2, mean) |> round(3),
  sd = apply(df, 2, sd) |> round(3)
) |> t()

dfmoments
#>         V1    V2    V3    V4
#> mean 2.500 3.000 3.000 3.500
#> sd   1.004 1.004 1.501 0.753

### correlations should be correct to two decimal places
cor(df) |> round(3)
#>       V1   V2   V3    V4
#> V1 1.000 0.25 0.35 0.448
#> V2 0.250 1.00 0.70 0.750
#> V3 0.350 0.70 1.00 0.850
#> V4 0.448 0.75 0.85 1.000

Generate a dataframe from Cronbach’s Alpha and predefined moments

This is a two-step process:

  1. apply makeCorrAlpha() to generate a correlation matrix from desired alpha,

  2. apply makeItems() to generate rating-scale items from the correlation matrix and desired moments

Required parameters are:

  • k: number items/ columns

  • alpha: a target Cronbach’s Alpha.

  • n: number of observations

  • lowerbound: a vector of length ‘k’ of values for the lower bound of each variable

  • upperbound: a vector of length ‘k’ of values for the upper bound of each variable

  • means: a vector of length ‘k’ of desired means of each variable

  • sds: a vector of length ‘k’ of desired standard deviations of each variable

Step 1: Generate a correlation matrix

## define parameters
k <- 6
myAlpha <- 0.85

## generate correlation matrix
set.seed(42)
myCorr <- makeCorrAlpha(items = k, alpha = myAlpha)
#> correlation values consistent with desired alpha in 15193 iterations
#> The correlation matrix is positive definite

## display correlation matrix
myCorr |> round(3)
#>        [,1]   [,2]  [,3]  [,4]  [,5]  [,6]
#> [1,]  1.000 -0.153 0.116 0.430 0.438 0.473
#> [2,] -0.153  1.000 0.480 0.498 0.528 0.585
#> [3,]  0.116  0.480 1.000 0.602 0.625 0.641
#> [4,]  0.430  0.498 0.602 1.000 0.662 0.677
#> [5,]  0.438  0.528 0.625 0.662 1.000 0.684
#> [6,]  0.473  0.585 0.641 0.677 0.684 1.000

### checking Cronbach's Alpha
alpha(cormatrix = myCorr)
#> [1] 0.8500101

Step 2: Generate dataframe

## define parameters
n <- 256
myMeans <- c(2.75, 3.00, 3.00, 3.25, 3.50, 3.5)
mySds <- c(1.00, 0.75, 1.00, 1.00, 1.00, 1.5)
lowerbound <- rep(1, k)
upperbound <- rep(5, k)

## Generate Items
myItems <- makeItems(
  n = n, means = myMeans, sds = mySds,
  lowerbound = lowerbound, upperbound = upperbound,
  cormatrix = myCorr
)
#> Variable  1
#> best solution in 972 iterations
#> Variable  2
#> best solution in 17 iterations
#> Variable  3
#> best solution in 973 iterations
#> Variable  4
#> best solution in 4866 iterations
#> Variable  5
#> best solution in 336 iterations
#> Variable  6
#> best solution in 16769 iterations
#> 
#> Arranging data to match correlations
#> 
#> Successfully generated correlated variables

## resulting data frame
head(myItems)
#>   V1 V2 V3 V4 V5 V6
#> 1  3  1  1  1  1  1
#> 2  3  2  1  1  1  1
#> 3  2  2  1  1  2  1
#> 4  1  5  5  5  5  5
#> 5  1  5  5  5  5  5
#> 6  1  5  5  5  5  5
tail(myItems)
#>     V1 V2 V3 V4 V5 V6
#> 251  1  4  3  3  3  1
#> 252  2  4  3  2  4  3
#> 253  4  2  2  4  2  4
#> 254  3  3  4  2  4  5
#> 255  4  3  3  4  4  5
#> 256  4  2  4  4  4  5

## means and standard deviations
myMoments <- data.frame(
  means = apply(myItems, 2, mean) |> round(3),
  sds = apply(myItems, 2, sd) |> round(3)
) |> t()
myMoments
#>          V1    V2    V3    V4    V5    V6
#> means 2.750 3.000 3.000 3.250 3.500 3.500
#> sds   0.998 0.751 1.002 0.998 0.998 1.498

## Cronbach's Alpha of data frame
alpha(NULL, myItems)
#> [1] 0.8499588

Summary plots of new data frame

Summary of dataframe from makeItems() function

Summary of dataframe from makeItems() function


Generate a dataframe of rating-scale items from a summated rating scale

makeItemsScale()

  • makeItemsScale() generates a dataframe of rating-scale items from a summated rating scale and desired Cronbach’s Alpha.

To create the desired dataframe, the user must define the following parameters:

  • scale: a vector or dataframe of the summated rating scale. Should range from (‘lowerbound’ * ‘items’) to (‘upperbound’ * ‘items’)

  • lowerbound: lower bound of the scale item (example: ‘1’ in a ‘1’ to ‘5’ rating)

  • upperbound: upper bound of the scale item (example: ‘5’ in a ‘1’ to ‘5’ rating)

  • items: k, or number of columns to generate

  • alpha: desired Cronbach’s Alpha. Default = ‘0.8’

  • variance: quantile for selecting the combination of items that give summated scores. Must lie between ‘0’ (minimum variance) and ‘1’ (maximum variance). Default = ‘0.5’.

makeItemsScale() Example:

generate a summated scale
## define parameters
n <- 256
mean <- 3.00
sd <- 0.85
lowerbound <- 1
upperbound <- 5
items <- 4

## apply lfast() function
meanScale <- lfast(
  n = n, mean = mean, sd = sd,
  lowerbound = lowerbound, upperbound = upperbound,
  items = items
)
#> best solution in 900 iterations

## sum over all items
summatedScale <- meanScale * items
Summated scale distribution

Summated scale distribution

create items with makeItemsScale()

## apply makeItemsScale() function
newItems_1 <- makeItemsScale(
  scale = summatedScale,
  lowerbound = lowerbound,
  upperbound = upperbound,
  items = items
)
#> generate 256 rows
#> rearrange 4 values within each of 256 rows
#> Complete!
#> desired Cronbach's alpha = 0.8 (achieved alpha = 0.8004)

### First 10 observations and summated scale
head(cbind(newItems_1, summatedScale), 10)
#>    V1 V2 V3 V4 summatedScale
#> 1   4  1  1  3             9
#> 2   2  2  2  2             8
#> 3   5  1  2  4            12
#> 4   5  4  4  4            17
#> 5   5  2  3  3            13
#> 6   5  5  5  4            19
#> 7   4  1  2  4            11
#> 8   5  4  4  5            18
#> 9   5  3  4  5            17
#> 10  4  1  4  3            12

### correlation matrix
cor(newItems_1) |> round(2)
#>      V1   V2   V3   V4
#> V1 1.00 0.56 0.61 0.51
#> V2 0.56 1.00 0.60 0.33
#> V3 0.61 0.60 1.00 0.39
#> V4 0.51 0.33 0.39 1.00

### default Cronbach's alpha = 0.80
alpha(data = newItems_1) |> round(4)
#> [1] 0.8004

### calculate eigenvalues and print scree plot
eigenvalues(cor(newItems_1), 1) |> round(3)

#> cor(newItems_1)  is positive-definite
#> [1] 2.517 0.717 0.403 0.364

makeItemsScale() with same summated values and higher alpha

## apply makeItemsScale() function
newItems_2 <- makeItemsScale(
  scale = summatedScale,
  lowerbound = lowerbound,
  upperbound = upperbound,
  items = items,
  alpha = 0.9
)
#> generate 256 rows
#> rearrange 4 values within each of 256 rows
#> Complete!
#> desired Cronbach's alpha = 0.9 (achieved alpha = 0.8778)

### First 10 observations and summated scale
head(cbind(newItems_2, summatedScale), 10)
#>    V1 V2 V3 V4 summatedScale
#> 1   4  1  2  2             9
#> 2   3  1  2  2             8
#> 3   3  3  3  3            12
#> 4   5  3  5  4            17
#> 5   4  2  4  3            13
#> 6   5  4  5  5            19
#> 7   4  1  4  2            11
#> 8   5  4  5  4            18
#> 9   5  4  4  4            17
#> 10  4  1  4  3            12

### correlation matrix
cor(newItems_2) |> round(2)
#>      V1   V2   V3   V4
#> V1 1.00 0.58 0.68 0.64
#> V2 0.58 1.00 0.58 0.66
#> V3 0.68 0.58 1.00 0.73
#> V4 0.64 0.66 0.73 1.00

### requested Cronbach's alpha = 0.90
alpha(data = newItems_2) |> round(4)
#> [1] 0.8778

### calculate eigenvalues and print scree plot
eigenvalues(cor(newItems_2), 1) |> round(3)

#> cor(newItems_2)  is positive-definite
#> [1] 2.929 0.457 0.366 0.248

same summated values with lower alpha that may require higher variance

## apply makeItemsScale() function
newItems_3 <- makeItemsScale(
  scale = summatedScale,
  lowerbound = lowerbound,
  upperbound = upperbound,
  items = items,
  alpha = 0.6,
  variance = 0.7
)
#> generate 256 rows
#> rearrange 4 values within each of 256 rows
#> Complete!
#> desired Cronbach's alpha = 0.6 (achieved alpha = 0.5989)

### First 10 observations and summated scale
head(cbind(newItems_3, summatedScale), 10)
#>    V1 V2 V3 V4 summatedScale
#> 1   1  1  3  4             9
#> 2   1  4  2  1             8
#> 3   2  4  4  2            12
#> 4   3  5  4  5            17
#> 5   2  5  2  4            13
#> 6   4  5  5  5            19
#> 7   1  4  3  3            11
#> 8   5  5  5  3            18
#> 9   4  5  5  3            17
#> 10  2  3  2  5            12

### correlation matrix
cor(newItems_3) |> round(2)
#>      V1   V2   V3   V4
#> V1 1.00 0.45 0.45 0.09
#> V2 0.45 1.00 0.25 0.17
#> V3 0.45 0.25 1.00 0.22
#> V4 0.09 0.17 0.22 1.00

### requested Cronbach's alpha = 0.70
alpha(data = newItems_3) |> round(4)
#> [1] 0.5989

### calculate eigenvalues and print scree plot
eigenvalues(cor(newItems_3), 1) |> round(3)

#> cor(newItems_3)  is positive-definite
#> [1] 1.862 0.946 0.742 0.450

Create a multidimensional dataframe of correlated scale items

correlateScales()

Correlated rating-scale items generally are summed or averaged to create a measure of an “unobservable”, or “latent”, construct.

correlateScales() takes several such dataframes of rating-scale items and rearranges their rows so that the scales are correlated according to a predefined correlation matrix. Univariate statistics for each dataframe of rating-scale items do not change, but their correlations with rating-scale items in other dataframes do.

To run correlateScales(), parameters are:

  • dataframes: a list of ‘k’ dataframes to be rearranged and combined

  • scalecors: target correlation matrix - should be a symmetric k*k positive-semi-definite matrix, where ‘k’ is the number of dataframes

As with other functions in LikertMakeR, correlateScales() focuses on item and scale moments (mean and standard deviation) rather than on covariance structure. If you wish to simulate data for teaching or experimenting with Structural Equation modelling, then I recommend the sim.item() and sim.congeneric() functions from the psych package

correlateScales() examples

three attitudes and a behavioural intention

create dataframes of Likert-scale items
n <- 128
lower <- 1
upper <- 5

### attitude #1

#### generate a correlation matrix
cor_1 <- makeCorrAlpha(items = 4, alpha = 0.80)
#> correlation values consistent with desired alpha in 5326 iterations
#> The correlation matrix is positive definite

#### specify moments as vectors
means_1 <- c(2.5, 2.5, 3.0, 3.5)
sds_1 <- c(0.75, 0.85, 0.85, 0.75)

#### apply makeItems() function
Att_1 <- makeItems(
  n = n, means = means_1, sds = sds_1,
  lowerbound = rep(lower, 4), upperbound = rep(upper, 4),
  cormatrix = cor_1
)
#> Variable  1
#> reached maximum of 16384 iterations
#> Variable  2
#> best solution in 817 iterations
#> Variable  3
#> best solution in 438 iterations
#> Variable  4
#> reached maximum of 16384 iterations
#> 
#> Arranging data to match correlations
#> 
#> Successfully generated correlated variables

### attitude #2

#### generate a correlation matrix
cor_2 <- makeCorrAlpha(items = 5, alpha = 0.85)
#> correlation values consistent with desired alpha in 16653 iterations
#> The correlation matrix is positive definite

#### specify moments as vectors
means_2 <- c(2.5, 2.5, 3.0, 3.0, 3.5)
sds_2 <- c(0.75, 0.85, 0.75, 0.85, 0.75)

#### apply makeItems() function
Att_2 <- makeItems(
  n, means_2, sds_2,
  rep(lower, 5), rep(upper, 5),
  cor_2
)
#> Variable  1
#> reached maximum of 16384 iterations
#> Variable  2
#> best solution in 155 iterations
#> Variable  3
#> reached maximum of 16384 iterations
#> Variable  4
#> best solution in 1203 iterations
#> Variable  5
#> reached maximum of 16384 iterations
#> 
#> Arranging data to match correlations
#> 
#> Successfully generated correlated variables

### attitude #3

#### generate a correlation matrix
cor_3 <- makeCorrAlpha(items = 6, alpha = 0.90)
#> correlation values consistent with desired alpha in 1311 iterations
#> The correlation matrix is positive definite

#### specify moments as vectors
means_3 <- c(2.5, 2.5, 3.0, 3.0, 3.5, 3.5)
sds_3 <- c(0.75, 0.85, 0.85, 1.0, 0.75, 0.85)

#### apply makeItems() function
Att_3 <- makeItems(
  n, means_3, sds_3,
  rep(lower, 6), rep(upper, 6),
  cor_3
)
#> Variable  1
#> reached maximum of 16384 iterations
#> Variable  2
#> best solution in 83 iterations
#> Variable  3
#> best solution in 225 iterations
#> Variable  4
#> reached maximum of 16384 iterations
#> Variable  5
#> reached maximum of 16384 iterations
#> Variable  6
#> best solution in 856 iterations
#> 
#> Arranging data to match correlations
#> 
#> Successfully generated correlated variables

### behavioural intention
intent <- lfast(n, mean = 4.0, sd = 3, lowerbound = 0, upperbound = 10) |>
  data.frame()
#> best solution in 4313 iterations
names(intent) <- "int"
check properties of item dataframes
## Attitude #1
A1_moments <- data.frame(
  means = apply(Att_1, 2, mean) |> round(2),
  sds = apply(Att_1, 2, sd) |> round(2)
) |> t()

### Attitude #1 moments
A1_moments
#>         V1   V2   V3   V4
#> means 2.50 2.50 3.00 3.50
#> sds   0.75 0.85 0.85 0.75

### Attitude #1 correlations
cor(Att_1) |> round(2)
#>      V1   V2   V3   V4
#> V1 1.00 0.33 0.44 0.49
#> V2 0.33 1.00 0.52 0.57
#> V3 0.44 0.52 1.00 0.64
#> V4 0.49 0.57 0.64 1.00

### Attitude #1 cronbach's alpha
alpha(cor(Att_1)) |> round(3)
#> [1] 0.799

## Attitude #2
A2_moments <- data.frame(
  means = apply(Att_2, 2, mean) |> round(2),
  sds = apply(Att_2, 2, sd) |> round(2)
) |> t()

### Attitude #2 moments
A2_moments
#>         V1   V2   V3   V4   V5
#> means 2.50 2.50 3.00 3.00 3.50
#> sds   0.75 0.85 0.75 0.85 0.75

### Attitude #2 correlations
cor(Att_2) |> round(2)
#>      V1   V2   V3   V4   V5
#> V1 1.00 0.38 0.39 0.38 0.46
#> V2 0.38 1.00 0.47 0.50 0.64
#> V3 0.39 0.47 1.00 0.65 0.65
#> V4 0.38 0.50 0.65 1.00 0.80
#> V5 0.46 0.64 0.65 0.80 1.00

### Attitude #2 cronbach's alpha
alpha(cor(Att_2)) |> round(3)
#> [1] 0.85

## Attitude #3
A3_moments <- data.frame(
  means = apply(Att_3, 2, mean) |> round(2),
  sds = apply(Att_3, 2, sd) |> round(2)
) |> t()

### Attitude #3 moments
A3_moments
#>         V1   V2   V3 V4   V5   V6
#> means 2.50 2.50 3.00  3 3.50 3.50
#> sds   0.75 0.85 0.85  1 0.75 0.85

### Attitude #3 correlations
cor(Att_3) |> round(2)
#>      V1   V2   V3   V4   V5   V6
#> V1 1.00 0.29 0.36 0.44 0.47 0.49
#> V2 0.29 1.00 0.51 0.59 0.66 0.67
#> V3 0.36 0.51 1.00 0.67 0.68 0.78
#> V4 0.44 0.59 0.67 1.00 0.78 0.79
#> V5 0.47 0.66 0.68 0.78 1.00 0.80
#> V6 0.49 0.67 0.78 0.79 0.80 1.00

### Attitude #2 cronbach's alpha
alpha(cor(Att_3)) |> round(3)
#> [1] 0.9


## Behavioural Intention

intent_moments <- data.frame(
  mean = apply(intent, 2, mean) |> round(3),
  sd = apply(intent, 2, sd) |> round(3)
) |> t()

### Intention moments
intent_moments
#>        int
#> mean 4.000
#> sd   3.001
correlateScales parameters
### target scale correlation matrix
scale_cors <- matrix(
  c(
    1.0, 0.7, 0.6, 0.5,
    0.7, 1.0, 0.4, 0.3,
    0.6, 0.4, 1.0, 0.2,
    0.5, 0.3, 0.2, 1.0
  ),
  nrow = 4
)

### bring dataframes into a list
data_frames <- list("A1" = Att_1, "A2" = Att_2, "A3" = Att_3, "Int" = intent)

apply the correlateScales() function

### apply correlateScales() function
my_correlated_scales <- correlateScales(
  dataframes = data_frames,
  scalecors = scale_cors
)
#> scalecors  is positive-definite
#> New dataframe successfully created

plot the new correlated scale items

Check the properties of our derived dataframe
## data structure
str(my_correlated_scales)
#> 'data.frame':    128 obs. of  16 variables:
#>  $ A1_1 : num  1 3 4 1 1 4 4 1 3 4 ...
#>  $ A1_2 : num  1 5 4 1 2 4 4 1 4 4 ...
#>  $ A1_3 : num  1 5 5 1 1 5 5 2 5 4 ...
#>  $ A1_4 : num  2 5 5 2 2 5 5 2 5 5 ...
#>  $ A2_1 : num  1 4 4 1 1 4 4 1 4 3 ...
#>  $ A2_2 : num  1 4 5 1 1 4 4 1 4 2 ...
#>  $ A2_3 : num  2 4 5 2 2 4 4 2 4 4 ...
#>  $ A2_4 : num  1 5 5 1 1 4 4 2 4 4 ...
#>  $ A2_5 : num  2 5 5 2 2 5 5 2 5 4 ...
#>  $ A3_1 : num  1 4 4 1 1 4 4 2 4 3 ...
#>  $ A3_2 : num  1 4 4 1 2 4 4 1 4 3 ...
#>  $ A3_3 : num  1 5 5 1 1 5 5 2 4 3 ...
#>  $ A3_4 : num  1 5 5 1 1 5 4 1 4 4 ...
#>  $ A3_5 : num  2 5 5 2 2 5 5 2 5 4 ...
#>  $ A3_6 : num  1 5 5 1 2 5 5 2 5 4 ...
#>  $ Int_1: num  0 10 10 0 0 10 9 0 9 9 ...
## eigenvalues of dataframe correlations
Cor_Correlated_Scales <- cor(my_correlated_scales)
eigenvalues(cormatrix = Cor_Correlated_Scales, scree = TRUE) |> round(2)

#> Cor_Correlated_Scales  is positive-definite
#>  [1] 6.97 2.35 1.12 0.82 0.73 0.70 0.58 0.53 0.45 0.38 0.33 0.30 0.27 0.19 0.14
#> [16] 0.13
#### Eigenvalues of predictor variable items only
Cor_Attitude_items <- cor(my_correlated_scales[, -16])
eigenvalues(cormatrix = Cor_Attitude_items, scree = TRUE) |> round(2)

#> Cor_Attitude_items  is positive-definite
#>  [1] 6.82 2.31 0.87 0.79 0.73 0.63 0.53 0.51 0.38 0.33 0.32 0.30 0.19 0.14 0.14

Helper functions

likertMakeR() includes two additional functions that may be of help when examining parameters and output.

  • alpha() calculates Cronbach’s Alpha from a given correlation matrix or a given dataframe

  • eigenvalues() calculates eigenvalues of a correlation matrix, a report on whether the correlation matrix is positive definite, and produces an optional scree plot.

alpha()

alpha() accepts, as input, either a correlation matrix or a dataframe. If both are submitted, then the correlation matrix is used by default, with a message to that effect.

alpha() examples

## define parameters
df <- data.frame(
  V1 = c(4, 2, 4, 3, 2, 2, 2, 1),
  V2 = c(3, 1, 3, 4, 4, 3, 2, 3),
  V3 = c(4, 1, 3, 5, 4, 1, 4, 2),
  V4 = c(4, 3, 4, 5, 3, 3, 3, 3)
)

corMat <- matrix(
  c(
    1.00, 0.35, 0.45, 0.75,
    0.35, 1.00, 0.65, 0.55,
    0.45, 0.65, 1.00, 0.65,
    0.75, 0.55, 0.65, 1.00
  ),
  nrow = 4, ncol = 4
)

## apply function examples
alpha(cormatrix = corMat)
#> [1] 0.8395062
alpha(data = df)
#> [1] 0.8026658
alpha(NULL, df)
#> [1] 0.8026658
alpha(corMat, df)
#> Alert: 
#> Both cormatrix and data present.
#>                 
#> Using cormatrix by default.
#> [1] 0.8395062

eigenvalues()

eigenvalues() calculates eigenvalues of a correlation matrix, reports on whether the matrix is positive-definite, and optionally produces a scree plot.

eigenvalues() examples

## define parameters
correlationMatrix <- matrix(
  c(
    1.00, 0.25, 0.35, 0.45,
    0.25, 1.00, 0.70, 0.75,
    0.35, 0.70, 1.00, 0.85,
    0.45, 0.75, 0.85, 1.00
  ),
  nrow = 4, ncol = 4
)

## apply function
evals <- eigenvalues(cormatrix = correlationMatrix)
#> correlationMatrix  is positive-definite

print(evals)
#> [1] 2.7484991 0.8122627 0.3048151 0.1344231
eigenvalues() function with optional scree plot
evals <- eigenvalues(correlationMatrix, 1)

#> correlationMatrix  is positive-definite
print(evals)
#> [1] 2.7484991 0.8122627 0.3048151 0.1344231

Alternative methods & packages

LikertMakeR is intended for synthesising & correlating rating-scale data with means, standard deviations, and correlations as close as possible to predefined parameters. If you don’t need your data to be close to exact, then other options may be faster or more flexible.

Different approaches include:

  • sampling from a truncated normal distribution

  • sampling with a predetermined probability distribution

  • marginal model specification

sampling from a truncated normal distribution

Data are sampled from a normal distribution, and then truncated to suit the rating-scale boundaries, and rounded to set discrete values as we see in rating scales.

See Heinz (2021) for an excellent and short example using the following packages:

  • truncnorm

  • faux

  • See also the rLikert() function from the excellent latent2likert package, Lalovic (2024), for an approach using optimal discretization and skew-normal distribution. latent2likert() converts continuous latent variables into ordinal categories to generate Likert scale item responses.

sampling with a predetermined probability distribution

  • the following code will generate a vector of values with approximately the given probabilities. Good for simulating a single item.
n <- 128
sample(1:5, n,
  replace = TRUE,
  prob = c(0.1, 0.2, 0.4, 0.2, 0.1)
)

Factor Models: Classical Test Theory (CTT)

The psych package has several excellent functions for simulating rating-scale data based on factor loadings. These focus on factor and item correlations rather than item moments. Highly recommended.

Also:

simsem has many functions for simulating and testing data for application in Structural Equation modelling. See examples at https://simsem.org/

General data simulation

simpr provides a general, simple, and tidyverse-friendly framework for generating simulated data, fitting models on simulations, and tidying model results.


References

Grønneberg, S., Foldnes, N., & Marcoulides, K. M. (2022). covsim: An R Package for Simulating Non-Normal Data for Structural Equation Models Using Copulas. Journal of Statistical Software, 102(1), 1–45. doi:10.18637/jss.v102.i03

Heinz, A. (2021), Simulating Correlated Likert-Scale Data In R: 3 Simple Steps (blog post) https://glaswasser.github.io/simulating-correlated-likert-scale-data/

Lalovic M (2024). latent2likert: Converting Latent Variables into Likert Scale Responses. R package version 1.2.2, https://latent2likert.lalovic.io/.

Matta, T.H., Rutkowski, L., Rutkowski, D. & Liaw, Y.L. (2018), lsasim: an R package for simulating large-scale assessment data. Large-scale Assessments in Education 6, 15. doi:10.1186/s40536-018-0068-8

Pornprasertmanit, S., Miller, P., & Schoemann, A. (2021). simsem: R package for simulated structural equation modeling https://simsem.org/

Revelle, W. (in prep) An introduction to psychometric theory with applications in R. To be published by Springer. (working draft available at https://personality-project.org/r/book/ )

Touloumis, A. (2016), Simulating Correlated Binary and Multinomial Responses under Marginal Model Specification: The SimCorMultRes Package, The R Journal 8:2, 79-91. doi:10.32614/RJ-2016-034