Recent studies showed that e-commerce increased by 43%, rising from $571.2 billion dollars in 2019 to $815.4 billion dollars in 2020 because of COVID-19 (Brewster, 2022). Product pictures, as the first thing to see online, play important roles in purchasing. In this study, an experiment about the comparison of different backgrounds of product photos from two websites shows which kind of background has more impact on customers. The authors would like to know whether the products with photos in the real-world background will have better purchase rates and add-to-cart rates than the products with photos in white backgrounds. For designing the experiment, the researchers search for related literature and many articles claim that visual impact is an important factor that will affect the phrasing decisions, except for the price. The researchers of online customer behaviors consider that rather than the number on the website, interesting pictures are more attractive to customers. The authors would like to research whether products in the real-world background will bring more visual impact to customers and lead to higher purchase rates and add-to-cart rates. The authors plan to design two experimental websites for a start-up furniture company, one with white background products’ photos, and one with real-world background products’ photos. Other factors of the two websites will be controlled to the same. There will be around 3000 participants in this experiment. The customers will randomly and evenly enter one of the websites and the data of purchase rate and add-to-cart rate will be recorded.
In order to measure how backgrounds can affect purchase rate and add-to-cart rate quantitatively, this experiment will use a one-tailed proportion test to separately stimulate with two research questions. Each website will randomly have 1500 customers. Customers’ purchase rate and add-to-cart rate will be recorded and the experiment will last for three months for collecting enough data. All decisions from customers will not be controlled or guided. The product photos with real-world backgrounds will be the treatment group and the product photos with white backgrounds will be the control group. For two research questions, each one will have two possibilities during the data stimulation, one with no effect for two groups and one with an expected effect for two groups. This study shows that product photos with real-world backgrounds have higher purchase rates and add-to-cart rates than product photos with white backgrounds. Real-world images have a higher proportion during the experiment. Therefore, we recommend that e-commerce merchants should use real-world images as backgrounds rather than white backgrounds.
The COVID-19 pandemic forced people to shop more online, which led to the number of people who chose e-commerce highly increased from 2019 to now. Many studies show that the internet transition volumes have significant growth. In the furniture industry, e-commerce also has an 11% growth in 2021(Statista Research Department, 2022). People were more inclined to shop online, so the attraction of products shown online became more important to both merchants and customers. The visual impact of the images of products can be changed by many details. It is meaningful to know what factors in product images changed can attract more customers to click and purchase. Even though there are many factors in product images that can affect the degree of attraction, such as clearness and close-up details, the backgrounds of products, as the second biggest part in images, except for products themselves, are the important factors that deserve to be researched.
Present backgrounds of product images are basically two kinds,
real-world backgrounds, and white backgrounds. The null hypothesis will
be that products with real-world background images do not influence
consumer behavior. The alternative hypothesis will be that products with
real-world background images influence consumer behavior. The sample
will randomly and evenly be selected from students in different states
in the U.S. The experiment will cooperate with a furniture shopping
website and there will be two kinds of images with contrasting
backgrounds for the same products. The purchasing rate and the rate of
adding to the shopping cart will all be analyzed to prove one of the
hypotheses. The data on the size and price of the furniture will also be
collected to test whether there are other relationships. This study aims
to answer the following questions:
Research Question 1: Does a product with a real world
background image have a higher purchase rate than a product with a white
background?
Description: We compare the purchase rate of products
with different backgrounds of product photos. RT represents
the purchasing rate of the product in the treatment group which uses
product photos with a real world background, and RC
represents the purchasing rate of the product in the control group which
uses product photos with a white background. The suggested effect size
is 10%.
Null Hypothesis (H0a): RT <= RC
Alternative Hypothesis (H1a): RT > RC
Research Question 2: Does a product with a real
world background image have a higher add-to-cart rate than a product
with a white background?
Description: We compare the add-to-cart rate of
products with different backgrounds of product photos. RT
represents the add-to-cart rate of the product in the treatment group
which uses product photos with a real world background, and
RC represents the add-to-cart rate of the product in the
control group which uses product photos with a white background. The
suggested effect size is 10%.
Null Hypothesis (H0b): RT <= RC
Alternative Hypothesis (H1b): RT > RC
E-commerce is now a common format of retail trade and a quickly
expanding industry as well. The competition on B2C online shopping
websites is becoming increasingly fierce. A simple search for one
certain product on an online shopping platform can generate thousands of
products supplied by different sellers. A major advantage of online
shopping is that it is convenient. However, a significant disadvantage
is that the product cannot be touched or felt. Therefore, images play an
important role in online shopping. Lots of present literature indicated
that consumers’ buying decisions can be affected both directly or
indirectly by the attention that consumers put on the product.Other than
the price, visual aspects of the product can also affect a consumer’s
purchase decision to a large extent (Gonchigjav, 2020).It was found that
as many as 87.6% of respondents considered the product image to be
central to their shopping experience (Ergonode, 2022). Researchers
studying online consumer behavior believe that most digital content is
consumed by scrolling - until an interesting graphic catches the eyes of
the user, it is difficult to keep them interested for even a short
period of time(Goswami, 2011).
Photos that perfectly illustrate functional characteristics of a product
and are presented in an acceptable arrangement that suits a certain
category or industry have a greater impact. Good product images boost
sales and help to develop the overall brand image as professional,
innovative, and detail-oriented, which is particularly essential in the
case of furniture. Among different visual marketing strategies,
designing product photos is regarded as a crucial strategy to improve
the effectiveness of communicating product information, building the
positive first image of the product, as well as shaping consumers’
buying decisions (Jeon & Yoh, 2014). Colors could be a variable that
affects consumers precision. When product color and background color are
presented together in the consumer’s vision, these colors will
constitute the visual effect of the product–background color
combination. (Huang ,2004) .If the product is shot in a real life
background, there are studies that show relevance between the real world
scenario and consumer attractiveness. In the experiment in the research
(Kim et al., 2014) The results showed that a product presentation was
significantly different in attractiveness, informativeness,
satisfaction, and repurchase intention after controlling apparel items
and model. This product presentation in everyday life had greater mean
values than product presentation with the posing model. In other words,
background could be an important variable to make the purchase.
Population of Interest
The authors of this research plan are mainly interested in how the
backgrounds of selling products will influence the purchasing pattern.
This study would focus on everyone in the United States who are willing
to buy furniture online as the population of interest.
Sample Selection
The authors plan to randomly and evenly select college students from
different states in the US using interval sampling method. The company
automatically gives every other customer coming to buy products the
experimental websites. The study would use IP tracking to help make sure
that each user is consistently in either the control group or treatment
group. The sample size is designed to have 3000+ participants. The
operation procedure will be conducted through cooperating with an online
furniture-selling website, ideally a start-up company. The reason for
choosing start-up companies is because large companies might already
have their own technical departments doing similar things to this study,
and also, start-up companies would be willing to cooperate because this
study may potentially boost the sales. The online furniture-selling
company would help to make two sets of websites that contain the same
products, one with real world background pictures of the product
(treatment group), while the other has pure white backgrounds of the
product (control group). The two websites are both real ones which means
the customers can actually purchase the order through it. The only
difference between the two websites are the pictures of products with
different backgrounds, other than that, everything is controlled to be
the same. The participants will be randomly selected for viewing one of
the two websites. This study aims to collect the following data to
analyze: the percentage that participants would add the products to
their shopping cart and the purchase rate. The outcome will be based on
the purchasing of the products within the first 48 hours from the first
view. The control group in this study is the product pictures with white
backgrounds, and the experimental group is the product pictures with
real-world setting backgrounds.
Operational Procedures
The participants in the study will be the study group members as well as
the authors of this research. Since this is an experimental research,
the participants are all the consumers that enter the website and the
system will randomly distribute them to link to one of the two website
pages. Each participant will be randomly selected to one of the two
website pages and continue their shopping just like a normal shopping
experience. One of the websites is designed to have product pictures
with white backgrounds. Another website is designed to have product
pictures with real life backgrounds. Comparing the behavior of the two
groups of the customers, we expect to find the insights whether
background pictures affect customers behavior. The two websites are
constructed by the startup company and we cooperate with them on a
reciprocity contract. They provide the construction of the website and
the data collection and we conduct the analysis of the data. The
experiment will be conducted directly at the startup’s furniture online
store website. On top of that, the experiment is 24 hours non stop
during the experiment duration.
Brief Schedule
The whole research will be conducted in three phases. The first phase is
an experiment cooperating with a startup company. The second phase is
the data collection. The last phase is the analysis and the draw of the
conclusion. The whole study will take five months to complete. The
experiment will be started on July 15th, 2022, and end on October 15th
2022. The whole experiment will take three months to complete. This
duration is discussed with the startup company and is chosen because
history sales data reveals that there is a selling peak during the
summer season. The whole study is estimated in three months however it
depends on whether we have collected enough samples. The experiment
duration would be extended until we collect the sample amount we
expected. The data collection is estimated to be finished in one month,
which is in the middle of November. Lastly, the final phase will be a
conclusion in which we will analyze the result using relevant
statistical methods. We will also address limitations and problems that
occurred and directions for future research improvements.
Data Collection
When our experiment period is over, we will collect the shopping cart
rate and the purchase rate of each customer in the two groups that had
participated in the experiment. We use interval sampling to drive
customers into one of the two websites. The customer will be equally
distributed in two groups. We will then use both Excel and R to record
and analyze the data.
Data Security
As we are collaborating with a startup company, we will be accessing the
data stored in the company’s database which will be using asymmetric
encryption to protect the privacy of the customer data.
Variables
We have two research questions and will stimulate two scenarios of each. In the beginning, we will stimulate the statistics power first and then start the stimulations of the sample groups. For the two research questions, we randomly assign 3000 samples into two groups, each will have 1500 samples. The sample will be in binomial normal distribution. For research question 1, we set a proportion of 0.15 in the control groups of both the scenarios and treatment group as 0.15 and 0.25 in scenario 1 and 2 respectively. For research question 2, we set a proportion of 0.25 in the control groups of both the scenarios, as the proportion of adding to cart is higher than purchasing, and treatment group as 0.25 and 0.35 in scenario 1 and 2 respectively. We will use a proportion T test to process the stimulation. We will then conduct a repetition of the stimulation. After the simulations have been done, we will analyze the mean effects, confidence level, and the percentage of type 1 and type 2 errors. We will compare the indicators by chart to compare the difference.
The sample size we use is 3000 and 1500 in each group. We decided this sample size to get higher statistical power. We use power test to get our statistical power. The statistical power is 0.86 with significance level 0.05.
There are some possible recommendations related to the study. First of all, if we fail to reject the null hypothesis, which means that the pictures with real-world background don’t work better than those with white backgrounds, we would suggest to keep using the white background photo for online shopping. However, if the statistical test shows that the null hypothesis should be rejected, then we may conclude that the pictures with real-world background may generate more profits for retail companies, then we would suggest adding more real-life pictures, and possibly hire some photographers to achieve the goal.
Our experiment was designed to randomly select participants and redirect them to either the website for the control group which shows them product photos with blank backgrounds or the website for the treatment group which shows them product photos with real-life backgrounds. Although such a setting can help us know the real reactions of the company’s target customers to different backgrounds of product photos, we are not able to collect and control some characteristics of the sample such as customers’ demographic information including age, income, and marital status which can affect consumers’ purchasing behavior. Also, as each participant will be linked to only one type of website, we cannot measure the effect of different backgrounds of product photos on the click rate. Additionally, since furniture is more expensive and depreciates at a slower rate than other products, the average purchase rate of furniture would also be lower, which leads to a small effect size and therefore requires a larger sample size to have enough statistical power. As a result, collecting enough sample data would be a costly part of the research considering the budget limit of a startup company.
library(pwr)
library(data.table)
library(DT)
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:data.table':
##
## between, first, last
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(kableExtra)
##
## Attaching package: 'kableExtra'
## The following object is masked from 'package:dplyr':
##
## group_rows
pwr.test = pwr.t2n.test(n1 = 1500, n2 = 1500, d = 0.1, sig.level = 0.05,
alternative = "greater")
pwr.test
##
## t test power calculation
##
## n1 = 1500
## n2 = 1500
## d = 0.1
## sig.level = 0.05
## power = 0.8628341
## alternative = greater
n = 3000
set.seed(1031)
#By randomly assign 3000 into two groups, each would have 1500.
purchase_rate_S1.dat = data.table(Group = c(rep.int(x = "Treatment", times = n/2), rep.int(x = "Control", times = n/2)))
purchase_rate_S1.dat[Group == "Control", PR := round(x = rbinom(n = 1500, size = 1, prob= 0.15))]
purchase_rate_S1.dat[Group == "Treatment", PR := round(x = rbinom(n = 1500, size = 1, prob = 0.15))]
datatable(data = purchase_rate_S1.dat)
table(purchase_rate_S1.dat)
## PR
## Group 0 1
## Control 1282 218
## Treatment 1286 214
#Number of people in Treatment group with Purchase Rate as 1
purchase_rate_treatment_S1 = purchase_rate_S1.dat%>%
filter(PR==1, Group == 'Treatment')%>%
nrow()
#Number of people in Control group with Purchase Rate as 1
purchase_rate_control_S1 = purchase_rate_S1.dat%>%
filter(PR==1, Group == 'Control')%>%
nrow()
Applying the two sample proportion test
purchase_rate_S1 = prop.test(x = c(purchase_rate_treatment_S1,purchase_rate_control_S1), n = c(n/2, n/2),alternative = 'greater'); purchase_rate_S1
##
## 2-sample test for equality of proportions with continuity correction
##
## data: c(purchase_rate_treatment_S1, purchase_rate_control_S1) out of c(n/2, n/2)
## X-squared = 0.024338, df = 1, p-value = 0.562
## alternative hypothesis: greater
## 95 percent confidence interval:
## -0.02442018 1.00000000
## sample estimates:
## prop 1 prop 2
## 0.1426667 0.1453333
analyze.experiment <- function(the.dat) {
require(data.table)
setDT(the.dat)
the.test <- t.test(x = the.dat[Group == "Treatment",
PR], y = the.dat[Group == "Control", PR], alternative = "greater")
the.effect <- the.test$estimate[1] - the.test$estimate[2]
upper.bound <- the.test$conf.int[2]
p <- the.test$p.value
result <- data.table(effect = the.effect, upper_ci = upper.bound,
p = p)
return(result)
}
analyze.experiment(purchase_rate_S1.dat)
## effect upper_ci p
## 1: -0.002666667 Inf 0.5823553
B <- 1000
n <- 3000
RNGversion(vstr = 3.6)
set.seed(1031)
Experiment <- 1:B
Group <- c(rep.int(x = "Treatment", times = n/2), rep.int(x = "Control", times = n/2))
sim.dat_r1s1 <- as.data.table(expand.grid(Experiment = Experiment, Group = Group))
setorderv(x = sim.dat_r1s1, cols = c("Experiment", "Group"), order = c(1,1))
sim.dat_r1s1[Group == "Control", PR := round(x = rbinom(n = .N, size = 1, prob= 0.15), digits = 1)]
sim.dat_r1s1[Group == "Treatment", PR := round(x = rbinom(n = .N, size = 1, prob = 0.15), digits = 1)]
dim(sim.dat_r1s1)
## [1] 3000000 3
exp.results_r1s1 <- sim.dat_r1s1[, analyze.experiment(the.dat = .SD),
keyby = "Experiment"]
DT::datatable(data = round(x = exp.results_r1s1[1:100, ], digits = 3),
rownames = F)
pvalue = mean(exp.results_r1s1$p)
table_r1s1 <- data.table(Research_Question = "Question 1",
Scenario = "No Effect",
Mean_Effect_in_Simulated_Data = mean(exp.results_r1s1$effect),
Ninety_Five_Percent_Confidence_Interval_of_Mean_Effect = mean(exp.results_r1s1$upper_ci),
Percentage_of_False_Positives = exp.results_r1s1[, mean(p < 0.05)],
Percentage_of_True_Negative = 1-exp.results_r1s1[, mean(p < 0.05)],
Percentage_of_False_Negative = "",
Percentage_of_True_Positives = ""
)
table_r1s1
## Research_Question Scenario Mean_Effect_in_Simulated_Data
## 1: Question 1 No Effect 0.00047
## Ninety_Five_Percent_Confidence_Interval_of_Mean_Effect
## 1: Inf
## Percentage_of_False_Positives Percentage_of_True_Negative
## 1: 0.052 0.948
## Percentage_of_False_Negative Percentage_of_True_Positives
## 1:
#Purchase Rate - > Scenario 2: An expected effect
n <- 3000
set.seed(1031)
purchase_rate_S2.dat <- data.table(Group = c(rep.int(x = "Treatment", times = n/2), rep.int(x = "Control", times = n/2)))
purchase_rate_S2.dat[Group == "Control", PR := round(x = rbinom(n = 1500, size = 1, prob= 0.15))]
purchase_rate_S2.dat[Group == "Treatment", PR := round(x = rbinom(n = 1500, size = 1, prob = 0.25))]
datatable(data = purchase_rate_S2.dat)
table(purchase_rate_S2.dat)
## PR
## Group 0 1
## Control 1282 218
## Treatment 1120 380
# Number of people in Treatment group with Purchase Rate as 1
purchase_rate_treatment_S2 = purchase_rate_S2.dat%>%
filter(PR==1, Group== 'Treatment')%>%
nrow()
# Number of people in Control group with Purchase Rate as 1
purchase_rate_control_S2 = purchase_rate_S2.dat%>%
filter(PR==1, Group == 'Control')%>%
nrow()
Applying the two sample proportion test
purchase_rate_S2 = prop.test(x = c(purchase_rate_treatment_S2 ,purchase_rate_control_S2), n = c(n/2, n/2),alternative = 'greater'); purchase_rate_S2
##
## 2-sample test for equality of proportions with continuity correction
##
## data: c(purchase_rate_treatment_S2, purchase_rate_control_S2) out of c(n/2, n/2)
## X-squared = 54.138, df = 1, p-value = 9.347e-14
## alternative hypothesis: greater
## 95 percent confidence interval:
## 0.083559 1.000000
## sample estimates:
## prop 1 prop 2
## 0.2533333 0.1453333
analyze.experiment <- function(the.dat) {
require(data.table)
setDT(the.dat)
the.test <- t.test(x = the.dat[Group == "Treatment",
PR], y = the.dat[Group == "Control", PR], alternative = "greater")
the.effect <- the.test$estimate[1] - the.test$estimate[2]
upper.bound <- the.test$conf.int[2]
p <- the.test$p.value
result <- data.table(effect = the.effect, upper_ci = upper.bound,
p = p)
return(result)
}
analyze.experiment(purchase_rate_S2.dat)
## effect upper_ci p
## 1: 0.108 Inf 5.304356e-14
B <- 1000
n <- 3000
RNGversion(vstr = 3.6)
set.seed(1031)
Experiment <- 1:B
Group <- c(rep.int(x = "Treatment", times = n/2), rep.int(x = "Control", times = n/2))
sim.dat_r1s2 <- as.data.table(expand.grid(Experiment = Experiment, Group = Group))
setorderv(x = sim.dat_r1s2, cols = c("Experiment", "Group"), order = c(1,1))
sim.dat_r1s2[Group == "Control", PR := round(x = rbinom(n = .N, size = 1, prob= 0.15), digits = 1)]
sim.dat_r1s2[Group == "Treatment", PR := round(x = rbinom(n = .N, size = 1, prob = 0.25), digits = 1)]
dim(sim.dat_r1s2)
## [1] 3000000 3
exp.results_r1s2 <- sim.dat_r1s2[, analyze.experiment(the.dat = .SD),
keyby = "Experiment"]
DT::datatable(data = round(x = exp.results_r1s2[1:100, ], digits = 3),
rownames = F)
pvalue = mean(exp.results_r1s1$p)
table_r1s2 <- data.table(Research_Question = "Question 1",
Scenario = "Expected Effect",
Mean_Effect_in_Simulated_Data = mean(exp.results_r1s2$effect),
Ninety_Five_Percent_Confidence_Interval_of_Mean_Effect = mean(exp.results_r1s2$upper_ci),
Percentage_of_False_Positives = "",
Percentage_of_True_Negative = "",
Percentage_of_False_Negative = 1-exp.results_r1s2[, mean(p < 0.05)],
Percentage_of_True_Positives = exp.results_r1s2[, mean(p < 0.05)]
)
table_r1s2
## Research_Question Scenario Mean_Effect_in_Simulated_Data
## 1: Question 1 Expected Effect 0.1002933
## Ninety_Five_Percent_Confidence_Interval_of_Mean_Effect
## 1: Inf
## Percentage_of_False_Positives Percentage_of_True_Negative
## 1:
## Percentage_of_False_Negative Percentage_of_True_Positives
## 1: 0 1
n <- 3000
set.seed(1031)
#By randomly assign 3000 into two groups, each would have 1500.
add_to_cart_S1.dat <- data.table(Group = c(rep.int(x = "Treatment", times = n/2), rep.int(x = "Control", times = n/2)))
add_to_cart_S1.dat[Group == "Control", ACR := round(x = rbinom(n = 1500, size = 1, prob= 0.25))]
add_to_cart_S1.dat[Group == "Treatment", ACR := round(x = rbinom(n = 1500, size = 1, prob = 0.25))]
datatable(data = add_to_cart_S1.dat)
table(add_to_cart_S1.dat)
## ACR
## Group 0 1
## Control 1127 373
## Treatment 1120 380
#Number of people in Treatment group with Add to Cart Rate as 1
add_to_cart_treatment_S1 = add_to_cart_S1.dat%>%
filter(ACR==1, Group== 'Treatment')%>%
nrow()
#Number of people in Control group with Add to Cart Rate as 1
add_to_cart_control_S1 = add_to_cart_S1.dat%>%
filter(ACR==1, Group == 'Control')%>%
nrow()
Applying the two sample proportion test
add_to_cart_S1 = prop.test(x = c(add_to_cart_treatment_S1,add_to_cart_control_S1), n = c(n/2, n/2),alternative = 'greater'); add_to_cart_S1
##
## 2-sample test for equality of proportions with continuity correction
##
## data: c(add_to_cart_treatment_S1, add_to_cart_control_S1) out of c(n/2, n/2)
## X-squared = 0.06383, df = 1, p-value = 0.4003
## alternative hypothesis: greater
## 95 percent confidence interval:
## -0.02204163 1.00000000
## sample estimates:
## prop 1 prop 2
## 0.2533333 0.2486667
analyze.experiment <- function(the.dat) {
require(data.table)
setDT(the.dat)
the.test <- t.test(x = the.dat[Group == "Treatment",
ACR], y = the.dat[Group == "Control", ACR], alternative = "greater")
the.effect <- the.test$estimate[1] - the.test$estimate[2]
upper.bound <- the.test$conf.int[2]
p <- the.test$p.value
result <- data.table(effect = the.effect, upper_ci = upper.bound,
p = p)
return(result)
}
analyze.experiment(add_to_cart_S1.dat)
## effect upper_ci p
## 1: 0.004666667 Inf 0.384137
B <- 1000
n <- 3000
RNGversion(vstr = 3.6)
set.seed(1031)
Experiment <- 1:B
Group <- c(rep.int(x = "Treatment", times = n/2), rep.int(x = "Control", times = n/2))
sim.dat_r2s1 <- as.data.table(expand.grid(Experiment = Experiment, Group = Group))
setorderv(x = sim.dat_r2s1, cols = c("Experiment", "Group"), order = c(1,1))
sim.dat_r2s1[Group == "Control", ACR := round(x = rbinom(n = .N, size = 1, prob= 0.25), digits = 1)]
sim.dat_r2s1[Group == "Treatment", ACR := round(x = rbinom(n = .N, size = 1, prob = 0.25), digits = 1)]
dim(sim.dat_r2s1)
## [1] 3000000 3
exp.results_r2s1 <- sim.dat_r2s1[, analyze.experiment(the.dat = .SD),
keyby = "Experiment"]
DT::datatable(data = round(x = exp.results_r2s1[1:100, ], digits = 3),
rownames = F)
pvalue = mean(exp.results_r2s1$p)
table_r2s1 <- data.table(Research_Question = "Question 2",
Scenario = "No Effect",
Mean_Effect_in_Simulated_Data = mean(exp.results_r2s1$effect),
Ninety_Five_Percent_Confidence_Interval_of_Mean_Effect = mean(exp.results_r2s1$upper_ci),
Percentage_of_False_Positives = exp.results_r2s1[, mean(p < 0.05)],
Percentage_of_True_Negative = 1-exp.results_r2s1[, mean(p < 0.05)],
Percentage_of_False_Negative = "",
Percentage_of_True_Positives = ""
)
table_r2s1
## Research_Question Scenario Mean_Effect_in_Simulated_Data
## 1: Question 2 No Effect 0.0001926667
## Ninety_Five_Percent_Confidence_Interval_of_Mean_Effect
## 1: Inf
## Percentage_of_False_Positives Percentage_of_True_Negative
## 1: 0.06 0.94
## Percentage_of_False_Negative Percentage_of_True_Positives
## 1:
n <- 3000
set.seed(1031)
#By randomly assign 3000 into two groups, each would have 1500.
add_to_cart_S2.dat <- data.table(Group = c(rep.int(x = "Treatment", times = n/2), rep.int(x = "Control", times = n/2)))
add_to_cart_S2.dat[Group == "Control", ACR := round(x = rbinom(n = 1500, size = 1, prob= 0.25))]
add_to_cart_S2.dat[Group == "Treatment", ACR := round(x = rbinom(n = 1500, size = 1, prob = 0.35))]
datatable(data = add_to_cart_S2.dat)
table(add_to_cart_S2.dat)
## ACR
## Group 0 1
## Control 1127 373
## Treatment 972 528
#Number of people in Treatment group with Add to Cart Rate as 1
add_to_cart_treatment_S2 = add_to_cart_S2.dat%>%
filter(ACR==1, Group== 'Treatment')%>%
nrow()
#Number of people in Control group with Add to Cart Rate as 1
add_to_cart_control_S2 = add_to_cart_S2.dat%>%
filter(ACR==1, Group == 'Control')%>%
nrow()
Applying the two sample proportion test
add_to_cart_S2 = prop.test(x = c(add_to_cart_treatment_S2 ,add_to_cart_control_S2), n = c(n/2, n/2),alternative = 'greater');add_to_cart_S2
##
## 2-sample test for equality of proportions with continuity correction
##
## data: c(add_to_cart_treatment_S2, add_to_cart_control_S2) out of c(n/2, n/2)
## X-squared = 37.621, df = 1, p-value = 4.297e-10
## alternative hypothesis: greater
## 95 percent confidence interval:
## 0.07530971 1.00000000
## sample estimates:
## prop 1 prop 2
## 0.3520000 0.2486667
analyze.experiment <- function(the.dat) {
require(data.table)
setDT(the.dat)
the.test <- t.test(x = the.dat[Group == "Treatment",
ACR], y = the.dat[Group == "Control", ACR], alternative = "greater")
the.effect <- the.test$estimate[1] - the.test$estimate[2]
upper.bound <- the.test$conf.int[2]
p <- the.test$p.value
result <- data.table(effect = the.effect, upper_ci = upper.bound,
p = p)
return(result)
}
analyze.experiment(add_to_cart_S2.dat)
## effect upper_ci p
## 1: 0.1033333 Inf 3.001506e-10
B <- 1000
n <- 3000
RNGversion(vstr = 3.6)
set.seed(1031)
Experiment <- 1:B
Group <- c(rep.int(x = "Treatment", times = n/2), rep.int(x = "Control", times = n/2))
sim.dat_r2s2 <- as.data.table(expand.grid(Experiment = Experiment, Group = Group))
setorderv(x = sim.dat_r2s2, cols = c("Experiment", "Group"), order = c(1,1))
sim.dat_r2s2[Group == "Control", ACR := round(x = rbinom(n = .N, size = 1, prob= 0.25), digits = 1)]
sim.dat_r2s2[Group == "Treatment", ACR := round(x = rbinom(n = .N, size = 1, prob = 0.35), digits = 1)]
dim(sim.dat_r2s2)
## [1] 3000000 3
exp.results_r2s2 <- sim.dat_r2s2[, analyze.experiment(the.dat = .SD),
keyby = "Experiment"]
DT::datatable(data = round(x = exp.results_r2s2[1:100, ], digits = 3),
rownames = F)
pvalue = mean(exp.results_r2s2$p)
table_r2s2 <- data.table(Research_Question = "Question 2",
Scenario = "Expected Effect",
Mean_Effect_in_Simulated_Data = mean(exp.results_r2s2$effect),
Ninety_Five_Percent_Confidence_Interval_of_Mean_Effect = mean(exp.results_r2s2$upper_ci),
Percentage_of_False_Positives = "",
Percentage_of_True_Negative = "",
Percentage_of_False_Negative = 1-exp.results_r2s2[, mean(p < 0.05)],
Percentage_of_True_Positives = exp.results_r2s2[, mean(p < 0.05)]
)
table_r2s2
## Research_Question Scenario Mean_Effect_in_Simulated_Data
## 1: Question 2 Expected Effect 0.09999867
## Ninety_Five_Percent_Confidence_Interval_of_Mean_Effect
## 1: Inf
## Percentage_of_False_Positives Percentage_of_True_Negative
## 1:
## Percentage_of_False_Negative Percentage_of_True_Positives
## 1: 0 1
Results = rbind(table_r1s1,table_r1s2,table_r2s1,table_r2s2);
Results %>%
kbl() %>%
kable_styling()
Research_Question | Scenario | Mean_Effect_in_Simulated_Data | Ninety_Five_Percent_Confidence_Interval_of_Mean_Effect | Percentage_of_False_Positives | Percentage_of_True_Negative | Percentage_of_False_Negative | Percentage_of_True_Positives |
---|---|---|---|---|---|---|---|
Question 1 | No Effect | 0.0004700 | Inf | 0.052 | 0.948 | ||
Question 1 | Expected Effect | 0.1002933 | Inf | 0 | 1 | ||
Question 2 | No Effect | 0.0001927 | Inf | 0.06 | 0.94 | ||
Question 2 | Expected Effect | 0.0999987 | Inf | 0 | 1 |