Bar Graphs

Bar graphs are perhaps the most commonly used kind of data visualization. They’re typically used to display numeric values (on the y-axis), for different categories (on the x-axis). For example, a bar graph would be good for showing the prices of four different kinds of items.

 

Making a Basic Bar Graph

Step 1: For preparing a basic bar graph you are going to need a Data frame which is used for storing data tables. It is a list of vectors of equal length.
For example, the following variable df is a data frame containing three vectors n, s, b.

> n = c(2, 3, 5)
> s = c(“aa”, “bb”, “cc”)
> b = c(TRUE, FALSE, TRUE)
> df = data.frame(n, s, b)       # df is a data frame
Step 2: You have a data frame where one column represents the x position of each bar, and another column represents the vertical (y) height of each bar.
Use ggplot() with geom_bar(stat=”identity”) and specify what variables you want on the x- and y-axes.
                           
Here we can see the snapshot of  Rstudio which is a GUI developed for coding the R language. The usual Rstudio screen has four windows:
1. The R script(s) and data view.
2. Workspace, history & Files.
3. Console
3. Plots, packages, help & Viewer.
You can Write your code on the R Script and data view and then you select all of it and run the code by clicking on the run button as highlighted. then you can see in the console below that the code is being executed and on the plots tab you can see the graph with the x-y axes.
Step 3: By default, bar graphs use a very dark grey for the bars. To use a color fill, use fill. Also, by default, there is no outline around the fill. To add an outline, use colour.
Here I have used blue for the fill and black for the outline.
Making Bar Graphs of Counts
You can group the bars by taking a second variable.
Step 1: Install the package and load the library to which your dataset belongs and on which you want to make the graph.
Use the command:
ggplot(diamonds, aes(x=cut)) + geom_bar()
# Equivalent to using geom_bar(stat=”bin”)
 to see the barchart from the diamonds dataset using the ggplot command.

Step 2: You can also have a look at the data from the diamond data set by just writing diamonds on the console and pressing enter will show you the entire table of data. Which is practically not possible as it will exceed the limit and therefore we prefer to write the function head() along with the diamonds so that the a few amount of data is shown from the table from the beginning. You can also type in the tail() to see the data from the end of the table.
Step 3: The variable on the x-axis is discrete. If we use a continuous variable on the x-axis, we’ll get a histogram. We rewrite the command by just replacing the cut by carat on the X-axis.
Using Colors in Bar Graph
Step 1: Map the appropriate variable to the fill aesthetic. We’ll use the uspopchange data set for this example. It contains the percentage change in population for the US states from 2000 to 2010. We’ll take the top 10 fastest-growing states and graph their percentage change. We’ll also color the bars by region (Northeast, South, North Central, or West). For that we use the fill parameter and set it to region as highlighted you can see, I have taken the change in the Y-axis and Abb in the X-axis. All these sorted data is from the subset table created.
Step 2: So now if you want to change the default colors you can do so by using the scale_fill_brewer() or scale_fill_manual().
Note that setting occurs outside of aes(), while mapping occurs within aes(). Now you run the highlighted command and you can see the result in the Plots Window.
Coloring Positive and Negative Bars Differently
Step 1: We’ll use a subset of the climate data and create a new column called pos, which indicateswhether the value is positive or negative:
library(gcookbook) # For the data set
csub <- subset(climate, Source==”Berkeley” & Year >= 1900)
csub$pos <- csub$Anomaly10y >= 0
Once we have the data, we can make the graph and map pos to the fill color. Notice that we use position=”identity” with the bars. This will prevent a warning message about stacking not being well defined for negative numbers:
ggplot(csub, aes(x=Year, y=Anomaly10y, fill=pos)) +
geom_bar(stat=”identity”, position=”identity”)
Then we write these codes in the script and hit the Run.
Step 2: We can change the colors with scale_fill_manual() and remove the legend with guide=FALSE, as shown in Figure 3-12. We’ll also add a thin black outline around each of the bars by setting colour and specifying size, which is the thickness of the outline, in millimeters:
ggplot(csub, aes(x=Year, y=Anomaly10y, fill=pos)) +
geom_bar(stat=”identity”, position=”identity”, colour=”black”, size=0.25) +
scale_fill_manual(values=c(“#CCEEFF”, “#FFDDDD”), guide=FALSE)
After we run this command we get our result in the Plots window.
Adjusting Bar Width and Spacing
Step 1: To make the bars narrower or wider, set width in geom_bar(). The default value is 0.9; larger values make the bars wider, and smaller values make the bars narrower.
code 1 (for standard-width bars)
code 2 (for narrower bars)
code 3 (for wider bars (these have the maximum width of 1))
code 4 (for a grouped bar graph with narrow bars)
code 5 (with some space between the bars)
Adding Labels to a Bar Graph
 Step 1: As we do for each program we have to install the package and add it to the library to use the functions within that package.
The command for adding a package into the library is: #library(package_name)
Here we are using the gcookbook package.
 so to display the label we use the geom_text() function and we can take parameters for the label text.
code:
library(gcookbook) # For the data set
# Below the top
ggplot(cabbage_exp, aes(x=interaction(Date, Cultivar), y=Weight)) +
geom_bar(stat=”identity”) +
geom_text(aes(label=Weight), vjust=1.5, colour=”white”)
# Above the top
ggplot(cabbage_exp, aes(x=interaction(Date, Cultivar), y=Weight)) +
  geom_bar(stat=”identity”) +
  geom_text(aes(label=Weight), vjust=-0.2)
The plot in the image below is the plot for the italics text above.

Step 2: For grouped bar graphs, you also need to specify position=position_dodge() and give it a value for the dodging width. The default dodge width is 0.9. Because the bars are
narrower, you might need to use size to specify a smaller font to make the labels fit. The default value of size is 5, so we’ll make it smaller by using 3.

Here if we add the position parameter as dodge and we remove the interaction from the x-axis we get to see grouped bars. Here the Bars are colored because of the parameter fill on the cultivar.

 

Step 3: Putting labels on stacked bar graphs requires finding the cumulative sum for each stack. To do this, first make sure the data is sorted properly—if it isn’t, the cumulative sum
might be calculated in the wrong order. We’ll use the arrange() function from the plyr package, which automatically gets loaded with ggplot2.

First we arrange the data i.e. sort the data in order to find the cumulative sum, using the code below:

library(plyr)
# Sort by the day and sex columns
ce <- arrange(cabbage_exp, Date, Cultivar)

after the data gets sorted we find the cumulative sum using the code below:

# Get the cumulative sum
ce <- ddply(ce, “Date”, transform, label_y=cumsum(Weight))

After we get the cumulative sum then we are able to plot the bar graph using the code given below:

ggplot(ce, aes(x=Date, y=Weight, fill=Cultivar)) +
geom_bar(stat=”identity”) +
geom_text(aes(y=label_y, label=Weight), vjust=1.5, colour=”white”)

 

In the above image you can see the plot in the right hand side contains the stacked graph and then you can see the labels on the top of a the graph and then the ddply from the plyr package takes a data frame as input and then using split apply combine paradigm of R, it does something to each piece of the data and then combines the results back together again.

Making a Cleveland Dot Plot

Cleveland dot plots are sometimes used instead of bar graphs because they reduce visual clutter and are easier to read.

Step 1: The simple way to create a dot plot is to use the geom_point().

 

 

Step 2: Here we have ordered the dot plots by avg. By default, the items on the given axis will be ordered however is appropriate for the data type. name is a character vector, so it’s
ordered alphabetically. If it were a factor, it would use the order defined in the factor levels. In this case, we want name to be sorted by a different variable, avg.

Here, the reorder function in the below code takes the name column, turns it into a factor, and sorts the factor levels by avg.

Then to enhance the appearance a theming system have been used to turn the horizontal lines to dashed.

 

Step 3: Another way to separate the two groups is to use facets. The order in which the facets are displayed is different from the sorting order. To change the display order, you must change the order of factor levels in the lg variable.

You can change the color of the points by changing the set number in the palette.

 

Leave A Reply

Your email address will not be published. Required fields are marked *