I am a new R user, so please forgive me if my question seems simple about the cookbook and booklet of statistical analysis. Despite research in, I am unable to create a special graph for my liking.
I am trying to graph the two columns. Age and income is the age of an integer value (40, 34, 50, ...), whereas the binary values in the earnings (& lt; = 50k , & Gt; = 50k). There are 32561 rows of data with different ages. I want to make a plot for my Y-axis, plot (age, income) as X-axis and income binary variable. Of course this leads to two parallel lines with a conspiration because the income is a binary variable which is fine. The information I am trying to get from this conspiracy is the number of people of any age who comes in any of the income bucket. The way I would like to do this, the size of the circle should be according to the number of people of the fixed age within each income class. For example, if at the age of 25 there were 700 people and the weighting was in = 50k bracket and 150 which fell into other brackets, then the size of the two points would be different depending on the number of people. Therefore, 700 people have been & lt; = 50k falling in bucket will be represented by a large circle and later a very small circle I want to do this for all ages ... I hope this makes sense. Please tell me whether clarification is required or No. Thanks! I'm sure you will not hear me again far in the future.
The answer is easy to answer with the example data, but in this case, with that problem Was quite easy for:
age = rep (c (20, 30, 40, 50, 60), 20) income = c (representative ("> 50k", 80) First, we prepare a summary of statistics, age and income ("50k", 20)) df1 = data.frame (age = age, income = income) In each combination, people get count : Library (plyr) df1_summary = ddply : < / Code> former> ggplot (df1_summary, aes (age, income, size = count)) + geom_point ()
No comments:
Post a Comment