Friday, November 21, 2014

Popular software skills in Data Science job postings


This exercise was done to understand the most popular skills required in job postings found in popular job websites. The skills requirements text is extracted, cleansed and mined. R and its packages tm and arules were used to cleanse and analyze the data. The findings were as follows

The following is the word cloud created out of the job software skill requirements



The following chart gives the relative importance of skills. A frequency of 0.5 means the skill is found in 50% of the postings.




As seen, R, python and sql are the top 3 skills found. Java continues to be a favorite programming language. Interestingly, SQL triumphs hadoop in the skill list. 

Association rules mining was done to find which skills occur together. The following are the results of ARM (rules) on this skill set

 lhs             rhs            support confidence      lift
1   {}           => {sas}        0.3469388  0.3469388 1.0000000
2   {}           => {java}       0.4081633  0.4081633 1.0000000
3   {}           => {hadoop}     0.4693878  0.4693878 1.0000000
4   {}           => {sql}        0.5714286  0.5714286 1.0000000
5   {}           => {python}     0.6326531  0.6326531 1.0000000
6   {}           => {R}          0.7142857  0.7142857 1.0000000
7   {tableau}    => {R}          0.1020408  1.0000000 1.4000000
8   {javascript} => {java}       0.1224490  1.0000000 2.4500000
9   {java}       => {javascript} 0.1224490  0.3000000 2.4500000
10  {javascript} => {sql}        0.1020408  0.8333333 1.4583333
11  {javascript} => {python}     0.1020408  0.8333333 1.3172043
12  {big data}   => {hadoop}     0.1020408  0.7142857 1.5217391
13  {spark}      => {hive}       0.1224490  0.8571429 3.2307692
14  {hive}       => {spark}      0.1224490  0.4615385 3.2307692
15  {spark}      => {hadoop}     0.1224490  0.8571429 1.8260870
16  {spark}      => {R}          0.1020408  0.7142857 1.0000000
17  {perl}       => {sql}        0.1224490  1.0000000 1.7500000
18  {perl}       => {python}     0.1224490  1.0000000 1.5806452
19  {perl}       => {R}          0.1020408  0.8333333 1.1666667
20  {mapreduce}  => {hive}       0.1020408  0.5555556 2.0940171
21  {hive}       => {mapreduce}  0.1020408  0.3846154 2.0940171
22  {mapreduce}  => {hadoop}     0.1632653  0.8888889 1.8937198
23  {hadoop}     => {mapreduce}  0.1632653  0.3478261 1.8937198
24  {mapreduce}  => {R}          0.1224490  0.6666667 0.9333333
25  {ruby}       => {java}       0.1020408  0.6250000 1.5312500
26  {ruby}       => {sql}        0.1632653  1.0000000 1.7500000
27  {ruby}       => {python}     0.1428571  0.8750000 1.3830645
28  {ruby}       => {R}          0.1020408  0.6250000 0.8750000
29  {pig}        => {hive}       0.1428571  0.7777778 2.9316239
30  {hive}       => {pig}        0.1428571  0.5384615 2.9316239
31  {pig}        => {java}       0.1020408  0.5555556 1.3611111
32  {pig}        => {hadoop}     0.1428571  0.7777778 1.6570048
33  {hadoop}     => {pig}        0.1428571  0.3043478 1.6570048
34  {pig}        => {sql}        0.1224490  0.6666667 1.1666667
35  {pig}        => {python}     0.1224490  0.6666667 1.0537634
36  {pig}        => {R}          0.1632653  0.8888889 1.2444444
37  {matlab}     => {hive}       0.1224490  0.4615385 1.7396450
38  {hive}       => {matlab}     0.1224490  0.4615385 1.7396450
39  {matlab}     => {java}       0.1020408  0.3846154 0.9423077
40  {matlab}     => {hadoop}     0.1224490  0.4615385 0.9832776
41  {matlab}     => {sql}        0.1428571  0.5384615 0.9423077
42  {matlab}     => {python}     0.2040816  0.7692308 1.2158809
43  {python}     => {matlab}     0.2040816  0.3225806 1.2158809
44  {matlab}     => {R}          0.2448980  0.9230769 1.2923077
45  {R}          => {matlab}     0.2448980  0.3428571 1.2923077
46  {hive}       => {java}       0.1020408  0.3846154 0.9423077
47  {hive}       => {hadoop}     0.2040816  0.7692308 1.6387960
48  {hadoop}     => {hive}       0.2040816  0.4347826 1.6387960
49  {hive}       => {sql}        0.2040816  0.7692308 1.3461538
50  {sql}        => {hive}       0.2040816  0.3571429 1.3461538
51  {hive}       => {python}     0.1632653  0.6153846 0.9727047
52  {hive}       => {R}          0.2040816  0.7692308 1.0769231
53  {sas}        => {java}       0.1224490  0.3529412 0.8647059
54  {java}       => {sas}        0.1224490  0.3000000 0.8647059
55  {sas}        => {hadoop}     0.1428571  0.4117647 0.8772379
56  {hadoop}     => {sas}        0.1428571  0.3043478 0.8772379
57  {sas}        => {sql}        0.2040816  0.5882353 1.0294118
58  {sql}        => {sas}        0.2040816  0.3571429 1.0294118
59  {sas}        => {python}     0.2040816  0.5882353 0.9297913
60  {python}     => {sas}        0.2040816  0.3225806 0.9297913
61  {sas}        => {R}          0.3061224  0.8823529 1.2352941
62  {R}          => {sas}        0.3061224  0.4285714 1.2352941
63  {java}       => {hadoop}     0.1428571  0.3500000 0.7456522
64  {hadoop}     => {java}       0.1428571  0.3043478 0.7456522
65  {java}       => {sql}        0.2653061  0.6500000 1.1375000
66  {sql}        => {java}       0.2653061  0.4642857 1.1375000
67  {java}       => {python}     0.3469388  0.8500000 1.3435484
68  {python}     => {java}       0.3469388  0.5483871 1.3435484
69  {java}       => {R}          0.3265306  0.8000000 1.1200000
70  {R}          => {java}       0.3265306  0.4571429 1.1200000
71  {hadoop}     => {sql}        0.2448980  0.5217391 0.9130435
72  {sql}        => {hadoop}     0.2448980  0.4285714 0.9130435
73  {hadoop}     => {python}     0.2448980  0.5217391 0.8246844
74  {python}     => {hadoop}     0.2448980  0.3870968 0.8246844
75  {hadoop}     => {R}          0.3265306  0.6956522 0.9739130
76  {R}          => {hadoop}     0.3265306  0.4571429 0.9739130
77  {sql}        => {python}     0.4081633  0.7142857 1.1290323
78  {python}     => {sql}        0.4081633  0.6451613 1.1290323
79  {sql}        => {R}          0.4081633  0.7142857 1.0000000
80  {R}          => {sql}        0.4081633  0.5714286 1.0000000
81  {python}     => {R}          0.5306122  0.8387097 1.1741935
82  {R}          => {python}     0.5306122  0.7428571 1.1741935
83  {java,                                                     
     javascript} => {sql}        0.1020408  0.8333333 1.4583333
84  {javascript,                                               
     sql}        => {java}       0.1020408  1.0000000 2.4500000
85  {java,                                                     
     sql}        => {javascript} 0.1020408  0.3846154 3.1410256
86  {java,                                                     
     javascript} => {python}     0.1020408  0.8333333 1.3172043
87  {javascript,                                               
     python}     => {java}       0.1020408  1.0000000 2.4500000
88  {hive,                                                     
     spark}      => {hadoop}     0.1020408  0.8333333 1.7753623
89  {hadoop,                                                   
     spark}      => {hive}       0.1020408  0.8333333 3.1410256
90  {hadoop,                                                   
     hive}       => {spark}      0.1020408  0.5000000 3.5000000
91  {perl,                                                     
     sql}        => {python}     0.1224490  1.0000000 1.5806452
92  {perl,                                                     
     python}     => {sql}        0.1224490  1.0000000 1.7500000
93  {python,                                                   
     sql}        => {perl}       0.1224490  0.3000000 2.4500000
94  {perl,                                                     
     sql}        => {R}          0.1020408  0.8333333 1.1666667
95  {perl,                                                     
     R}          => {sql}        0.1020408  1.0000000 1.7500000
96  {perl,                                                     
     python}     => {R}          0.1020408  0.8333333 1.1666667
97  {perl,                                                     
     R}          => {python}     0.1020408  1.0000000 1.5806452
98  {hive,                                                     
     mapreduce}  => {hadoop}     0.1020408  1.0000000 2.1304348
99  {hadoop,                                                   
     mapreduce}  => {hive}       0.1020408  0.6250000 2.3557692
100 {hadoop,                                                   
     hive}       => {mapreduce}  0.1020408  0.5000000 2.7222222





30 comments:

  1. Hi Kumaran,

    How did you conduct the study? How many sites did you visit, and was there a time range for selecting them? I'm also interested in knowing what keywords you used for the search.

    Thanks,

    Brad

    ReplyDelete
  2. R and python enough for analysis..?

    ReplyDelete
  3. Sorting the ARM output would make it more meaningful. Thanks.

    ReplyDelete
  4. The expansion of internet and other business intelligence leads to large volume of data. Industries are looking for talented professionals to maintain and process huge volume of data with latest tools available in the market. Taking Hadoop Training in Chennai | Big Data Training in Chennai will ensure better career prospects for talented professionals.

    ReplyDelete
  5. There is a huge demand for professional big data analysts who are able to use the software which is used to process the big data in order to get accurate results. MNC's are looking for professionals who can process their data so that they can get into a accurate business decision which would eventually help them to earn more profits, they can serve their customers better, and their risk is lowered.
    big data training in chennai|big data training|big data course in chennai|big data training chennai|big data hadoop training in chennai

    ReplyDelete
  6. Thanks for sharing informative article on java application development. Your post helped to understand the career in Java. Java Training in Chennai

    ReplyDelete
  7. Thanks for sharing this information in here. As a fresher any one can also go for the below training programs.
    CCNA Training in Chennai

    ReplyDelete
  8. Your article about software testing is awesome. It helped me to understand the career prospects in software testing industry. software testing training in Chennai

    ReplyDelete

  9. I actually enjoyed reading through this posting.Many thanks.

    Hadoop Training in Chennai

    ReplyDelete
  10. Thanks for sharing informative article on java application development
    be projects in chennai

    ReplyDelete

  11. Thanks for giving important information to training seekers,Keep posting useful information,Click below to find

    Data Visualization Training Institutes in Chennai Trichy

    ReplyDelete
  12. how was the data for this analysis obtained. Can you through some light on that.

    ReplyDelete
  13. To be an Data Scientist, any one of the programming language is enough.
    But it is better to have R or Phyton any of this for Data Science.
    http://eonlinetraining.co/course/data-science-online-training/

    ReplyDelete
  14. Thx For Sharing Nice Information

    RS Trainings Offers Hadoop Online Training in Hyderabad by real-time experts for more details

    ReplyDelete
  15. Just found your post by searching on the Google, I am Impressed and Learned Lot of new thing from your post. I am new to blogging and always try to learn new skill as I believe that blogging is the full time job for learning new things day by day.
    "Emergers Technologies"

    ReplyDelete
  16. Big Data and Data Science Course Material. Avail 15 Day Free Trial! Learn Flume, Sqoop, Pig, Hive, MapReduce, Yarn & More. Get Certified By Experts! big data online training

    ReplyDelete
  17. This is extremely great information for these blog!! And Very good work. It is very interesting to learn from to easy understood. Thank you for giving information. Please let us know and more information get post to link.
    Hadoop Training in Chennai

    ReplyDelete
  18. The great service in this blog and the nice technology is visible in this blog. I am really very happy for the nice approach is visible in this blog and thank you very much for using the nice technology in this blog
    Data Science Online Training

    ReplyDelete
  19. Free easy & simple way to learn java online and much more.. go to =>> http://foundjava.blogspot.in

    ReplyDelete
  20. Great post!I am actually getting ready to across this information,i am very happy to this commands.Also great blog here with all of the valuable information you have.Well done,its a great knowledge.
    Python Training in Chennai

    ReplyDelete
  21. Thanks for sharing a good article....it is very nice and informative blog.
    Hadoop Training in Hyderabad

    ReplyDelete
  22. It is amazing and wonderful to visit your site.Thanks for sharing this information,this is useful to me...
    Android Training in Chennai
    Ios Training in Chennai

    ReplyDelete
  23. Nice article, Great post, and thanks for your valuable information and time.
    IOS Training in chennai

    ReplyDelete