Friday, November 21, 2014

Popular software skills in Data Science job postings


This exercise was done to understand the most popular skills required in job postings found in popular job websites. The skills requirements text is extracted, cleansed and mined. R and its packages tm and arules were used to cleanse and analyze the data. The findings were as follows

The following is the word cloud created out of the job software skill requirements



The following chart gives the relative importance of skills. A frequency of 0.5 means the skill is found in 50% of the postings.




As seen, R, python and sql are the top 3 skills found. Java continues to be a favorite programming language. Interestingly, SQL triumphs hadoop in the skill list. 

Association rules mining was done to find which skills occur together. The following are the results of ARM (rules) on this skill set

 lhs             rhs            support confidence      lift
1   {}           => {sas}        0.3469388  0.3469388 1.0000000
2   {}           => {java}       0.4081633  0.4081633 1.0000000
3   {}           => {hadoop}     0.4693878  0.4693878 1.0000000
4   {}           => {sql}        0.5714286  0.5714286 1.0000000
5   {}           => {python}     0.6326531  0.6326531 1.0000000
6   {}           => {R}          0.7142857  0.7142857 1.0000000
7   {tableau}    => {R}          0.1020408  1.0000000 1.4000000
8   {javascript} => {java}       0.1224490  1.0000000 2.4500000
9   {java}       => {javascript} 0.1224490  0.3000000 2.4500000
10  {javascript} => {sql}        0.1020408  0.8333333 1.4583333
11  {javascript} => {python}     0.1020408  0.8333333 1.3172043
12  {big data}   => {hadoop}     0.1020408  0.7142857 1.5217391
13  {spark}      => {hive}       0.1224490  0.8571429 3.2307692
14  {hive}       => {spark}      0.1224490  0.4615385 3.2307692
15  {spark}      => {hadoop}     0.1224490  0.8571429 1.8260870
16  {spark}      => {R}          0.1020408  0.7142857 1.0000000
17  {perl}       => {sql}        0.1224490  1.0000000 1.7500000
18  {perl}       => {python}     0.1224490  1.0000000 1.5806452
19  {perl}       => {R}          0.1020408  0.8333333 1.1666667
20  {mapreduce}  => {hive}       0.1020408  0.5555556 2.0940171
21  {hive}       => {mapreduce}  0.1020408  0.3846154 2.0940171
22  {mapreduce}  => {hadoop}     0.1632653  0.8888889 1.8937198
23  {hadoop}     => {mapreduce}  0.1632653  0.3478261 1.8937198
24  {mapreduce}  => {R}          0.1224490  0.6666667 0.9333333
25  {ruby}       => {java}       0.1020408  0.6250000 1.5312500
26  {ruby}       => {sql}        0.1632653  1.0000000 1.7500000
27  {ruby}       => {python}     0.1428571  0.8750000 1.3830645
28  {ruby}       => {R}          0.1020408  0.6250000 0.8750000
29  {pig}        => {hive}       0.1428571  0.7777778 2.9316239
30  {hive}       => {pig}        0.1428571  0.5384615 2.9316239
31  {pig}        => {java}       0.1020408  0.5555556 1.3611111
32  {pig}        => {hadoop}     0.1428571  0.7777778 1.6570048
33  {hadoop}     => {pig}        0.1428571  0.3043478 1.6570048
34  {pig}        => {sql}        0.1224490  0.6666667 1.1666667
35  {pig}        => {python}     0.1224490  0.6666667 1.0537634
36  {pig}        => {R}          0.1632653  0.8888889 1.2444444
37  {matlab}     => {hive}       0.1224490  0.4615385 1.7396450
38  {hive}       => {matlab}     0.1224490  0.4615385 1.7396450
39  {matlab}     => {java}       0.1020408  0.3846154 0.9423077
40  {matlab}     => {hadoop}     0.1224490  0.4615385 0.9832776
41  {matlab}     => {sql}        0.1428571  0.5384615 0.9423077
42  {matlab}     => {python}     0.2040816  0.7692308 1.2158809
43  {python}     => {matlab}     0.2040816  0.3225806 1.2158809
44  {matlab}     => {R}          0.2448980  0.9230769 1.2923077
45  {R}          => {matlab}     0.2448980  0.3428571 1.2923077
46  {hive}       => {java}       0.1020408  0.3846154 0.9423077
47  {hive}       => {hadoop}     0.2040816  0.7692308 1.6387960
48  {hadoop}     => {hive}       0.2040816  0.4347826 1.6387960
49  {hive}       => {sql}        0.2040816  0.7692308 1.3461538
50  {sql}        => {hive}       0.2040816  0.3571429 1.3461538
51  {hive}       => {python}     0.1632653  0.6153846 0.9727047
52  {hive}       => {R}          0.2040816  0.7692308 1.0769231
53  {sas}        => {java}       0.1224490  0.3529412 0.8647059
54  {java}       => {sas}        0.1224490  0.3000000 0.8647059
55  {sas}        => {hadoop}     0.1428571  0.4117647 0.8772379
56  {hadoop}     => {sas}        0.1428571  0.3043478 0.8772379
57  {sas}        => {sql}        0.2040816  0.5882353 1.0294118
58  {sql}        => {sas}        0.2040816  0.3571429 1.0294118
59  {sas}        => {python}     0.2040816  0.5882353 0.9297913
60  {python}     => {sas}        0.2040816  0.3225806 0.9297913
61  {sas}        => {R}          0.3061224  0.8823529 1.2352941
62  {R}          => {sas}        0.3061224  0.4285714 1.2352941
63  {java}       => {hadoop}     0.1428571  0.3500000 0.7456522
64  {hadoop}     => {java}       0.1428571  0.3043478 0.7456522
65  {java}       => {sql}        0.2653061  0.6500000 1.1375000
66  {sql}        => {java}       0.2653061  0.4642857 1.1375000
67  {java}       => {python}     0.3469388  0.8500000 1.3435484
68  {python}     => {java}       0.3469388  0.5483871 1.3435484
69  {java}       => {R}          0.3265306  0.8000000 1.1200000
70  {R}          => {java}       0.3265306  0.4571429 1.1200000
71  {hadoop}     => {sql}        0.2448980  0.5217391 0.9130435
72  {sql}        => {hadoop}     0.2448980  0.4285714 0.9130435
73  {hadoop}     => {python}     0.2448980  0.5217391 0.8246844
74  {python}     => {hadoop}     0.2448980  0.3870968 0.8246844
75  {hadoop}     => {R}          0.3265306  0.6956522 0.9739130
76  {R}          => {hadoop}     0.3265306  0.4571429 0.9739130
77  {sql}        => {python}     0.4081633  0.7142857 1.1290323
78  {python}     => {sql}        0.4081633  0.6451613 1.1290323
79  {sql}        => {R}          0.4081633  0.7142857 1.0000000
80  {R}          => {sql}        0.4081633  0.5714286 1.0000000
81  {python}     => {R}          0.5306122  0.8387097 1.1741935
82  {R}          => {python}     0.5306122  0.7428571 1.1741935
83  {java,                                                     
     javascript} => {sql}        0.1020408  0.8333333 1.4583333
84  {javascript,                                               
     sql}        => {java}       0.1020408  1.0000000 2.4500000
85  {java,                                                     
     sql}        => {javascript} 0.1020408  0.3846154 3.1410256
86  {java,                                                     
     javascript} => {python}     0.1020408  0.8333333 1.3172043
87  {javascript,                                               
     python}     => {java}       0.1020408  1.0000000 2.4500000
88  {hive,                                                     
     spark}      => {hadoop}     0.1020408  0.8333333 1.7753623
89  {hadoop,                                                   
     spark}      => {hive}       0.1020408  0.8333333 3.1410256
90  {hadoop,                                                   
     hive}       => {spark}      0.1020408  0.5000000 3.5000000
91  {perl,                                                     
     sql}        => {python}     0.1224490  1.0000000 1.5806452
92  {perl,                                                     
     python}     => {sql}        0.1224490  1.0000000 1.7500000
93  {python,                                                   
     sql}        => {perl}       0.1224490  0.3000000 2.4500000
94  {perl,                                                     
     sql}        => {R}          0.1020408  0.8333333 1.1666667
95  {perl,                                                     
     R}          => {sql}        0.1020408  1.0000000 1.7500000
96  {perl,                                                     
     python}     => {R}          0.1020408  0.8333333 1.1666667
97  {perl,                                                     
     R}          => {python}     0.1020408  1.0000000 1.5806452
98  {hive,                                                     
     mapreduce}  => {hadoop}     0.1020408  1.0000000 2.1304348
99  {hadoop,                                                   
     mapreduce}  => {hive}       0.1020408  0.6250000 2.3557692
100 {hadoop,                                                   
     hive}       => {mapreduce}  0.1020408  0.5000000 2.7222222





59 comments:

  1. Hi Kumaran,

    How did you conduct the study? How many sites did you visit, and was there a time range for selecting them? I'm also interested in knowing what keywords you used for the search.

    Thanks,

    Brad

    ReplyDelete
  2. R and python enough for analysis..?

    ReplyDelete
  3. Sorting the ARM output would make it more meaningful. Thanks.

    ReplyDelete
  4. The expansion of internet and other business intelligence leads to large volume of data. Industries are looking for talented professionals to maintain and process huge volume of data with latest tools available in the market. Taking Hadoop Training in Chennai | Big Data Training in Chennai will ensure better career prospects for talented professionals.

    ReplyDelete
  5. There is a huge demand for professional big data analysts who are able to use the software which is used to process the big data in order to get accurate results. MNC's are looking for professionals who can process their data so that they can get into a accurate business decision which would eventually help them to earn more profits, they can serve their customers better, and their risk is lowered.
    big data training in chennai|big data training|big data course in chennai|big data training chennai|big data hadoop training in chennai

    ReplyDelete
  6. Thanks for sharing informative article on java application development. Your post helped to understand the career in Java. Java Training in Chennai

    ReplyDelete
  7. Thanks for sharing this information in here. As a fresher any one can also go for the below training programs.
    CCNA Training in Chennai

    ReplyDelete
  8. Your article about software testing is awesome. It helped me to understand the career prospects in software testing industry. software testing training in Chennai

    ReplyDelete

  9. I actually enjoyed reading through this posting.Many thanks.

    Hadoop Training in Chennai

    ReplyDelete
  10. Thanks for sharing informative article on java application development
    be projects in chennai

    ReplyDelete

  11. Thanks for giving important information to training seekers,Keep posting useful information,Click below to find

    Data Visualization Training Institutes in Chennai Trichy

    ReplyDelete
  12. how was the data for this analysis obtained. Can you through some light on that.

    ReplyDelete
  13. To be an Data Scientist, any one of the programming language is enough.
    But it is better to have R or Phyton any of this for Data Science.
    http://eonlinetraining.co/course/data-science-online-training/

    ReplyDelete
  14. Thx For Sharing Nice Information

    RS Trainings Offers Hadoop Online Training in Hyderabad by real-time experts for more details

    ReplyDelete
  15. Just found your post by searching on the Google, I am Impressed and Learned Lot of new thing from your post. I am new to blogging and always try to learn new skill as I believe that blogging is the full time job for learning new things day by day.
    "Emergers Technologies"

    ReplyDelete
  16. Big Data and Data Science Course Material. Avail 15 Day Free Trial! Learn Flume, Sqoop, Pig, Hive, MapReduce, Yarn & More. Get Certified By Experts! big data online training

    ReplyDelete
  17. This is extremely great information for these blog!! And Very good work. It is very interesting to learn from to easy understood. Thank you for giving information. Please let us know and more information get post to link.
    Hadoop Training in Chennai

    ReplyDelete
  18. The great service in this blog and the nice technology is visible in this blog. I am really very happy for the nice approach is visible in this blog and thank you very much for using the nice technology in this blog
    Data Science Online Training

    ReplyDelete
  19. Free easy & simple way to learn java online and much more.. go to =>> http://foundjava.blogspot.in

    ReplyDelete
  20. Great post!I am actually getting ready to across this information,i am very happy to this commands.Also great blog here with all of the valuable information you have.Well done,its a great knowledge.
    Python Training in Chennai

    ReplyDelete
  21. Thanks for sharing a good article....it is very nice and informative blog.
    Hadoop Training in Hyderabad

    ReplyDelete
  22. It is amazing and wonderful to visit your site.Thanks for sharing this information,this is useful to me...
    Android Training in Chennai
    Ios Training in Chennai

    ReplyDelete
  23. Nice article, Great post, and thanks for your valuable information and time.
    IOS Training in chennai

    ReplyDelete
  24. It is amazing and wonderful to visit your site.Thanks for sharing this information,this is useful to me...
    Android Training in Chennai
    Ios Training in Chennai

    ReplyDelete
  25. This is an awesome post.Really very informative and creative contents. These concept is a good way to enhance the knowledge.I like it and help me to development very well.Thank you for this brief explanation and very nice information.Well, got a good knowledge.
    Data Science Online Training

    Hadoop Online Training

    ReplyDelete
  26. Nice tutorial iam reading your article really I am impresses it is very helpful for providing a lot of information about Datascience …. Thank you Please keep share some more…………

    ReplyDelete
  27. Thanks for sharing informative information about datascience data science online training in hyderabad

    ReplyDelete
  28. Before choosing a Job Oriented Training program it is important to evaluate your skills, interests, strength and weakness. Job Oriented Courses enable you to get a identity once you finish the same. Choose eNvent software Technology that suits you and make your career worthwhile.

    ReplyDelete
  29. I would suggest to take training from someone who is working in real time. Let me tell my story, I was working as a software engineer for a company. After 5 years it was very hard for me to move to other company as I my knowledge is very less. So I thought to change my platform to get new skills and new package in future. After continuous research I decided to take Hadoop training. So I googled on internet for best institute to learn Hadoop, shortlisted SV Soft Solutions institute and attended demo session, impressed to the trainer demo and joined. The course duration was 3 months. The trainer has great knowledge and he explained real time scenarios and taught real time project. I was able to clear my interview with great package. And finally moved to new company.

    You can also reach SV Soft Solutions http://www.svsoftsolutions.com,
    The trainer contact number is +1-845-915-8712, +91-9642373173

    ReplyDelete
  30. This comment has been removed by the author.

    ReplyDelete
  31. Free easy & simple way to learn programming online we provide niit projects, assignments, cycle tests and much more..
    visit===>> http://foundjava.blogspot.in/

    ReplyDelete
  32. This is an awesome post.Really very informative and creative contents. These concept is a good way to enhance the knowledge.I like it and help me to development very well.Thank you for this brief explanation and very nice information.Well, got a good knowledge.


    Hadoop Training in Marathahalli|
    Hadoop Training in Bangalore|
    Data science training in Marathahalli|
    Data science training in Bangalore|

    ReplyDelete
  33. Besant Technologies is provide a Complete PRACTICAL Training and Complete REALTIME Training exclusively on Selenium, QTP and Jmeter. Study Material and Practice Material included during the course. We are proud to mention that we are the Best institutes to provide Software Testing COMPLETE PRACTICAL REALTIME TRAINING and hence, closely working with various companies in India. So join us besant technologies. AWS Training in Bangalore |
    DataScience Training in Bangalore |

    ReplyDelete
  34. Thanks for sharing, Reach your Goal & finally get a successful future. If you want to learn a high level language……………
    Data Science Training in BTM Layout
    Java Training in Marathahalli

    ReplyDelete
  35. This blog makes me perfect to understand widely about Big data and Data science.Best Hadoop Training Institute in Chennai

    ReplyDelete
  36. The Active Wizards website accentuate the most popular skills are fundamental knowledge of mathematics, statistics, computer science and machine learning. In addition to theoretical knowledge, Data Scientist should be able to "extract," clean, simulate and visualize data. Experience in software development and quality management is also important.

    ReplyDelete
  37. Revanth Technologies is a vast experienced online training center in Hyderabad, India since 2006, with highly qualified and real time experienced faculties, offers Python online training with real time project scenarios.

    In the course training we are covering Types and Operations,Statements and Syntax,Functions,Modules,Classes and OOP, Exceptions and Tools etc..

    For more details please contact: 9290971883
    Mail id: revanthonlinetraining@gmail.com


    For course content and more details please visit
    http://www.revanthtechnologies.com/python-online-training-from-india.php

    ReplyDelete
  38. Data Science is one of the best subject to be learnt and this blog clearly explains the best popular software skills in data science job posting While I was having my PMP Training in Kuwait I heard a lot about DataScience I will also learn about the subject in the coming years and gain more knowledge on it with having my PMP Certified and Expertized Knowledge Thankyou so much for providing the information looking up for more Updates from you.

    ReplyDelete
  39. Excellent Article
    Leading Local Search Engine in India
    you can Find AC Mechanic in Chennai
    you can Find Automobile Batteries Chennai
    you can Find Beauty and Spa Chennai
    you can Find Best Bike Mechanics Chennai
    you can Find Leading Call Taxi Chennai
    you can Find 24 Hours Pharmacy Chennai
    You can find All your requirements in call360 for more details & search Visit http://www.call360.in

    ReplyDelete
  40. Telephone and Phone Answering Service I would like to thank you for your nicely written post, its informative and your writing style encouraged me to read it till end. Thanks

    ReplyDelete
  41. Appreciate your work, very informative blog on DataScience. I just wanted to share information about DataScience Online Training. Hope it helps community here.

    ReplyDelete