Friday, November 21, 2014

Popular software skills in Data Science job postings

This exercise was done to understand the most popular skills required in job postings found in popular job websites. The skills requirements text is extracted, cleansed and mined. R and its packages tm and arules were used to cleanse and analyze the data. The findings were as follows

The following is the word cloud created out of the job software skill requirements

The following chart gives the relative importance of skills. A frequency of 0.5 means the skill is found in 50% of the postings.

As seen, R, python and sql are the top 3 skills found. Java continues to be a favorite programming language. Interestingly, SQL triumphs hadoop in the skill list. 

Association rules mining was done to find which skills occur together. The following are the results of ARM (rules) on this skill set

 lhs             rhs            support confidence      lift
1   {}           => {sas}        0.3469388  0.3469388 1.0000000
2   {}           => {java}       0.4081633  0.4081633 1.0000000
3   {}           => {hadoop}     0.4693878  0.4693878 1.0000000
4   {}           => {sql}        0.5714286  0.5714286 1.0000000
5   {}           => {python}     0.6326531  0.6326531 1.0000000
6   {}           => {R}          0.7142857  0.7142857 1.0000000
7   {tableau}    => {R}          0.1020408  1.0000000 1.4000000
8   {javascript} => {java}       0.1224490  1.0000000 2.4500000
9   {java}       => {javascript} 0.1224490  0.3000000 2.4500000
10  {javascript} => {sql}        0.1020408  0.8333333 1.4583333
11  {javascript} => {python}     0.1020408  0.8333333 1.3172043
12  {big data}   => {hadoop}     0.1020408  0.7142857 1.5217391
13  {spark}      => {hive}       0.1224490  0.8571429 3.2307692
14  {hive}       => {spark}      0.1224490  0.4615385 3.2307692
15  {spark}      => {hadoop}     0.1224490  0.8571429 1.8260870
16  {spark}      => {R}          0.1020408  0.7142857 1.0000000
17  {perl}       => {sql}        0.1224490  1.0000000 1.7500000
18  {perl}       => {python}     0.1224490  1.0000000 1.5806452
19  {perl}       => {R}          0.1020408  0.8333333 1.1666667
20  {mapreduce}  => {hive}       0.1020408  0.5555556 2.0940171
21  {hive}       => {mapreduce}  0.1020408  0.3846154 2.0940171
22  {mapreduce}  => {hadoop}     0.1632653  0.8888889 1.8937198
23  {hadoop}     => {mapreduce}  0.1632653  0.3478261 1.8937198
24  {mapreduce}  => {R}          0.1224490  0.6666667 0.9333333
25  {ruby}       => {java}       0.1020408  0.6250000 1.5312500
26  {ruby}       => {sql}        0.1632653  1.0000000 1.7500000
27  {ruby}       => {python}     0.1428571  0.8750000 1.3830645
28  {ruby}       => {R}          0.1020408  0.6250000 0.8750000
29  {pig}        => {hive}       0.1428571  0.7777778 2.9316239
30  {hive}       => {pig}        0.1428571  0.5384615 2.9316239
31  {pig}        => {java}       0.1020408  0.5555556 1.3611111
32  {pig}        => {hadoop}     0.1428571  0.7777778 1.6570048
33  {hadoop}     => {pig}        0.1428571  0.3043478 1.6570048
34  {pig}        => {sql}        0.1224490  0.6666667 1.1666667
35  {pig}        => {python}     0.1224490  0.6666667 1.0537634
36  {pig}        => {R}          0.1632653  0.8888889 1.2444444
37  {matlab}     => {hive}       0.1224490  0.4615385 1.7396450
38  {hive}       => {matlab}     0.1224490  0.4615385 1.7396450
39  {matlab}     => {java}       0.1020408  0.3846154 0.9423077
40  {matlab}     => {hadoop}     0.1224490  0.4615385 0.9832776
41  {matlab}     => {sql}        0.1428571  0.5384615 0.9423077
42  {matlab}     => {python}     0.2040816  0.7692308 1.2158809
43  {python}     => {matlab}     0.2040816  0.3225806 1.2158809
44  {matlab}     => {R}          0.2448980  0.9230769 1.2923077
45  {R}          => {matlab}     0.2448980  0.3428571 1.2923077
46  {hive}       => {java}       0.1020408  0.3846154 0.9423077
47  {hive}       => {hadoop}     0.2040816  0.7692308 1.6387960
48  {hadoop}     => {hive}       0.2040816  0.4347826 1.6387960
49  {hive}       => {sql}        0.2040816  0.7692308 1.3461538
50  {sql}        => {hive}       0.2040816  0.3571429 1.3461538
51  {hive}       => {python}     0.1632653  0.6153846 0.9727047
52  {hive}       => {R}          0.2040816  0.7692308 1.0769231
53  {sas}        => {java}       0.1224490  0.3529412 0.8647059
54  {java}       => {sas}        0.1224490  0.3000000 0.8647059
55  {sas}        => {hadoop}     0.1428571  0.4117647 0.8772379
56  {hadoop}     => {sas}        0.1428571  0.3043478 0.8772379
57  {sas}        => {sql}        0.2040816  0.5882353 1.0294118
58  {sql}        => {sas}        0.2040816  0.3571429 1.0294118
59  {sas}        => {python}     0.2040816  0.5882353 0.9297913
60  {python}     => {sas}        0.2040816  0.3225806 0.9297913
61  {sas}        => {R}          0.3061224  0.8823529 1.2352941
62  {R}          => {sas}        0.3061224  0.4285714 1.2352941
63  {java}       => {hadoop}     0.1428571  0.3500000 0.7456522
64  {hadoop}     => {java}       0.1428571  0.3043478 0.7456522
65  {java}       => {sql}        0.2653061  0.6500000 1.1375000
66  {sql}        => {java}       0.2653061  0.4642857 1.1375000
67  {java}       => {python}     0.3469388  0.8500000 1.3435484
68  {python}     => {java}       0.3469388  0.5483871 1.3435484
69  {java}       => {R}          0.3265306  0.8000000 1.1200000
70  {R}          => {java}       0.3265306  0.4571429 1.1200000
71  {hadoop}     => {sql}        0.2448980  0.5217391 0.9130435
72  {sql}        => {hadoop}     0.2448980  0.4285714 0.9130435
73  {hadoop}     => {python}     0.2448980  0.5217391 0.8246844
74  {python}     => {hadoop}     0.2448980  0.3870968 0.8246844
75  {hadoop}     => {R}          0.3265306  0.6956522 0.9739130
76  {R}          => {hadoop}     0.3265306  0.4571429 0.9739130
77  {sql}        => {python}     0.4081633  0.7142857 1.1290323
78  {python}     => {sql}        0.4081633  0.6451613 1.1290323
79  {sql}        => {R}          0.4081633  0.7142857 1.0000000
80  {R}          => {sql}        0.4081633  0.5714286 1.0000000
81  {python}     => {R}          0.5306122  0.8387097 1.1741935
82  {R}          => {python}     0.5306122  0.7428571 1.1741935
83  {java,                                                     
     javascript} => {sql}        0.1020408  0.8333333 1.4583333
84  {javascript,                                               
     sql}        => {java}       0.1020408  1.0000000 2.4500000
85  {java,                                                     
     sql}        => {javascript} 0.1020408  0.3846154 3.1410256
86  {java,                                                     
     javascript} => {python}     0.1020408  0.8333333 1.3172043
87  {javascript,                                               
     python}     => {java}       0.1020408  1.0000000 2.4500000
88  {hive,                                                     
     spark}      => {hadoop}     0.1020408  0.8333333 1.7753623
89  {hadoop,                                                   
     spark}      => {hive}       0.1020408  0.8333333 3.1410256
90  {hadoop,                                                   
     hive}       => {spark}      0.1020408  0.5000000 3.5000000
91  {perl,                                                     
     sql}        => {python}     0.1224490  1.0000000 1.5806452
92  {perl,                                                     
     python}     => {sql}        0.1224490  1.0000000 1.7500000
93  {python,                                                   
     sql}        => {perl}       0.1224490  0.3000000 2.4500000
94  {perl,                                                     
     sql}        => {R}          0.1020408  0.8333333 1.1666667
95  {perl,                                                     
     R}          => {sql}        0.1020408  1.0000000 1.7500000
96  {perl,                                                     
     python}     => {R}          0.1020408  0.8333333 1.1666667
97  {perl,                                                     
     R}          => {python}     0.1020408  1.0000000 1.5806452
98  {hive,                                                     
     mapreduce}  => {hadoop}     0.1020408  1.0000000 2.1304348
99  {hadoop,                                                   
     mapreduce}  => {hive}       0.1020408  0.6250000 2.3557692
100 {hadoop,                                                   
     hive}       => {mapreduce}  0.1020408  0.5000000 2.7222222


  1. Hi Kumaran,

    How did you conduct the study? How many sites did you visit, and was there a time range for selecting them? I'm also interested in knowing what keywords you used for the search.



  2. R and python enough for analysis..?

  3. Sorting the ARM output would make it more meaningful. Thanks.

  4. The expansion of internet and other business intelligence leads to large volume of data. Industries are looking for talented professionals to maintain and process huge volume of data with latest tools available in the market. Taking Hadoop Training in Chennai | Big Data Training in Chennai will ensure better career prospects for talented professionals.

  5. There is a huge demand for professional big data analysts who are able to use the software which is used to process the big data in order to get accurate results. MNC's are looking for professionals who can process their data so that they can get into a accurate business decision which would eventually help them to earn more profits, they can serve their customers better, and their risk is lowered.
    big data training in chennai|big data training|big data course in chennai|big data training chennai|big data hadoop training in chennai

  6. Thanks for sharing informative article on java application development. Your post helped to understand the career in Java. Java Training in Chennai

  7. Thanks for sharing this information in here. As a fresher any one can also go for the below training programs.
    CCNA Training in Chennai

  8. Your article about software testing is awesome. It helped me to understand the career prospects in software testing industry. software testing training in Chennai


  9. I actually enjoyed reading through this posting.Many thanks.

    Hadoop Training in Chennai

  10. Thanks for sharing informative article on java application development
    be projects in chennai


  11. Thanks for giving important information to training seekers,Keep posting useful information,Click below to find

    Data Visualization Training Institutes in Chennai Trichy

  12. how was the data for this analysis obtained. Can you through some light on that.

  13. To be an Data Scientist, any one of the programming language is enough.
    But it is better to have R or Phyton any of this for Data Science.

  14. Thx For Sharing Nice Information

    RS Trainings Offers Hadoop Online Training in Hyderabad by real-time experts for more details

  15. Just found your post by searching on the Google, I am Impressed and Learned Lot of new thing from your post. I am new to blogging and always try to learn new skill as I believe that blogging is the full time job for learning new things day by day.
    "Emergers Technologies"

  16. Big Data and Data Science Course Material. Avail 15 Day Free Trial! Learn Flume, Sqoop, Pig, Hive, MapReduce, Yarn & More. Get Certified By Experts! big data online training

  17. This is extremely great information for these blog!! And Very good work. It is very interesting to learn from to easy understood. Thank you for giving information. Please let us know and more information get post to link.
    Hadoop Training in Chennai

  18. The great service in this blog and the nice technology is visible in this blog. I am really very happy for the nice approach is visible in this blog and thank you very much for using the nice technology in this blog
    Data Science Online Training

  19. Free easy & simple way to learn java online and much more.. go to =>>

  20. Great post!I am actually getting ready to across this information,i am very happy to this commands.Also great blog here with all of the valuable information you have.Well done,its a great knowledge.
    Python Training in Chennai

  21. Thanks for sharing a good is very nice and informative blog.
    Hadoop Training in Hyderabad

  22. It is amazing and wonderful to visit your site.Thanks for sharing this information,this is useful to me...
    Android Training in Chennai
    Ios Training in Chennai

  23. Nice article, Great post, and thanks for your valuable information and time.
    IOS Training in chennai

  24. It is amazing and wonderful to visit your site.Thanks for sharing this information,this is useful to me...
    Android Training in Chennai
    Ios Training in Chennai

  25. This is an awesome post.Really very informative and creative contents. These concept is a good way to enhance the knowledge.I like it and help me to development very well.Thank you for this brief explanation and very nice information.Well, got a good knowledge.
    Data Science Online Training

    Hadoop Online Training

  26. Nice tutorial iam reading your article really I am impresses it is very helpful for providing a lot of information about Datascience …. Thank you Please keep share some more…………

  27. Thanks for sharing informative information about datascience data science online training in hyderabad

  28. Before choosing a Job Oriented Training program it is important to evaluate your skills, interests, strength and weakness. Job Oriented Courses enable you to get a identity once you finish the same. Choose eNvent software Technology that suits you and make your career worthwhile.

  29. I would suggest to take training from someone who is working in real time. Let me tell my story, I was working as a software engineer for a company. After 5 years it was very hard for me to move to other company as I my knowledge is very less. So I thought to change my platform to get new skills and new package in future. After continuous research I decided to take Hadoop training. So I googled on internet for best institute to learn Hadoop, shortlisted SV Soft Solutions institute and attended demo session, impressed to the trainer demo and joined. The course duration was 3 months. The trainer has great knowledge and he explained real time scenarios and taught real time project. I was able to clear my interview with great package. And finally moved to new company.

    You can also reach SV Soft Solutions,
    The trainer contact number is +1-845-915-8712, +91-9642373173

  30. This comment has been removed by the author.

  31. Free easy & simple way to learn programming online we provide niit projects, assignments, cycle tests and much more..

  32. This is an awesome post.Really very informative and creative contents. These concept is a good way to enhance the knowledge.I like it and help me to development very well.Thank you for this brief explanation and very nice information.Well, got a good knowledge.

    Hadoop Training in Marathahalli|
    Hadoop Training in Bangalore|
    Data science training in Marathahalli|
    Data science training in Bangalore|

  33. Besant Technologies is provide a Complete PRACTICAL Training and Complete REALTIME Training exclusively on Selenium, QTP and Jmeter. Study Material and Practice Material included during the course. We are proud to mention that we are the Best institutes to provide Software Testing COMPLETE PRACTICAL REALTIME TRAINING and hence, closely working with various companies in India. So join us besant technologies. AWS Training in Bangalore |
    DataScience Training in Bangalore |

  34. Thanks for sharing, Reach your Goal & finally get a successful future. If you want to learn a high level language……………
    Data Science Training in BTM Layout
    Java Training in Marathahalli

  35. This blog makes me perfect to understand widely about Big data and Data science.Best Hadoop Training Institute in Chennai

  36. The Active Wizards website accentuate the most popular skills are fundamental knowledge of mathematics, statistics, computer science and machine learning. In addition to theoretical knowledge, Data Scientist should be able to "extract," clean, simulate and visualize data. Experience in software development and quality management is also important.

  37. Revanth Technologies is a vast experienced online training center in Hyderabad, India since 2006, with highly qualified and real time experienced faculties, offers Python online training with real time project scenarios.

    In the course training we are covering Types and Operations,Statements and Syntax,Functions,Modules,Classes and OOP, Exceptions and Tools etc..

    For more details please contact: 9290971883
    Mail id:

    For course content and more details please visit

  38. Data Science is one of the best subject to be learnt and this blog clearly explains the best popular software skills in data science job posting While I was having my PMP Training in Kuwait I heard a lot about DataScience I will also learn about the subject in the coming years and gain more knowledge on it with having my PMP Certified and Expertized Knowledge Thankyou so much for providing the information looking up for more Updates from you.

  39. Excellent Article
    Leading Local Search Engine in India
    you can Find AC Mechanic in Chennai
    you can Find Automobile Batteries Chennai
    you can Find Beauty and Spa Chennai
    you can Find Best Bike Mechanics Chennai
    you can Find Leading Call Taxi Chennai
    you can Find 24 Hours Pharmacy Chennai
    You can find All your requirements in call360 for more details & search Visit

  40. Telephone and Phone Answering Service I would like to thank you for your nicely written post, its informative and your writing style encouraged me to read it till end. Thanks

  41. This comment has been removed by the author.

  42. Appreciate your work, very informative blog on DataScience. I just wanted to share information about DataScience Online Training. Hope it helps community here.

  43. Best Topic in the Big data training... Nice & Appreciable Blog

    Aptron India

  44. Thanks for the information and links you shared this is so should be a useful and quite informative!
    Data Science Python

  45. nice blog
    Great Information.
    Skin Course
    VLCC Institute believes in providing students with opportunities to learn from the real world experience, even during their student life.
    Through Internship programmes and Industry interactions, students learn and practise in the real world environment.

  46. nice information about Data Science. very useful blog. its really help ful for me. keep sharing on updated tutorials............

  47. Nice blog..! I really loved reading through this article... Thanks for sharing such an amazing post with us and keep blogging...
    certified woman owned business enterprise
    woman and minority owned businesses sas training services

  48. I am reading your post from the beginning, it was so interesting to read & I feel thanks to you for posting such a good blog, keep updates regularly AWS Training in India | AWS Training in Bangalore

  49. Australia Best Tutor is one of the best Online Assignment Help providers at an affordable price. Here All Learners or Students are getting best quality assignment help with reference and styles formatting.

    Visit us for more Information

    Australia Best Tutor
    Sydney, NSW, Australia
    Call @ +61-730-407-305
    Live Chat @

    Our Services

    Online assignment help Australia
    my assignment help Australia
    assignment help
    help with assignment
    Online instant assignment help
    Online Assignment help Services

  50. Excellent Blog very imperative good content, this article on Data Science training in Noida is useful to beginners and real time Employees

  51. Awesome,
    Thank you so much for sharing such an awesome blog...
    sas analytics training
    with placement assistance


  52. I wish to show thanks to you just for bailing me out of this particular trouble.As a result of checking through the net and meeting techniques that were not productive, I thought my life was done.
    Advanced Selenium Training in Chennai

  53. Nice post, I bookmark your blog because I found very good jobs on your blog, Thanks for sharing more information...
    Data Science Training in Hyderabad

  54. Really very informative and creative contents. This concept is a good way to enhance the knowledge.
    thanks for sharing. please keep it up.
    Hadoop Training in Gurgaon

  55. Your good knowledge and kindness in playing with all the pieces were very useful. I don’t know what I would have done if I had not encountered such a step like this.
    uipath training institute in chennai

  56. I feel really happy to have seen your webpage and look forward to so many more entertaining times reading here. Thanks once more for all the details.
    Best RPA Training in Bangalore

    Best RPA Training in Chennai

  57. This comment has been removed by the author.

  58. My rather long internet look up has at the end of the day been compensated with pleasant insight to talk about with my family and friends. Best Selenium Training in Bangalore
    Selenium Interview Question and Answers - Frequently Asked Questions

  59. I am really happy to say it’s an interesting post to read . I learn new information from your article , you are doing a great job . Keep it up
    Data Science Training in Hyderabad
    Data Science Training in Madhapur

  60. CIITN is located in Prime location in Noida having best connectivity via all modes of public transport. CIITN offer both

    weekend and weekdays courses to facilitate Hadoop aspirants. Among all Hadoop Training Institute in Noida , CIITN's Big Data and Hadoop Certification course is designed to prepare you to match

    all required knowledge for real time job assignment in the Big Data world with top level companies. CIITN puts more focus in project based training

    and facilitated with Hadoop 2.7 with Cloud Lab—a cloud-based Hadoop environment lab setup for hands-on experience.

    CIITNOIDA is the good choice for Big Data Hadoop

    Training in NOIDA
    in the final year. I have also completed my summer training from here. It provides high quality Hadoop training with Live

    projects. The best thing about CIITNOIDA is its experienced trainers and updated course content. They even provide you placement guidance and have

    their own development cell. You can attend their free demo class and then decide.

    Hadoop Training in Noida
    Big Data Hadoop Training in Noida

  61. I simply wanted to write down a quick word to say thanks to you for those wonderful tips and hints you are showing on this site.

    Amazon Web Services Training in Chennai

    Best Java Training Institute Chennai


  62. Thanks a lot very much for the high quality and results-oriented help. I won’t think twice to endorse your blog post to anybody who wants and needs support about this area.

    AWS Training in Bangalore

  63. I am really happy with your blog because your article is very unique and powerful for new reader.
    Best Python training Institute in chennai

  64. Its is very very helpful for all of us and I never get bored while reading your article because, they are becomes a more and more interesting from the starting lines until the end.

    salesforce adm 201 online training

  65. Thank you for your post. This is excellent information. It is amazing and wonderful to visit your site.
    sas data certified scientist

  66. The information which you have provided is very good. It is very useful who is looking for core Java online training Bangalore

  67. it is very excellent blog and useful article thank you for sharing with usBig data hadoop online training India

  68. Thanks information which you are provided please update regularly. Please go through this link for information about Machine Learning Training In Hyderabad

  69. This concept is a good way to enhance the knowledge.thanks for sharing. please keep it up Java online training

  70. Data Science has become extremely popular among several software professionals lately. Candidates are looking to broaden their horizons by learning data science along with their programming languages. There are several institutes in India especially in Hyderabad. Get the best Data Science Courses in Hyderabad.

  71. This comment has been removed by the author.

  72. It's A Great Pleasure reading your Article, learned a lot of new things, we have to keep on updating it salesforce certification training Thanks for posting.


  73. Really it was an awesome article… very interesting to read…
    Thanks for sharing.........

    bigdata hadoop online training in ammeerpet

  74. Extraordinary and helpful article. Making content consistently is extremely intense. Your focuses are roused me to proceed onward.

    capezio dance shoes

  75. Such an excellent post. Data Science is one of the booming technology in the current IT world. I have much interest to work with data science. Thank you for sharing with us.

    Data Science Training Institute in Chennai
    Data Science Course in Chennai


  76. Really it was an awesome article… very interesting to read…
    Thanks for sharing.........

    datascience online training in hyderabad