GSOC 2009 RTEMS Proposal

Title: Coverage Analysis
Student: santosh vattam
Abstract:

I wish to take up Coverage Analysis of RTEMS as a project for the Google Summer of Code initiative, 2009. The aims of the project are as follows:

1. To perform automated coverage testing and analyse the object level coverage provided by the RTEMS test suite
2. Identify and report parts of code that are not being exercised under the test suites
3. Analyse each case separately and classify into categories previously identified.
4. Address each of these cases and eliminate them.
Content:

About me:

I am a student pursuing undergraduate degree course, majoring in Information Science and Engineering, from B.M.S. College of Engineering, Bangalore - India. I am currently into my final semester and will be graduating in July. I have been a free and open source software enthusiast from the past 4 years. I am one of the main coordinators of B.M.S. Libre Software Users' Group, shortly known as BMSLUG. Through BMSLUG we have conducted various sessions and workshops to spread the awareness of Free and Open Source Software, not only across the campus, but also across other colleges in and around Bangalore.

* We organized a session by Richard M. Stallman on October 18th, 2006
* We organized "Swatantra Tech Fest", an all day long Free Software fest on April 21st, 2007
* We organized a Hack Fest as part of Genesis - the technical fest of the Dept. of Information Science and Engineering, BMSCE held in September 2007.
* We conducted the Gnusim8085 Hackathon on November 5th, 2008 - it was an introduction to FOSS hacking, starting with checkouts, build tools to generating a patch and committing. We managed to fix a bug and send a patch to the devel list on that day.
* We have conducted a hands on workshop Python at NIT Calicut as a part of FOSS Meet at NITC held in February 2008

I can dedicate a minimum of 30 hours a week on this project, since I have classes to attend only on 3 days a week and my only other commitment to the college - my final semester project - is complete. This project involves testing majority of the RTEMS code base and this will provide a great insight into the internals of RTEMS. I hope to gain this knowledge from the project, and through this take my contribution to the next level. I express deep commitment to the RTEMS project, I completely understand that through this programme RTEMS is looking for full time contributors to the project and I intend to be one at the end of this programme.

I have a fairly good understanding of C and C++ concepts and have close to 4 years and about 3 years of programming experience in C and C++ respectively. I understand that C is a major requirement for this project and I am very comfortable with programming with C. Other than C and C++, I am also fairly comfortable with programming in Python and x86 and x86_64 assembly code which will also be of help with this project.

I have been working with RTEMS for close to a month now. I started off tackling bug #1383 and generated a patch for it, but since this patch had shortcomings it was not accepted and Joel Sherrill pointed out these shortcomings and that's how I got introduced to the RTEMS community. I then tackled bug #1378, and this time I managed to get my name on the Changelog. After this I met Joel Sherrill on #rtems and started discussing ideas for SoC and finally zeroed in on Coverage Analysis.

Email ID: vattam.santosh@gmail.com

IRC Handler: dr__house

IRC Server: freenode

Blog: http://vattamsantosh.info/blog

Application

I wish to take up Coverage Analysis of RTEMS as a project for the Google Summer of Code initiative for the year 2009. I have had detailed discussions with my prospect mentor to establish the requirements and challenges of this project. RTEMS coverage analysis right now is to be done on four directories under cpukit viz., posix, rtems, sapi, score. The coverage analysis is performed on the files contained in these directories. Coverage analysis will be performed using simulators that have coverage analysis support. At the moment, skyeye (http://www.skyeye.org) provides coverage support for the ARM architecture and can be used for RTEMS built for the ARM/EDB7312 or ARM/RTL22xx BSPs. Also tsim (http://www.gaisler.com) provides coverage support for SPARC ERC32, LEON2 and LEON3 BSPs but it is a proprietary product. Hence, the coverage analysis at this point in time will be done on these architectures for the BSPs supported on these simulators. If I do not have access to a TSIM license, I will use the ARM and let Joel Sherrill provide updated coverage reports on the SPARC. Through coverage analysis, parts of code that are not executed by the current tests are identified and must be manually classified into various categories. These are some of the categories identified:

* Needs a new test case
* Unreachable with the current configuration
* Debug or sanity checking code
* Unreachable paths generated by gcc for switches
* Critical Sections that are synchronizing actions with ISRs.

The code that's identified as uncovered during coverage analysis is classified into one of these categories and an entry added to the file Explanations.txt to explain it. This file contains the line numbers in the files that had uncovered code along with comments on why the code is uncovered. As part of the coverage analysis, the tools read that file and generate a report with uncovered code ranges, size, source files, and an explanation if there is one in Explanations.txt. It also generates a file named annotated.dmp that contains an object dump with source code. It highlights the assembly instructions that were not executed. The reporting software also generates various statistics on the uncovered code, such as the percentage of code covered and not covered and the sizes of the largest uncovered areas. This file gives a fair idea about why a certain piece of code is not being covered. After having identified and analysed the uncovered code, each of these cases are addressed and corrected. Hence, the project will be an iterative process of identifying sets of uncovered code, analysing them and proposing methods to cover these sets. These methods will be reviewed by the mentor and then these methods are implemented. Once all the parts of a given set are covered, a patch is generated and added to the code base after review.

This project demands an iterative development process, since the code coverage cannot exactly be divided into distinct phases of development. The following tasks are performed iteratively on the sets that are identified as mentioned above:
Phase I:
Perform coverage analysis on the sets, and generate the Explanations.txt and the annotate.dmp for each of these sets.
Phase II:
Study the current Explanations.txt file and tackle all the TEST SIMPLE cases, since they are the easiest to handle and eliminate. These cases are generally easy and tackling them will certainly reduce the number of cases considerably and also give confidence. Patches will be generated and submitted.
Phase III:
Once the TEST SIMPLE cases have been eliminated, the largest uncovered area of the code set is identified. The test cases in these areas will be analyzed and Explanations.txt updated. Then a set will be attacked and tackled. There are currently 194 uncovered areas in the RTEMS code base. These areas will be identified and tackled during the course of the project as well as after the completion of the GSoC timeline. Patches will be generated and submitted.
Phase IV:
Iterate through the above mentioned steps for subsequent sets identified.

In order to evaluate the progress for the Mid Term and the Final evaluations, there is a need to set up milestones. But since this project consists of many small iterative steps it is difficult to set specific milestones. The only criteria based on which the coverage task can be evaluated are:

1. The reduction in the number of uncovered binary code ranges from that identified initially.
2. The percent of untested binary object code as a percentage of the total code size under analysis.

During the course of this project, I will be attacking uncovered code and generating patches for all the code that's fixed. For all those cases that were not tackled, I will be adding entries in the Explanations.txt file with a brief write up about why they were not tackled and a link to the project wiki where there will be detailed explanations presented. The whole process of coverage analysis and generating patches will be documented on the project wiki.
After having discussed with my prospect mentor about Mid Term and Final evaluations, the following criteria have been set up as milestones:

* Mid Term Milestone:

1. Exhaustive documentation of the procedure of coverage analysis. This is in order to make the procedure available to others, so that after the completion of GSoC, people from the RTEMS community may perform coverage analysis on their own.
2. A set of patches for all the uncovered code that was fixed along with reports about these patches. These reports will contain the exact uncovered ranges that the patches fixed and an explanation on what the patches do. Also, a more detailed explanation on these patches will be provided on the project wiki.
3. A wiki documentation of all the cases explained but which were hard to test. These cases will have entries in the Explanations.txt along with the link to the wiki documentation.

* Final Evaluation Milestone:

1. The set of patches will increase in number as the analysis progress and more and more uncovered code is fixed. Thus the reports as well as the project wiki will be update reflecting the changes.
2. As the more code is analysed, the number of uncovered cases that are hard to test may also increase, and hence for all these cases as said earlier the Explanations.txt and the wiki pages will be updated subsequently.

The project wiki, that is being mentioned numerous times above, will have the following structure. It will have 4 pages, one will contain the procedure of Coverage Analysis. The second will have the list of cases that have been successfully tackled, with entries for each of the patches generated, along with a brief report on what these patches do. The third will contain the list of cases that have not been tackled along with detailed explanation about why they were not tackled. Finally, the fourth page will be a spread sheet that will contain baseline information about the coverage such as the number of uncovered ranges, the total size, the uncovered size and the covered and uncovered percentage of code, along with updates from each committed patch. This will also have a graph indicating the progress. This wiki will be visible to the entire community, thus keeping the community updated about my work. Feedback/comments/brickbats from the community will be welcome, thus improving my work. The procedure of coverage analysis will also be made available as a formal Texinfo manual along with the source tree.

I understand that during the course of my work, more and more code will be added on to the RTEMS code base, thus resulting in coverage changes. I discussed this with my prospect mentor, and we decided that the official coverage numbers will be provided by my mentor using tsim, since that's the simulator that most RTEMS developers who have worked on coverage analysis are used to and also since tsim runs all the tests. My mentor will provide all the data from the base run and the updated runs.

Auxillary Goals:

At the moment the project will focus on analysing code from the posix, rtems, sapi and score directories under cpukit. As an additional goal, the project will include code from the libcsupport, once the analysis of the above mentioned directories are done.