Detecting and Analyzing Redundant Code

Redundant Code has a number of different impacts.

Introduction

DeepEnds was developed as a tool that analysed dependencies within source code. It created the Abstract Syntax Tree for C# source by using Roslyn and C++ source by using libclang. To validate that it was working as intended I then implemented functionality to identify unused methods. This showed that the parsing of my C# code was much more accurate than that of my C++ code, hence I chose to focus on the further development of the C# parser and other people’s more sophisticated C# code.

Initially the tool would flag the lines where the redundant methods existed, as the scale of the problem became apparent an option to automatically delete those lines was implemented. A typical analysis would involve running the tool repeatedly to prune back the source tree as brutally as possible. This was then followed by several cycles of reverting changes so as to get successful builds and then passing tests. The reasons for failure being that the tool had behaved incorrectly or there was a known limitation, examples of the latter being reflection or the existence of a code contract.

The tool was trained on various GitHub repositories for C# projects that were chosen on the basis that I had used them and thus wanted to contribute back. Ultimately a pull request was submitted to the community asking for discussion of the changes in my branch. As the tool is brutal and I was engaging online with people for the first time this was where diplomacy was required and hopefully I didn’t offend too many people. By contributing and being involved in the subsequent discussions my understanding of the issues was greatly increased and this article is an attempt to feed that through into the wider community.

The GitHub projects analysed

Mono.Cecil decompiles .NET code to C# and written by JB Evain. It was suggested that a mere 36 lines be deleted and JB chose to independently add some of the changes after review rather than merge in the branch.

Automatic Graph Layout is an official Microsoft project developed by Lev Nachmanson, Sergey Pupyrev, Tim Dwyer, Ted Hart, and Roman Prutkin for drawing graphs and directed graphs and is used by Visual Studio for displaying various interactive diagrams. The pull request was for deletion of 4674 lines of code, a number of which were to do with SilverLight (the deprecation of which was announced in 2015). The branch was merged in without modification or discussion.

Roslyn is the modern C# compiler maintained by a team from the .NET foundation. In this instance the pull request was for 18364 lines to be deleted, which resulted in a good discussion and led to most of the categorisation which is discussed below. Obviously it was too large to be merged and instead led to a number of individual issues being raised.

MSBuild is an official Microsoft project that any user of Visual Studio should be familiar with. Analysis led to a pull request for 3722 lines to be deleted, unfortunately the team didn’t have the capacity at the time to review the suggested changes.

The last to be analysed was the System.XML assembly within the .NET Core foundational libraries. These libraries are maintained by the .NET foundation and the team had a tracking issue raised to eliminate dead code. The issue was addressed by a per assembly process that involved trimming the assembly (more commonly known as dead code elimination) and then creating a difference between the untrimmed and trimmed assemblies to identify what compiled code had been removed. This difference then informs the source code to be deleted, the work typically undertaken by the volunteer community.

Given a sample of only five projects it is not wise to draw any conclusions, especially given a lack of reliable statistics:

  Mono.Cecil MSAGL Roslyn MSBuild CoreFX (System.XML)
Deleted 36 4674 18364 3722 427
Initial commit 2010/04/12 2015/02/22 2014/03/18 2015/03/12 2014/11/08
Authors (main) 39 (1) 25 (4) 285 (31) 90 (6) 526 (29)

The initial commit date is notable for MSBuild given its long history. Just counting the number of authors without assessing their contribution is almost certainly meaningless so a rough estimate of the significant contributors is given. Having said that I would speculate that:

Categorisation of test results

The tool tests for redundant methods and like any test it can be analysed for success and failure.

The true negative:

The false negatives:

The true positives:

The false positives:

It is to be noted that where the tool fails is not necessarily due to a lack of inference. Some of the cases should be covered by other tools. e.g. compiler warnings should flag some of the issues, duplicate code detectors are also useful.

The effects of redundant code

The false negatives and true positives can be collectively viewed as instances of YAGNI and raise the following considerations:

Given the wide variety of problems that may occur due to redundant code it is best to treat it as a code smell.

Management of existing redundant code

There are four ways of dealing with redundant code:

If the source code is going to be changed then any changes should be reviewed by someone familiar with that section of the code to ensure that the code is really redundant.

Management of new development

Assuming the aim is not to introduce new redundant code during development then what stratagems are to be deployed? If partial commits are allowed then it is likely that redundant code will be temporarily added to the code base. This can be mitigated by using feature branches and only merging into the master branch once the tests provide coverage and the source passes static analysis. Another advantage of feature branches is that when development is abandoned the branch can remain unmerged.