Software Research and the Industry

Dirk Riehle’s blog about everything computer science, applied and more

Software Research and the Industry header image 2

The Commit Size Distribution of Open Source Software

September 23rd, 2008 · No Comments

Title: The Commit Size Distribution of Open Source Software

Authors: Oliver Arafat, Dirk Riehle

Institution: SAP Research, SAP Labs LLC

Abstract: With the growing economic importance of open source, we need to improve our understanding of how open source software development processes work. The analysis of code contributions to open source projects is an important part of such research. In this paper we analyze the size of code contributions to more than 9,000 open source projects. We review the total distribution and distinguish three categories of code contributions using a size-based heuristic: single focused commits, aggregate team contributions, and repository refactorings. We find that both the overall distribution and the individual categories follow a power law. We also suggest that distinguishing these commit categories by size will benefit future analyses.

Reference: In Proceedings of the 42nd Hawaiian International Conference on System Sciences (HICSS-42). Forthcoming.

Available as a PDF file.

Tags: Open Source · Publication · Research

0 responses so far ↓

  • There are no comments yet...Kick things off by filling out the form below.

Leave a Comment