David A. Wheeler
The NTP problem is really bugging meOn Jan 28, 2016, at 8:51 AM, Emily Ratliff <firstname.lastname@example.org> wrote:
This is likely a quirk in the data. CII does fund the NTP project for ongoing maintenance work (part-time - the project certainly could use additional funding). They don't use the github repository as their main development repo, so that may be throwing the numbers off. There are more than 0 committers.Actually, ntp was identified as a risky program. The more detailed paper D-5459 has more info that you (Kit) may have missed. See https://github.com/linuxfoundation/cii-census/blob/master/OSS-2015-06-19.pdf (click on “Raw” to use your local PDF reader). If you look at the list “Riskiest OSS Programs (human-identified subset informed by risk measures)”, on page 6-5 we *specifically* identify ntp as one of the riskiest programs. Once you look at the list when combined with human expertise, ntp jumps out as important. As Emily noted, the Linux Foundation is specifically funding the NTP project.
An *ideal* for the census project would be to have no need for human judgement. Ideally we could create quantitative measures, combine them in a clear and simple way, and demonstrably have a perfect list of exactly what’s riskiest (and in what order). We don’t currently have that ideal… but that doesn’t make the work useless. What we have instead are quantitative measures that can *help* humans make a determination of risk. In my experience there are many tough problems where the computer can't really make the decision... it can only be an *aid* to a human who makes the decision. Since the goal was to help humans make investment decisions, we met the goal.
If people have ideas about how to improve the census, we're all ears. We posted not just how we created the census numbers, but the alternatives we looked at & the code to calculate it. We *want* people to suggest improvements. Metrics is a notoriously hard problem in security.
One *big* problem is the lack of known truth - there are a lot of great learning algorithms, but they require truth data we don't have. Vulnerability counts (for example) are terrible proxies; a low number may mean the software is secure, or it may simply mean that no one has seriously reviewed it AND publicly reported the results. Not all vulnerabilities are equal, either.
--- David A. Wheeler