Topics

Revoking badges if there are "too many" vulnerabilities


David A. Wheeler
 

First, a recap. Enos proposed:
VULNERABILITIES
Do not only measure efforts but also results: if too many
vulnerabilities are discovered (e.g. total CVSS points per number of
lines of code in a certain period), temporarily revoke badges.
Emily Ratliff expressed concern:
I would be concerned about this one, especially the way it is
described in the parenthetical. We don't want a rule that is subject
to gaming (i.e. if I withhold this fix for 72 hours, then I won't lose
the badge for my project). We also don't want to penalize the behavior
that we are trying to motivate [...]
Trevor also expressed concern:
... if this happened, I would be hard pressed to constantly fend off the false positives (there are thousands) and I detest simply changing my software to make someone else's poorly written scanner be quiet.
Enos replied:
You have a point, but as always I will play the devil's advocate...
The best projects go after badges to *increase* their security.
Ordinary projects go after badges to *prove* they incorporate security.
Bad projects go after badges to *pretend* they incorporate security.
If bad projects are allowed to maintain their badge in spite of an outrageous number of vulnerabilities, end-users will realize it and will stop associating the badges to the concept of security. The badges will ultimately lose their meaning for all but the best projects.
While the concept of "maximum CVSS total points per lines of code per unit of time" is not appropriate for the basic badge, I am of the opinion that it (or something similar) is necessary for higher ones.
I appreciate your goal, and clearly we want to counter pretenders.

However, I don't think that counting the number of vulnerabilities is the best way to get there; I think there are better ways.

Raw vulnerability counts can be very misleading. A project that is large, written in C/C++, security-relevant, and widely reviewed (like the Linux kernel) is likely to have more publicly-identified vulnerabilities just from the higher likelihood of mistakes, the higher likelihood of mistakes leading to vulnerabilities, and the larger number of people examining it for those vulnerabilities. A different rarely-used program might have more vulnerabilities, but if fewer people care there will be fewer who examine for them. What number is "too much"? That would be pretty arbitrary.

Also, when new tools or analysis techniques become available, a whole bunch of vulnerabilities may be discovered at once. American Fuzzy Lop is an excellent example of this. Yes, the vulnerabilities were already there, but they may have been difficult to find without the new tool or technique. Well-run projects can still have a sudden increase in known vulnerability counts when a new tool is created.

Projects are already required to *fix* confirmed vulnerabilities of a certain level within 60 days of public release of the vulnerability info. If a project has a *huge* number of vulnerabilities, it's likely to struggle to resolve them in that time. So I think *that* criteria already helps catch the projects that are more vulnerability than functionality.

A more general problem is the noisiness of many static analysis tools. I tried to make it clear that just a *finding* from a tool isn't the same thing, because of the very issue Trevor raised above. Perhaps the text should more carefully discuss and distinguish between tool findings and exploitable vulnerabilities; it appears that isn't clear enough.


Just prohibit projects from arbitrarily withholding patches... If you cannot trust a project to respect that single rule, than you cannot trust it to self-certify. Out of respect for the good projects, and for the survival of the badging program, you ought to find rogue projects and, after a warning or two, revoke their badges.
Performing audits I learned the importance of strict policies: if you have them, then you can always choose not to enforce them, but if you don't have them, you cannot suddenly create them when needed.
I don't that this text should be a criterion as-is. Often there's a dispute on whether or not something is actually a security fix, or whether or not it fixes things at all. In short, just "withholding patches" is too broad for a criterion.

HOWEVER: This might indeed be a start for a good criteria idea: an anti-cheating rule. If a project is "cheating" in some way, we could require that they agree not to cheat & penalize them later if they do. Volkswagon's emission test cheating went on from 2008-2015. You could say that's an argument against an anti-cheating rule (since they did it anyway). However, I think it's better to view it as an argument FOR an anti-cheating rule; cheating was expressly forbidden, and since they cheated they now have a penalty (in our case, it might be withdrawal of the badge for a time). I'm not sure what the text of an anti-cheating rule would look like, though.

--- David A. Wheeler


Enos <temp4282138782@...>
 

Wheeler, David A wrote:

[...]
Your arguments are indeed logical, but I still prefer some kind of metrics (not necessarily enforced) on discovered vulnerabilities to deter projects in bad faith, and lighter dynamic testing for the basic badge (e.g. only considering high vulnerabilities) to not overwhelm the smallest projects.

Thanks for the discussion. As I already suggested, please feel absolutely free to ignore these and other items in my list without the need to motivate it.

Speaking of the list, I would transfer from the background document to the criteria the OpenBSD code review policy: once the cause of a vulnerability is discovered, all the code must be parsed for similar occurrences (in our case possibly with the aid of automated tools, e.g. "grep") and not just the part causing the specific vulnerability.


Kind Regards
--
Enos


Alton Blom
 

Hi All,

On Enos' last point HackerOne recently released their 'Maturity Model for Vulnerability Coordination'[1] that references a number of ISO standards.

Part of their maturity model for the engineering side of things has the following criteria:

"Basic - Clear way to receive vulnerability reports, and an internal bug database to track them to resolution. See ISO 29147.
Advanced - Dedicated security bug tracking and documentation of security decisions, deferrals, and trade-offs.
Expert - Use vulnerability trends and root cause analysis to eliminate entire classes of vulnerabilities. See ISOs 29147, 30111, 27034."[2]

At our basic badge level I think we have a criteria that matches their Basic and the badge application process covers some of the macro level decisions around why certain security functions are or aren't included.  I think that at our basic level it'll be too much to mandate all code to be parsed for similar occurences.  Maybe this could be a suggestion at the basic level and then a MUST at higher badge levels.

On Thu, 8 Oct 2015 at 05:00 Enos <temp4282138782@...> wrote:
Wheeler, David A wrote:

> [...]

Your arguments are indeed logical, but I still prefer some kind of
metrics (not necessarily enforced) on discovered vulnerabilities to
deter projects in bad faith, and lighter dynamic testing for the basic
badge (e.g. only considering high vulnerabilities) to not overwhelm the
smallest projects.

Thanks for the discussion. As I already suggested, please feel
absolutely free to ignore these and other items in my list without the
need to motivate it.

Speaking of the list, I would transfer from the background document to
the criteria the OpenBSD code review policy: once the cause of a
vulnerability is discovered, all the code must be parsed for similar
occurrences (in our case possibly with the aid of automated tools, e.g.
"grep") and not just the part causing the specific vulnerability.


Kind Regards
--
Enos
_______________________________________________
CII-badges mailing list
CII-badges@...
https://lists.coreinfrastructure.org/mailman/listinfo/cii-badges


Enos <temp4282138782@...>
 

Alton Blom wrote:

[...] HackerOne [...]

Basic [...]
Advanced - Dedicated security bug tracking and documentation of
security decisions, deferrals, and trade-offs.
Expert - Use vulnerability trends and root cause analysis to eliminate
entire classes of vulnerabilities. See ISOs 29147, 30111, 27034."
Interesting, however they mention "trends" (for which you need both the issue history and the skills to read through it), while I meant just a naive grep for yesterday's critical vulnerability, so another identical one is not found tomorrow.


[...]
I think that at our basic level it'll be too much to mandate all code to
be parsed for similar occurences. Maybe this could be a suggestion at
the basic level and then a MUST at higher badge levels.
I agree in the case of manual code reviews to categorically exclude issues, but I remain of my idea if by "parsing" we just mean a quick search with no pretense (for example, after an SQL injection, grep for queries not using parameterized statements).

In my opinion, in that case, the time and skill required are compatible with the basic badge, the concept could be expressed clearly and (from my own limited experience) the results are generally worth the effort.


Kind Regards
--
Enos