- Case Number: a20140422.2
- Status: running
- Claimant: Benny B, Michael T
- Respondent: Critical Admins Team (CAT)
initial Case Manager: EvaStöwe
Case Manager: BernhardFröhlich
- Date of arbitration start: 2014-03-23
- Date of ruling: 201Y-MM-DD
- Case closed: 201Y-MM-DD
- Complaint: Unapproved modification on Critical System
- Relief: TBD
Before: Arbitrator EvaStöwe (A), Respondent: Critical Admins Team (R), Claimant: Benny B(C1), Michael T(C2), Case: a20140422.2
- 2014-04-22 (issue.c.o): case [s20140422.178]
- 2014-04-22 (iCM): added to wiki, request for CM / A, informed R, C1, C2 about case
- 2014-04-22 (R): as this is hanlded per arbitration critical team will stop to revert any changes in this context for the time being
- 2014-04-23 (C1): explains reason for dispute to R
2014-04-23 (A): I'll take care of this case as A, BernhardFröhlich will CM
- 2014-04-23 (A): init mail send to C1, C2, R
- 2014-04-23 (A): some questions to critical team
- 2014-04-23 (R): answers questions
- [there are some important other issues to be attended by software team, so A decides to wait some time, until they are less busy with further questions, as no urgency is seen]
- 2015-01-17 (A): forwards some of the answers from critical team to software team; asks for comments before 2015-02-14
- 2015-02-21 (A): two questions to C1, C2 - why was R not addressed directly by the questions from the dispute, why was bug 1127 not addressed within one year
- 2015-02-21 (C2): two comments to answers from R
- 2015-02-21 (C2): asnwers the two quesitons from A: C2 had contacted R before the dispute to solve the issue, with R responding, dispute was filed by C1, afterwards R wanted to wait for an arbitration decision to proceed; nothing done to bug as it is not needed anymore and R delcared to wait for an arbitration decision
- 2015-02-21 (A): informs R about last two mails from C2, tells parties that solving the issue was not part of the dispute und definitely in authorities of the parties, so saw no need to cover this so far, parties should be able to slove this on their own - if they cannot do this without a decision from Arbitration there they should state this clearly
- 2015-02-21 (R): had proposed to revert the changes, when addressed by C1 at 2014-04-22, but instead of answering software team filed a dispute, prognosed back than, that by this the issue would not be solved for months, proposes a solution to close the issue
- 2015-02-22 (A): leaves this open to teams, but appreciates suggested solution
- 2015-09-29 (C): accepts changes from R, even as they are obsolte, now
Link to Arbitration case a20140422.2 (Private Part), Access for (CM) + (A) only
EOT Private Part
> Dear support, > > I want to file a dispute against the Critical Admins Team (CAT) for the > way that bug 1127 has been handled so far. > > Only by chance was discovered that someone of the CAT wrote AND applied > a patch without prior agreement by the SoftWare Assessment Team (SWAT). > > Please answer and decide on the following issues: > 1. Was CAT allowed to apply this change without prior consent by SWAT? > 2. Who of the CAT authorized this change and for what reason? > 3. What necessary documentation is required yet is missing in this case? > 4. Which modifications have been done on the critical system that are > not officially documented (e.g. credentials for the database)? > Especially which changes to version controlled files have been done? > 5. How did CAT try to ensure that SWAT becomes aware of the issue and > give consent to this change? > 6. Why wasn't SWAT informed that the change has become obsolete in the > mean time due to other changes to the infrastructure?
Answer to questions asked in the dispute
The first question is of another nature as the other ones, this will be addressed afterwards, as the other questions help to answer it partly.
The questions and answers are restructured a little bit, so that they can be read in a more natural order.
2. Who of the CAT authorised this change and for what reason?
The CAT-Team lead authorised the change, after had reviewed them himself, "because of complaints from translators about references in the strings to be translated to non-existing source code."
The patch was tested on the productive system, "because that is the only place where we can upload the strings to be translated to the translation server."
They were not removed afterwards ", since they were useful. Just fixing the uploaded translation strings once, only to have them replaced by a confused version some days/weeks later after a new upload with the old unpatched script did not make sense."
3. What necessary documentation is required yet is missing in this case?
This was noted in the bugtracker entry for the according bug https://bugs.cacert.org/view.php?id=1127, " -- assuming that the Mantis system is monitored by the software team for new bug reports."
The change "was deliberately *not* committed to the CVS / tarball yet, in order to wait for formal processing by the software team. However, to stop confusing translators, the change was left in place on the production server, so meaningful translation string uploads would be performed."
Additional question about some further changes
Question: "In the first answer the critical team lead, mentioned, that there were further changes done on the system, that made the changes addressed in the dispute unneeded. What changes are this, are they related to a patch?"
"Those changes were performed on February 5/6, 2014, and have been documented in https://lists.cacert.org/wws/arc/cacert-systemlog/2014-02/msg00003.html These changes are not related to a specific patch. They are somewhat related to the split-up of the CRL and OCSP servers into two separate virtual servers."
These changes are not related to a specific patch. They are somewhat related to the split-up of the CRL and OCSP servers into two separate virtual servers."
They were authorised by the CAT team-lead for above reasons. They were created and tested by himself and reviewed by the critical team.
The link above "also contains a list of affected files on the various critical servers. Please note that all configuration files on critical servers are version controlled through RCS; getting them into the CAcert SVN is a work-in-progress -- already complete for crl.cacert.org and ocsp.cacert.org." [this was stated at 2015-04-23, probably done by now]
Also a diff for this changes were provided (and forwarded from Arbitration to the claimants).
6. Why wasn't SWAT informed that the change has become obsolete in the mean time due to other changes to the infrastructure?
"It was not done, because we did not realize that this change would make the #1127 patch unnecessary. This was only realized (and reported) after Michael's e-mail of April 22, 2014." [at the date when the dispute was filed]
4. Which modifications have been done on the critical system that are not officially documented (e.g. credentials for the database)? Especially which changes to version controlled files have been done?
"I have no idea what you mean with "credentials for the database" here. But to answer your question: $ cd /home/cacert/www $ cvs diff [... diff of the system, omitted by the Arbitrator ...] shows no other uncommitted changes in the CAcert application."
$ cd /home/cacert/www $ cvs diff [... diff of the system, omitted by the Arbitrator ...]
shows no other uncommitted changes in the CAcert application."
5. How did CAT try to ensure that SWAT becomes aware of the issue and give consent to this change?
The question answered by the critical team was: "If you think that there was a big issue related to bug 1127 or bug 1127 not attended for a while, how did you try to get the software team aware of it?"
The answer to this was: "There was no big issue here, in two ways: I reported the issue as bug #1127 including full reasoning and a fix ready to deploy, assuming it would be picked up by the software team at some suitable moment, with very little work involved to complete the adoption of the patch. I didn't expect that more than 16 months would go by without any activity. So clearly the issue was forgotten on both sides (critical team and software team)."
I reported the issue as bug #1127 including full reasoning and a fix ready to deploy, assuming it would be picked up by the software team at some suitable moment, with very little work involved to complete the adoption of the patch. I didn't expect that more than 16 months would go by without any activity. So clearly the issue was forgotten on both sides (critical team and software team)."
additional question for the critical team, asked by the Arbitrator
Question: "Have you any idea how the communication between software team and critical team can be improved"
Answer: "I think that the communication between the two teams is working pretty well generally. But if there is something extra that needs to be done to bring bugs reported in Mantis under the attention of the software team, please let us know and we'll do it."
The questions asked and answers given by the critical team, including the last one, were forwarded to the claimants with the request to provide comments. There was only one minor late comment from the claimants, who are the former and the proposed team-leads for the software team.
As there was no further comment from their side, about the communication question, it has to be assumed, that entries in the bugtracker (Mantis) are regarded sufficient as communication, on both sides (critical and software team).
The Arbitrator has some doubts in this regard, especially as mails in this case have shown that the issue could be solved easily with just a few mails within a short time - which C2 and R were trying to do, when the dispute was filed by C1, so that the other communication was stopped.
If the current form of communication leads to quite long phases of patching issues, both teams should consider to improve something in this regard.
However, the assumption from the critical team, that software team should get aware about filed bugs has to be confirmed:
The SP 7.5: Software assessment team maintains a bug system. Primary communications should go through this system.
Such bugs can be critical and those have to lead to quick actions, so it is the job of the software team to monitor entries in the bugtracker closely. It is also the job of the software team to organise themselves and to keep track of the bugs.
At least this is not the job of the critical team. So if a bug is entered in the bugtracker, it is not the job of the critical team, or anybody else who entered the bug, to keep software team aware of the entry. (But it may be sensible ot ask about the progress after a reasonable time, at least in some situations.)
1. Was CAT allowed to apply this change without prior consent by SWAT?
Nature of the change
There seems to be agreement that the changes were done to the controlled software parts of the critical system. There is also agreement, that the change was not critical - C2 has confirmed this.
The change consisted of renaming of files and adjustments in the makefile that creates the files for the translation system, based on the renaming. The real change was, that a special file was not treated by the makefile, afterwards. This change was done by intention because of issues with this file in the context fo translations.
Critical team has a lot of responsibilities and this includes to handles software and issues with software running on the critical system that is not part of the controlled software. The translation system seems to be one of those. The changes were done as an administrative act to solve an issue, here. They were of a complexity that an administrator should be able to handle.
But as the changes were done to the part of the software that is under control of the software assessment team, their authorisation may have been required for those changes.
[considerartions of SP, SM, procedures to be added]
possible special aurthority based no special situation
But the SP also allows for some deviations from that processes, in special situations. Especially when there is an emergency. There are additionally requirements to be met for those. The requirements are depending on the urgency of the issue and the availability of team members to perform the needed tasks. The issue probably was not urgent and nobody tried to do an emergency execution, so they clearly do not apply.
But there was an issue that was hindering other team members (translators) to perform their tasks. It was not urgent or critical but as the time has shown this issue would not have been fixed in way over a year, if critical team had not done the changes - as the bug was not addressed by the software team during that time, at all - even WHEN there was more pressure because it was already installed on the critical server.
So even if there was no direct urgency, the delay until the issue would have been fixed without the changes of critical team has to be called major. So the estimation behind the quick fix has to be considered to be at least reasonable.
While this does not authorise the fix as it was done, this also has to be considered.
It may be sensible to consider to install an additional patch process for patches that are found to be required by the critical team for the performance of the critical system, even as the issue is non-critical and where it has to be assumed that the normal process would take too long, when the patch itself is of an administrative nature.
Discrepancies between the systems
The major concern of the claimants was (in the words of C2 which are a lot more appropriate than those of C1): " the fact that the tarballs differed from the system was the problem here, and the fear that this change might only be the tip of the iceberg of unpublished changes, which fortunately turned out to be unnecessary."
This discrepancy was definitely caused and introduced by the actions of the critical team. As clarified by this case, they were the only discrepancies. It also was documented in the bugtracker - one of the sources one would look if interested in the software.
The fact that the differences continue even over 2 years later, is the responsibility of the software team. The discrepancy was documented in the bugtracker, which the SP defines as the primary communication system for the software team. When addressed with the discrepancies (over one year later) critical team proposed within hours to revert the changes. Critical team proposed to do this during the course of this case, again. This was not answered by the software team, both times. It has to be assumed that if the software team had either processed the bug or asked critical team to revert the changes that the discrepancies would have been ended, more or less directly.
Also it is the responsibility to monitor the bugtracker. There are three options at this point:
- Software Team saw the bug, classified the changes as unproblematic - and left it like that for over a year.
- Software Team saw the bug, classified the changes as a problematic issue - and left it like that for over a year.
- Software Team did not see the bug.
The second two options would be serious violations of the responsibilites of the SP.
Everything in the case points to either the first or the last point. We probably will not be able to clarify this in detail. In the favor of the Software Team we assume the first alternative.
If we do this, this has to be regarded as a non-documented review (from probably more than one Software Assessor). This would only be a minor deviation.
Another consequence of this view is that at this point (which hopefully was shortly after the bug was entered), the changes were reviewed. They also were tested as described by Critical Team in the bug entry. So at this point the patch had the status that it would need to be allowed on the critical system, even as the required processes were not followed. But with incomplete documention in the bugtracker and the tarball.
So if we again follow the assumed classification of the Software Team, we have to classify the discrepancies as minor - in favor of both teams.