* Case Number: a20120622.1 * Status: running * Claimants: Bas D * Respondents: CAcert * Initial Case Manager: AlexRobertson * Case Manager: BernhardFröhlich * Arbitrator: UlrichSchroeter * Date of arbitration start: 2012-06-22 * Date of ruling: 201Y-MM-DD * Case closed: 201Y-MM-DD * Complaint: Authorize emergency visit * Relief: TBD Before: Arbitrator UlrichSchroeter (A), Respondent: CAcert (R), Claimant: Bas D(C), Case: a20120622.1 == History Log == . 2012-06-21 (Support): Blog post [[http://blog.cacert.org/2012/06/567.html|Problems with signing certificates (resolved)]] . 2012-06-22 (issue.c.o) case [[https://issue.cacert.org/otrs/index.pl?Action=AgentZoom&TicketID=148310|s20120622.180]] . 2012-06-22 (iCM): added to wiki . 2012-06-22 (A): I'll take this and appoint CM . 2012-06-22 (A): Intermediate Ruling #1 . 2012-06-22 (C): called (A) by mobile . 2012-06-23 (Critical team): Visit BIT 22.06.2012 Report . 2012-06-23 (C): Visit BIT 22.06.2012 Report Confirmation . 2012-06-25 (Critical team): scheduled visit #2 - Tuesday 2012-06-26 . 2012-06-25 (Support): Blog post [[http://blog.cacert.org/2012/06/568.html|maintenance announcement: Server Downtime 2012-06-26 about 12:00 UTC to 14:00 UTC]] . 2012-06-25 (iCM): should this be appended to a20120622.1? . 2012-06-25 (A): Intermediate Ruling #2 . 2012-06-25 (Critical Admin): note to intermediate ruling #2 . 2012-06-25 (A): response to note given . 2012-06-26 (iCM): response to (Critical Admin) note to intermediate ruling #2 . 2012-06-26 (Critical Admin): response to (iCM) note . 2012-07-03 (C): call to (A) by mobile (16:08): questions regarding disk shreeding == Original Dispute, Discovery (Private Part) == * '''Link to Arbitration case [[Arbitrations/priv/a20120622.1|a20120622.1 (Private Part)]], Access for (CM) + (A) only)''' <> ==== EOT Private Part ==== == Intermediate Ruling #1 == I'll hereby follow the case [[Arbitrations/a20120528.1|a20120528.1]] and I give the following Intermediate Ruling: Intermediate ruling #1 I order that one access engineer and one or two critical adminstrator(s) are allowed to access the BIT facilities to analyse and probably fix the current signer problem. If further authorisation is required according to SP you can call me by mobile +##-####-###.#### The critical team shall prepare an report for later review according to SP procedures Frankfurt/Main, 2012-06-22 == Discovery == * Public Support mailing list * [[https://lists.cacert.org/wws/arc/cacert-support/2012-06/msg00022.html|Invalid Certificates (Eric)]] * [[https://lists.cacert.org/wws/arc/cacert-support/2012-06/msg00023.html|Re: Invalid Certificates (Werner)]] * [[https://lists.cacert.org/wws/arc/cacert-support/2012-06/msg00024.html|(website form email): Not-Yet-Valid certificates? (Brian)]] * [[https://lists.cacert.org/wws/arc/cacert-support/2012-06/msg00025.html|Re: (website form email): Not-Yet-Valid certificates? (Marek)]] * [[https://lists.cacert.org/wws/arc/cacert-support/2012-06/msg00026.html|Re: (website form email): Not-Yet-Valid certificates? (David)]] * [[https://lists.cacert.org/wws/arc/cacert-support/2012-06/msg00027.html|Re: (s20120625.460) (website form email): Not-Yet-Valid certificates? (Werner)]] * [[https://lists.cacert.org/wws/arc/cacert-support/2012-06/msg00028.html|Re: (s20120625.460) (website form email): Not-Yet-Valid certificates? (Werner)]] * (Support) Blog post [[http://blog.cacert.org/2012/06/567.html|Problems with signing certificates (resolved)]] . "The signer was down from June 21 02:00 UTC to June 22 23:00 UTC." * 2012-06-22 (C): called (A) by mobile (all before the 1st visit) * Discussion wether the team of 1 AE + 1 Critical Admin is allowed to access the critical system (Signer) * A second Critical Admin is not available * (A): SP 1.2 Principles: dual control, four eyes, redundancy, escrow, logging, separation of concerns, Audit, Authority . confirmation to keep the 4 eyes principle intact, 1 Access Engineer, 1 Critical Admin, Access Engineer goes into role of oversight * Round #2 via (Support) * (A): Whats about previous (Critical Admin) who moved to (AE) Stefan Kooman? * (C): Stefan isn't available too * 2012-06-23 Visit BIT 22.06.2012 Report by (Critical Admin) * [[https://lists.cacert.org/wws/arc/cacert-systemlog/2012-06/msg00008.html|2012-06-22 Emergency Visit BIT report]] * Persons: * Mendel Mobach (CAcert)(Critical Admin) * Bas van den Dikkenberg (Oophaga)(Access Engineer) * 2012-06-23 Visit BIT 22.06.2012 Report Confirmation by (C) * [[https://lists.cacert.org/wws/arc/cacert-systemlog/2012-06/msg00009.html|2012-06-22 Emergency Visit Report confirmation]] * NB Recommendation to CritSys team to replace failing disk * 2012-06-25 (Critical team): scheduled visit #2 - Tuesday 2012-06-26 {{{ A visit to BIT by Bas van den Dikkenberg (Oophaga), Mendel Mobach (CAcert) and Wytze van der Raay (CAcert) has been scheduled for 26 June 2012 at 14:00 CEST. The purpose of the visit is to replace a broken disk in the signing server, and correct the time on the signing server. Due to this work the signing server will be unavailable for approximately two hours. }}} * 2012-06-25 (Support): [[http://blog.cacert.org/2012/06/568.html|maintenance announcement]] . "Server Downtime 2012-06-26 about 12:00 UTC to 14:00 UTC" == Intermediate ruling #2 == A team from Access Engineers and Critical team is allowed to visit BIT again as a follow-up of the Friday, 2012-06-22 visit to replace the broken disk as proposed by Bas van Dikkenberg in the confirmation email dated 2012-06-23 of the last visit report by Mendel dated 2012-06-23 The critical team shall prepare a report for later review according to SP procedures also of this visit. Frankfurt/Main, 2012-06-25 == Discovery == * 2012-06-25 (Critical Admin): note to ruling #2 {{{ we probably don't need arbitration as this is a normal visit with only normal fixes and normal maintaince (with impact for users, but oke). }}} * 2012-06-25 (A): response to note given {{{ By default, you're probably correct As this visit is a follow-up visit of the last weeks visit that has been moved into the disputes queue one of the tasks is a replacement of a defective disk As we have a running case open, all unforseen actions can be handled under this current case until the initial problem is solved and the running arbitration case is closed. So the upcoming visit can also be seen as an evidence gathering process for the previous visit. The requested report follows SP 2.3.3. Access Logging that includes "reporting to all" probably related SP areas that are covered by current case: SP 2.2.3.3 Retirement SP 2.3.2. Access Profiles SP 5.4. Investigation SP 5.6. Report }}} * 2012-06-26 (iCM): response to (Critical Admin) note to intermediate ruling #2 {{{ IMHO it's probably only relevant in that it follows up Bas' request to replace the disk from his confirmation of the earlier visit - as that was logged in the running case, this provides "completion" to that case. }}} * 2012-06-26 (Critical Admin): response to (iCM) note {{{ My motivation for copying the visit announcement to support@cacert.org was to make them aware of the actions taking in response to their reports of failing user certs and failing OCSP responses, and of course the expected outage of the server during the maintenance activity. }}} * Potentialy affected Policies and Manuals * [[https://svn.cacert.org/CAcert/Policies/SecurityPolicy.html|Security Policy]] * SP 1.2 Principles: dual control, four eyes, redundancy, escrow, logging, separation of concerns, Audit, Authority * SP 2.3.3. Access Logging (this includes "reporting to all") * SP 2.2.3.3 Retirement (Storage media) {{{ Storage media that is exposed to critical data and is to be retired from service shall be destroyed or otherwise secured. The following steps are to be taken: The media is securely destroyed, or the media is securely erased, and stored securely. Records of secure erasure and method of final disposal shall be tracked in the asset inventory. Where critical data is involved, two Systems Administrators must sign-off on each step. }}} * Question that araises: How to handle broken disks? * One answer is probably given under [[Arbitrations/a20090301.1|a20090301.1]] * The answer is given under [[SystemAdministration/Procedures/DriveRetirement]] * SP 2.3.2. Access Profiles {{{ According to the Security Manual 2.3.2 updates to the signer may require the presence of two critical system administrators. }}} * see also Arbitration case [[Arbitrations/a20090810.4|a20090810.4]] * more details under [[SystemAdministration/Procedures/DriveRetirement]] * SP 5.4. Investigation * SP 5.6. Report * SP 6. DISASTER RECOVERY * [[http://www.cacert.org/policy/CertificationPracticeStatement.php#p5.7|Certification Practice Statement p 5.7]] {{{ 5.7. Compromise and disaster recovery Refer to Security Policy 5, 6 (COD8). (Refer to §1.4 for limitations to service.) }}} * SP 5. INCIDENT RESPONSE * SP 6. DISASTER RECOVERY * [[SecurityManual]] * 6.2. Recovery Times {{{ DisasterRecovery sets the recovery time for revocation services at 27 hours. }}} * Interruption by Signer affects OCSP responder service * [[DisasterRecovery]] * [[DisasterRecovery#2._Standard_Process_Times]] * [[DisasterRecovery#3._Recovery_Time_Objectives]] * System Procedures * List of System procedures: [[SystemAdministration/Procedures]] and [[SystemAdministration]] * [[SystemAdministration/Procedures/DriveRetirement]] * [[SystemAdministration/Procedures/OcspResponder]] * Systems * [[SystemAdministration/Systems/Signer|System: Signer]] * 2012-07-03 (C): call to (A) by mobile (16:08): questions regarding disk shreeding {{{ Dear CC-party, Regarding arbitration case a20120622.1 https://wiki.cacert.org/Arbitrations/a20120622.1 Today I've received a call from Bas by mobile at 2012-07-03 16:08 that relates to the running case. For documentation purposes I'll document this under this case: Bas, Wytze and Mendel are staying at BIT Ede datacenter. (visit re-scheduled for 2012-07-03 14:00 CEST) The purpose of the visit is to execute the follow-up actions defined after the previous site visit of 26 June 2012: * Retire the old (somewhat broken) system drive of the siging server. following the "Suggested simplified procedure" as described in: https://wiki.cacert.org/SystemAdministration/Procedures/DriveRetirement Part 0 -- Zero -- CAcert Systems Administrator has been executed. The question now is, if its allowed that Wytze can take the disk at home to execute Part 1 -- Shred -- CAcert Systems Administrator as the process is a long time consuming phase (A): Is it possible to do the process in the datacenter? (C): Yes it is. (A): references to https://wiki.cacert.org/Arbitrations/a20090301.1 where the procedure has been changed caused by lack of hardware to proceed so the current answer is to proceed it in the datacenter --End of call-- === Discussion [on] === While now reading the procedure defined under https://wiki.cacert.org/SystemAdministration/Procedures/DriveRetirement there is a slight difference in the procedures and the procedures itself opens some questions: In the visit announcement the "Suggested simplified procedure" has been referenced (later more) but there also exist a main procedure references the 2-step procedure: Part 0 + Part 1 One of the possible conflicting points is following line under Part 0: "After completion of this, remove the old drive and take it off-site for Phase 1." What does "off-site" means here ? Take it away from the production system, but keep it in the secure environment (the CAcert rack) ? The Notes under this section opens the question: Notes: * Two CAcert administrators need to be present at the start and the finish, and sign-off on the completed process. * The Machine plus drive need to be in a location with reasonable security. E.g., a secured office location or a populated home location. * if the drive to be shredded contains hard media defects which block writing of certain sectors, the above procedure may not run to completion, and another (physical) method will be required to render the remaining data on the drive inaccessible. This will mean that CAcert Systems Administrators will also need to be present in Oophaga phase 2 below. "Two CAcert administrators need to be present" ok, so the next section The Machine plus drive need to be in a location with reasonable security. E.g., a secured office location or a populated home location. makes no sense, if the disk is take off-site (out of BIT Ede) delivered to one administrator who takes the disk at home, to start the shred process ... or both admins have to travel to one admins house so the requirement: "Two CAcert administrators need to be present at the start and the finish, and sign-off on the completed process." makes only sense, if the disk will be kept in the rack, the rack opened by one Access Engineer. The two CAcert administrators starting the shred process, the Access Engineer closes the rack and all parties scheduling a revisit There is one more topic, this is Notes part 2: "if the drive to be shredded contains hard media defects which block writing of certain sectors, the above procedure may not run to completion, and another (physical) method will be required to render the remaining data on the drive inaccessible. This will mean that CAcert Systems Administrators will also need to be present in Oophaga phase 2 below. " With the analyze and knowledge from the previous visits, we know, that we have a defective disk with potential defective blocks that can cause the problems described under Notes part 2 ...... Then we have the scheduled visit announcement for the "simplified procedure" Suggested simplified procedure (not agreed as yet) step 1: requirement: two CAcert System Administrators present "After completion of this, remove the old drive and take it off-site." here probably off-site means out of BIT Ede datacenter? Is this correct? So one main question that araises in comparing both procedures is: why there is that a big difference in step 1 of procedure 1 in relation to step 2 in procedure 2 ? proc 1, step 1 requires 2 admins in a secured location, probably access controlled by an Access Engineer in contrast proc 2, step 2 doesn't have these requirements So what is the reason that allows a reduced security control in procedure 2 step 2 in relation to proc 1 step 1 ? Is the "zero the data" procedure on the disk enough that the bits and bytes can no longer be recovered so the outcome is to lower the security requirements? that allows one admin to take the disk away from the secured location, to take it at home, to pass step 2 of the suggested procedure ? Why then there are 2 more steps - a shredding and physical destruction step required for such a disk ? }}} == Ruling == == Execution == == Similiar Cases == ||[[Arbitrations/a20090301.1|a20090301.1]] || [[Arbitrations/a20090301.1|CAcert disk destruction procedure has changed compared to the CAcert Board decision]] || || [[Arbitrations/a20090627.1|a20090627.1]] || [[Arbitrations/a20090627.1|Emergency access to CAcert critical systems]] || || [[Arbitrations/a20090810.4|a20090810.4]] || [[Arbitrations/a20090810.4|Emergency access to CAcert critical systems]] || || [[Arbitrations/a20120528.1|a20120528.1]] || [[Arbitrations/a20120528.1|Emergency Dispute: Access to server due to signer problem ]] || ---- . CategoryArbitration . CategoryArbCaseSystemTasks