The need for location data from the production database for the purpose of developing Ad hoc SQL queries that are geo-specific to CAcert members for internal business operations.  Specifically the country, region, and location tables of the database.  The extraction of this data contains no member sensitive information.  Please consider this request as an Arbitration, according to Security Policy 3.3 [5].

The SQL queries would be:

To determine the size/# records in each table:
----
SELECT count(*) FROM countries;
SELECT count(*) FROM regions;
SELECT count(*) FROM locations;
----
Of concern might be the size of location table, which could be broken down into manageable chunks using "BETWEEN" (see below.)

To export the data to simple tab delimited files:
----
SELECT * INTO OUTFILE '/tmp/countries.txt'
  FIELDS TERMINATED BY '\t'
  OPTIONALLY ENCLOSED BY '"'
  LINES TERMINATED BY '\n'
FROM countries;

SELECT * INTO OUTFILE '/tmp/regions.txt'
  FIELDS TERMINATED BY '\t'
  OPTIONALLY ENCLOSED BY '"'
  LINES TERMINATED BY '\n'
FROM regions;

SELECT * INTO OUTFILE '/tmp/locations.txt'
  FIELDS TERMINATED BY '\t'
  OPTIONALLY ENCLOSED BY '"'
  LINES TERMINATED BY '\n'
FROM locations;

The chunk version:
SELECT * INTO OUTFILE '/tmp/locations.txt'
  FIELDS TERMINATED BY '\t'
  OPTIONALLY ENCLOSED BY '"'
  LINES TERMINATED BY '\n'
FROM locations
WHERE id BETWEEN 0 AND 50000;
----

Before: Arbitrator UlrichSchroeter (A), Respondent: CAcert (R), Claimant: Greg Stark (C), Case: a20090427.2

History Log

Discovery

I have a few questions.
First of all some background infos.

I've got a testserver image from PG back in 2009
with tables data filled for countries, regions, locations.
I've did a test export with the query proposed by Greg:

------
SELECT * INTO OUTFILE '/tmp/countries.txt'
  FIELDS TERMINATED BY '\t'
  OPTIONALLY ENCLOSED BY '"'
  LINES TERMINATED BY '\n'
FROM countries;
------

The proposed table sizes are:
db          reccount  size [bytes]  size
countries   248       4583          4 Kb
regions     4579      103101        103 Kb
locations   2200456   101137195     101 Mb


QUESTION 1
==========

@(C):
All tables includes an account field with the count of
users for each area (city, region, country)
Is this needed in the export ?
or can the export reset these values to 0 ?


QUESTION 2
==========

@Sysadmin Teamleader:
About the OUTFILE option:
the testserver image i've got from PG includes a MySQL 5 db
As I know, the live system has a MySQL 4 database. Is this correct? I've did a test export on a MySQL 4 database. And the OUTFILE parameter works also.
------------------------------------------------------------------------
drive-path>mysql --user=root --host=localhost --password=xxxxxx Welcome to the MySQL monitor.  Commands end with ; or \g. Your MySQL connection id is 375004 to server version: 4.0.20a-nt

Type 'help;' or '\h' for help. Type '\c' to clear the buffer.

mysql> select * into outfile '/forument.txt'
    ->  fields terminated by '\t'
    ->  optionally enclosed by '"'
    ->  lines terminated by '\n'
    -> from bbs.forum_entries;
Query OK, 11 rows affected (0.02 sec)
------------------------------------------------------------------------
This is a similiar test query against a MySQL 4 database and i've got no error executing above query. So where is the problem using the OUTFILE option ?


QUESTION 3
==========

@Sysadmin Teamleader:
If the export is made, where can the export be placed for download? Is it possible to transfer the export to the svn server? I'll propose to create a subdirectory 
https://svn.cacert.org/CAcert/testsystem
with a subdirectory locationsdb
to place the exports into this directory.
as SP states in 3. LOGICAL SECURITY, 3.1.1.1. External connectivity 
and 3.1.1.2. Internal connectivity restricted access



This is no execution request and no ruling.
This is only to investigate the environment
and possible solutions for this arbitration request.

Actualy I found no exceptions to not allow the export
of the locations data (except the count users column).
All other data informations can be compiled from
several databases around the world.
The cities geo locations informations doesn't fall
under personal related data. There is no link to
a user account in these requested tables.
So further investigations to find a solution for doing
the export is still open.

What I didn't yet considered is the license model
under which the locations informations are handled.
This needs some further investigations ...

You agree to a non-exclusive non-restrictive non-revokable transfer of Licence to CAcert for your contributions. That is, if you post an idea or comment on a CAcert forum, or email it to other Members, your work can be used freely by the Community for CAcert purposes, including placing under CAcert's licences for wider publication.

You retain authorship rights, and the rights to also transfer non-exclusive rights to other parties. That is, you can still use your ideas and contributions outside the Community.

Note that the following exceptions override this clause:

   1. Contributions to controlled documents are subject to Policy on Policy ("PoP" => COD1)
   2. Source code is subject to an open source licence regime. 

> @Sysadmin:
> About the OUTFILE option:
> the testserver image i've got from PG includes a MySQL 5 db
> As I know, the live system has a MySQL 4 database. Is this correct?

No, the live system also runs with MySQL 5, more specifically:
    mysql-server-5 5.0.32-7etch11  (debian package name/version)

> I've did a test export on a MySQL 4 database. And the
> OUTFILE parameter works also.

As I reported on May 5, 2009 to (C):

> I've had a quick look at the queries. Is the formatting "SELECT * INTO
> OUTFILE" etc. required for you, or would a formatted table dump do
> just as well? Issue with the SELECT INTO OUTFILE is that you need the
> mysql FILE privilege to do so, and user cacert does not have that
> (nor should it).

So this is not an issue with the MySQL version, it is just that user cacert
(the one we use to access the database) does not have the required privilege.
Here is the error message from the live system:

  mysql> SELECT * INTO OUTFILE '/tmp/countries.txt'
    ->   FIELDS TERMINATED BY '\t'
    ->   OPTIONALLY ENCLOSED BY '"'
    ->   LINES TERMINATED BY '\n'
    -> FROM countries;
  ERROR 1045 (28000): Access denied for user 'cacert'@'localhost' (using
password: YES)

My MySQL knowledge is very limited; I've only noticed that dropping the
"INTO OUTFILE '/tmp/something'" part from the query is not valid syntax
anymore according to the mysql utility:

  mysql> SELECT *
    ->  FIELDS TERMINATED BY '\t'
    ->   OPTIONALLY ENCLOSED BY '"'
    ->   LINES TERMINATED BY '\n'
    -> FROM countries;
  ERROR 1064 (42000): You have an error in your SQL syntax; check the manual
that corresponds to your MySQL server version for the right syntax to use near
'FIELDS TERMINATED BY '\t'
  OPTIONALLY ENCLOSED BY '"'
  LINES TERMINATED BY '\n' at line 2

> ...
> So where is the problem using the OUTFILE option ?

I think I've just explained that adequately above here.


> QUESTION 3
> ==========
> 
> @Sysadmin:
> If the export is made, where can the export be placed for
> download?
> Is it possible to transfer the export to the svn server?

Are you talking about a regular export (say daily or weekly)?
That would have to be automatic, and it's not so easy to do; the webdb server
has (on purpose) no possibility to routinely copy files to the outside.
If it is just for a one-time export, it's no problem, that can be done
manually of course.

> I'll propose to create a subdirectory 
> https://svn.cacert.org/CAcert/testsystem
> with a subdirectory locationsdb
> to place the exports into this directory.
> as SP states in 3. LOGICAL SECURITY, 3.1.1.1. External connectivity 
> and 3.1.1.2. Internal connectivity restricted access

Ah, I see what you mean, you are looking for an "internal" channel from
webdb server to svn server. There really aren't such channels in my opinion,
expecially as the CAcert infrastructure services (including svn) are likely
to be housed elsewhere soon, at a completely different physical location
than the critical services.

However, given the essential public nature of the data in case, I can well
imagine two other solutions:
(a) a reverse solution: maintain the master version of these tables on some
    CAcert infrastructure system (eg svn), and have the webdb server pull in
    this data on a regular basis to update its mysql copy. Some scripts would
    need to be written for that, but it would fit in with the current access
    restrictions setup;
(b) a simple download solution: the webdb server could make the data available
    via its web interface (over http or https), and thus allow some script on
    the svn server to pull in new versions on a regular basis.
If the data is not considered to be fully public, it probably shouldn't go
into svn either, should it? But for data protection (b) above could be
enhanced to allow the retrieval only by a specific IP or certificate.

[...]

The remarks from PG and Iang which I read on the wiki page
for this arbitration all seem to make a lot of sense. The only thing that
raised an eyebrow with me was PG's remark that some users may have
contributed data to this database with an express intent that it would only
be used for CAcert and for nothing else. That would be hard to reconcile with
publishing the data, but I guess you'll have to ask for explicit evidence of
this situation, it could be obsolete hearsay, I have no idea ...

Question #1
The count of users for each area (city, region, country) is not necessary.

Question #2 & #3
OUTFILE on the production system is not an option due to the nature of the configuration.  No way to get the file once it is created.  A formatted screen dump would suffice.  The locations file was going to be the problem.

<? /*
    Copyright (C) 2010  CAcert Inc.
*/

error_reporting(E_ALL);

require_once("../includes/mysql.php");

$output1 = "/tmp/a2009042702.countries.txt";
$output2 = "/tmp/a2009042702.regions.txt";
$output3 = "/tmp/a2009042702.locations.txt";

if(!($fp1 = fopen($output1,'w'))) return;
if(!($fp2 = fopen($output2,'w'))) return;
if(!($fp3 = fopen($output3,'w'))) return;

//                id,name,account (-> 0)
$query1 = "SELECT * FROM countries";
//                id,ccid,name,account (-> 0)
$query2 = "SELECT * FROM regions";
//                id,regid,ccid,name,lat,long,account (-> 0)
$query3 = "SELECT * FROM locations";

// QUERY 1
function dump_table($query, $fp, $fclmn)
{
  $res = mysql_query($query);
  $nbFields = mysql_num_fields($res);
  while ($row = mysql_fetch_array($res, MYSQL_NUM)) 
  {
    for($i=0; $i < $nbFields; $i++)
    {
      // fwrite($fp,$row[$i]);
      if($i != $nbFields-1) {
        if ($i == $fclmn) {
          fwrite($fp,'"'.$row[$i].'"');
        } else {
          fwrite($fp,$row[$i]);
        }
        fwrite($fp,"\t");
      } else {
        fwrite($fp,"0");
        fwrite($fp,"\n");
      }
    }
  }
  mysql_free_result($res);
}

dump_table($query1, $fp1, 1);
dump_table($query2, $fp2, 2);
dump_table($query3, $fp3, 3);

fclose($fp1);
fclose($fp2);
fclose($fp3);

?>

around 14 min. (!) runtime on testserver results in output:
/tmp/a2009042702.countries.txt
1       "Afghanistan"   1

/tmp/a2009042702.regions.txt
1       7       "Barbuda"       0

/tmp/a2009042702.locations.txt
1       2527    169     "A"     62.983  9.750   0

> attached version c of the script.
> The output goes to directory /tmp
> 'cause this is writable.

Actually any directory that is writable for the id executing
the script will do fine. But /tmp is guaranteed to be fine :-)

> I've modified the script this way, that
> it simulates the output of the "OUTFILE"
> parameter from the proposal.
> 
> This version is tested on my local testserver
> system.
> The runtime of the script was around 14 min. (!)
> on my testserver and results in output rows:
> ...
> @Sysadmins teamleader:
> 
> This is not a ruling, only a proposal and
> a question to you:
> 
> what is your suggestion about the script runtime
> of about 14-15 min  ?

It's long but quite feasible, certainly for a one-off.
I expect it to run a little faster than 15 minutes on the
production system.

> Is it ok, to start this script on the life system ?

Yes, no problem.

> or would you suggest, that this runtime
> is a burden on the production system ?

No, only if we would be running this very frequently,
but I understand that that is not the case.

> My intention for the ruling is, to let the
> script running once, not on a recuring schedule.

Fine.

> So your thoughts how a regular transfer can be
> implemented is oversized for this solution.
> A onetime transfer to a storageplace that
> can be secured for requestors will fit
> like the src directory on SVN

Please make sure to point out in your ruling whether the results
may be published to a generally visible place somewhere in the
CAcert infrastructure or that they should be kept private to
arbitrator and any others you see fit to receive the data.

> Means, if a user requests access to this
> image and has no permissions to
> this directory, the SVN admin has
> to add the new users account to the
> SVN and have to forward a notification
> to PG.
> 
> 1. Is this a procedure that can be
>    established ?

Yes it could be established. I think a special issue category in the bug
tracker for such requests could be a good idea though.

> 2. the file size of about 735 Mb
>    can this handled thru the SVN ?
>    or exists any resource limitations
>    i.e. bandwith, disk space ?

SVN is no ideal for storing binary files but could be used for it. We have 30G
of available disk space on the svn machine. So one revision with the bz2-file
should be no problem. If it changes from time to time you should be aware that
SVN does not handle binaries (especially files with lots of changes) very
efficiently.

If there are no plans to change the file very often I think it would be better
to create a download area served by Apache and protected by certificate or
password authentication.

Ulrich,
I am very pleased that you are making this effort and I appreciate it.

To answer your questions:
  Yes, it meets the need to have access to such data.

  As a request to perform an Ad hoc SQL query; No.

Say that I develop a query to extract monthly numbers on usage and demographics for COMA, the Board, for the marketing and events group, or an AGM report.  Whose queries are trusted?  How does the query get validated so the sysadmins feel comfortable and willing to run it?  How is the data going to get off the production system?  Here is the fuzzy area of this Arbitration.

Around the time I requested this arbitration, we were trying to get an email sent to assurers whose ability to assure had arbitrarily been turned off by Philipp Guehring when he installed the CATS patch without notice to anyone. Hence later, you got your "Event Announcement Email script".

So be careful about the decision in this arbitration.  I feel the ruling needs to be, "Ok,  run the these queries".  This supports the security policy and our CAcert management's ability to extract useful metrics about the project, and to send email to users and assurers. To say no is to set the precedence to avoid running any kind of query.  **That you identify the steps of how this happens is also of great importance so that future Ad hoc SQL query requests can be run, and do not have to wait a year. 

Everyone has looked these queries over, and agrees they do not extract personal data that puts CAcert in jeopardy by the Belgium DPA. That this data comes via a console session saved to file, is fine.  It may take time, it won't crash the system.  It can also be run at a low use time.  You need to dispel the fear of doing this kind of thing.

Regards,
Greg

Discussion

The questions that araises in this Ad hoc query request
are multiple folded.

1. Questions about the locations data:
   a) Is the query results data critical or non-critical data ?
   b) Are the results of the query under a CAcert license model ?
   c) Are there any reasons why data cannot be published ?
   d) Does this data relates to SP 9.4. Outsourcing

From further investigations, I received the answers from
the critical sysadmin team, statements from Board members,
from developers and from the Software Assessment team.
This araises addtl. questions:

2. Ad hoc vs. recuring queries.
   a) What does this mean ?
   b) Is there a difference ?
   c) How does recuring queries effects critical sysadmin team ?
   d) How does recuring queries effects Software Assessment team ?
   e) How does recuring queries effects the CAcert board ?

3. Statistical data:
   a) What does this mean ?
   b) How is this effected by SP / SM and other policies ?
   c) How can the results be transfered to the recipients ?

4. Publishing of Ad hoc query results
   a) Is there a simple answer to a complicate qustion possible ?
   b) can some data be published freely ?
   c) or needs all Ad hoc query results to be private ?

5. Usage of Locations Data


=== Question 1.a) ===
From the requested tables, the tables content has no
privacy implications and no part of it is personally
identifiable. So it can be declared as non-critical data.
Even though for historic reasons that data is currently
housed in the core systems under auspices of the Critical
Systems Team. My core statement would be that:
Just because data is currently part of the critical systems,
that does not automatically mean it is critical data.
There is data in there that has no need to be kept private.
The Locations data isn't be really under the regime of
Security Policy. And such data can be released.

=== Question 1.b) ===
- Existing CAcert license models doesn't cover the
  locations database data.
- Exported data is no contribution to a controlled document.
- Exported data doesn't fall under the source code subject
  to an open source licence regime.

I see the CCA 1.3 "Your Contributions"
as the license model, that comes in place, if no other
license model relates to the case in question as a fallback.
If no contributor can be identified, so the contributor
has to be set to CAcert.

One additional question is under a historical view:
What is with contributions before CCA comes into effect ?

Here I come to the conclusion, that they also covered
under CCA 1.3 "Your Contributions" license
with two exception.

Exception 1:
One, that is handled by another
arbitration (a20090913.1): named "privacy info"
The contributors "privacy info"
has to be anonymized on request as long as its not
covered by the Privacy Policy.

Exception 2:
The 2nd exception is a case with a named contributor
with a restricted license form, that isn't covered
by the CCA 1.3 license schema.
If one have made contributions before CCA comes into
effect, the license may be limited if a named
contributor did so.

=== Question 1.c) ===
PG makes a claim, that on contributor with some of the
data we have is for CAcert only, and we are not allowed
to hand it out to anyone else.
As the data cannot be extracted, this claim relates to
all or nothing of the data set.
No chance to extract or split it in any way to become
a set of non-transferable and a set of transferable data.

By a review of the used license model for this
contribution, this contradicts with the license model
that is published by the CCA 1.3 "Your Contribution"
model, but falls under exception 2 - contribution
was made before CCA comes into effect.
The name of the contributor is known to the arbitrator.

See also https://wiki.cacert.org/Software/Database/LocationDatabase

=== Question 1.d) ===
The answer that were found in 1.a), that the locations
data isn't really under the regime of Security Policy
means, that also SP 9.4 hasn't been covered and
the release of the data needs to be handled by an
arbitrators ruling.

=== Question 2) ===
An "ad hoc" query releases limited data for limited purposes.
This isn't the same thing as permanent publication of a font
of data. One would think that the "ad hoc" query releases the
data to a member who will then be expected to look after it;
including not further publishing it:

2.1
a20090810.3   User requests a list of people who have more
              than 150 points 
a20090902.1   request list of OA 


Other samples includes the publishing of data for
i.e. work on a new policy work

2.2
a20091221.1   Adhoc SQL-query about U18 cases

An "Ad hoc" script that falls under the recuring schema
is the Events team leader request:

2.3
a20090525.1   Event officer request recurrent notification
              to assurers near the location of the
              following ATEs
Here there is no publishing of data. Its a automated
mailing to recipients, that are unknown to the Events team
leader and also to the sysadmins. The only result of the
script is the count of recipients, that can be seen as
a statistical data.

The difference between the "Ad hoc" queries listed under
2.1 and the scripted mailing under 2.3 is, that the
scripts under 2.3 is a subset of a query, that has
a limited purpose, that has a defined input of
a few variable, but known parameters:
- central location
- max distance to the location
- subject text for the mailing
- text that is used in the mailing
and a defined output. This has been reviewed by an
arbitrator and has been ruled.
Each of the "Ad hoc" queries listed under 2.1
has also been reviewed by an arbitrator and
the execution has been ruled by the arbitrator.

The exception of 2.2 query was, that the result
needs to be published. This was also handled by
an individual arbitration case and was ruled
by an arbitrator.

Conclusion:
Each "Ad hoc" query, also each recuring query
needs a definition of a query, the expected
result, the recipient(s) of the result set,
and if this result set can be published or not.
If there is a need of a recuring execution,
this can be defined by request (yet arbitrated
Ad hoc query) or in the initial request.
This only needs one arbitration
(see Events mailing case) so futher executions
can relate to the one handled case.

As a result of this.
This arbitration can rule for the request
for the location data. But this arbitration
cannot rule for the results of future
developed queries, that are not known yet.

If i.e. the marketing team needs recuring
statistical data about user growths in
locations worldwide over a timeframe,
they have to define their query with a request,
to start this script on a recuring schedule.
If this data can be published or not, depends
on the query that has been requested and needs
to be ruled by an addtl. arbitration case.


=== Question 3) ===
The nature of Statistical data is,
that many of them can be published freely -
see www.cacert.org
      + About CAcert.org
        + CAcert Statistics
http://www.cacert.org/stats.php

This covers the count of users, assurers,
and more detailed data like
count of assurer candidates,
count of assurers with test
and so on.
Also the growth over the past 12 months
and the past years can be displayed
freely.
Other statistical data cannot be accessed
so easily like the "hidden" stats page:
https://secure.cacert.org/wot.php?id=1
that lists the count of assurers by county,
by region, or by city
This "hidden stats" page relates to the
"find an assurer" database, and has
restricted access permissions to
the loggedin users, means community members.

The difference between both statistical
data relates to the Privacy Policy:
8. Privacy of user data
CAcert Assurers can see the name, birthday and
the number of points by looking up the correct
email address. No other person related data is
published by CAcert.

As the "find an assurer" results page
displays names of assurers in a form
of half anonymized names, to all community
members, this is an addtl. not yet covered
set of data from within the Privacy Policy.
The "find an assurer" results page lists:
givenname, 1st char lastname, the count of
points an assurer can give and a
free text field, the assurer can fill in.

So therefor the "hidden stats" page cannot
be published freely.

Collecting statistical data, how many
assurers are listed in each country, region,
city without further link to the assurers list
contains no privacy related data and therefor
can be published in a way, that data can
be used CAcert internaly for develop
seeding programs for CAcert deserts.
But this isn't covered by this arbitration
and needs a review in another Ad hoc query
request decision.

=== Question 4) ===
The Publishing of Ad hoc query results
 a. Is there a simple answer to a complicate qustion possible ?
    Yes.
    Each query result needs to be checked against
    privacy issues and SP thru an arbitration.

    Sample 1:
    the current query request to export locations data
    is a resultset with non-privacy data. So therefor, this
    result can be published (by default), but needs to be reviewed
    by an Arbitrator, if a result set doesn't realy doesn't
    contain any privacy data (i.e. misused fields in data set)
    Someone can add his Email address and postal address
    in the description field of "Find an Assurer" dataset.

    Sample 2:
    A query request of potential assurers for an ATE, to
    transfer the Email addresses to the Event Officer will
    probably contain personal data (the Names and Email addresses).
    So therefor thus needs to be arbitrated (see [[Arbitrations/a20090525.1|a20090525.1]]
    [[Arbitrations/a20090525.1|Event officer request recurrent notification to assurers near the location of the following ATEs]])
    The workaround to prevent transfering the result set was, that the
    scripted mailing doesn't transfer the complete result set to
    the Events Officer, instead the mailing was triggered by
    the critical sysadmin team an the Events Officer only receives
    the count of mails sent to potential Assurers.
    So there is no publishing of sensitive data.
 
   b) can some data be published freely ?
    Statistical data can be published within CAcert by default
    if the statistical data doesn't contain privacy data.
    I.e. in the prepare phase of the Policy on Junior Assurers / Members
    the question araises, how many members started membership
    underaged. The count of users was something unexpected
    so the real need for this subpolicy was emphasized by the result.
    By default, results w/o privacy data can be published
    within CAcert w/o problems.
    Someone can argue, that a query about Delete Account requests
    may harm CAcert, so freely publishing shouldn't be allowed.
    So thats why a precedence ruling cannot be taken.
    Each query needs an individual arbitration.
   
   c) or needs all Ad hoc query results to be private ?
    Privacy is a limitation. Also may other questions better
    be for CAcert internal usage only. But a general
    limitation, that query results needs to be private
    cannot be set.
    Again, each query needs a review by an Arbitrator,
    who have to decide, if the query can be executed
    as expected and how the result set can be used. 

=== Question 5) ===
The usage of Locations Data was limited by the contributor
for CAcert usage only. Thus means, there is an unclear state,
if the locations database can be used on another CAcert
service (i.e. new Software, other infrastructure server).
It can be used for CAcert developers in testserver
images. But the usage is limited to CAcert internal
usage only, to write and test patches or new features for
the current webdb system.

From all the informations received, parts of the
locations database maybe OpenSource. But as long there
is no defined state here, and the data is mixed
with probably limited data, the overall state
is "limited".

How to transfer the locations database result set?
The locations database is about 100-120 Mbytes in size.
Thats half of the size of the complete webdb system.
It comes to my knowledge that testserver images
flying around with hidden links
(PG has a testserver image, the new Software-Assessment
project has a testserver image).
On further investigation, the result is, that the
testserver image by PG has the locations database included.
The new testserver image built by Wytze within the
Software-Assessment-project doesn't contain the locations
database by default.
So one question that araises: is there a need to
export the locations database from the live system?
or is it sufficient to use the existing testserver
image from PG with the locations database included?

With this limitations set, its difficult to handle this in
practice. How can it be archived, that only CAcert developers
get a testserver image with the locations database included?
How can we prevent developers to use the locations data
in another non-CAcert project?

These questions can only answered with:
limit the access to the testserver image
or locations database export result set.
Find a place, where the image / export can be stored.
Give access permissions only to users by request -
Next question: Who can give access?
 * Support ?
 * Critical Sysadmins ?
 * Infrastructure-admins ?
 * or Software-Assessors ?
Also an option: access with client cert only.
Inform the requestors about the limitations that
are set to the locations database.
    
Restrictions / Limitations found so far:
 * there is a restriction / limitation set on the locations database dataset
 * recuring exports aren't preferable, as each execution needs a lot of work from sysadmins to transfer the result set onto another system where it can be downloaded
 * svn isn't the preferable target for a testserver image with the locations database included or the locations database export
 * download area for the locations database dataset should be under CAcert control
 * the method of control is open to system requirements and by one of the CAcert group members
  * this can be: ACL permissions, Group permissions, Client Certs to the download area or a hidden link
  * dependend of the download area system, access permission can be given by Critical Sysadmins, Infrastructure Admins and/or the Software-Assessment team

Discovery

Ruling

I have gathered and reviewed evidence concerning the origins of the locations data to the extent reasonable and plausible. The evidence does not reveal the original source of the data, and no obvious nor easy stones remain to be unturned. Therefore I conclude this information has been lost to us.

The alleged claim of external claims over the locations data fall outside the CCA contributions clause, which came later, and outside the IPR transfer by Duane dated 2004. They cannot therefore be verified by this Arbitration.

A claim was made that one contributor provided some of the data on the basis that it is for CAcert only, and there is no licence to share it with anyone else.

The licences of typical geodata companies examined in the course of this Arbitration did match the general sense of an individual licence, and not an open licence. However, as the evidence pointed at no particular company, none of those licences is more than illustrative of what might have been.

I conclude however that the central claim has merit, and CAcert must respect that claim. Further, the claim cannot be converted into something that is not contrary to CAcert's principles and conventional position in the open source world.

But it further implicates that CAcert should prevent further usage of this locations-database-set.

i. So therefor, to prevent CAcert Inc. and CAcert community from any harm in the future that may cause by the undefined source of data, I rule hereby to stop the usage of the locations-database-set asap.

ii. The BirdShack project team made the decision to not use the locations database within the new project. I confirm that decision.

iii. The Community shall start a project ASAP to migrate the location function that is currently implemented in the critical system, with the objective of replacing the locations database completely, by means of either

or

iv. For the intermediate time, until a migration has been completed, the Migration team is allowed to get a copy of the locations database set for analyzing and tests of a migration plan. The Migration team should be under control of the Software-Assessment team.

The Software-Assessment team that is under control of CAcert Inc. is allowed to use a copy of the locations-database-set for the time of the migration phase on their staging system to get software updates passed to the critical system. The staging system has to be under full control of the Software-Assessment team.

v. The original request of a general export of the locations database set is rejected caused by the undefined state of the locations database set and the found restrictions set. But a limited export, that is under control of the Critical Sysadmin team and Software-Assessment team is allowed.

vi. To prevent the same mistake for the future, an undocumented, an undefined state of a database set, the source of a new locations database set must be named and documented.

vii. I further rule that, once the locations database set has been replaced in accordance with the other elements of the ruling, the remaining locations data is no longer to be treated as critical nor privacy related data. This includes the tables country, regions and locations, but not the users record link to the location tables.

From the requested tables, the tables content has no privacy implications and no part of it is personally identifiable. So it can be declared as non-critical data, even though for historic reasons that locations data is currently housed in the core systems under auspices of the Critical Systems Team.

My core statement would be that:

My recommendation to the Community is:

Frankfurt/Main, 2010-08-29

Execution

Similiar Cases

a20090525.1

Event officer request recurrent notification to assurers near the location of the following ATEs

a20091127.1

answers of the cats exam found by google search


Arbitrations/a20090427.2 (last edited 2010-08-29 13:54:37 by UlrichSchroeter)