* Case Number: a20090427.2 * Status: closed * Claimants: Gregory Stark * Respondents: CAcert * Case Manager: Alexander Prinsier * former Case Manager: Alejandro Mery * Arbitrator: UlrichSchroeter * former Arbitrator: Sebastian Kueppers * Date of arbitration start: 2009-05-17 * Date of ruling: 2010-08-29 * Case closed: 2010-08-29 * Complaint: Adhoc SQL query requested {{{ The need for location data from the production database for the purpose of developing Ad hoc SQL queries that are geo-specific to CAcert members for internal business operations. Specifically the country, region, and location tables of the database. The extraction of this data contains no member sensitive information. Please consider this request as an Arbitration, according to Security Policy 3.3 [5]. The SQL queries would be: To determine the size/# records in each table: ---- SELECT count(*) FROM countries; SELECT count(*) FROM regions; SELECT count(*) FROM locations; ---- Of concern might be the size of location table, which could be broken down into manageable chunks using "BETWEEN" (see below.) To export the data to simple tab delimited files: ---- SELECT * INTO OUTFILE '/tmp/countries.txt' FIELDS TERMINATED BY '\t' OPTIONALLY ENCLOSED BY '"' LINES TERMINATED BY '\n' FROM countries; SELECT * INTO OUTFILE '/tmp/regions.txt' FIELDS TERMINATED BY '\t' OPTIONALLY ENCLOSED BY '"' LINES TERMINATED BY '\n' FROM regions; SELECT * INTO OUTFILE '/tmp/locations.txt' FIELDS TERMINATED BY '\t' OPTIONALLY ENCLOSED BY '"' LINES TERMINATED BY '\n' FROM locations; The chunk version: SELECT * INTO OUTFILE '/tmp/locations.txt' FIELDS TERMINATED BY '\t' OPTIONALLY ENCLOSED BY '"' LINES TERMINATED BY '\n' FROM locations WHERE id BETWEEN 0 AND 50000; ---- }}} * Relief: execute sql queries Before: Arbitrator UlrichSchroeter (A), Respondent: CAcert (R), Claimant: Greg Stark (C), Case: a20090427.2 == History Log == . 2009-05-17 (Support): Sebastian Kuppers granted Arbitrator for the case . 2009-12-25 (UlrichSchroeter): arbitrator released this case as of [[Arbitrations/a20091118.4|a20091118.4]] . 2009-12-25 (A): I'll take care about this case . 2009-12-25 (A): requesting informations regarding this case from (C) and former (A) . 2009-12-29 (C): I am still interested in having all my cases processed . 2009-12-29 (AlexanderPrinsier): I'll take care about this case as (CM) . 2010-01-03 (A): sent some queries to (C) and Critical Sysadmin Teamleader . 2010-01-03 (A): sent some interview questions to probably mainteners of the locations database . 2010-01-03 (A): rcvd answers from PD's interview . 2010-01-03 (A): rcvd answers from Iang's interview . 2010-01-03 (A): rcvd answers from PG's interview . 2010-01-03 (A): rcvd answer from Critical Sysadmin Teamleader . 2010-01-05 (A): rcvd answer (to req 2010-01-03) from (C) . 2010-01-07 (CM): wrote php script that executes the queries and puts the output in a file, sent to (A) . 2010-01-09 (A): deployment and testing of alternate php script (similiar to the events mailing script) that simulates an OUTFILE output . 2010-01-09 (A): request to (CM) + Sysadmin teamleader about hints to the results of the deployed script and the test runs . 2010-01-09 (A): rcvd answer from Sysadmin teamleader from req 20100109 . 2010-01-10 (A): req. to PG about his 2nd answer from 20100103 interview to clarify his statement "I was told that some of the data we have is for CAcert only, and we are not allowed to hand it out to anyone else." . 2010-01-11 (A): rcvd answer by PG. answer to be declared private under seal by (A) . 2010-01-13 (A): from PGs offer for a qemu image as a workaround, sent req. to (C) about a statement . 2010-01-13 (A): contacting SVN admin with questions a) is add permissions for user procedure with notification to PG installable ? b) exists any resource limitations ? bandwith? diskspace ? for the optional qemu image solution (filesize: about 735 Mb) ? . 2010-01-13 (A): rcvd answer from SVN-admin . 2010-01-15 (C): sending answer to queries from 2010-01-13 . 2010-01-25 (A): req. for infos about a locations database contributor . 2010-01-25 (A): rcvd answer with infos about contributor . 2010-01-25 (A): sent req. to one of the contributors of locations database . 2010-01-27 (A): rcvd first answer from one contributor of locations database . 2010-01-28 (A): sent further questions to one contributor of locations database . 2010-02-28 (A): Progress report to (C), (CM) that ruling will still retard caused by complexity of this case . 2010-06-18 (CM): requesting progress report from A . 2010-07-01 (A): current state: writing down rationale . 2010-07-01 (A): questions to [[Software#Software-Assessment|Software-Assessment team]] and [[Software/Assessment|Software-Assessment project team]] about testserver images . 2010-07-01 (PG): answered (see below) . 2010-07-01 (A): Dirk, you've downloaded the testserver image from PG. Does this image includes the locations database? . 2010-07-03 (A): forwarded req to Dirk to Dirks private mail addr . 2010-07-03 (Dirk): i just verified, that in the image, which we got from PG the tables contain the data for the location database. i made some quickchecks with german coordinates i know (Velbert, Frankfurt, Leverkusen) and detected them as valid. . 2010-07-06 (Andreas): the current state of the testserver image is located here <...>, see also [[SystemAdministration/Systems/Development|Testserver Images]] . 2010-07-06 (A): interview dirk within the Software-Assessment-projects telco: does the current testserver image from Andreas as described above contains the locations database ? dirk's answer is: No . 2010-07-08 (A): published discussion to arbitration participients (CM), (C), (R), Software-Assessment-Team, Infrastructure-teamleader, Critical sysadmin teamleader and participients in this case who gave answers to some of the questions for comments . 2010-07-29 (A): Discussion - Recomendation "Solution I" sent to (C), Critical Sysadmin team, Software-Assessors, Software-Assessment Project team, developers mailing list, board mailing list, (CM) for comments . 2010-07-29 (C): respond to "Solution I": Yes, this work for me. . 2010-07-29 (IanG): respond to "Solution I": some doubts . 2010-07-29 (PG): respond to "Solution I": one doubt . 2010-07-29 (Wytze): respond to "Solution I": problem with the "limitation", ok with the rest . 2010-07-29 (Mario): respond to "Solution I": proposal for locations database migration . 2010-08-03 (A): infos about IPR transfer to CAcert Inc requested. Email sent to Board, (C), (CM) under seal . 2010-08-24 (Wytze): info about locations database . 2010-08-25 (A): sending req. for info to one of the contributors of locations database and (PG) . 2010-08-25 (PG): answered req from (A) . 2010-08-27 (A): discussion with Ian about IPR and CAcert on IRC, notification of a possible CoI . 2010-08-28 (A): req to (Software-Assessment team), (Critical Team): is there a log on community contributions on locations-database-set available ? Req. for recommendations on a locations-database-set migration plan ? == Discovery == * Has the export been executed in the meanwhile ? * If the answer is no. Do (C) maintain his claim ? Answer is Yes. * [[Software/Database/StructureDefined|Database structure]] * Countries, List of Countries ||Field||Type||Comment|| ||id||int(3)||Primary Key|| ||name||varchar(50)|| country name || ||acount||integer|| how many assurers in this country? || * Locations, List of Cities ||Field||Type||Comment|| ||id||int(7)||Primary Key|| ||regid||int(4)|| city relates to this region || ||ccid||int(3)|| city relates to this country || ||name||varchar(50)|| city name || ||lat||double(6,3)|| latitude of the city || ||long||double(6,3)|| longitude of the city || ||acount||integer|| how many assurers in this city? || * Regions, List of Regions ||Field||Type||Comment|| ||id||int(5)||Primary Key|| ||ccid||int(3)|| region relates to this country || ||name||varchar(50)|| region name || ||acount||integer|| how many assurers in this region? || * Each table includes an ''account'' field with live data from the system - how many users are in each area. Is this needed for the export? * alternate: select id,name,0 from countries resets all account values to 0 * (A): from PG i've got a testserver image in 2009 with countries, locations, regions filled. . recorded values from a test export: || db || records count || size [bytes] || || countries || 248 || 4583 (4 Kb) || || regions || 4579 || 103101 (103 Kb) || || locations || 2200456 || 101137195 (101 Mb) || * [[https://svn.cacert.org/CAcert/Policies/SecurityPolicy.html|Security Policy]] * DRAFT p20090327: * 3.3. Application [3] - Requests to systems administration for ad hoc queries over the database for business or similar purposes must be approved by the Arbitrator. * WIP 20091213: * 7.6. [2] Handover Production - Requests to Application Engineers for ad hoc queries over the database for business or similar purposes must be approved by the Arbitrator. * (A) mail request dated 2010-01-03 to (C) / Critical Sysadmin Teamleader {{{ I have a few questions. First of all some background infos. I've got a testserver image from PG back in 2009 with tables data filled for countries, regions, locations. I've did a test export with the query proposed by Greg: ------ SELECT * INTO OUTFILE '/tmp/countries.txt' FIELDS TERMINATED BY '\t' OPTIONALLY ENCLOSED BY '"' LINES TERMINATED BY '\n' FROM countries; ------ The proposed table sizes are: db reccount size [bytes] size countries 248 4583 4 Kb regions 4579 103101 103 Kb locations 2200456 101137195 101 Mb QUESTION 1 ========== @(C): All tables includes an account field with the count of users for each area (city, region, country) Is this needed in the export ? or can the export reset these values to 0 ? QUESTION 2 ========== @Sysadmin Teamleader: About the OUTFILE option: the testserver image i've got from PG includes a MySQL 5 db As I know, the live system has a MySQL 4 database. Is this correct? I've did a test export on a MySQL 4 database. And the OUTFILE parameter works also. ------------------------------------------------------------------------ drive-path>mysql --user=root --host=localhost --password=xxxxxx Welcome to the MySQL monitor. Commands end with ; or \g. Your MySQL connection id is 375004 to server version: 4.0.20a-nt Type 'help;' or '\h' for help. Type '\c' to clear the buffer. mysql> select * into outfile '/forument.txt' -> fields terminated by '\t' -> optionally enclosed by '"' -> lines terminated by '\n' -> from bbs.forum_entries; Query OK, 11 rows affected (0.02 sec) ------------------------------------------------------------------------ This is a similiar test query against a MySQL 4 database and i've got no error executing above query. So where is the problem using the OUTFILE option ? QUESTION 3 ========== @Sysadmin Teamleader: If the export is made, where can the export be placed for download? Is it possible to transfer the export to the svn server? I'll propose to create a subdirectory https://svn.cacert.org/CAcert/testsystem with a subdirectory locationsdb to place the exports into this directory. as SP states in 3. LOGICAL SECURITY, 3.1.1.1. External connectivity and 3.1.1.2. Internal connectivity restricted access This is no execution request and no ruling. This is only to investigate the environment and possible solutions for this arbitration request. Actualy I found no exceptions to not allow the export of the locations data (except the count users column). All other data informations can be compiled from several databases around the world. The cities geo locations informations doesn't fall under personal related data. There is no link to a user account in these requested tables. So further investigations to find a solution for doing the export is still open. What I didn't yet considered is the license model under which the locations informations are handled. This needs some further investigations ... }}} * License model under which the exported data falls: * CCA 1.3 Your Contributions {{{ You agree to a non-exclusive non-restrictive non-revokable transfer of Licence to CAcert for your contributions. That is, if you post an idea or comment on a CAcert forum, or email it to other Members, your work can be used freely by the Community for CAcert purposes, including placing under CAcert's licences for wider publication. You retain authorship rights, and the rights to also transfer non-exclusive rights to other parties. That is, you can still use your ideas and contributions outside the Community. Note that the following exceptions override this clause: 1. Contributions to controlled documents are subject to Policy on Policy ("PoP" => COD1) 2. Source code is subject to an open source licence regime. }}} * Exported data is no contribution to a controlled document * Exported data may fall under the source code subject to an open source licence regime. * Answers from an email interview by PD: * 1. Where does the data of the locations, regions, countries tables comes from ? * The community * 2. Are they subject to a special license model? * I believe not * As I know, addtl. entries to the locations database are contributions from community members (i.e. city not in the locations database -> insert new record by having the locadmin flag in the useraccount data set) CCA 1.3 Your Contributions. But the question here: 3. is this an exception to the default license model as stated under exceptions: 2. Source code is subject to an open source licence regime. ? * I think that this is not part of the source-code, and therefore an open-source license is not mandated. However I believe we should provide this information freely. But this is part of the decision you have to make. * 4. As the locations table is part of the critical system, SP 9.4. Outsourcing * I think this has nothing to do with outsourcing, but again, that is what you have to decide. * Your hints? Thoughts? * My thought is that the data at issue here has no privacy implications as no part of it is personally identifiable. So it would be non-critical in my eyes. Therefore it should be shared freely with the community. So my strong hint would be to do the export and publish freely for download even though for historic reasons that data is currently housed in the core systems under auspices of the Critical Systems Team. My core statement would be that: Just because data is currently part of the critical systems, that does not automatically mean it is critical data. There is data in there that has no need to be kept private. And such data can be released via the arbitration system. * Answers from an email interview by Iang: * 1. Where does the data of the locations, regions, countries tables comes from ? * The data can be added by members who had the special Location admin bit set. This is set by Support Engineer. There is currently no control over this other than being asked. As I understand it, the data was added by (and the bit was given to) members who grumbled that their town was not on there. * 2. Are they subject to a special license model? * It was this sort of thing that CCA 1.3 was meant to cover/include. Whether it succeeds or not is a decision to be made... The Locations data isn't / shouldn't be really under the regime of Security Policy. * ... but the question here: 3. is this an exception to the default license model as stated under exceptions: 2. Source code is subject to an open source licence regime. ? * No, this isn't source code, this is data. IMHO. * 4. As the locations table is part of the critical system, SP 9.4. Outsourcing * Right, in that the board is generally the executive for deciding whether a component is inside critical or not. As it happens, two cases are being discussed by the Board now: DNS+OCSP and also the domain cacert.org. The former will likely become critical-team (vote to be confirmed tonight), the latter is still uncertain. * If we were to interpret Location data as being a question under SP9.4 then one thing might be to simply request the board to examine it and determine if it can be outsourced. * As it happens, this was discussed at Innsbruck, and Birdshack team already decided to outsource it (well, maybe that's a bit too aggressive, there were some spirited defences of keeping it internal). * If the current position is that the data is inside critical sphere, then the data is under SP, so it needs to be dealt with as such. This is somewhat confirmed by the need to run an ad hoc SQL query through Arbitration. * Right. A question for the future; it is clear that we are now making text changes without proper controls ... one could argue that this is ok, as it is the historical path. I'm not sure what a future auditor would make of the fact that we are installing text on the system in a language we can't read :) For my view, it's something to worry about in the future, we need to get the basics right. * Your hints? Thoughts? * Yes, this seems more complicated. * An "ad hoc" query releases limited data for limited purposes. This isn't the same thing as permanent publication of a font of data. One would think that the "ad hoc" query releases the data to a member who will then be expected to look after it; including not further publishing it. Although not written down, that is the implication I draw today. * A decision to take a portion of the data base in order to do further ad hoc queries is another step further; the way I read the Arbitration filing, it isn't asking for an ad hoc request, it's asking for a slice of the database in order to do further ad hoc requests. This is more or less the thing that we didn't want to do: hand over extracts of our database to researchers, marketeers, etc, so they could construct the ad hoc requests without further reference to ourselves. * Then, a decision to publish on a regular basis, however it is done, is more of an over-arching architectural decision. Which should be before the architect, the software assessment team, and at the end of the day, before the board as ultimate responsible agent. * So according to the reading of the request, it may be something that Arbitration should reject, and instead refer to Board? * Perhaps the thing is to use this good work in analysis done so far, in order to write the motion for the board to vote on? * Answers from an email interview by PG: * 1. Where does the data of the locations, regions, countries tables comes from ? * We used various data sources that were freely or non-freely available. Additionally, CAcert users that are location-admins can add/modify/delete entries. * 2. Are they subject to a special license model? * I was told that some of the data we have is for CAcert only, and we are not allowed to hand it out to anyone else. * As I know, addtl. entries to the locations database are contributions from community members (i.e. city not in the locations database -> insert new record by having the locadmin flag in the useraccount data set) CCA 1.3 Your Contributions. But the question here: 3. is this an exception to the default license model as stated under exceptions: 2. Source code is subject to an open source licence regime. ? * This is data, not sourcecode. It is not part of the sourcecode we ship. * 4. As the locations table is part of the critical system, SP 9.4. Outsourcing * ./. * Your hints? Thoughts? * I would suggest a different option: * We do have full copies of those tables on some of our test-systems. I would suggest to give the people that need access either location-admin access through the web-interface, or access to test1.cacert.at which contains the tables. * Handing out the tables as downloads results in a lot of traffic, and should be avoided, if easily possible. * answer from Critical Sysadmin Teamleader {{{ > @Sysadmin: > About the OUTFILE option: > the testserver image i've got from PG includes a MySQL 5 db > As I know, the live system has a MySQL 4 database. Is this correct? No, the live system also runs with MySQL 5, more specifically: mysql-server-5 5.0.32-7etch11 (debian package name/version) > I've did a test export on a MySQL 4 database. And the > OUTFILE parameter works also. As I reported on May 5, 2009 to (C): > I've had a quick look at the queries. Is the formatting "SELECT * INTO > OUTFILE" etc. required for you, or would a formatted table dump do > just as well? Issue with the SELECT INTO OUTFILE is that you need the > mysql FILE privilege to do so, and user cacert does not have that > (nor should it). So this is not an issue with the MySQL version, it is just that user cacert (the one we use to access the database) does not have the required privilege. Here is the error message from the live system: mysql> SELECT * INTO OUTFILE '/tmp/countries.txt' -> FIELDS TERMINATED BY '\t' -> OPTIONALLY ENCLOSED BY '"' -> LINES TERMINATED BY '\n' -> FROM countries; ERROR 1045 (28000): Access denied for user 'cacert'@'localhost' (using password: YES) My MySQL knowledge is very limited; I've only noticed that dropping the "INTO OUTFILE '/tmp/something'" part from the query is not valid syntax anymore according to the mysql utility: mysql> SELECT * -> FIELDS TERMINATED BY '\t' -> OPTIONALLY ENCLOSED BY '"' -> LINES TERMINATED BY '\n' -> FROM countries; ERROR 1064 (42000): You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'FIELDS TERMINATED BY '\t' OPTIONALLY ENCLOSED BY '"' LINES TERMINATED BY '\n' at line 2 > ... > So where is the problem using the OUTFILE option ? I think I've just explained that adequately above here. > QUESTION 3 > ========== > > @Sysadmin: > If the export is made, where can the export be placed for > download? > Is it possible to transfer the export to the svn server? Are you talking about a regular export (say daily or weekly)? That would have to be automatic, and it's not so easy to do; the webdb server has (on purpose) no possibility to routinely copy files to the outside. If it is just for a one-time export, it's no problem, that can be done manually of course. > I'll propose to create a subdirectory > https://svn.cacert.org/CAcert/testsystem > with a subdirectory locationsdb > to place the exports into this directory. > as SP states in 3. LOGICAL SECURITY, 3.1.1.1. External connectivity > and 3.1.1.2. Internal connectivity restricted access Ah, I see what you mean, you are looking for an "internal" channel from webdb server to svn server. There really aren't such channels in my opinion, expecially as the CAcert infrastructure services (including svn) are likely to be housed elsewhere soon, at a completely different physical location than the critical services. However, given the essential public nature of the data in case, I can well imagine two other solutions: (a) a reverse solution: maintain the master version of these tables on some CAcert infrastructure system (eg svn), and have the webdb server pull in this data on a regular basis to update its mysql copy. Some scripts would need to be written for that, but it would fit in with the current access restrictions setup; (b) a simple download solution: the webdb server could make the data available via its web interface (over http or https), and thus allow some script on the svn server to pull in new versions on a regular basis. If the data is not considered to be fully public, it probably shouldn't go into svn either, should it? But for data protection (b) above could be enhanced to allow the retrieval only by a specific IP or certificate. [...] The remarks from PG and Iang which I read on the wiki page for this arbitration all seem to make a lot of sense. The only thing that raised an eyebrow with me was PG's remark that some users may have contributed data to this database with an express intent that it would only be used for CAcert and for nothing else. That would be hard to reconcile with publishing the data, but I guess you'll have to ask for explicit evidence of this situation, it could be obsolete hearsay, I have no idea ... }}} * answer from (C) {{{ Question #1 The count of users for each area (city, region, country) is not necessary. Question #2 & #3 OUTFILE on the production system is not an option due to the nature of the configuration. No way to get the file once it is created. A formatted screen dump would suffice. The locations file was going to be the problem. }}} * CM / A alternate php script (similiar to the events mailing script) that simulates an OUTFILE output {{{ #!/usr/bin/php -q 0) $query1 = "SELECT * FROM countries"; // id,ccid,name,account (-> 0) $query2 = "SELECT * FROM regions"; // id,regid,ccid,name,lat,long,account (-> 0) $query3 = "SELECT * FROM locations"; // QUERY 1 function dump_table($query, $fp, $fclmn) { $res = mysql_query($query); $nbFields = mysql_num_fields($res); while ($row = mysql_fetch_array($res, MYSQL_NUM)) { for($i=0; $i < $nbFields; $i++) { // fwrite($fp,$row[$i]); if($i != $nbFields-1) { if ($i == $fclmn) { fwrite($fp,'"'.$row[$i].'"'); } else { fwrite($fp,$row[$i]); } fwrite($fp,"\t"); } else { fwrite($fp,"0"); fwrite($fp,"\n"); } } } mysql_free_result($res); } dump_table($query1, $fp1, 1); dump_table($query2, $fp2, 2); dump_table($query3, $fp3, 3); fclose($fp1); fclose($fp2); fclose($fp3); ?> around 14 min. (!) runtime on testserver results in output: /tmp/a2009042702.countries.txt 1 "Afghanistan" 1 /tmp/a2009042702.regions.txt 1 7 "Barbuda" 0 /tmp/a2009042702.locations.txt 1 2527 169 "A" 62.983 9.750 0 }}} * Sysadmins teamleader answer 20090109 {{{ > attached version c of the script. > The output goes to directory /tmp > 'cause this is writable. Actually any directory that is writable for the id executing the script will do fine. But /tmp is guaranteed to be fine :-) > I've modified the script this way, that > it simulates the output of the "OUTFILE" > parameter from the proposal. > > This version is tested on my local testserver > system. > The runtime of the script was around 14 min. (!) > on my testserver and results in output rows: > ... > @Sysadmins teamleader: > > This is not a ruling, only a proposal and > a question to you: > > what is your suggestion about the script runtime > of about 14-15 min ? It's long but quite feasible, certainly for a one-off. I expect it to run a little faster than 15 minutes on the production system. > Is it ok, to start this script on the life system ? Yes, no problem. > or would you suggest, that this runtime > is a burden on the production system ? No, only if we would be running this very frequently, but I understand that that is not the case. > My intention for the ruling is, to let the > script running once, not on a recuring schedule. Fine. > So your thoughts how a regular transfer can be > implemented is oversized for this solution. > A onetime transfer to a storageplace that > can be secured for requestors will fit > like the src directory on SVN Please make sure to point out in your ruling whether the results may be published to a generally visible place somewhere in the CAcert infrastructure or that they should be kept private to arbitrator and any others you see fit to receive the data. }}} * We actual have one SVN admin and requests have to go to svn-admin@c.o * answer from SVN-admin 2009-01-13 {{{ > Means, if a user requests access to this > image and has no permissions to > this directory, the SVN admin has > to add the new users account to the > SVN and have to forward a notification > to PG. > > 1. Is this a procedure that can be > established ? Yes it could be established. I think a special issue category in the bug tracker for such requests could be a good idea though. > 2. the file size of about 735 Mb > can this handled thru the SVN ? > or exists any resource limitations > i.e. bandwith, disk space ? SVN is no ideal for storing binary files but could be used for it. We have 30G of available disk space on the svn machine. So one revision with the bz2-file should be no problem. If it changes from time to time you should be aware that SVN does not handle binaries (especially files with lots of changes) very efficiently. If there are no plans to change the file very often I think it would be better to create a download area served by Apache and protected by certificate or password authentication. }}} * (C)'s statement on request from 2009-01-13, at 2010-01-15 {{{ Ulrich, I am very pleased that you are making this effort and I appreciate it. To answer your questions: Yes, it meets the need to have access to such data. As a request to perform an Ad hoc SQL query; No. Say that I develop a query to extract monthly numbers on usage and demographics for COMA, the Board, for the marketing and events group, or an AGM report. Whose queries are trusted? How does the query get validated so the sysadmins feel comfortable and willing to run it? How is the data going to get off the production system? Here is the fuzzy area of this Arbitration. Around the time I requested this arbitration, we were trying to get an email sent to assurers whose ability to assure had arbitrarily been turned off by Philipp Guehring when he installed the CATS patch without notice to anyone. Hence later, you got your "Event Announcement Email script". So be careful about the decision in this arbitration. I feel the ruling needs to be, "Ok, run the these queries". This supports the security policy and our CAcert management's ability to extract useful metrics about the project, and to send email to users and assurers. To say no is to set the precedence to avoid running any kind of query. **That you identify the steps of how this happens is also of great importance so that future Ad hoc SQL query requests can be run, and do not have to wait a year. Everyone has looked these queries over, and agrees they do not extract personal data that puts CAcert in jeopardy by the Belgium DPA. That this data comes via a console session saved to file, is fine. It may take time, it won't crash the system. It can also be run at a low use time. You need to dispel the fear of doing this kind of thing. Regards, Greg }}} * Info about [[Software/Database/LocationDatabase|Locations Database]] used within the CAcert's website * 2010-07-01 (PG): answer {{{ > 1. Does the testserver image, that is under your > control include a copy of the locations database ? > I think so, but I don't fully remember at the moment. > 2. Is there a Download link available, that someone > can download the testserver image? > please add also the link > Direct download: BitTorrent download (I am not sure whether that one still works): > 3. Is there a download restriction set and if yes, > what access restrictions exists for this link? > account? password? client cert? or only > knowledge of the link ? > Only the knowledge of the link. }}} == Discussion == {{{ The questions that araises in this Ad hoc query request are multiple folded. 1. Questions about the locations data: a) Is the query results data critical or non-critical data ? b) Are the results of the query under a CAcert license model ? c) Are there any reasons why data cannot be published ? d) Does this data relates to SP 9.4. Outsourcing From further investigations, I received the answers from the critical sysadmin team, statements from Board members, from developers and from the Software Assessment team. This araises addtl. questions: 2. Ad hoc vs. recuring queries. a) What does this mean ? b) Is there a difference ? c) How does recuring queries effects critical sysadmin team ? d) How does recuring queries effects Software Assessment team ? e) How does recuring queries effects the CAcert board ? 3. Statistical data: a) What does this mean ? b) How is this effected by SP / SM and other policies ? c) How can the results be transfered to the recipients ? 4. Publishing of Ad hoc query results a) Is there a simple answer to a complicate qustion possible ? b) can some data be published freely ? c) or needs all Ad hoc query results to be private ? 5. Usage of Locations Data === Question 1.a) === From the requested tables, the tables content has no privacy implications and no part of it is personally identifiable. So it can be declared as non-critical data. Even though for historic reasons that data is currently housed in the core systems under auspices of the Critical Systems Team. My core statement would be that: Just because data is currently part of the critical systems, that does not automatically mean it is critical data. There is data in there that has no need to be kept private. The Locations data isn't be really under the regime of Security Policy. And such data can be released. === Question 1.b) === - Existing CAcert license models doesn't cover the locations database data. - Exported data is no contribution to a controlled document. - Exported data doesn't fall under the source code subject to an open source licence regime. I see the CCA 1.3 "Your Contributions" as the license model, that comes in place, if no other license model relates to the case in question as a fallback. If no contributor can be identified, so the contributor has to be set to CAcert. One additional question is under a historical view: What is with contributions before CCA comes into effect ? Here I come to the conclusion, that they also covered under CCA 1.3 "Your Contributions" license with two exception. Exception 1: One, that is handled by another arbitration (a20090913.1): named "privacy info" The contributors "privacy info" has to be anonymized on request as long as its not covered by the Privacy Policy. Exception 2: The 2nd exception is a case with a named contributor with a restricted license form, that isn't covered by the CCA 1.3 license schema. If one have made contributions before CCA comes into effect, the license may be limited if a named contributor did so. === Question 1.c) === PG makes a claim, that on contributor with some of the data we have is for CAcert only, and we are not allowed to hand it out to anyone else. As the data cannot be extracted, this claim relates to all or nothing of the data set. No chance to extract or split it in any way to become a set of non-transferable and a set of transferable data. By a review of the used license model for this contribution, this contradicts with the license model that is published by the CCA 1.3 "Your Contribution" model, but falls under exception 2 - contribution was made before CCA comes into effect. The name of the contributor is known to the arbitrator. See also https://wiki.cacert.org/Software/Database/LocationDatabase === Question 1.d) === The answer that were found in 1.a), that the locations data isn't really under the regime of Security Policy means, that also SP 9.4 hasn't been covered and the release of the data needs to be handled by an arbitrators ruling. === Question 2) === An "ad hoc" query releases limited data for limited purposes. This isn't the same thing as permanent publication of a font of data. One would think that the "ad hoc" query releases the data to a member who will then be expected to look after it; including not further publishing it: 2.1 a20090810.3 User requests a list of people who have more than 150 points a20090902.1 request list of OA Other samples includes the publishing of data for i.e. work on a new policy work 2.2 a20091221.1 Adhoc SQL-query about U18 cases An "Ad hoc" script that falls under the recuring schema is the Events team leader request: 2.3 a20090525.1 Event officer request recurrent notification to assurers near the location of the following ATEs Here there is no publishing of data. Its a automated mailing to recipients, that are unknown to the Events team leader and also to the sysadmins. The only result of the script is the count of recipients, that can be seen as a statistical data. The difference between the "Ad hoc" queries listed under 2.1 and the scripted mailing under 2.3 is, that the scripts under 2.3 is a subset of a query, that has a limited purpose, that has a defined input of a few variable, but known parameters: - central location - max distance to the location - subject text for the mailing - text that is used in the mailing and a defined output. This has been reviewed by an arbitrator and has been ruled. Each of the "Ad hoc" queries listed under 2.1 has also been reviewed by an arbitrator and the execution has been ruled by the arbitrator. The exception of 2.2 query was, that the result needs to be published. This was also handled by an individual arbitration case and was ruled by an arbitrator. Conclusion: Each "Ad hoc" query, also each recuring query needs a definition of a query, the expected result, the recipient(s) of the result set, and if this result set can be published or not. If there is a need of a recuring execution, this can be defined by request (yet arbitrated Ad hoc query) or in the initial request. This only needs one arbitration (see Events mailing case) so futher executions can relate to the one handled case. As a result of this. This arbitration can rule for the request for the location data. But this arbitration cannot rule for the results of future developed queries, that are not known yet. If i.e. the marketing team needs recuring statistical data about user growths in locations worldwide over a timeframe, they have to define their query with a request, to start this script on a recuring schedule. If this data can be published or not, depends on the query that has been requested and needs to be ruled by an addtl. arbitration case. === Question 3) === The nature of Statistical data is, that many of them can be published freely - see www.cacert.org + About CAcert.org + CAcert Statistics http://www.cacert.org/stats.php This covers the count of users, assurers, and more detailed data like count of assurer candidates, count of assurers with test and so on. Also the growth over the past 12 months and the past years can be displayed freely. Other statistical data cannot be accessed so easily like the "hidden" stats page: https://secure.cacert.org/wot.php?id=1 that lists the count of assurers by county, by region, or by city This "hidden stats" page relates to the "find an assurer" database, and has restricted access permissions to the loggedin users, means community members. The difference between both statistical data relates to the Privacy Policy: 8. Privacy of user data CAcert Assurers can see the name, birthday and the number of points by looking up the correct email address. No other person related data is published by CAcert. As the "find an assurer" results page displays names of assurers in a form of half anonymized names, to all community members, this is an addtl. not yet covered set of data from within the Privacy Policy. The "find an assurer" results page lists: givenname, 1st char lastname, the count of points an assurer can give and a free text field, the assurer can fill in. So therefor the "hidden stats" page cannot be published freely. Collecting statistical data, how many assurers are listed in each country, region, city without further link to the assurers list contains no privacy related data and therefor can be published in a way, that data can be used CAcert internaly for develop seeding programs for CAcert deserts. But this isn't covered by this arbitration and needs a review in another Ad hoc query request decision. === Question 4) === The Publishing of Ad hoc query results a. Is there a simple answer to a complicate qustion possible ? Yes. Each query result needs to be checked against privacy issues and SP thru an arbitration. Sample 1: the current query request to export locations data is a resultset with non-privacy data. So therefor, this result can be published (by default), but needs to be reviewed by an Arbitrator, if a result set doesn't realy doesn't contain any privacy data (i.e. misused fields in data set) Someone can add his Email address and postal address in the description field of "Find an Assurer" dataset. Sample 2: A query request of potential assurers for an ATE, to transfer the Email addresses to the Event Officer will probably contain personal data (the Names and Email addresses). So therefor thus needs to be arbitrated (see [[Arbitrations/a20090525.1|a20090525.1]] [[Arbitrations/a20090525.1|Event officer request recurrent notification to assurers near the location of the following ATEs]]) The workaround to prevent transfering the result set was, that the scripted mailing doesn't transfer the complete result set to the Events Officer, instead the mailing was triggered by the critical sysadmin team an the Events Officer only receives the count of mails sent to potential Assurers. So there is no publishing of sensitive data. b) can some data be published freely ? Statistical data can be published within CAcert by default if the statistical data doesn't contain privacy data. I.e. in the prepare phase of the Policy on Junior Assurers / Members the question araises, how many members started membership underaged. The count of users was something unexpected so the real need for this subpolicy was emphasized by the result. By default, results w/o privacy data can be published within CAcert w/o problems. Someone can argue, that a query about Delete Account requests may harm CAcert, so freely publishing shouldn't be allowed. So thats why a precedence ruling cannot be taken. Each query needs an individual arbitration. c) or needs all Ad hoc query results to be private ? Privacy is a limitation. Also may other questions better be for CAcert internal usage only. But a general limitation, that query results needs to be private cannot be set. Again, each query needs a review by an Arbitrator, who have to decide, if the query can be executed as expected and how the result set can be used. === Question 5) === The usage of Locations Data was limited by the contributor for CAcert usage only. Thus means, there is an unclear state, if the locations database can be used on another CAcert service (i.e. new Software, other infrastructure server). It can be used for CAcert developers in testserver images. But the usage is limited to CAcert internal usage only, to write and test patches or new features for the current webdb system. From all the informations received, parts of the locations database maybe OpenSource. But as long there is no defined state here, and the data is mixed with probably limited data, the overall state is "limited". How to transfer the locations database result set? The locations database is about 100-120 Mbytes in size. Thats half of the size of the complete webdb system. It comes to my knowledge that testserver images flying around with hidden links (PG has a testserver image, the new Software-Assessment project has a testserver image). On further investigation, the result is, that the testserver image by PG has the locations database included. The new testserver image built by Wytze within the Software-Assessment-project doesn't contain the locations database by default. So one question that araises: is there a need to export the locations database from the live system? or is it sufficient to use the existing testserver image from PG with the locations database included? With this limitations set, its difficult to handle this in practice. How can it be archived, that only CAcert developers get a testserver image with the locations database included? How can we prevent developers to use the locations data in another non-CAcert project? These questions can only answered with: limit the access to the testserver image or locations database export result set. Find a place, where the image / export can be stored. Give access permissions only to users by request - Next question: Who can give access? * Support ? * Critical Sysadmins ? * Infrastructure-admins ? * or Software-Assessors ? Also an option: access with client cert only. Inform the requestors about the limitations that are set to the locations database. Restrictions / Limitations found so far: * there is a restriction / limitation set on the locations database dataset * recuring exports aren't preferable, as each execution needs a lot of work from sysadmins to transfer the result set onto another system where it can be downloaded * svn isn't the preferable target for a testserver image with the locations database included or the locations database export * download area for the locations database dataset should be under CAcert control * the method of control is open to system requirements and by one of the CAcert group members * this can be: ACL permissions, Group permissions, Client Certs to the download area or a hidden link * dependend of the download area system, access permission can be given by Critical Sysadmins, Infrastructure Admins and/or the Software-Assessment team }}} * 2010-07-29 Recomendation Solution I by (A) {{{ Recomendation: To move the authority over the locations database set to the Software-Assessment team and turn the distribution order from live system to the outside to Software-Assessors to live system a) Updates onto the Locations database can be tested thru the Software-Assessment team b) Export requests can be handled by Software-Assessors authority Advantages: * Requests from CAcert developers can be easily handled without first building a framework to handle this request * Updates to the locations database are under Software-Assessments Team authority and can follow the Software-Assessment procedures for updates * Updates can be send easily to connected CAcert developers update receivers * recuring updates are no longer needed to be transfered from the critical system, as the main repository is under authority of the Software-Assessment team. Updates will be sent from Software-Assessment team to the critical system * Software-Assessment team is also under control of CAcert so the locations database set is under control CAcert developers can send a simple request to the Software-Assessment team to get the locations database for development purposes. Also other teams of CAcert (Board, Arbitration, Infrastructure) can request a copy of the locations database set for CAcert usage. The Software-Assessment team have to give notification to the requestor, that the database set can only be used for CAcert purposes. No transfer to other projects, allowed. The transfer format of the locations database set is not limited to a special format. It can be transfered within an CAcert developers image, as sql-dump or whatever else format. The only limitation is, that the download links aren't publicy available and access is secured, so that a requester needs download infos from the Software-Assessment team by either Account/Password combination and/or a hashkey URL and/or an ACL secured access point and/or a client cert limited access point thru one or more possible and applicable services (i.e. ftp, http, or other services) An initial complete export from the critical system to the Software-Assessment team is allowed to receive the current state of the locations database set. Critical team and Software-Assessment team have to deploy a transfer concept for the initial transfer. }}} * 2010-07-29 (IanG): respond to "Solution I" {{{ > To your information: > There is a limitation set over the locations database set > "for CAcert usage only" *Property* As far as I know, this is a claim that has been verbally stated by someone, so I would question whether we have a reliable understanding of that. I'm also uncertain as to whether we should accept on face value any such claim, as it has quite important ramifications. Some questions... to establish any facts: Has the Arbitrator managed to clarify the nature of the intellectual property claim and the nature of the licence that we have (verbal or implied or otherwise) ? If NO to the above, is the Ruling likely to include a statement that clarifies the licence for us? By dictat, as it were? Those are questions to establish facts, here are my thoughts: This Community has always preferred open source. It's not always possible to arrange this, but there is a sense that it should be the case for all the critical domain software. Indeed, audit criteria establish strong control provisions over our critical assets, which tends towards us needing either full ownership or control established under a strong licence (aka, agreed open source licence). The current location database doesn't seem to fit that ... even though we can recognise that when it was first given to us, the nature of its usage and provision might have been a good thing, and we're grateful for any help. Things have moved on from then, I suspect. In practice, we have google earth and other open alternatives. Also, BirdShack team advanced the theory that the location database did not need to be in the critical domain at all. Sure, it was there for convenience of coding when Duane had one website and that was it. But we are a bit bigger now; we have a lot of separated platforms, and we want a tight CA. We have a track record in Client Certs and we'll push that more.... So, BirdShack decided that the best thing to do would be to kick the location database out of the critical domain, and suggest some sort of infra or community tool. Leave that up to the others. > == Discussion == > > Recomendation: > > To move the authority over the locations database set > to the Software-Assessment team and turn the distribution > order from live system to the outside to > Software-Assessors to live system In this sense, giving the Software Assessors some sort of control over the location database strikes me as perhaps going in the other direction. In the past, SEs just added anyone as location administrator after doing some informal due diligence. (Correct me if I'm wrong here, I haven't looked up how it is done.) Now it is to be controlled as if software? - I would speculate that the location database is really a database, not a source code file, so it complicates the distribution goes. (Same commment has been made about policies! :) - Are the database fields part of the main database? Does this mean that a software distro from the Software Team is now going to include data components? - There is the practical issue that software changes are generally a long cycle, compared to SE changes which are short. OK, both could be quicker, but at least simple SE changes like this can be done within a week. > a) Updates onto the Locations database can be tested > thru the Software-Assessment team What is there to test? Either some town is in the right place, or not? Is this really a critical issue? > b) Export requests can be handled by Software-Assessors > authority (I'm happy with that authority being handled by any of the critical teams, the Board/officers, or the Arbitrator. But I also think it could be handled by anyone who's an Assurer and has a manual on how to do it... e.g., a year ago, we stopped the Board from approving any maillists or email addresses, it's just too much bureaucracy.) > Advantages: > * Requests from CAcert developers can be easily handled > without first building a framework to handle this request > * Updates to the locations database are under Software-Assessments Team > authority and can follow the Software-Assessment procedures for updates > * Updates can be send easily to connected CAcert developers update > receivers I'm unclear to me why this would be an advantage. As a programmer, the last thing I'd want to do is deal with ... data updates. But perhaps I'm missing something? > * recuring updates are no longer needed to be transfered from the > critical system, as the main repository is under authority of the > Software-Assessment team. Updates will be sent from Software-Assessment > team to the critical system > * Software-Assessment team is also under control of CAcert so the > locations > database set is under control > > CAcert developers can send a simple request to the Software-Assessment > team > to get the locations database for development purposes. Also other teams > of CAcert (Board, Arbitration, Infrastructure) can request a copy of the > locations database set for CAcert usage. The Software-Assessment team have > to > give notification to the requestor, that the database set can only be used > for CAcert purposes. No transfer to other projects, allowed. OK. > The transfer format of the locations database set is not limited to > a special format. It can be transfered within an CAcert developers image, > as sql-dump or whatever else format. > The only limitation is, that the download links aren't publicy available > and access is secured, so that a requester needs download infos from > the Software-Assessment team by either Account/Password combination and/or > a hashkey URL and/or an ACL secured access point and/or a client cert > limited > access point thru one or more possible and applicable services (i.e. ftp, > http, > or other services) > An initial complete export from the critical system to the > Software-Assessment team is allowed to receive the current state of > the locations database set. Critical team and Software-Assessment team > have to deploy a transfer concept for the initial transfer. > > ---- > > please comment on the recomendation, if this is a usable > and acceptable solution > > I want to get answers from: > 1. Claimant > and the following teams: > 2. Software-Assessors > 3. Software-Assessment Project > 4. Critical Sysadmins > > and probably others to comment on > 5. board Speaking from pov of Board, but not as board: I am keen to establish just exactly what this licence is (implied or otherwise). The natural party to the licence is the Board as representative of CAcert Inc. Although we might vary that, convention pushes us in that direction. Also, I and others have worked hard to get out of these particular difficulties created by "special contracts". E.g., CCA 1.3, PoP 6.2, past decisions from Duane to transfer the IPR to CAcert Inc, audit argument over old CPS as owned by another party. So I would be hoping for any decision that takes us further along that path. As a principle, we want control of our own location database if it is part of the critical system, *OR* we move it out of the critical domain. }}} * 2010-07-29 (PG): respond to "Solution I" {{{ I am not sure, whether this is a good idea, since the location database is primarily maintained by the CAcert users through the web-interface. }}} * 2010-07-29 (Wytze): respond to "Solution I" {{{ > There is a limitation set over the locations database set > "for CAcert usage only" Not speaking as critical admin but as CAcert user and open source advocate here: this statement about "limitation" seems at odds with CAcert's general policies, so it would really be nice if this claim is substantiated with clear facts - a license, sources of non-public information, amount of it, etc. ... > ... > Recomendation: > > To move the authority over the locations database set > to the Software-Assessment team and turn the distribution > order from live system to the outside to > Software-Assessors to live system Strong support from critical-admin@cacert.org for this -- this data is not critical and should not be managed by critical sysadmins. > ... > An initial complete export from the critical system to the > Software-Assessment team is allowed to receive the current state of > the locations database set. Critical team and Software-Assessment team > have to deploy a transfer concept for the initial transfer. The initial complete export can simply be dealt with by a mysqldump of the countries, regions and location tables, which will be scp'ed to a Software-Assessment designated location. In return we'd expect updates to the live data to be supplied to us by signed e-mail from the Software-Assesment team with mysql update scripts for the above mentioned tables. }}} * 2010-07-29 (Mario): respond to "Solution I" {{{ > This Community has always preferred open source. It's not always > possible to arrange this, but there is a sense that it should be the > case for all the critical domain software. Indeed, audit criteria > establish strong control provisions over our critical assets, which > tends towards us needing either full ownership or control established > under a strong licence (aka, agreed open source licence). looking at CAcert principles it would be appreciated to migrate the database to an open database. http://sourceforge.net/projects/opengeodb/ might be one possibility. But migration would not be that easy I guess. >> * Updates can be send easily to connected CAcert developers update >> receivers > > I'm unclear to me why this would be an advantage. As a programmer, the > last thing I'd want to do is deal with ... data updates. But perhaps > I'm missing something? For development purposes imho a snapshot should suffice. Maybe even some open database can be imported for testing purposes (depends on the data format). }}} * 2010-08-24 (Wytze): info about locations database * With an empty Locations database in a recovered system there is impossible to a user with !LocAdmin flag set to enter locations data into the locations database * at least country and regions table have to be filled, before an !LocAdmin flag enabled user can add locations data into the database * "As far as I am aware, there are no licensing restrictions on the data in the countries and regions tables." * 2010-08-25 (PG): I think the countries table is definitely public, and I am not fully sure about the regions table, but I would say that it can be thought of being public, as well. == Discovery == * From the Software-Assessment-Project the recovery of a system was tested w/o the locations database set. * the missing locations database, that consists of 3 tables: Countries, Regions, Locations, results that no user with !LocAdmin flag enabled can enter new locations data into the database * there is a need to import at least the countries and regions table data to get the system running as idintical as possible in relation to the production system * a possible solution for better handling can be, to split the locations database set into 2 different sets of data a. country and regions tables (with no restrictions set) a. locations table (with restrictions set) * Research on country / regions databases * started in [[http://drupal.org/node/19983]] from [[Software/Database/LocationDatabase|CAcert wiki Location Database]] * Source: '''[[http://earth-info.nga.mil/|earth-info.nga.mil]]''' * Country files with citys under [[http://earth-info.nga.mil/gns/html/cntry_files.html|Country files]] * [[http://earth-info.nga.mil/gns/html/gazetteers2.htm|Geopolitical Codes (Formerly FIPS PUB 10-4)]] * [[http://earth-info.nga.mil/gns/html/FIPS10-4_match.pdf|FIPS 10-4 Countries, Dependencies, Areas of Special Sovereignty, and Their Principal Administrative Divisions]] official listing (April 1995) * Geopolitical Entities and Codes (formerly FIPS PUB 10-4) * [[http://earth-info.nga.mil/gns/html/Geopolitical_Independent_States_1OCT09.pdf|Independent States]] (September 2009) * [[http://earth-info.nga.mil/gns/html/Geopolitical_Dependencies_1OCT09.pdf|Dependencies and Areas of Special Sovereignty]] (September 2009) * Full lists (April 2010) * [[http://earth-info.nga.mil/gns/html/GEOPOLITICAL_CODES.pdf|PDF]] * [[http://earth-info.nga.mil/gns/html/GEOPOLITICAL_CODES.xls|XLS]] * Q: What is the [[http://earth-info.nga.mil/gns/html/FAQ.htm#C3|License model]] ? * A: There is no problem in making the geographic names data freely available. A suitable citation note is: Toponymic information is based on the Geographic Names Data Base, containing official standard names approved by the United States Board on Geographic Names and maintained by the National Geospatial-Intelligence Agency. More information is available at the Maps and Geodata link at www.nga.mil. The National Geospatial-Intelligence Agency name, initials, and seal are protected by 10 United States Code Section §445. * Source: '''[[http://www.geonames.org/|www.geonames.org]]''' * [[http://www.geonames.org/about.html|License]]: "The !GeoNames geographical database is available for download free of charge under a creative commons attribution license." * Source: '''[[http://www.maxmind.com/app/products|www.maxmind.com]]''' i. variant 1 * Products: GeoIP® Geolocation Products * offered with site license and with Updates * Subcountries code is available thru [[http://www.maxmind.com/app/fips10_4|MaxMind fips10_4 Subcountries]] * Q: What is the license model on this ? * A: The subcountries list referres to GeoIP database that has restrictions: * [[http://www.maxmind.com/app/license_sign|MaxMind GeoIP database license]] * Comment: this license sounds like CAcert's usage of their locations database i. variant 2 * [[http://www.maxmind.com/app/ip-location]] * defined: Free / Open Source * [[http://www.maxmind.com/app/geolitecountry|GeoLite Country]] * !GeoLite Country is similar to the GeoIP Country database, but is slightly less accurate. Should you require greater accuracy, GeoIP Country is a drop-in replacement for !GeoLite Country. * [[http://www.maxmind.com/app/geolitecity|GeoLite City]] * [[http://www.maxmind.com/app/locate_demo_ip|Free GeoIP City lookup]] * [[http://www.maxmind.com/app/api|GeoIP APIs]] * Q: What is the license model on this ? * A: Under the license agreement, all advertising materials and documentation mentioning features or use of this database must display the following acknowledgment: "This product includes !GeoLite data created by !MaxMind, available from http://www.maxmind.com/." * Redistribution: Free, subject to GPL/LGPL for APIs and database license. Commercial redistribution licenses are available. * Source: '''[[http://www.geodatasource.com/cities-free.html|www.geodatasource.com]] (Free Edition)''' * needs registering for download full list * list includes only countries and cities, no long, lat and regions (that is starting in Gold Edition, Pay Edition) * Q: What is the license model on this ? * ??? * Source: '''[[http://developer.yahoo.com/geo/geoplanet/data/|Yahoo! GeoPlanet™ Data]]''' * [[http://developer.yahoo.com/geo/geoplanet/data/getLatest.php|Yahoo! GeoPlanet™ Data download]] * includes no LONG, LAT and Regions, only Cities and WOEID's * Q: What is the license model on this ? * A: Creative Commons Attribution license * Source: '''[[http://ipinfodb.com/ip_database.php|IPInfoDB]]''' * Available information in the database : ISO country code, country name, FIPS region code, region name, city, zipcode, latitude, longitude and timezone * The data is compiled from the free Maxmind CSV database (Geolite City) and rearranged with many scripts. We use other sources such as Geonames, TZ data, FIPS for regions name and IANA for IP assignement. * Q: What is the license model on this ? * A: The SQL database behind IPInfoDB is offered for free * Source: '''''- to be continued -''''' * The CPS story on Open Audit * [[http://iang.org/papers/open_audit_lisa.html#cps|CPS - Things that can go wrong -- The Trap of Being Too Good]] * [[https://svn.cacert.org/CAcert/CAcert_Inc/Board/board_review_actions_20040820_20070525.html|Board Review Actions between 2004-08-20 and 2007-05-25]] {{{ Transfer of Source Code Proposed Duane Groth 2004.09.01 to transfer my copyright to CAcert with the exception that I'm allowed to retain any/all copies of my own work and have no restrictions on what I do with the code etc. To make it legal under Australian law there will need to be money given to me, even if this is only $1. Aye: NR, WH, ML, CM (DG). Minuted 2004.09.15 as $1 transferred. }}} * The [[http://rossde.com/CA_review/|David E. Ross Criteria (DRC)]] ||A.3.j ||23, 35 ||The CPS describes which aspects of the CA's operations involve protected intellectual property and what protections and licenses are involved. The property status of the following shall be addressed:<
> * CP, CPS, privacy policy, configuration-control specification, and declarations of risks and liability<
> * Root and intermediate certificates<
> * CA-generated subscriber certificates<
> * Lists of current and revoked certificates<
> * Software tools used in the CA's operations<
> * CRLs, OCSP data, or the equivalent (see §B.2.k)<
> * The CA's Web site || ||A.3.k ||6, 42 ||The CPS describes how the CA handles its subscribers' intellectual property. || * [[http://www.cacert.org/policy/CertificationPracticeStatement.php#p9.5|CPS 9.5. Intellectual property rights]] * 9.5.1. Ownership and Licence * 9.5.2. Brand * 9.5.3. Documents * 9.5.4. Code * [[http://www.opendatacommons.org/licenses/odbl/|An Open Database License (ODbL)]] and [[http://www.opendatacommons.org/licenses/dbcl/|Database Contents License (DbCL)]] * [[http://www.opendatacommons.org/faq/licenses/#db-versus-contents|Why a separate license for the contents?]] * "Where neither “DB” rights or a contracts exists no license will be enforceable. If this is of concern to you your only real alternative is to not make the database available." * [[http://svn.cacert.org/CAcert/principles.html|Principles of the CAcert Community]] {{{ x. Openness and Transparency We strive to open up as many of our processes as possible. We strive to present our decisions, products and services as transparent. We do not do secret deals. }}} == Ruling == I have gathered and reviewed evidence concerning the origins of the locations data to the extent reasonable and plausible. The evidence does not reveal the original source of the data, and no obvious nor easy stones remain to be unturned. Therefore I conclude this information has been lost to us. The alleged claim of external claims over the locations data fall outside the CCA contributions clause, which came later, and outside the IPR transfer by Duane dated 2004. They cannot therefore be verified by this Arbitration. A claim was made that one contributor provided some of the data on the basis that it is for CAcert only, and there is no licence to share it with anyone else. The licences of typical geodata companies examined in the course of this Arbitration did match the general sense of an individual licence, and not an open licence. However, as the evidence pointed at no particular company, none of those licences is more than illustrative of what might have been. I conclude however that the central claim has merit, and CAcert must respect that claim. Further, the claim cannot be converted into something that is not contrary to CAcert's principles and conventional position in the open source world. But it further implicates that CAcert should prevent further usage of this locations-database-set. i. So therefor, to prevent CAcert Inc. and CAcert community from any harm in the future that may cause by the undefined source of data, I rule hereby to stop the usage of the locations-database-set asap. ii. The BirdShack project team made the decision to not use the locations database within the new project. I confirm that decision. iii. The Community shall start a project ASAP to migrate the location function that is currently implemented in the critical system, with the objective of replacing the locations database completely, by means of either a) with a completely new fresh freely available open source of locations data or b) any alternate solution to the location function that does not need a local store of locations data within the critical system iv. For the intermediate time, until a migration has been completed, the Migration team is allowed to get a copy of the locations database set for analyzing and tests of a migration plan. The Migration team should be under control of the Software-Assessment team. The Software-Assessment team that is under control of CAcert Inc. is allowed to use a copy of the locations-database-set for the time of the migration phase on their staging system to get software updates passed to the critical system. The staging system has to be under full control of the Software-Assessment team. v. The original request of a general export of the locations database set is rejected caused by the undefined state of the locations database set and the found restrictions set. But a limited export, that is under control of the Critical Sysadmin team and Software-Assessment team is allowed. vi. To prevent the same mistake for the future, an undocumented, an undefined state of a database set, the source of a new locations database set must be named and documented. vii. I further rule that, once the locations database set has been replaced in accordance with the other elements of the ruling, the remaining locations data is no longer to be treated as critical nor privacy related data. This includes the tables country, regions and locations, but not the users record link to the location tables. From the requested tables, the tables content has no privacy implications and no part of it is personally identifiable. So it can be declared as non-critical data, even though for historic reasons that locations data is currently housed in the core systems under auspices of the Critical Systems Team. My core statement would be that: . Just because data is currently part of the critical systems, that does not automatically mean it is critical data. There is data in there that has no need to be kept private. The locations data has no proper purpose under the regime of Security Policy, and may be considered outside its control as soon as is convenient. My recommendation to the Community is: . To find a solution that allows the usage of a locations service that is moved out of the critical box like in the birdshack concept. A location service that is serviced by the Community on a non-critical infrastructure system or elsewhere. Frankfurt/Main, 2010-08-29 == Execution == * 2010-08-29 (A): Ruling sent to: (C), (Board), (Critical Team), (Software-Assessment Team), Community, (CM) * 2010-08-29 (A): no further actions needed, case closed. == Similiar Cases == || [[Arbitrations/a20090525.1|a20090525.1]] || [[Arbitrations/a20090525.1|Event officer request recurrent notification to assurers near the location of the following ATEs]] || || [[Arbitrations/a20091127.1|a20091127.1]] || [[Arbitrations/a20091127.1|answers of the cats exam found by google search ]] || ---- . CategoryArbitration . CategoryArbCaseSystemTasks