Microsoft Research has announced that it will be adopting the Open Use of Data Agreement (O-UDA) data for several datasets that the company offers.
The Open Use of Data Agreement (O-UDA) is intended to make it easier for individuals and organizations that want to share data to do so, with minimal requirements for users and no restrictions on use. The O-UDA is complemented by the Computational Use of Data Agreement (C-UDA), an agreement intended for situations where a specific data use scenario is desirable or required.
It is not appropriate for datasets that include any data that might include materials subject to privacy laws (such as the GDPR or HIPAA) or other unlicensed third-party materials.
The O-UDA meets the open definition: it does not impose any restriction with respect to the use or modiļ¬cation of data other than ensuring that attribution and limitation of liability information is passed downstream. In the research context, this implies that users of the data need to cite the corresponding publication with which the data is associated. This aids in findability and reusability of data, an important tenet in the FAIR guiding principles for scientific data management and stewardship.
Microsoft recognizes that in certain cases, datasets useful for AI and research analysis may not be able to be fully “open” under the O-UDA. For example, they may contain third-party copyrighted materials, such as text snippets or images, from publicly available sources.
The law permits their use for research, so following the principle that research data should be “as open as possible, as closed as necessary,” Microsoft developed the Computational Use of Data Agreement (C-UDA) to make data available for research while respecting other interests. The software giant will prefer the O-UDA where possible, but perceives the C-UDA as a useful tool for ensuring that researchers continue to have access to important and relevant datasets.
Microsoft researcher John Krumm and collaborators collected GPS data from 21 people who carried a GPS receiver in the Seattle area. Users who provided their data agreed to it being shared as long as certain geographic regions were deleted.
This work covers key research on privacy preservation of GPS data as evidenced in the corresponding paper, “Exploring End User Preferences for Location Obfuscation, Location-Based Services, and the Value of Location,” which was accepted at the Twelfth ACM International Conference on Ubiquitous Computing (UbiComp 2010). The paper has been cited 147 times, including for research that builds upon this work to further the field of preservation of geo-privacy for location-based services providers.
Another example dataset is that of labeled hand images and video clips collected by researchers Eyal Krupka, Kfir Karmon, and others. The research addresses an important computer vision and machine learning problem that deals with developing a hand-gesture-based interface language.
The data was recorded using depth cameras and has labels that cover joints and fingertips. The two datasets included are FingersData, which contains 3,500 labeled depth frames of various hand poses, and GestureClips, which contains 140 gesture clips (100 of these contain labeled hand gestures and 40 contain non-gesture activity).
The research associated with this dataset is available in the paper “Toward Realistic Hands Gesture Interface: Keeping it Simple for Developers and Machines,” which was published in Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems.
Finally, the FigureQA dataset generated by researchers Samira Ebrahimi Kahou, Adam Atkinson, Adam Trischler, Yoshua Bengio and collaborators, introduces a visual reasoning task for research that is specific to graphical plots and figures.
Microsoft Research Open Data project was conceived from the start to reflect Microsoft Research’s commitment to fostering open science and research and to achieve this without compromising the ethics of collecting and sharing data. The company aims to make it easier for researchers to maintain provenance of data while having the ability to reference and build upon it.
No comments:
Post a Comment