Datasheets for Datasets - https://arxiv.org/abs/1803.09010
"Document [the dataset] motivation, composition, collection process, recommended uses, and so on. [They] have the potential to increase transparency and accountability within the machine learning community, mitigate unwanted biases in machine learning systems, facilitate greater reproducibility of machine learning results, and help researchers and practitioners select more appropriate datasets for their chosen tasks.''
The motivation behind the proposal was the electronics industry, where every component has a datasheet that describes its operating characteristics and recommended uses. In machine learning, data is the input for model training. Using the wrong dataset, or using a dataset outside of its original intent, or even not understanding well enough the limitations of a dataset, has dire consequences for the model. However, ``[d]espite the importance of data to machine learning, there is no standardized process for documenting machine learning datasets. To address this gap, we propose datasheets for datasets.''
This program is free software; you can redistribute it and/or modify it under
the terms of the GNU General Public License as published by the Free Software
Memorandum Template for Army Memoranda, updated in accordance with AR 25-50
This LaTeX template is available for authors to prepare a manuscript to Revista de Arqueologia da SAB(Sociedade de Arqueologia Brasileira).
Modelo LaTeX para que os autores preparem um artigo para a Revista de Arqueologia da SAB (Sociedade de Arqueologia Brasileira).