Wednesday, June 21, 2023

Data papers as documentation of research processes and practices; Contexts of Data Discovery and Selection Criteria for Clinical Trials Data @asist_ec #ist23

Isto Huvila

My penultimate liveblog from the Information Science Trends conference taking place in Uppsala, Sweden and online (I'll do one or two non-live blog posts later). First in this session: Data papers as documentation of research processes and practices, presented by Isto Huvila(pictured), and coauthored with Dydimus Zengenene, Olle Sköld and Lisa Andersson. The abstract is at https://zenodo.org/record/8059285 
A data paper was defined "as peer-reviewed text describing a data set and published in a peer reviewed journal". In such a paper tere tends to be a context/summary; methods; data files; notes on validity of the data; notes about its potential use and reuse; notes on its reproducability and whether code (used with the data) is available: however, there is not a standard format. There are some journals which are specifically focused on this type of paper. It can be a way of encouraging people to publish their data, to encourage reuse and also to improve the staus of this kind of paper (as well as the usual thing of getting a publication and citations). What hasn't been examined so much on the extent to which these papers document the research process.  
Huvila went on to talk about how research processes and practices were described in 77 archaeology articles, identifying variation. He highlighted some huge differences in the amount of detail given  about data collection - from a senetence to dense paragraphs. What was relevant for the document would also vary. Another issue is that some matters might be documented in the article and some in the data set (e.g. survey questions as part of the data set).  In terms of authorship, it is not always made clear who did what in the research. There is evidence of disciplinary differences in terms of what is described and in what detail. There are further differences depending on whether primary or secondary data is involved. There is the issue of the kind of research behind the data. There may be differences between data from thesis data, project data and ongoing datasets. The original purpose of research - whether the data was central to the research or a by-product - ccould lead to difference.
Overall, it seemed like these papers were perhaps not paradata (providing data about processes).

Investigation of Contexts of Data Discovery and Selection Criteria for Clinical Trials Data, presented by Ying-Hsang Liu, and coauthored by Mingfang Wu and Megan Power. The abstract is here https://zenodo.org/record/7919508 
The aim of the project is to understand data discovery by clinical trial researchers, aiming to improve the experience. It involved interviewing 17 researchers and data specialists who had reused data (with also a pre-interview survey). This is sensitive data with strict processes for getting access. Research questions were: what criteria do researchers apply in assessing relevance and usability, and secondly the relationship between context of data discovery and the criteria. Thematic analysis was used. A few findings follow. Clinical trial designers looked at clinical trial registry data and conducted a meta analysis; Clinical/health guideline developers focused on definition of topic scope and topic mapping and aimed to identify gaps in existing guidelines; secondary study researchers undertook meta analysis through searching and consulting with experts and secondary data analysis (the latter with access to source data).

In terms of data attributes - there are specific data needs related to purpose and outcome of the study; to data quality and integrity; to metadata and documentation; and to access (e.g. contact information of data custodians). Selection criteria included scientific accuracy; completeness; currency - these were mapped to different contexts. Three standout observations were: providing consistent guides about data documentation and data dictionary; enhancing provenance and common license information; make metadata available together with datasets.

No comments: