Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

create_dataset and add_file error out #82

Open
Danny-dK opened this issue Feb 11, 2021 · 16 comments
Open

create_dataset and add_file error out #82

Danny-dK opened this issue Feb 11, 2021 · 16 comments
Labels
Milestone

Comments

@Danny-dK
Copy link
Contributor

I'm new to the dataverse package. I can't seem to get things working and I don't understand any of the errors.

The following functions are working:

get_dataverse('mydatversename')
contents <- dataverse_contents('mydataversename')
get_dataset(contents[[1]])

But adding datafiles to a newly created dataset fails (i'm following the examples given here https://www.rdocumentation.org/packages/dataverse/versions/0.3.0):

library(dataverse)
Sys.setenv("DATAVERSE_SERVER" = "demo.dataverse.nl")
Sys.setenv("DATAVERSE_KEY" = "myapitokenfromdataverse")

# retrieve your service document
d <- service_document()

# create a list of metadata
metadat <-
  list(
    title       = "My Study3",
    creator     = "Doe, John",
    description = "An example study"
  )

# create the dataset
ds <- initiate_sword_dataset("mydataversename", body = metadat)

# add files to dataset
tmp <- tempfile()
write.csv(iris , file = tmp)
f <- add_file(ds, file = tmp)

The add_file() fails with the error:

Error in handle_url(handle, url, ...) : 
  Must specify at least one of url or handle

The following also does not work

f <- add_dataset_file(file = tmp, dataset = ds)
Error in parse_url(url) : length(url) == 1 is not TRUE

The other example at the previously mentioned link using the native API throws an error at the very first line and thus can't even create the dataset:

metadat <-
  list(
    title       = "My Study4",
    creator     = "Doe, John",
    description = "An example study"
  )

ds <- create_dataset('mydataversename', body = metadat)

Error in create_dataset("mydataversename", body = metadat) : 
  Internal Server Error (HTTP 500).

Can anyone tell me why it is not working to add files? I'm not particularly adept at error handling, so if you need more info please instruct me on how to receive that info.

R version 4.03
Rstudio 1.4.1103
Windows 1909 build 18363.1256
demo.dataverse.nl v. 4.18.1
dataverse package v 0.3.0

@Danny-dK Danny-dK changed the title adding files to dataset errrors adding files to dataset errors Feb 11, 2021
@Danny-dK
Copy link
Contributor Author

Danny-dK commented Feb 11, 2021

I am able to get it working by converting the curl command (https://guides.dataverse.org/en/5.3/api/native-api.html#create-dataset-command) to R code using this online tool to convert curl to R https://curl.trillworks.com/#r

Curl as displayed in https://guides.dataverse.org/en/5.3/api/native-api.html#create-dataset-command at the "Add a File to a Dataset" section:

curl -H X-Dataverse-key:xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx -X POST -F file=@data.tsv -F 'jsonData={"description":"My description.", "restrict":"false"}' "https://demo.dataverse.nl/api/datasets/:persistentId/add?persistentId=doi:[mydoicreated]&version=DRAFT"

becomes R:


require(httr)

headers = c(
  `X-Dataverse-key` = 'xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx'
)

params = list(
  `persistentId` = 'doi:[mydoicreated]',
  `version` = 'DRAFT'
)

files = list(
  `file` = upload_file('data.tsv'),
  `jsonData` = '{"description":"My description.", "restrict":"false"}'
)

res <- httr::POST(url = 'https://demo.dataverse.nl/api/datasets/:persistentId/add', httr::add_headers(.headers=headers), query = params, body = files)

The R code now works using httr library. Why would it not work with the dataverse package?

EDIT::
F me. The HTTR code got me thinking on the reference to the dataset name. Originally I had:

# create the dataset
ds <- initiate_sword_dataset("mydataversename", body = metadat)

# add files to dataset
tmp <- tempfile()
write.csv(iris , file = tmp)
f <- add_file(ds, file = tmp)

The add_file did not work nor did add_dataset_file. However changing it do include the full doi does work:

f <- add_dataset_file(file = tmp, dataset = 'doi:[mydoicreated]&version=DRAFT')

but this only works for add_dataset_file, not for add_file. The create_dataset() function still produces the same error. So for now I can create a dataset with the 'inititate_sword_dataset()' function and add files to it with add_dataset_file() as long as I give the full doi of that dataset (referencing it with just the dataset name created, doesn't seem to work).

@Danny-dK
Copy link
Contributor Author

Danny-dK commented Mar 2, 2021

As per request.
To summarize, my code that now works is:

# Load required libraries.

library(dataverse)
library(rstudioapi)
library(dplyr)


# Set variables
 
Sys.setenv("DATAVERSE_SERVER" = "demo.dataverse.nl")
Sys.setenv("DATAVERSE_KEY" = askForPassword())


# Set working directory to where you opened this saved R file. (It is essentially somewhat similar to what the 'here' package does)

setwd(dirname(getActiveDocumentContext()$path))


# List local files to upload (from a folder called dataversetest on my hard-drive)
 
files <-  list.files(path = 'dataversetest/', full.names = TRUE)


# Retrieve service document

d <- service_document()


# Specify the metadatafields as objects.

metatitle <- paste0("thisistheRtestmult_", format(Sys.time(), '%Y%m%d_%H%M'))
metacreator <- "Doe, John"
metadescription <- "AnRtestupload"


# Create the actual metadata list.

metadat <- list(
              title       = metatitle,
              creator     = metacreator,
              description = metadescription)


# Create a dataset. The dataset created will get the name as specified in the 'title' entry of the metadata listed before. 

ds <- initiate_sword_dataset(dataverse = "UU", body = metadat)


# Search for the doi of the just created dataset. This is required to upload files using add_dataset_file in the next line after. 
# Specifying the ds object created in the line above as the dataset in the add_dataset_file function does not work.

mydatadoi <- paste0(dataverse_search(title = metatitle, type = "dataset")$global_id, '&version=DRAFT')


# Add the files to your dataset created above.

f <- lapply(files, function(x) add_dataset_file(file = x, dataset = mydatadoi))

What doesn't seem to work is creating a dataset using create_dataset instead of initiate_sword_dataset() before or after retrieving the service document:

ds <- create_dataset('UU', body = metadat)

Which produces:

Error in create_dataset("UU", body = metadat) : 
  Internal Server Error (HTTP 500).

And also the add_file produces an error when used instead of add_dataset_file():

f <- lapply(files, function(x) add_file(file = x, dataset = mydatadoi))

Producing the error:

(file = x, dataset = mydatadoi) : 
  Bad Request (HTTP 400). 

I resolved the previous error in the original post by first searching for the complete draft doi of the created dataset package, but it is still producing an 400 error.

Note in the second post that the curl commands work as intended.

@kuriwaki kuriwaki changed the title adding files to dataset errors create_dataset and add_file error, while initiate_sword_dataset and add_dataset_file still do Mar 2, 2021
@kuriwaki kuriwaki changed the title create_dataset and add_file error, while initiate_sword_dataset and add_dataset_file still do create_dataset and add_file error out Mar 2, 2021
@kuriwaki
Copy link
Member

kuriwaki commented Mar 2, 2021

Thanks. Managed to reproduced this on my end.

@Danny-dK
Copy link
Contributor Author

Danny-dK commented Mar 3, 2021

@kuriwaki Need a bit of further help.

I tried to get the native API curl example mentioned in https://guides.dataverse.org/en/5.3/api/native-api.html at the topic "Create a Dataset in a Dataverse" working in R. They mention that you "must" supply a json file containing the metadata fields to create a dataset in a dataverse. An example of a json file required is provided by them at https://guides.dataverse.org/en/5.3/_downloads/dataset-create-new-all-default-fields.json. The curl code is:

curl -H "X-Dataverse-key:xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" -X POST "https://demo.dataverse.org/api/dataverses/root/datasets" --upload-file "dataset-create-new-all-default-fields.json"

Using https://curl.trillworks.com/#r and some fiddling about (not really understanding what I'm doing), I eventually got to a working one in R:

require(httr)

headers = c(
  `X-Dataverse-key` = 'xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx'
)

metadat <- upload_file('D:/dataset-create-new-all-default-fields.json')

res <-httr::POST(url = "https://demo.dataverse.nl/api/dataverses/ddktestdeleteme/datasets", httr::add_headers(.headers=headers), body = metadat)

My question is 2-fold:

  1. How would one create a json metadata body file from within R

  2. What is the main difference between my R code that works with httr and the code within the dataverse package:

create_dataset <- function(dataverse, body, key = Sys.getenv("DATAVERSE_KEY"), server = Sys.getenv("DATAVERSE_SERVER"), ...) {
    dataverse <- dataverse_id(dataverse, key = key, server = server, ...)
    u <- paste0(api_url(server), "dataverses/", dataverse, "/datasets/")
    r <- httr::POST(u, httr::add_headers("X-Dataverse-key" = key), body = body, encode = "json", ...)
    httr::stop_for_status(r, task = httr::content(r)$message)
    httr::content(r)
}

@kuriwaki
Copy link
Member

kuriwaki commented Mar 4, 2021

Unfortunately, I'm not the best person to ask httr/curl questions (don't know enough). But I can help point to others.

Regarding 1, pyDataverse may have thought about this issue more. I asked at gdcc/pyDataverse#116

Regarding 2, well,

library(dataverse)

server <- "demo.dataverse.nl"
dataverse <- "ddktestdeleteme"
identical(paste0(dataverse:::api_url(server), "dataverses/", dataverse, "/datasets/"), # taken from create_dataset()
          "https://demo.dataverse.nl/api/dataverses/ddktestdeleteme/datasets") # given in working code
#> [1] FALSE

# but
identical(paste0(dataverse:::api_url(server), "dataverses/", dataverse, "/datasets"), # remove last slash
          "https://demo.dataverse.nl/api/dataverses/ddktestdeleteme/datasets")
#> [1] TRUE

Created on 2021-03-03 by the reprex package (v0.3.0)

So in create_dataset() there's only that (a) trailing slash and (b) encode = "json" that's different, right? Do you mind trying editing (a), (b), and both out of the function and seeing if that fixes it?

Having no easy reprex is inherent to these upload examples, so thanks for your patience..

@Danny-dK
Copy link
Contributor Author

Danny-dK commented Mar 4, 2021

@kuriwaki No worries, I appreciate all the help I'm getting!

I'll start with the positive. The create_dataset() function of the dataverse package actually works as long as you use upload_file() of the httr package along with the json file. Don't know why I missed this. So this works:


library(dataverse)
library(rstudioapi)
library(httr)

Sys.setenv("DATAVERSE_SERVER" = "demo.dataverse.nl")
Sys.setenv("DATAVERSE_KEY" = 'xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx')

metadat <- upload_file('D:/dataset-create-new-all-default-fields.json')

create_dataset('ddktestdeleteme', body = metadat)

I tried changing the functions to add or remove the slash and encode = "json" both in the httr code and in a new function resembling the create_dataset() function. For simplicity's sake, I'll just show the create_dataset(), but the exact same messages occur with adjusting the httr code.

#Dataverse's create_dataset funtcion. Note the first line within the function is removed as that seemed to give errors 
# and didn't seem required. Removed: dataverse <- dataverse_id(dataverse, key = key, server = server, ...)


create_datasettest <- function(dataverse, body, key = Sys.getenv("DATAVERSE_KEY"), server = Sys.getenv("DATAVERSE_SERVER"), ...) {
   
    u <- paste0(dataverse:::api_url(server), "dataverses/", dataverse, "/datasets/")
    r <- httr::POST(u, httr::add_headers("X-Dataverse-key" = key), body = body, encode = "json", ...)
    httr::stop_for_status(r, task = httr::content(r)$message)
    httr::content(r)
}

server <- "demo.dataverse.nl"
dataverse <- "ddktestdeleteme"
key <- "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx"
metatitle <- paste0("thisistheRtestmult2_", format(Sys.time(), '%Y%m%d_%H%M'))
metacreator <- "Doe, John"
metadescription <- "AnRtestupload"

metadat <- list(
              title       = "sometitle",
              creator     = "Doe, John",
              description = "AnRtestupload")

create_datasettest(dataverse, body = metadat, key = key, server = server)

The trailing backslash doesn't seem to matter. Adding or removing it does not seem to change the outcome.
Keeping the encode = "json" gives the 500 error code.
Removing the encode = "json" gives the 400 code with the added info:

Bad Request (HTTP 400). Failed to Error parsing Json: Unexpected char 45 at (line no=1, column no=2, offset=1). 

Thus, it all works when using httr::upload_file(somejson.json) in either the raw httr code or in the dataverse funcion create_dataset().

The pydataverse documentation also seems to list that a json must be specified. Looking at https://pydataverse.readthedocs.io/en/latest/ at the create dataset section they specify a json as ds.json(). So I'm assuming for the create_dataset() to work as intended, there needs to be a way to give the "body" option a content in the form of a json object or file. My current way of listing the 'metadat' just doesn't seem to be recognized because of the format and / or the items specified and I don't know how to specify a json file. The examples the dataverse help documentation give do show what a json file looks like https://guides.dataverse.org/en/5.3/_downloads/dataset-finch1.json but I don't know how to do this efficiently in R. Adding just an empty json as per example https://stackoverflow.com/a/20109959 and write it to local disk as per example https://rfaqs.com/reading-and-writing-json-files-in-r/, and then use metadat <- upload_file(createdempty.json) and use that as the body, it also fails. So I'm assuming certain fields are required by Dataverse and certainly need to be in a certain order / format.

::EDIT::
A helper function as used in the initiate_sword_dataset() would be nice to build the metadata structure:

build_metadata: https://rdrr.io/cran/dataverse/src/R/build_metadata.R
referred in initiate_sword_dataset: https://github.com/IQSS/dataverse-client-r/blob/master/R/SWORD_dataset.R

@Danny-dK
Copy link
Contributor Author

Danny-dK commented Mar 4, 2021

Well...it's dumb as hell and ain't pretty, but it works. Using the minimal fields required for the json:

jsonmetaverse <- function(title, author, affiliation, description){
  headerjson <- paste0('{"datasetVersion": {"metadataBlocks": {"citation": {"fields": [{')
  titlejson <- paste0('"value":"', title, '", "typeClass": "primitive", "multiple": false,"typeName": "title"},')
  authornamejson <- paste0('{"value": [{"authorName": {"value":"', author, '", "typeClass": "primitive", "multiple": false,"typeName": "authorName"},')
  authoraffiliationjson <- paste0('"authorAffiliation": {"value":"', affiliation, '", "typeClass": "primitive", "multiple": false, "typeName": "authorAffiliation"}}],')
  authorjson <- paste0(authornamejson, authoraffiliationjson, '"typeClass": "compound", "multiple": true,"typeName": "author"},')
  descriptionvaluejson <- paste0('{"value": [ {"dsDescriptionValue":{"value":"', description, '", "multiple":false, "typeClass": "primitive", "typeName": "dsDescriptionValue"}}],')
  descriptionfulljson <- paste0(descriptionvaluejson, '"typeClass": "compound","multiple": true,"typeName": "dsDescription"}')
  closingjson <- paste0(']}}}}}')
  fulljson <-paste0(headerjson, titlejson, authorjson, descriptionfulljson, closingjson)
}

testmeta <- jsonmetaverse(title = 'testing stuff', author= 'Doe John', affiliation = 'testverse', 
                          description = 'testing working json upload meta')



library(dataverse)

Sys.setenv("DATAVERSE_SERVER" = "demo.dataverse.nl")
Sys.setenv("DATAVERSE_KEY" = 'xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx')

create_dataset('ddktestdeleteme', body = testmeta)

Note that you can actually create a dataset with an empty json file as mentioned at IQSS/dataverse#6752, which I managed to do, but then the dataset cannot be opened or viewed within dataverse.

@kuriwaki
Copy link
Member

kuriwaki commented Mar 6, 2021

Thank you @Danny-dK for the investigation!

I am still learning the create* parts of the package, but it seems like you're post highlights 2 concrete and relatively straightfoward things:

  1. Either document that the body argument of create_dataset must be a httr::upload_file object, OR (perhaps better) write in httr::upload_file into the function
  2. Implement dataverse::build_metadata into create_dataverse too, just like initiate_sword_dataset, so that users can input a R list.

Would you be comfortable or interested in starting a fork/branch to make and test these fixes?

In the mid/long-term, I am trying to figure out if we really need to maintain a native vs. sword API, two parallel things, to do the same thing.

@Danny-dK
Copy link
Contributor Author

Danny-dK commented Mar 6, 2021

@kuriwaki Well I don't think I'm R versed enough to try these out. Someone with more R experience should probably look into it.

For now I can work around it and by using my very very ugly paste code anyone can create the required metadata and doesn't require httr::upload_file(). The function jsonmetaverse in my post just pastes the different json metadata parts behind each other while keeping some keywords to be filled in by the user and is accepted as is by create_dataset(). It's the very minimal required info for creating a dataset. If others want more or other metadata present, just look at all the dataverse metadata so far https://guides.dataverse.org/en/latest/_downloads/dataset-create-new-all-default-fields.json and add them to the very very ugly paste code. One could of course just adjust the json file and then use that file as the body within the create dataset.

You do raise an interesting point about whether to keep both native and sword api functions. If there really isn't any advantage of one over the other, I would think to just keep using the sword api to create the dataset as it seems it already has helper functions to create the metadata. Or perhaps more simple would be to explain in the help file what the contents of the body should be in create_dataset() (e.g. The body should consist of allowed dataverse metadata in json format, either as a json file (see https://guides.dataverse.org/en/latest/_downloads/dataset-create-new-all-default-fields.json) or as an R object that adheres to the json structure of the aforementioned link. Currenlty, no helper functions exist to create metadata in a json format (see initiate_sword_dataset() for another way in creating a dataset that has metadata helper functions built-in to create the associated xml metadata used in the sword api.))

For now I'll close this issue, as the cause of the issue is identified and 2 workarounds are available (1 create dataset using sword api, 2 create dataset using native api but supplying the body into a json file or json structured R object). If you want to keep it open, re-open it at will.

@Danny-dK Danny-dK closed this as completed Mar 6, 2021
@kuriwaki kuriwaki reopened this Mar 6, 2021
@kuriwaki
Copy link
Member

kuriwaki commented Mar 6, 2021

I'll reopen since it still identifies a bug in master and it seems fixable. I might not be able to fix it this month, but I'll see if I can make a branch - and PRs by others are welcome.

One question about what you wrote: why can't you use or tweakdataverse::build_metadata instead of your custom function to jsonmetaverse ? I didn't realize the former existed.

@Danny-dK
Copy link
Contributor Author

Danny-dK commented Mar 7, 2021

Several reasons of which the main one is that I don't know how. Other reasons are that the build_metadata only seems to accept dc terms and are in an xml format. If one could find out how to refer to the json metdata format https://guides.dataverse.org/en/latest/_downloads/dataset-create-new-all-default-fields.json and extract only the relevant blocks that occur in a list of keywords provided by a user, then I guess one could do it.

The build_metadata function called in initiate_sword_dataset() https://rdrr.io/cran/dataverse/src/R/build_metadata.R:

dcterms_fields <- c("abstract","accessRights","accrualMethod","accrualPeriodicity",
                    "accrualPolicy","alternative","audience","available",
                    "bibliographicCitation","conformsTo","contributor","coverage",
                    "created","creator","date","dateAccepted","dateCopyrighted",
                    "dateSubmitted","description","educationLevel","extent","format",
                    "hasFormat","hasPart","hasVersion","identifier","instructionalMethod",
                    "isFormatOf","isPartOf","isReferencedBy","isReplacedBy","isRequiredBy",
                    "issued","isVersionOf","language","license","mediator","medium",
                    "modified","provenance","publisher","references","relation","replaces",
                    "requires","rights","rightsHolder","source","spatial","subject",
                    "tableOfContents","temporal","title","type","valid")

build_metadata <- function(..., metadata_format = "dcterms") {
    if (metadata_format == 'dcterms') {
        pairls <- list(...)
        if (any(!names(pairls) %in% dcterms_fields)) {
            stop('All names of parameters must be in Dublin Core')
        }
        if (!'title' %in% names(pairls)) {
            stop('"title" is a required metadata field')
        }
        entry <- xml2::read_xml('<entry xmlns="http://www.w3.org/2005/Atom" xmlns:dcterms="http://purl.org/dc/terms/"></entry>')
        dcchild <- function(nodevalue, nodename) {
            add <- paste0("dcterms:", nodename)
            child <- xml2::xml_add_child(entry, .value = add)
            xml2::xml_text(child) <- nodevalue
            TRUE
        }
        mapply(dcchild, pairls, names(pairls))
        structure(as.character(entry), class = c("character", "dataverse_metadata"), format = "metadata_format")
    } else {
        stop("Unrecognized metadata format requested")
    }
}

@skasberger
Copy link

Regarding 1, pyDataverse may have thought about this issue more. I asked at gdcc/pyDataverse#116

@Danny-dK @kuriwaki

Yes, the JSON can be created and validated with pyDataverse, but only for the default metadatablock (not for customized ones, but they are rarely in use as far as i know).

The workflow:

  • The import can be done with help of the CSV templates or directly with Dataset.set().
  • The export is done with Dataset.json().

You can find this in more detail in our user guides:

@Danny-dK
Copy link
Contributor Author

Danny-dK commented Mar 10, 2021

So that still means a csv template is required. There is no helper function like in the initiate_sword_dataset() function https://rdrr.io/cran/dataverse/src/R/build_metadata.R in which it suffices to just specify approved keywords corresponding to metadata terms and the functions builds the required metadata object for you:

# Specify the metadatafields as objects.

metatitle <- paste0("thisistheRtestmult_", format(Sys.time(), '%Y%m%d_%H%M'))
metacreator <- "Doe, John"
metadescription <- "AnRtestupload"


# Create the actual metadata list.

metadat <- list(
              title       = metatitle,
              creator     = metacreator,
              description = metadescription)

initiate_sword_dataset(dataverse = "ddktestdeleteme", body = metadat)

That's a bit of a shame.

::EDIT::

Maybe this might give people more ideas on how to possibly go forward. This is as far as I could get. The horrible paste code posted before could also be accomplished with:

library(jsonlite)
library(dplyr)
library(dataverse)

fulljson <- fromJSON('D:/dataset-finch2 - Copy.json')
partjson <- fulljson$datasetVersion$metadataBlocks$citation$fields


datasettitle = 'my_new_title'
datasetauthorname = 'Doe new John'
datasetaffiliation ='testing uni new'
description = 'mydescription new'


for(i in seq_along(partjson)){
  
  if(!is.na(partjson$typeName[i]) & partjson$typeName[i] == "title"){
    
    partjson$value[i] <- datasettitle
    
    }else{
      
      if(!is.na(partjson$typeName[i]) & partjson$typeName[i] == "author"){
    
          for(x in seq_along(partjson$value)){
      
            if(grepl('authorName', partjson$value[x])){
    
              partjson$value[[x]]$authorName$value <- datasetauthorname
              
              }
            
            if(grepl('authorAffiliation', partjson$value[x])){
              
              partjson$value[[x]]$authorAffiliation$value <- datasetaffiliation
            
              }
            
            }
        
        }else{
          
          if(!is.na(partjson$typeName[i]) & partjson$typeName[i] == "dsDescription"){
            
            for(x in seq_along(partjson$value)){
              
              if(grepl('dsDescription', partjson$value[x])){
                
                partjson$value[[x]]$dsDescriptionValue$value <- description
                
                }
              }
            }
        }
    }
}


newjson <- list()

newjson$datasetVersion$metadataBlocks$citation$fields <- partjson
 
prettyjson <- toJSON(newjson, pretty = TRUE, auto_unbox = TRUE)


Sys.setenv("DATAVERSE_SERVER" = "demo.dataverse.nl")
Sys.setenv("DATAVERSE_KEY" = 'xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx')

create_dataset('ddktestdeleteme', body = prettyjson)  

It uses dataset-finch2 - Copy.json.txt (just remove the .txt), which is the minimal metadata required imo. Hope it helps in getting ideas (I can imagine it to be a hell of a job doing this for all allowed dataverse json metadata entries).

@Danny-dK
Copy link
Contributor Author

Last post from my side. Another option could be to create separate json files of all the entries in https://guides.dataverse.org/en/latest/_downloads/dataset-create-new-all-default-fields.json and call on them if the user specifies them in a list.
For example, placing these 4 files in a directory called dataverse:
author.json.txt
description.json.txt
subtitle.json.txt
title.json.txt

library(jsonlite)
library(dplyr)
library(dataverse)
library(stringr)

#  create metadata as list. The list names function as general names to call on the json files.

metalisttemp <- list(description = list(dsDescriptionValue = 'mydescription new 3'), 
                     title = list(title = 'my_new_title_3'), 
                     author = list(authorName = 'Doe new3 John', authorAffiliation ='testing uni new3' ))


# create list of json files present

jsonfiles <- as.data.frame(list.files(path = 'D:/dataverse/', full.names = TRUE)) %>% 
    
            rename(., v1 = 1) %>% 
  
            mutate(., subject = str_extract(v1, '(?<=dataverse/)[^\\.]+'))


# import the json files

for(i in 1:length(jsonfiles$v1)){
  
  if(jsonfiles$subject[i] %in% names(metalisttemp)){
    
    assign(jsonfiles$subject[i], fromJSON(jsonfiles$v1[i]))}}


# create list object of the imported json files

jsonobj <- mget(ls(pattern = paste(names(metalisttemp), collapse = '|')))


# Unlist the metalisttemp object and remove the text 'list(' and ')'. Afterwards, recreate list from the character string.

metalisttemp_unnest <- toString(metalisttemp) %>% 
  
                       gsub('list\\(', '', .) %>% 
  
                       gsub('\\)', '', .) 
  
metalist <-  eval(parse(text=paste('list(', metalisttemp_unnest, ')'))) 


# For all entries in jsonobj and metalist: if names in both object=s are equal and there are no multiple json entries (no sublists),
# then within the 'value' portion of the json object, paste the corresponding value given by the user in the metalist in the
# value field of the JSON. Else, if there are multiple entries in a json object and the names in those sublist are equal 
# to names in the metalist object, then paste the corresponding value given by the user in the metalist in the value field of 
# the JSON (within that sublist). 

for(i in seq_along(jsonobj)){
  
  for(j in seq_along(metalist)){
    
    if(names(jsonobj[i]) == names(metalist[j]) & jsonobj[[i]]$multiple != TRUE){
      
      jsonobj[[i]]$value <- unname(metalist[[j]])
      
      }else{
        
        if(jsonobj[[i]]$multiple == TRUE){
          
          for(k in seq_along(jsonobj[[i]]$value)){
            
            if(names(jsonobj[[i]]$value[k]) == names(metalist[j])){
              
              jsonobj[[i]]$value[[k]]$value <- unname(metalist[[j]])
              
            }
          }
        }
      }
  }
}

    
# create new json list 
 
newjson <- list()


# For each object in the jsonobj, assign it to the header section of the dataverse json metadata form.

for(i in seq_along(jsonobj)){
  
newjson$datasetVersion$metadataBlocks$citation$fields[i] <- jsonobj[i]}


# prettify the json so that it is readible by the create_dataset function.

prettyjson <- toJSON(newjson, pretty = TRUE, auto_unbox = TRUE)


Sys.setenv("DATAVERSE_SERVER" = "demo.dataverse.nl")
Sys.setenv("DATAVERSE_KEY" = 'xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx')

create_dataset('ddktestdeleteme', body = prettyjson)  

@kuriwaki kuriwaki added this to the CRAN 0.4.0 milestone Dec 21, 2021
@kuriwaki kuriwaki mentioned this issue Apr 9, 2022
4 tasks
@kuriwaki
Copy link
Member

kuriwaki commented Apr 9, 2022

@Danny-dK do you think this is resolved given #116 is closed?

@Danny-dK
Copy link
Contributor Author

@kuriwaki No. the add_file() still doesn't work (sword api) and neither does the create_dataset() (native api). Right now the current working situation to create a dataset and upload files to it is to inititate_sword_dataset() (sword api) followed by using add_dataset_file() (native api). So it is still a bit of a mix and match. The reason create_dataset() was not working is that it requires a functional json format of metadata supplied (see this answer #82 (comment) and this one #82 (comment) in the current thread).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants