Implementation of Dense Retrievals #49

DelaramRajaei · 2023-12-27T19:07:42Z

Here is the issue, I will keep a record of all my findings as I work on the task of refining all aspects of the retrieval system on different datasets using dense retrievals.

DelaramRajaei · 2023-12-27T19:34:24Z

Hey @hosseinfani,
As mentioned here, I've downloaded the dbpedia and antique datasets. Could you please share the robust04 files with me so that I can initiate the dense indexing? There appears to be a problem extracting the stored tar files in the teams when using Windows.

Looking ahead, our next steps involve obtaining the clueweb12, clueweb09, and gov2 datasets. Similar to robust04, for gov2, we'll need to sign a contract, and they will send us a copy of the drive, as explained here.
I can begin by indexing the antique and dbpedia datasets.

hosseinfani · 2023-12-27T23:06:20Z

Hi @DelaramRajaei
I'm uploading the extracted files in our RePair > Datasets .. > Corpora >> Robust04
Can you upload the rest there as well?
I submitted the request for gov2.

DelaramRajaei · 2023-12-28T03:53:29Z

@hosseinfani
Yes, I will upload the raw datasets in teams.

DelaramRajaei · 2024-01-12T23:39:23Z

Hi @hosseinfani,

I wanted to provide you with an update on the indexing process. I downloaded the antique and dbpedia corpus and converted their format to the required jsonl format as mentioned in the documentation. I uploaded the jsonls in the Teams > RePir channel > files > Datasets & indexes > Corpora. Currently, I'm facing an issue when using pyserini for indexing. There seems to be a conflict with pygaggle, but I successfully removed pygaggle and used other libraries. However, I'm still encountering some issues with the library.

Hi @yogeswarl,

I noticed that you created the dense indexes for aol dataset. I followed the path you explained in the Readme and pyserini's documentation. However, I'm facing some problems. One issue is related to torch using CUDA. I installed torch with CUDA, but it's still not recognizing CUDA. Have you ever encountered this problem? Additionally, I have another question. Considering the large datasets and the possibility of running out of memory space, I wanted to know if you created the indexes using your local system or not?

DelaramRajaei added documentation Improvements or additions to documentation experiment Dataset Data loaders, datasests labels Dec 27, 2023

DelaramRajaei self-assigned this Dec 27, 2023

DelaramRajaei removed the documentation Improvements or additions to documentation label Feb 14, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implementation of Dense Retrievals #49

Implementation of Dense Retrievals #49

DelaramRajaei commented Dec 27, 2023

DelaramRajaei commented Dec 27, 2023

hosseinfani commented Dec 27, 2023

DelaramRajaei commented Dec 28, 2023

DelaramRajaei commented Jan 12, 2024 •

edited

Loading

Implementation of Dense Retrievals #49

Implementation of Dense Retrievals #49

Comments

DelaramRajaei commented Dec 27, 2023

DelaramRajaei commented Dec 27, 2023

hosseinfani commented Dec 27, 2023

DelaramRajaei commented Dec 28, 2023

DelaramRajaei commented Jan 12, 2024 • edited Loading

DelaramRajaei commented Jan 12, 2024 •

edited

Loading