This page provides companion data for the paper submitted to SANER'2018 - RENE Track. This data is intended to make our results reproducible.
Authors
Rodrigo Fernandes Gomes da Silva
Klérisson Vinícius Ribeiro Paixão
Marcelo de Almeida Maia
Source Code
We make publicly available the source code for our reproductions of DupPredictor and Dupe called DupPredictorRep and DupeRep. Follow the above links for detailed steps of the reproduction.
Dataset
We provide two dumps, both containing the main tables. They differ only in the table "posts". In Dump 1, the table data is stemmed and had the stop words removed. Also it has the synonyms of tags and code blocks already extracted. In Dump 2, the table contains the original raw content. The fastest way to reproduce DupPredictor or Dupe is using Dump 1. If you desire to run the entire process, including the stemming and stop words removal, follow the instructions available in the preprocess step for stemming and removing the stop words.