Intelligent Software Engineering Lab

Duplicate Question Detection in Stack Overflow: A Reproducibility Study

This page provides companion data for the paper submitted to SANER'2018 - RENE Track. This data is intended to make our results reproducible.


Rodrigo Fernandes Gomes da Silva
Klérisson Vinícius Ribeiro Paixão
Marcelo de Almeida Maia

Source Code

We make publicly available the source code for our reproductions of DupPredictor and Dupe called DupPredictorRep and DupeRep. Follow the above links for detailed steps of the reproduction. 


We provide two dumps, both containing the main tables. They differ only in the table "posts". In Dump 1, the table data is stemmed and had the stop words removed. Also it has the synonyms of tags and code blocks already extracted. In Dump 2the table contains the original raw content. The fastest way to reproduce DupPredictor or Dupe is using Dump 1. If you desire to run the entire process, including the stemming and stop words removal, follow the instructions available in the preprocess step for stemming and removing the stop words.



Free Global Counter Flag Counter
« prev  |   top  |   next »

Best Doctoral Thesis - CBSoft'2021

The doctoral thesis of Rodrigo Fernandes, CROKAGE: Effective Solution Recommendation for Programming Tasks by Leveraging Crowd Knowledge, won the prize of BEST doctoral thesis on Software Engineering at CBSoft'2021 ... more

Bad Smells: Which, When, What, Who, Where

This most comprehensive systematic literature review ever on bad smells includes 351 papers ranging from 1992 to 2017. We show the prevalence of smells in studies, the chronology, the main findings, the shape of collaborations, challenges and much ...more

Defects4J Dissection

Defects4J Dissection presents data to help researchers and practitioners to better understand the Defects4J bug dataset. ...more

Powered by CMSimple | Template by CMSimple! | Login