click here to download abstract project

click here to download the base paper

The automatic extraction of bibliographic data is still a difficult task to this day, when it is realized that the scientific publications are not in a standard format and each publications has its own template. There are many “regular expression” techniques and “supervised machine learning” techniques for extracting the
complete details of the references mentioned in the bibliographic section. But there is no much difference in the percentage of their success.This paper presents a strategy for segregating and automatically extracting the individual components of references such as Authors, Title of the references,publications details etc., using “Unsupervised technique” and link these references to their corresponding full text article with the help of google.
Keywords: Regular Expression, supervised machine learning, Bibliography,References, Unsupervised technique

Researchers typically download and collect numerous research papers in PDF form onto their desktops for reading and for further reference. These downloaded PDF files are usually stored along with other files in our local file system.Reference manager softwares like RefWorks, Zotero, EndNote,SodhanaRef and Mendeley are available in the market take the help of extracted basic metadata such as Title, Author, Abstract, etc. to search a article. But the essential feature of “reference linking” is not focused as they are individual with no association or links even if a scholarly publication cites an already existing one in the researcher’s personal computer, they are in no way associated and the researcher may not know the type of association between them. Most of the researchers focused on document clustering based on their context. But this does not help a researcher who is actually performing literature survey. Classification helps in binding all the research articles that worked on the same area and suggest to the reader. Most of the researchers prefer the use of snowball sampling technique. In this particular technique, the researcher checks for the most relevant references in a primary article and tries to get the corresponding full texts of those references. Basically the first search will be on their personal computer. But to find a full text article
in their file system is very time consuming and hectic task for the researcher. Instead they prefer to download from the web using the reference citation which causes redundancy. To make their search easy, there are many reference management software available in the market like Mendeley, Zotero, etc. which are good at extraction the metadata from each journal article and provide the researcher to find an article using its Title or Author name, etc. But still taking the title part of each and every reference and searching it manually in the reference management software is also a difficult task for researcher. To eliminate this problem and to reduce the manual strain for the researcher we have “SodhanaRef” which helps in automation of reference article linkage along with providing a semantic search for which the performance is 67% and also we have “A Strategy for Automatically Extracting References from PDF Documents” in which used supervised technique with annotations and regular expressions are used. The increase in performance is 74% which is very less.So to improve the performance, we are demonstrating an unsupervised approach for automatically extracting the components of the references and thereby linking the references either to their file system(if the file is already present in their personal computer)or to the google scholar which helps in reducing the search time and manual strain for the researcher

Leave a Comment


No comments yet. Why don’t you start the discussion?

Leave a Reply

Your email address will not be published. Required fields are marked *