Commit a08249eb authored by MONIKA SROHA's avatar MONIKA SROHA

adding fies

parents
\documentclass[12pt]{article}
\usepackage[english]{babel}
\usepackage{natbib}
\usepackage{url}
\usepackage[utf8x]{inputenc}
\usepackage{amsmath}
\usepackage{graphicx}
\graphicspath{{images/}}
\usepackage{parskip}
\usepackage{fancyhdr}
\usepackage{vmargin}
\setmarginsrb{3 cm}{2.5 cm}{3 cm}{2.5 cm}{1 cm}{1.5 cm}{1 cm}{1.5 cm}
\makeatletter
\let\thetitle\@title
\let\thedate\@date
\makeatother
\pagestyle{fancy}
\fancyhf{}
\rhead{\theauthor}
\lhead{\thetitle}
\cfoot{\thepage}
\begin{document}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\begin{titlepage}
\centering
\vspace*{0.5 cm}
\includegraphics[scale = 0.35]{IIT_Bombay_color_logo.png}\\[1.0 cm] % University Logo
\textsc{\Large INDIAN INSTITUTE OF TECHNOLOGY, BOMBAY}\\[1.0 cm]
\textsc{\large COMPUTER SCIENCE AND ENGINEERING}\\[0.2 cm] % Branch
\textsc{\Large SOFTWARE LAB}\\[0.2 cm] % Subject
\textsc{\Large CS699}\\[0.5 cm] % Course Code
\textsc{\large \textbf{RESEARCH PAPERS EASY ACCESS }}\\[0.2 cm]
% Project Name
\rule{\linewidth}{0.2 mm} \\[0.4 cm]
\\[1.0 cm]
\begin{minipage}{1\textwidth}
\begin{flushright}
\emph{\textbf{TEAM NAME: LEARNERS}} \\
DEEPTI MITTAL 193050025\linebreak
MONIKA SROHA 193050087\linebreak
SWARIL SINGHAL 193050039\linebreak
% Your Student Number
\end{flushright}
\end{minipage}\\[2 cm]
\vfill
\end{titlepage}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\tableofcontents
\pagebreak
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section{ABSTRACT}
We are generating "word cloud" for Research Papers of professors .The Research papers are extracted from the homepage of professor. The cloud gives greater prominence to words that appear more frequently in the Research Papers. The word-cloud will be clickable, by clicking on any word in the word cloud it will display the list of Research papers on which it appeared.
\section{INTRODUCTION}
We are representing Research papers published by a professor in the form of word cloud. Word Cloud is basically an image composed of words used in particular text. Our word cloud will be based on the terms appearing in the research papers published by a professor. These terms will be extracted from the links of research papers provided on the homepage of professor. Words in word cloud are clickable link displaying the list of research papers containing that word.The word cloud will be formed using frequently occurring words in the research papers. The word-cloud assigns weight to the words as per their frequencies in the research paper.
\section{MOTIVATION}
Word Clouds are effective tool to represent what is emphasized in your text. This can be used for displaying the field of interest of different professosr. For example, if you construct a word cloud for an Algorithm professor, you can instantly see what words are used more frequently in his research papers. This can help you spot words that perhaps he is focusing on the most or other key areas. Just like an info-graphic and other compelling pictorial representations, they:
\begin{itemize}
\setlength\itemsep{0.5em}
\item Make an impact
\item Are easy to understand
\item Can easily be shared
\item ease in handling papers published by the professor
\end{itemize}
These Word-Clouds are clickable. On clicking any word in the Word-cloud, it displays the list of Research Papers that contain that word so that the user can find the topics of his/her interest that intersect with professor research interest and from there he/she can redirect to related Research paper directly.
\section{PRIOR WORK}
There is no prior work done in representing the research paper in the form of Word-Cloud and doing word-wise classification of research papers of a professor. Also ease in handling the research paper by a professor.
\section{FEATURES}
\begin{itemize}
\setlength\itemsep{0.5em}
\item Clickable Word-Cloud classifying Research papers word-wise.
\item Each word will display the list of Research papers containing that word.
\item Each Research Paper in the list will be a link pointing directly to the Research Paper.
\item Ease in handling papers published by a professor
\item good GUI to handle research papers.
\end{itemize}
\pagebreak
\section{TECHNOLOGIES USED}
\section*{Django}
Django is a open source Web framework that encourages rapid development and clean, pragmatic design. Django takes security seriously and helps developers avoid many common security mistakes.
Some of the busiest sites on the Web leverage Django’s ability to quickly and flexibly scale. It takes care of much of the hassle of Web development, so you can focus on writing your app without needing to reinvent the wheel.
\section*{Python Tools and Libraries}
Python is a high-level, interpreted, interactive and object-oriented scripting language. Python is designed to be highly readable. It uses English keywords frequently where as other languages use punctuation, and it has fewer syntactical constructions than other languages.It supports functional and structured programming methods as well as OOPs. It can be used as a scripting language or can be compiled to byte-code for building large applications. It provides very high-level dynamic data types and supports dynamic type checking.It can be easily integrated with C, C++, COM, ActiveX, CORBA, and JavaScript.Libraries used in the project are:
\\
\begin{description}
\setlength\itemsep{2 em}
\item[\textbf{Beautiful Soup}] \hfill \\ Beautiful Soup is used for extracting Research Papers from homepage of professors given its URL.Beautiful soup is basically used for web scrapping. Web scraping is the process of downloading data from websites and extracting valuable information from that data.
\item[\textbf{PDF-Miner}] \hfill \\ PDF-Miner is a tool for extracting information from PDF documents. Unlike other PDF-related tools, it focuses entirely on getting and analyzing text data. PDF-Miner allows one to obtain the exact location of text in a page, as well as other information such as fonts or lines. It includes a PDF converter that can transform PDF files into other text formats.
\item[\textbf{Pattern.en}] \hfill \\ The pattern.en module contains a fast part-of-speech tagger for English (identifies nouns, adjectives, verbs, etc. in a sentence), sentiment analysis, tools for English verb conjugation and noun singularization & pluralization, and a Word-Net interface. This library helps in selecting valid words from the papers and merging frequency of similar words such as merging plural form of word to its singular form.
\end{description}
\section*{JavaScript, HTML and CSS} \hfill \\JavaScript is used in the frontend. JavaScript is a high level, dynamic, untyped, and interpreted programming language. JavaScript is used for displaying the words, making it clickable and finally showing the links to Research papers.
\pagebreak
\vfill
\section{WORKING PROCESS}
This is a web application with easy to use interface. Application will ask for the URL of the homepage of professor. On providing the URL, application will extract the link for actual page of the publications.Then, extract all the links for downloadable PDFs for the given professor by reading the structure of the professor's homepage and thus finally downloading the PDFs in the machine using the extracted links. Frequency counter is applied on each research paper to count the word with maximum frequency and thus finding 50 top most occurring words in the research papers while removing all the useless words such as adjectives, conjunctions, articles, etc termed as stop-words. We are using above 1000 stopwords to extract meaningful words. It will then create a final CSV of the top most 50 words on the basis of frequency. This CSV is used to generate a word-cloud by assigning weights to words as per their frequencies and display it in the web app. This word-cloud is clickable. Each word in the word-cloud is a clickable link. On clicking any word in the word-cloud, application will display the list of Research papers of the professor which contain that word in it.\newline
\pagebreak
\section{IMPLEMENTATION}
To run Research Papers Easy Access follow following steps:\\
\begin{enumerate}
\setlength\itemsep{2 em}
\item To host server open terminal in the given directory and give command\\
\begin{center}
\emph{python3 manage.py run server}\\
\vspace{0.5in}
\hspace*{-0.5in}
\centerline{\includegraphics[scale = 0.35]{1.png}}
\\[1.0 cm]
\end{center}
\pagebreak
\item Now open any Browser and go to \emph{localhost:8000}. This will display Homepage of Research Paper Easy Access.
\vspace{0.5 in}
\\
\hspace*{-0.5in}
\centerline{\includegraphics[scale = 0.35]{2.png}}\\[1.0 cm]
\pagebreak
\item Enter URL of homepage of a professor. Currently we have implemented it on seven professors.They are:
\begin{itemize}
\setlength\itemsep{1 em}
\begin{center}
\item[Pushpak Bhattacharya] : https://www.cse.iitb.ac.in/~pb/
\item[Rohit Gurjar] : https://www.cse.iitk.ac.in/users/rgurjar/
\item[Varsha Apte] : https://www.cse.iitb.ac.in/~varsha/
\item[Om Damani] : https://www.cse.iitb.ac.in/~varsha/
\item[Mythili Vutukuru] : https://www.cse.iitb.ac.in/~mythili/
\item[Ajit Diwan] : https://www.cse.iitb.ac.in/~aad/
\item[Ganesh ramkrishnan ] : https://www.cse.iitb.ac.in/~ganesh/
\end{center}
\end{itemize}
On entering URL you will get a word-cloud of research paper of that professor.
\vspace{0.5 in}
\\
\hspace*{-0.5in}
\centerline{\includegraphics[scale = 0.35]{4.png}}\\
\caption{\textbf{Word-Cloud of Prof. Rohit Gurjar}}\\[1.0 cm]
\vfill
\pagebreak
\item Now on clicking any word, It will display the links to Research papers containing that word.
\vspace{0.5 in}
\\
\hspace*{-0.5in}
\centerline{\includegraphics[scale = 0.35]{3.png}}\\[1.0 cm]
\end{enumerate}
\pagebreak
\section{FUTURE SCOPE}
We have implemented Research Paper Easy Access on seven professors till now, so implementing it on all the professors will be the next step but some professors are using DBLP for displaying their research paper so for that next we will extract research papers from DBLP links.
\section{CONCLUSION}
The Word-Cloud for research papers is going to be really beneficial from the student's point of view. Earlier, it was very time consuming task to find the professor and his/ her Research papers that intersect with ones research interest or for some academic work. Thus, it provide easier GUI based web application that it is adaptive and reliable. This application along with reducing the hard work in finding some research paper provide many more advantages such as speedy analysis of professor's research interest, ease in handling the research paper by professor, word-wise classification of research paper.
\newpage
\bibliographystyle{plain}
\bibliography{biblist}
\section{REFERENCES}
\begin{itemize}
\item Prof. Kavi Arya
\item Senior Teacher Assistant Diptesh Kanojia
\item www.w3schools.com
\item www.cse.iitb.ac.in
\item www.codingforum.com
\item geeksforgeeks.org
\item stackoverflow.com
\end{itemize}
\end{document}
GIT Link:https://git.cse.iitb.ac.in/monikasroha/CS699_Research_Paper_Easy_Access
Project Name: Research Paper Easy Access
Team Name: Learners
Members:
Name:Deepti Mittal Roll No:193050025
Name:Swaril Singhal Roll No:193050039
Name:Monika Sroha Roll No:193050087
Motivation:
Whenever a student wants to read a paper on the particular topic published by a faculty, he/she has to go through a bunch of paper list to figure out which paper to read. It seems like a hectic thing or somewhat irritating! Research Paper Easy Access will take the URL of faculty as input and display the top 50 words on which faculty has worked the most. On clicking a particular word, list of pdf which contains that word will appear. Now, You can easily access the papers!!
Hosting from Developer Documentation:
1. Inside source folder, Go to papers folder.
2. Open the terminal.
3. type the command: python3 manage.py runserver
4. Now Open a browser and run localhost:8000.
5. Type URL and word cloud will be generated.
All of these commands have been specified over the main page of the developer documented pdf provided in developer.pdf.
#!/usr/bin/env python
"""Django's command-line utility for administrative tasks."""
import os
import sys
## Documentation of main function
# @brief This function is used to set your settings.py file as default setting and the command is run from command line otherwise raise an error
# @brief NO paramerts passed
def main():
os.environ.setdefault('DJANGO_SETTINGS_MODULE', 'papers.settings')
try:
from django.core.management import execute_from_command_line
except ImportError as exc:
raise ImportError(
"Couldn't import Django. Are you sure it's installed and "
"available on your PYTHONPATH environment variable? Did you "
"forget to activate a virtual environment?"
) from exc
execute_from_command_line(sys.argv)
if __name__ == '__main__':
main()
"""
Django settings for papers project.
Generated by 'django-admin startproject' using Django 2.2.6.
For more information on this file, see
https://docs.djangoproject.com/en/2.2/topics/settings/
For the full list of settings and their values, see
https://docs.djangoproject.com/en/2.2/ref/settings/
"""
import os
# Build paths inside the project like this: os.path.join(BASE_DIR, ...)
## Documentation for the base directory path
# @brief It specifies the base path of the all files and folders of a project.
BASE_DIR = os.path.dirname(os.path.dirname(os.path.abspath(__file__)))
# Quick-start development settings - unsuitable for production
# See https://docs.djangoproject.com/en/2.2/howto/deployment/checklist/
# SECURITY WARNING: keep the secret key used in production secret!
## Documentation for security key
# @brief Security key to be kept private when code used in production!!
SECRET_KEY = '-w-mk58zgjk*y=t61muzzrd+*e)7%+20j__!4_=lh(6^*@)42n'
# SECURITY WARNING: don't run with debug turned on in production!
## Documentation for security about DEBUG
# @brief to show the error status while development. Never use debug as true when used in production.
DEBUG = True
ALLOWED_HOSTS = []
# Application definition
## To define Installed apps
# @brief mentioning of all applications used, developer need to add his own project into it. In OUr case we added 'research paper'
INSTALLED_APPS = [
'django.contrib.admin',
'django.contrib.auth',
'django.contrib.contenttypes',
'django.contrib.sessions',
'django.contrib.messages',
'django.contrib.staticfiles',
'research_papers',
'crispy_forms'
]
CRISPY_TEMPLATE_PACK = 'bootstrap4'
#'django.contrib.staticfiles'
## To add the middleware required.
# @brief for our development No need of any change, it was already initialised.
MIDDLEWARE = [
'django.middleware.security.SecurityMiddleware',
'django.contrib.sessions.middleware.SessionMiddleware',
'django.middleware.common.CommonMiddleware',
'django.middleware.csrf.CsrfViewMiddleware',
'django.contrib.auth.middleware.AuthenticationMiddleware',
'django.contrib.messages.middleware.MessageMiddleware',
'django.middleware.clickjacking.XFrameOptionsMiddleware',
]
## Documentation for ROOT_URLconfig
# @brief to specify the path for urls file where the urlpatterns are being added.
ROOT_URLCONF = 'papers.urls'
## Documentation for Templates
# @brief to specify the templates which has to be set to run the project.
TEMPLATES = [
{
'BACKEND': 'django.template.backends.django.DjangoTemplates',
'DIRS': [],
'APP_DIRS': True,
'OPTIONS': {
'context_processors': [
'django.template.context_processors.debug',
'django.template.context_processors.request',
'django.contrib.auth.context_processors.auth',
'django.contrib.messages.context_processors.messages',
],
},
},
]
WSGI_APPLICATION = 'papers.wsgi.application'
# Database
# https://docs.djangoproject.com/en/2.2/ref/settings/#databases
DATABASES = {
'default': {
'ENGINE': 'django.db.backends.sqlite3',
'NAME': os.path.join(BASE_DIR, 'db.sqlite3'),
}
}
## Password validation
# @brief It is used for password validation. Not required in Our case.
# https://docs.djangoproject.com/en/2.2/ref/settings/#auth-password-validators
AUTH_PASSWORD_VALIDATORS = [
{
'NAME': 'django.contrib.auth.password_validation.UserAttributeSimilarityValidator',
},
{
'NAME': 'django.contrib.auth.password_validation.MinimumLengthValidator',
},
{
'NAME': 'django.contrib.auth.password_validation.CommonPasswordValidator',
},
{
'NAME': 'django.contrib.auth.password_validation.NumericPasswordValidator',
},
]
# Internationalization
# https://docs.djangoproject.com/en/2.2/topics/i18n/
## for specifying the language code
LANGUAGE_CODE = 'en-us'
## to specify the different parameters for setting of time zone for server
TIME_ZONE = 'UTC'
## to specify the time zone for server
USE_I18N = True
## to specify the different parameters for setting of time zone for server
USE_L10N = True
## to specify the different parameters for setting of time zone for server
USE_TZ = True
# Static files (CSS, JavaScript, Images)
# https://docs.djangoproject.com/en/2.2/howto/static-files/
#STATICFILES_DIRS = (os.path.join( os.path.dirname( __file__ ), 'static' ),)
## Documentation for static file directory path
# @brief It is set to create a folder in base directory to store all the static files to be used such as images etc.
STATICFILES_DIRS = (os.path.join( BASE_DIR, 'static' ),)
## Documentation for static file finders
# @brief set to search for existence of any static files in project folder to be added
STATICFILES_FINDERS = (
'django.contrib.staticfiles.finders.FileSystemFinder',
'django.contrib.staticfiles.finders.AppDirectoriesFinder',)
STATIC_URL = '/static/'
#STATIC_ROOT = '/path/to/copy/files/to'
#STATIC_ROOT = os.path.join( os.path.dirname( BASE_DIR ), 'static_cdn' )
MEDIA_ROOT = os.path.join(BASE_DIR, 'media')
MEDIA_URL = '/media/'
"""papers URL Configuration
The `urlpatterns` list routes URLs to views. For more information please see:
https://docs.djangoproject.com/en/2.2/topics/http/urls/
Examples:
Function views
1. Add an import: from my_app import views
2. Add a URL to urlpatterns: path('', views.home, name='home')
Class-based views
1. Add an import: from other_app.views import Home
2. Add a URL to urlpatterns: path('', Home.as_view(), name='home')
Including another URLconf
1. Import the include() function: from django.urls import include, path
2. Add a URL to urlpatterns: path('blog/', include('blog.urls'))
"""
from django.contrib import admin
from django.urls import path, include
from django.conf import settings
from django.conf.urls.static import static
urlpatterns = [
path('admin/', admin.site.urls),
path('', include('research_papers.urls')),
]
if settings.DEBUG:
urlpatterns += static(settings.STATIC_URL, document_root=settings.STATIC_ROOT)
"""
WSGI config for papers project.
It exposes the WSGI callable as a module-level variable named ``application``.
For more information on this file, see
https://docs.djangoproject.com/en/2.2/howto/deployment/wsgi/
"""
import os
from django.core.wsgi import get_wsgi_application
os.environ.setdefault('DJANGO_SETTINGS_MODULE', 'papers.settings')
application = get_wsgi_application()
from django.contrib import admin
# Register your models here.
from django.apps import AppConfig
class ResearchPapersConfig(AppConfig):
name = 'research_papers'
from django import forms
class MyForm(forms.Form):
URL = forms.CharField(label='URL ', widget=forms.TextInput(attrs={'placeholder': 'Enter URL'}))
from django.db import models
# Create your models here.
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<title>Research Papers Easy Access</title>
</head>
<body>
URL entered: <strong>{{ URL }}</strong><br><br>
</body>
</html>
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<title>Research Papers Easy Access</title>
<meta charset="utf-8">
<meta name="viewport" content="width=device-width, initial-scale=1">
<link rel="stylesheet" href="https://maxcdn.bootstrapcdn.com/bootstrap/4.3.1/css/bootstrap.min.css">
<script src="https://ajax.googleapis.com/ajax/libs/jquery/3.4.1/jquery.min.js"></script>
<script src="https://cdnjs.cloudflare.com/ajax/libs/popper.js/1.14.7/umd/popper.min.js"></script>
<script src="https://maxcdn.bootstrapcdn.com/bootstrap/4.3.1/js/bootstrap.min.js"></script>
<style>
div{
background: transparent;
}
body
{
background-image: url("https://extraconfidencial.com/wp-content/uploads/2019/09/asistentes-virtuales.jpg");
background-position: top;
background-repeat: no-repeat;
background-size: cover; !important
background-color: #cccccc;
}
.content
{
position: absolute;
background: rgb(0,0,0);
background: rgba(0,0,0,0.5);
color: #f1f1f1;
width: 100%;
padding: 20px;
bottom: 30%;
}
span{background-color: #cccccc; !important}
</style>
</head>
<body>
<center>
<div class="content">
<span>
<h1 class="display-3" font="">Research Paper Easy Access</h1>
<form action="/thankyou/" method="post">
{% csrf_token %}
<table>
{{form.as_table}}
</table>
<br>
<input type="submit" class="btn btn-outline-light" value="Submit" />
</form>
</span>
</div>
</center>
</body>
</html>
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<title>Thank You</title>
</head>
<body bgcolor = "#d6cbd3">
<h2>Response Entered by you:</h2>
<form method="post">
URL entered: <strong>{{ URL }}</strong><br><br>
<script type="text/javascript">
var word_list_display = {{ word_list_display|safe }}
var word_count_display = {{ word_count_display|safe }}
var most_occur_file_map = {{ most_occur_file_map|safe }}
var pdf_name_list = {{ pdf_name_list|safe }}
var pdf_url_list = {{ pdf_url_list|safe }}
var most_occur_file_map_len = {{ most_occur_file_map_len|safe }}
function links(word, size, file, len, pdfname, urls,i) {
var a = document.createElement('a');
var b = document.createElement('br');
// Create the text node for anchor element.
var link = document.createTextNode(word);
// Append the text node to anchor element.
a.appendChild(link);
// Set the title.
a.title = word;
// Set the href property.
a.href = "javascript:func('"+file+"','"+len+"', '"+pdfname+"','"+urls+"');";
// Append the anchor element to the body.
a.style.fontSize = "1.0em";
var size1 = (0.002 * size);
//document.write(size1)
a.style.marginLeft = "100px";
a.style.fontSize = parseFloat(a.style.fontSize) + size1 + "em";
document.body.appendChild(a);
if(i%5==0)
document.body.appendChild(b);
}
function func(file, len, pdfname, urls) {
document.write("<center><b><u>");
document.write("List of PDFs <br />");
document.write("</u></b></center>");
document.body.style.backgroundColor = "#daebe8";
var file1 = file.split(',');
var pdfname1 = pdfname.split(',');
var urls1 = urls.split(',');
for(i=0; i<len; i++)
{
var a = document.createElement('a');
var b = document.createElement('br');
// Create the text node for anchor element.
var x = parseInt(file1[i]);
//document.write(x);
var link = document.createTextNode(pdfname1[x]);
// Append the text node to anchor element.
a.appendChild(link);
a.target= "_blank";
// Set the title.
a.title = pdfname1[x];
// Set the href property.
a.href = urls1[x];
// Append the anchor element to the body.
document.body.appendChild(a);
document.body.appendChild(b);
}
}
for (i=0;i<{{ x }};i++)
{
links(word_list_display[i],word_count_display[i], most_occur_file_map[word_list_display[i]], most_occur_file_map_len[word_list_display[i]], pdf_name_list, pdf_url_list,i );
}
</script>
</form>
</body>
</html>
from django.test import TestCase
# Create your tests here.
from django.urls import path
from research_papers import views
#from django.contrib.staticfiles.urls import staticfiles_urlpatterns
#from django.contrib.staticfiles.urls import staticfiles_urlpatterns
##Documentation for urlpatterns to specify the URL to be visited.
# @brief It is used to set the path for setting the pages or adding the pages in the project. it is setting the redirect of the first page to the page ("thankyou page) to result the output.
urlpatterns = [
path('', views.research_papers),
path('thankyou/', views.research_papers),
]
## Documentation on How to run the code.
#
# To make the code run in your system, python and django should be installed to your system.
#
# To install django:
#
# $ sudo apt-get install django-python
#
# OR
#
# $ sudo apt install python3-pip
#
# AND
#
# $ pip3 install django
#
## To install python:
#
# $ sudo apt-get install python
#
# Create a directory named 'papers'
#
# start a project in terminal by using the following:
#
# $ django -admin startproject research_project
#
# A manage.py file will be created.
#
# From paper folder to host the server:
#
# $ python3 manage.py runserver
#
# The code is in views.py and templates are being created in 'template' folder which is in research_paper.
#
# Settings for the setting.py and manage.py file:
#
# It has to be done as per specified in the setting.py file and as per requirement.
#
# Server is setup. Now you can easily access!!
##@mainpage
# @author Learners
import requests
from bs4 import BeautifulSoup
import os
import glob
from pdfminer.pdfinterp import PDFResourceManager, PDFPageInterpreter
from pdfminer.converter import TextConverter
from pdfminer.layout import LAParams
from pdfminer.pdfpage import PDFPage
from collections import Counter
import PyPDF2
import csv
from pattern.text.en import singularize, pluralize
import numpy as np
from io import StringIO
from wordcloud import WordCloud, STOPWORDS
from PIL import Image
import urllib
import requests
import matplotlib.pyplot as plt
import csv
import numpy as np
from django.shortcuts import render
from research_papers.forms import MyForm
from django.template import loader
from django.http import HttpResponse
##Documentation for wc() function.
# @brief This function opens a csv file inside it which contains a list of word along with its frequencies.
# @details It then saves a image of word cloud generated at specified location with different word size. This function is not used by us in current development but if image word cloud is needed, function code can be uncommented.
def wc():
'''reader = csv.reader(open('counts.csv', 'r'))
d = {}
for k,v in reader:
#insert the frequency values in d array
d[k] = float(v)
#creates a dummy mask(shape) for wordcloud
#mask = np.array(Image.open("images.jpeg"))
#word cloud corresponding to the word whose frequencies are in d array will be generated
wordcloud = WordCloud(width=5000, height=6000,max_words=400,stopwords=STOPWORDS,
random_state=42).generate_from_frequencies(d)
wordcloud.to_file('papers/static/image/images.png') #saves the wordcloud to image.png file'''
##Documentation for convertpdf function.
# @param path It takes the path of file which has to converted to string . the input file is of pdf type.
# @brief This function opens the pdf file inside it and convert the file into string
# @return str which is a string storing the whole file content.
def convert_pdf(path):
rsrcmgr = PDFResourceManager()
retstr = StringIO()
codec = 'utf-8'
laparams = LAParams()
device = TextConverter(rsrcmgr, retstr,codec=codec, laparams=laparams)
fp = open(path, 'rb')
interpreter = PDFPageInterpreter(rsrcmgr, device)
password = ""
maxpages = 0
caching = True
pagenos=set()
for page in PDFPage.get_pages(fp, pagenos, maxpages=maxpages, password=password,caching=caching, check_extractable=True):
interpreter.process_page(page)
fp.close()
device.close()
str = retstr.getvalue()
retstr.close()
return str
##Documentation for extractpdf_pb function.
# @param soup It contains the html page fetched by the request function which is converted to html5lib format using beautiful soup
# @param add_url It is the base url of the Professors page
# @brief This function process the content of html page to extract the name and links of the pdf of research paper publish by the faculty( Prof. Pushpak Bhattacharya).
# @details This function visit to the actual page where papers list has been displayed with the help of beautifulsoup and fetch the paper name along with the url to download the paper. The details for a single paper is stored in a dictionary with key as 'name' and 'url' and form a list of such dictionaries
# @return url_pdf it is the list of dictionary containing the paper name along with its URL.
def extractpdf_pb(soup, add_url):
#print(soup.prettify())
table = soup.find('ul', attrs = {'class':'sub-list'})
#print(table)
for row in table.findAll('li'):
if((row.get_text())=="Yearwise Papers"):
#pdfurl['url'] = row.a['href']
#pdfurl['type'] = row.a.text
url_significant= row.a['href']
url_significant=add_url + url_significant
print(url_significant)
list_pdf=[]
req_pdf= requests.get(url_significant)
soup = BeautifulSoup(req_pdf.content, 'html5lib')
#print(soup.prettify())
table = soup.find('ol')
url_pdf=[]
count=0
c=0
for row in table.findAll('li'):
pdf={}
if(row.a==None):
c=c+1
continue
else:
pdf['url'] = add_url + row.a['href']
pdf['name'] = row.a.get_text()
url_pdf.append(pdf)
count=count+1
i=0
'''for i in range(len(url_pdf)):
print(url_pdf[i])
(len(url_pdf))
print(c)
print(count)'''
return url_pdf
##Documentation for extractpdf_varsha function.
# @param soup It contains the html page fetched by the request function which is converted to html5lib format using beautiful soup
# @param add_url It is the base url of the Professors page
# @brief This function process the content of html page to extract the name and links of the pdf of research paper publish by the faculty( Prof. Varsha Apte).
# @details This function visit to the actual page where papers list has been displayed with the help of beautifulsoup and fetch the paper name along with the url to download the paper. The details for a single paper is stored in a dictionary with key as 'name' and 'url' and form a list of such dictionaries
# @return url_pdf it is the list of dictionary containing the paper name along with its URL.
def extractpdf_varsha(soup, add_url):
#print(soup.prettify())
table = soup.find('div', attrs = {'id':'menu'}) #to go to research page
#print(table)
for row in table.findAll('li'):
if((row.get_text())=="Research"):
url_significant= row.a['href']
url_significant= add_url + url_significant
#print(url_significant)
# to go to publications page
req_pdf= requests.get(url_significant)
soup = BeautifulSoup(req_pdf.content, 'html5lib')
#table = soup.find('div')
for row in soup.findAll('a'):
if((row.get_text())=="Here"):
url_significant= row.get('href')
url_significant= add_url + url_significant
#print(url_significant)
#to get the list of pdf
list_pdf=[]
req_pdf= requests.get(url_significant)
soup = BeautifulSoup(req_pdf.content, 'html5lib')
#print(soup.prettify())
table = soup.find('ol', attrs = {'type':'1'})
url_pdf=[]
count=0
c=0
for row in table.findAll('li'):
pdf={} #to store the url corresponding to name of pdf
if(row.a==None or row.a.get_text()== " "):
c=c+1
continue
else:
pdf['url'] = add_url + row.a['href']
pdf['name'] = row.a.get_text()
url_pdf.append(pdf)
count=count+1
i=0
return url_pdf
##Documentation for extractpdf_varsha function.
# @param soup It contains the html page fetched by the request function which is converted to html5lib format using beautiful soup
# @param add_url It is the base url of the Professors page
# @brief This function process the content of html page to extract the name and links of the pdf of research paper publish by the faculty( Prof Ganesh Ramakrishnan).
# @details This function visit to the actual page where papers list has been displayed with the help of beautifulsoup and fetch the paper name along with the url to download the paper. The details for a single paper is stored in a dictionary with key as 'name' and 'url' and form a list of such dictionaries
# @return url_pdf it is the list of dictionary containing the paper name along with its URL.
def extractpdf_ganesh(soup, add_url):
#print(soup.prettify())
table = soup.find('center')
#print(table)
for row in table.findAll('a'):
if((row.b.get_text())=="Publications"):
#pdfurl['url'] = row.a['href']
#pdfurl['type'] = row.a.text
url_significant= row['href']
url_significant=add_url + url_significant
print(url_significant)
list_pdf=[]
req_pdf= requests.get(url_significant)
soup = BeautifulSoup(req_pdf.content, 'html5lib')
#print(soup.prettify())
#exit()
url_pdf=[]
count=0
c=0
for row in soup.findAll('ol'):
for x in row.findAll('li'):
pdf={}
#print(x.i)
#print(x.i.a)
if(x.a==None or ("Handbook" in x.a.get_text())):
c=c+1
continue
else:
if ("http" in x.a['href']):
pdf['url']= x.a['href']
else:
pdf['url'] = add_url + x.a['href']
pdf['name'] = x.a.get_text()
url_pdf.append(pdf)
count=count+1
#for i in range(len(url_pdf)):
#print(url_pdf[i])
#(len(url_pdf))
#print(c)
#print(count)
return url_pdf
##Documentation for extractpdf_supratik function.
# @param soup It contains the html page fetched by the request function which is converted to html5lib format using beautiful soup
# @param add_url It is the base url of the Professors page
# @brief This function process the content of html page to extract the name and links of the pdf of research paper publish by the faculty( Prof. supratik Bhattacharya).
# @details This function visit to the actual page where papers list has been displayed with the help of beautifulsoup and fetch the paper name along with the url to download the paper. The details for a single paper is stored in a dictionary with key as 'name' and 'url' and form a list of such dictionaries
# @return url_pdf it is the list of dictionary containing the paper name along with its URL.
def extractpdf_supratik(soup, add_url):
#print(soup.prettify())
#table = soup.find('h3')
#print(table)
publications="\n"+" Publications"
for row in soup.findAll('a'):
if("Publications" in (row.get_text())):
#pdfurl['url'] = row.a['href']
#pdfurl['type'] = row.a.text
url_significant= row.get('href')
url_significant=add_url + url_significant
#print(url_significant)
list_pdf=[]
req_pdf= requests.get(url_significant)
soup = BeautifulSoup(req_pdf.content, 'html5lib')
#print(soup.prettify())
table = soup.find('ol')
url_pdf=[]
count=0
c=0
for row in table.findAll('li'):
pdf={}
if(row.a==None):
c=c+1
continue
else:
pdf['url'] = add_url + row.a['href']
pdf['name'] = row.a.get_text()
url_pdf.append(pdf)
count=count+1
i=0
#for i in range(len(url_pdf)):
#print(url_pdf[i])
#(len(url_pdf))
#print(c)
#print(count)
return url_pdf
##Documentation for extractpdf_mythili function.
# @param soup It contains the html page fetched by the request function which is converted to html5lib format using beautiful soup
# @param add_url It is the base url of the Professors page
# @brief This function process the content of html page to extract the name and links of the pdf of research paper publish by the faculty( Prof. Mythili Vutukuru).
# @details This function visit to the actual page where papers list has been displayed with the help of beautifulsoup and fetch the paper name along with the url to download the paper. The details for a single paper is stored in a dictionary with key as 'name' and 'url' and form a list of such dictionaries
# @return url_pdf it is the list of dictionary containing the paper name along with its URL.
def extractpdf_mythili(soup, add_url):
table = soup.find('table', attrs = {'cellpadding':'10', 'cellspacing':'0','width':'100%'})
url_pdf=[]
count=0
c=0
for row in soup.findAll('ul'):
for x in row.findAll('li'):
pdf={}
if(x.a==None or x.em==None):
c=c+1
continue
else:
pdf['url'] = add_url + x.a['href']
#print(pdf['url'])
pdf['name'] = x.em.get_text()
#print(pdf['name'])
url_pdf.append(pdf)
count=count+1
i=0
#for i in range(len(url_pdf)):
#print(url_pdf[i])
#(len(url_pdf))
#print(c)
#print(count)
return url_pdf
##Documentation for extractpdf_rohit function.
# @param soup It contains the html page fetched by the request function which is converted to html5lib format using beautiful soup
# @param add_url It is the base url of the Professors page
# @brief This function process the content of html page to extract the name and links of the pdf of research paper publish by the faculty( Prof. Rohit Gurjar).
# @details This function visit to the actual page where papers list has been displayed with the help of beautifulsoup and fetch the paper name along with the url to download the paper. The details for a single paper is stored in a dictionary with key as 'name' and 'url' and form a list of such dictionaries
# @return url_pdf it is the list of dictionary containing the paper name along with its URL.
def extractpdf_rohit(soup, add_url):
url_pdf=[]
count=0
c=0
for row in soup.findAll('ul'):
for x in row.findAll('li'):
pdf={}
if(x.a==None or x.i==None):
c=c+1
continue
else:
pdf['url'] = add_url + x.a['href']
#print(pdf['url'])
pdf['name'] = x.a.get_text()
#print(pdf['name'])
url_pdf.append(pdf)
count=count+1
i=0
#for i in range(len(url_pdf)):
#print(url_pdf[i])
#(len(url_pdf))
#print(c)
#print(count)
return url_pdf
##Documentation for extractpdf_om function.
# @param soup It contains the html page fetched by the request function which is converted to html5lib format using beautiful soup
# @param add_url It is the base url of the Professors page
# @brief This function process the content of html page to extract the name and links of the pdf of research paper publish by the faculty( Prof. Om Damani).
# @details This function visit to the actual page where papers list has been displayed with the help of beautifulsoup and fetch the paper name along with the url to download the paper. The details for a single paper is stored in a dictionary with key as 'name' and 'url' and form a list of such dictionaries
# @return url_pdf it is the list of dictionary containing the paper name along with its URL.
def extractpdf_om(soup, add_url):
table = soup.find('ul')
url_pdf=[]
count=0
c=0
for row in table.findAll('li'):
pdf={}
if(row.a==None):
c=c+1
continue
else:
if("http" in row.a['href']):
continue
else:
pdf['url'] = add_url + row.a['href']
#print(pdf['url'])
pdf['name'] = row.a.get_text()
#print(pdf['name'])
url_pdf.append(pdf)
count=count+1
#for i in range(len(url_pdf)):
# print(url_pdf[i])
#print(c, count)
return url_pdf
##Documentation for extractpdf_ajit function.
# @param soup It contains the html page fetched by the request function which is converted to html5lib format using beautiful soup
# @param add_url It is the base url of the Professors page
# @brief This function process the content of html page to extract the name and links of the pdf of research paper publish by the faculty( Prof. Ajit Diwan).
# @details This function visit to the actual page where papers list has been displayed with the help of beautifulsoup and fetch the paper name along with the url to download the paper. The details for a single paper is stored in a dictionary with key as 'name' and 'url' and form a list of such dictionaries
# @return url_pdf it is the list of dictionary containing the paper name along with its URL.
def extractpdf_ajit(soup, add_url):
table = soup.find('ol')
url_pdf=[]
count=0
c=0
for row in table.findAll('li'):
pdf={}
if(row.a==None):
c=c+1
continue
else:
if("http" in row.a['href']):
continue
else:
pdf['url'] = add_url + row.a['href']
#print(pdf['url'])
pdf['name'] = row.a.get_text()
#print(pdf['name'])
url_pdf.append(pdf)
count=count+1
#for i in range(len(url_pdf)):
# print(url_pdf[i])
#print(c, count)
return url_pdf
##Documentation for extractpdf_ajit function.
# @param soup It contains the html page fetched by the request function which is converted to html5lib format using beautiful soup
# @param URL It is the base url of the Professors page
# @brief This function figures out the function to be called on the basis of given input URL.
# @details This function calls the extractpdf function as per professor URL which returns the list of pdf mapped to URLs. If any URL for which no function mapping is yet done is passed, it print error message.
# @return url_pdf it is the list of dictionary containing the paper name along with its URL
def extractpdf_prof(soup, URL):
list_pdf=[]
if(URL=="https://www.cse.iitb.ac.in/~pb/"):
list_pdf=extractpdf_pb(soup, URL)
elif(URL=="https://www.cse.iitb.ac.in/~ganesh/"):
list_pdf=extractpdf_ganesh(soup, URL)
elif(URL=="https://www.cse.iitb.ac.in/~aad/"):
list_pdf=extractpdf_ajit(soup, URL)
elif(URL=="https://www.cse.iitb.ac.in/~damani/"):
list_pdf=extractpdf_om(soup, URL)
elif(URL=="https://www.cse.iitb.ac.in/~mythili/"):
list_pdf=extractpdf_mythili(soup, URL)
elif(URL=="https://www.cse.iitk.ac.in/users/rgurjar/"):
list_pdf=extractpdf_rohit(soup, URL)
elif(URL=="https://www.cse.iitb.ac.in/~varsha/"):
list_pdf=extractpdf_varsha(soup, URL)
else:
print("You need to define function yet:)")
exit()
return list_pdf
##Documentation for savepdf function.
# @param list_pdf It contains the list of dictionary storing he pdf name mapped to its URL
# @brief This function will store all the pdf into system for further processing.
# @details It creates a directory named pdfstore where all the pdf are being stored as pdf0.pdf, pdf1.pdf and do on.
# @return It return nothing. display a message saved when all pdf has been downloaded.
def savepdf(list_pdf):
x=len(list_pdf)
os.mkdir("pdfstore")
for i in range(0,x):
file_url=list_pdf[i]['url']
r = requests.get(file_url, stream = True)
filename= "pdf"+str(i)+".pdf"
print(filename)
path="pdfstore/"+filename
print(path)
with open(path,"wb") as pdf:
for chunk in r.iter_content(chunk_size=1024):
if chunk:
pdf.write(chunk)
print("saved")
##Documentation for research_paper function.
# @param request this function is passed with request which is used to get the requested posted on page
# @brief This function process the URL and return the processed words to be displayed inthe form of word cloud.
# @details On recieving the URL from User it calls the functions to extract the list of pdf links and download them using savpdf function. Afterwards, It will call convertpdf to convert pdf to string and then store it as text file to make word count and the whole procedure is repeated for all the files. then the top 50 occuring words are choosen to be displayed.
# @return response It return the response to the page after processing the request in form of word cloud.
def research_papers(request):
#if form is submitted
if request.method == 'POST':
myForm = MyForm(request.POST)
if myForm.is_valid():
URL = myForm.cleaned_data['URL']
#URL = "https://www.cse.iitb.ac.in/~pb/"
#context={}
req_home = requests.get(URL)
soup = BeautifulSoup(req_home.content, 'html5lib')
list_pdf= extractpdf_prof(soup, URL)
pdf_name_list1=[]
pdf_name_list=[]
pdf_url_list=[]
for i in list_pdf:
pdf_name_list1.append(i['name'])
pdf_url_list.append(i['url'])
for i in pdf_name_list1:
z = i.replace(","," ")
z2 = z.rstrip()
pdf_name_list.append(z2)
#print(pdf_name_list)
#print(pdf_url_list)
savepdf(list_pdf) # to store pdf
#pdflist = glob.glob("/home/monika/CS699/PROJECT/Learners_ResearchPapersEasyAccess/source/papers/pdfstore/*.pdf")
pdflist = glob.glob("pdfstore/*.pdf")
print(pdflist)
wordcount = {}
filemap={}
for pdf in pdflist:
print("Working on: " + pdf + '\n')
xlist= list(pdf)
ind= xlist[::-1].index("/")
ind=len(xlist)-ind
pdfname = xlist[ind +3:-4]
pdfname = "".join(pdfname)
pdfname=int(pdfname)
fout = open('pdfs.txt','w+',encoding="utf-8")
try:
PyPDF2.PdfFileReader(open(pdf,'rb'))
except PyPDF2.utils.PdfReadError:
continue
else:
fout.write(convert_pdf(pdf))
fout.close()
fin = open('pdfs.txt','r',encoding="utf-8")
words = fin.read().lower()
#print(words)
split_it = words.split()
# split() returns list of all the words in the string
stopwords = set(line.strip() for line in open('stopwords.txt',encoding="utf-8"))
# Instantiate a dictionary, and for every word in the file,
# Add to the dictionary if it doesn't exist. If it does, increase the count.
# To eliminate duplicates, remember to split by punctuation, and use case demiliters.
for word in split_it:
word = word.replace(".","")
word = word.replace(",","")
word = word.replace(":","")
word = word.replace("\"","")
word = word.replace("!","")
word = word.replace("“","")
word = word.replace("‘","")
word = word.replace("*","")
word = word.replace("(","")
word = word.replace(")","")
word = word.replace("\'","")
word = word.replace("-","")
if word not in stopwords:
if word in wordcount:
wordcount[word] += 1
if pdfname not in filemap[word]:
filemap[word].append(pdfname)
elif singularize(word) in wordcount:
wordcount[singularize(word)] += 1
if pdfname not in filemap[singularize(word)]:
filemap[singularize(word)].append(pdfname)
elif pluralize(word) in wordcount:
wordcount[pluralize(word)] += 1
if pdfname not in filemap[pluralize(word)]:
filemap[pluralize(word)].append(pdfname)
else:
wordcount[word] = 1
x=[]
x.append(pdfname)
filemap[word]=x
print("done")
n_print = 50
print(n_print)
print("\nOK. The {} most common words are as follows\n".format(n_print))
cnt = Counter(wordcount)
#print(cnt)
# most_common() produces k frequently encountered
# input values and their respective counts.
most_occur = cnt.most_common(n_print)
# with open("counts.csv", 'w+',encoding="utf-8") as file:
# writer = csv.writer(file)
# #writer.writerow(["word", "count"])
# for i in enumerate(most_occur):
# writer.writerow([i[1][0],i[1][1]])
lst = list()
for i in enumerate(most_occur):
tmp2=list()
tmp2.append(i[1][0])
tmp2.append(i[1][1])
lst.append(tmp2)
np.savetxt("counts.csv",lst, delimiter=',',fmt='%s',encoding="utf-8")
#print(most_occur)
#print(most_occur)
#print(filemap)
most_occur_file_map={}
most_occur_file_map_len={}
most_occur_dict={}
for i in enumerate(most_occur):
most_occur_dict[i[1][0]]= i[1][1]
most_occur_file_map[i[1][0]] = filemap[i[1][0]]
most_occur_file_map_len[i[1][0]] = len(filemap[i[1][0]])
#print(most_occur_dict)
#most_occur_dict= { 'a':1, 'b':2}
word_list_display=[]
word_count_display=[]
for i in most_occur_dict:
word_list_display.append(i)
word_count_display.append(most_occur_dict[i])
#print(most_occur_file_map)
#print(word_list_display)
#print(word_count_display)
#wc()
x = len(word_list_display)
y = len(pdf_name_list)
context= {'URL': URL, 'word_list_display': word_list_display,'word_count_display': word_count_display, 'x' : x, 'most_occur_file_map':most_occur_file_map, 'pdf_name_list':pdf_name_list, 'pdf_url_list':pdf_url_list , 'most_occur_file_map_len': most_occur_file_map_len, 'y':y }
template = loader.get_template('thankyou.html')
#URL = "https://www.cse.iitb.ac.in/~pb/"
return HttpResponse(template.render(context, request))
else:
form = MyForm()
return render(request, 'research_papers.html', {'form':form});
a
able
about
above
abst
accordance
according
accordingly
across
act
actually
added
adj
affected
affecting
affects
after
afterwards
again
against
ah
all
almost
alone
along
already
also
although
always
am
among
amongst
an
and
announce
another
any
anybody
anyhow
anymore
anyone
anything
anyway
anyways
anywhere
apparently
approximately
are
aren
arent
arise
around
as
aside
ask
asking
at
auth
available
away
awfully
b
back
be
became
because
become
becomes
becoming
been
before
beforehand
begin
beginning
beginnings
begins
behind
being
believe
below
beside
besides
between
beyond
biol
both
brief
briefly
but
by
c
ca
came
can
cannot
can't
cause
causes
certain
certainly
co
com
come
comes
contain
containing
contains
could
couldnt
d
date
did
didn't
different
do
does
doesn't
doing
done
don't
down
downwards
due
during
e
each
ed
edu
effect
eg
eight
eighty
either
else
elsewhere
end
ending
enough
especially
et
et-al
etc
even
ever
every
everybody
everyone
everything
everywhere
ex
except
f
far
few
ff
fifth
first
five
fix
followed
following
follows
for
former
formerly
forth
found
four
from
further
furthermore
g
gave
get
gets
getting
give
given
gives
giving
go
goes
gone
got
gotten
h
had
happens
hardly
has
hasn't
have
haven't
having
he
hed
hence
her
here
hereafter
hereby
herein
heres
hereupon
hers
herself
hes
hi
hid
him
himself
his
hither
home
how
howbeit
however
hundred
i
id
ie
if
i'll
im
immediate
immediately
importance
important
in
inc
indeed
index
information
instead
into
invention
inward
is
isn't
it
itd
it'll
its
itself
i've
j
just
k
keep keeps
kept
kg
km
know
known
knows
l
largely
last
lately
later
latter
latterly
least
less
lest
let
lets
like
liked
likely
line
little
'll
look
looking
looks
ltd
m
made
mainly
make
makes
many
may
maybe
me
mean
means
meantime
meanwhile
merely
mg
might
million
miss
ml
more
moreover
most
mostly
mr
mrs
much
mug
must
my
myself
n
na
name
namely
nay
nd
near
nearly
necessarily
necessary
need
needs
neither
never
nevertheless
new
next
nine
ninety
no
nobody
non
none
nonetheless
noone
nor
normally
nos
not
noted
nothing
now
nowhere
o
obtain
obtained
obviously
of
off
often
oh
ok
okay
old
omitted
on
once
one
ones
only
onto
or
ord
other
others
otherwise
ought
our
ours
ourselves
out
outside
over
overall
owing
own
p
page
pages
part
particular
particularly
past
per
perhaps
placed
please
plus
poorly
possible
possibly
potentially
pp
predominantly
present
previously
primarily
probably
promptly
proud
provides
put
pushpak
bhattacharya
q
que
quickly
quite
qv
r
ran
rather
rd
re
readily
really
recent
recently
ref
refs
regarding
regardless
regards
related
relatively
research
respectively
resulted
resulting
results
right
run
s
said
same
saw
say
saying
says
sec
section
see
seeing
seem
seemed
seeming
seems
seen
self
selves
sent
sentences
sentence
in-
features
tion
seven
several
shall
she
shed
she'll
shes
should
shouldn't
show
showed
shown
showns
shows
significant
significantly
similar
similarly
since
six
slightly
so
some
somebody
somehow
someone
somethan
something
sometime
sometimes
somewhat
somewhere
soon
sorry
specifically
specified
specify
specifying
still
stop
strongly
sub
substantially
successfully
such
sufficiently
suggest
sup
sure
t
take
taken
taking
based
tell
tends
total
high
low
th
than
thank
thanks
thanx
that
that'll
thats
that've
the
their
theirs
them
themselves
then
thence
there
thereafter
thereby
thered
therefore
therein
there'll
thereof
therere
theres
thereto
thereupon
there've
these
they
theyd
they'll
theyre
they've
think
this
those
thou
though
thoughh
thousand
throug
through
throughout
thru
thus
til
tip
to
together
too
took
toward
towards
tried
tries
truly
try
trying
ts
twice
two
u
un
under
unfortunately
unless
unlike
unlikely
until
unto
up
upon
ups
us
use
used
useful
usefully
usefulness
uses
using
usually
v
value
various
've
very
via
viz
vol
vols
vs
w
want
wants
was
wasnt
way
we
wed
welcome
we'll
went
were
werent
we've
what
whatever
what'll
whats
when
whence
whenever
where
whereafter
whereas
whereby
wherein
wheres
whereupon
wherever
whether
which
while
whim
whither
who
whod
whoever
whole
who'll
whom
whomever
whos
whose
why
widely
willing
wish
with
within
without
wont
words
world
would
wouldnt
www
x
y
yes
yet
you
youd
you'll
your
youre
yours
yourself
yourselves
you've
z
zero
&
1
2
3
4
5
6
7
8
9
0
proposed
/
*
-
+
.
!
@
#
$
%
^
(
)
{
}
[
]
\
'
"
;
:
<
>
?
_
=
'cos
'd
'm
're
ago
ahead
aren't
backward
backwards
beneath
couldn't
despite
forward
hadn't
however
I
inside
inspite
mayn't
mightn't
mine
mustn't
needn't
oughtn't
seldom
shan't
till
usedn't
usen't
wasn't
well
weren't
will
won't
wouldn't
's
't
n't
corporation
corp
corp.
llc
inc.
ltd.
llp
llp.
plc
plc.
!!
?!
??
!?
`
``
''
-lrb-
-rrb-
-lsb-
-rsb-
,
..
...
address
advance
advanced
al
application
approach
area
aspect
book
cant
century
circa
collected
collection
compared
concept
condition
conference
consider
considered
consist
consisted
contemporary
contribution
course
criticism
critique
currently
das
de
demonstrate
demonstrated
der
detail
development
die
difficult
discuss
discussed
early
easily
easy
edited
edition
editor
een
el
elementary
en
essay
essential
evaluation
example
exercise
finally
foundation
global
good
guide
handbook
happen
happened
held
het
illustrated
impact
implication
including
influence
intermediate
international
introduce
introduction
introductory
investigating
investigation
involving
issue
iv
ix
journal
kind
kingdom
la
le
lecture
lecture notes
library
lo
main
making
manual
method
modern
monograph
num
number
organisation
original
outline
outlined
overview
pamphlet
paper
paperback
party
perspective
possibility
practical
practice
presented
previous
principle
problem
proceeding
process
project
publication
published
publisher
reader
reading
reference
relating
report
respect
review
role
selected
seminar
series
service
short
simple
source
sourcebook
special
state of the art
studies
study
subject
suggested
suitable
supplement
survey
system
technique
text
theme
theories
theory
today
tomorrow
topic
und
understand
united
up to date
view
volume
workbook
year
documents
logs
2003
fig
called
state
based
failure
processing
2008
proof
corpus
work
sentence
dependent
performance
co-occurence
2010
character
relation
derications
entry
rules
node
table
features
theorem
function3size
cases
proof
>
ax
clearly
bound
claim
fact
define
coefficient
v1
programs
definition
gcid48
cid48
defined
cases
m2
0
22
basis
assume
exists
find
smaller
individual
version
or
idea
recall
project
nonzero
c1
x2
u2
v3
construction
base
blue
science
concentration
questions
categories
m2
To run the Source:
1. Inside source folder, Go to papers folder.
2. Open the terminal.
3. type the command: python3 manage.py runserver
4. Now Open a browser and run localhost:8000.
5. Type URL and word cloud will be generated.
Example:
Open the terminal inside paper folder:
python3 manage.py runserver
Open a new window in browser and run localhost on port 8000.
In the page , type URL in the given input box.
Ex: https://www.cse.iitb.ac.in/~varsha/
A word cloud will be shown on the page. Click on the desired word to display the list of pdf's. to read the desired pdf, click it.
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment