It's easy when you're doing things you love
I'm a Researcher | a Photographer | a Coder

know more ..
Welcome, I am Debjyoti Paul (Deb) | দেবজ্যোতি পাল

A PhD Student at School of Computing, University of Utah, USA.
An alumnus of Computer Science Department, IIT Kanpur, India.

Research Interests

Spatio-Temporal Data Analysis, Social Media Analysis, Healthcare Analytics, Representation Learning, Deep Learning, Machine Learning, Data Visualization

Office Address

Room 2780 WEB Building

72 S. Central Campus Drive

School of Computing, University of Utah

Salt Lake City, UT 84112



  •   University of Utah

    Doctor of Philosophy

    Computer Science

    GPA: 3.94/4.0

    Advisor: Prof. Feifei Li

  •   Teaching and Paper Reviews

    1. Advance Database Systems (Fall 2017, 3 Lectures)
    2. Data Mining (Spring 2017, 1 Lecture)
    3. Natural Language Processing (Fall 2016, 1 Lecture)
    IEEE Transactions on Knowledge and Data Engineering (TKDE)

  •   Indian Institute of Technology Kanpur

    Master of Technology

    Computer Science & Engineering

    GPA: 8.67/10.0 (Rank: 3)

    Advisor: Late Prof. Sanjeev Kumar Aggarwal

  •   Institute of Engineering & Management

    Bachelor of Technology (Rank < 10)

    Computer Science & Engineering

    GPA: 8.93/10.0


  •   Alibaba Group

    Research Intern

    Mentors: Feifei Li, Tieying Zhang, Hong Wu

    Location: Sunnyvale, USA

  •   Facebook

    Summer Research Intern

    Pages search improvement with AI

    Mentor: Shawn Poindexter

    Location: Seattle, USA

  •   Amazon AI Lab

    Summer Research Intern

    Hyperparameter optimization in MxNet

    Mentors: Baris Coskun, Ramesh Nallapati

    Location: New York, USA

  •   University of Utah

    Research Assistant

    InitialDLab, Database and data analysis Group

    School of Computing

  •   Flipkart

    Software Developer (Data Engineer)

    Data Platform Team

    Built scalable environment for BigData processing.

    Contributed in building softwares for Ingestion, Transformation and Distribution of data.

    Location: Bangalore, India.


AI Pro: Data Processing Framework for AI Models
Richie Frost, Debjyoti Paul, Feifei Li.
35th IEEE International Conference on Data Engineering (ICDE 2019), 8-12 April, 2019. Macau, China. doi: 10.1109/ICDE.2019.00219

[Abstract] | [Link] | [pdf] | [poster]

We present AI Pro, an open-source framework for data processing with Artificial Intelligence (AI) models. Our framework empowers its users with immense capability to transform raw data into meaningful information with a simple configuration file. AI Pro’s configuration file generates a data pipeline from start to finish with as many data transformations as desired. AI Pro supports major deep learning frameworks and Open Neural Network Exchange (ONNX), which allows users to choose models from any AI frameworks supported by ONNX. Its wide range of features and user friendly web interface grants everyone the opportunity to broaden their AI application horizons, irrespective of the user’s technical expertise. AI Pro has all the quintessential features to perform end-to-end data processing, which we demonstrate using two real world scenarios.

Bursty Event Detection Throughout Histories
Debjyoti Paul, Yanqing Peng, Feifei Li.
35th IEEE International Conference on Data Engineering (ICDE 2019), 8-12 April, 2019. Macau, China. doi: 10.1109/ICDE.2019.00124.

[Abstract] | [Link] | [pdf] | [poster]

The widespread use of social media and the active trend of moving towards more web- and mobile-based reporting for traditional media outlets have created an avalanche of information streams. These information streams bring in first-hand reporting on live events to massive crowds in real time as they are happening. It is important to study the phenomenon of burst in this context so that end-users can quickly identify important events that are emerging and developing in their early stages. In this paper, we investigate the problem of bursty event detection where we define burst as the acceleration over the incoming rate of an event mentioning. Existing works focus on the detection of current trending events, but it is important to be able to go back in time and explore bursty events throughout the history, while without the needs of storing and traversing the entire information stream from the past. We present a succinct probabilistic data structure and its associated query strategy to find bursty events at any time instance for the entire history. Extensive empirical results on real event streams have demonstrated the effectiveness of our approach.

Geotagged US Tweets as Predictors of County-Level Health Outcomes, 2015–2016
Quynh C. Nguyen, Matt McCullough, Hsien-wen Meng, Debjyoti Paul, Dapeng Li, Suraj Kath, Geoffrey Loomis, Elaine O. Nsoesie, Ming Wen, Ken R. Smith, Feifei Li.
American Journal of Public Health, September, 2017, doi: 10.2105/AJPH.2017.303993

[Abstract] | [Link] | [pdf]

Scarcity of consistently constructed environmental characteristics limits understanding of the impact of contextual factors on health. Our aim was to leverage geotagged Twitter data to create national indicators of the social environment, with small-area indicators of prevalent sentiment and social modeling of health behaviors. We then test associations with county-level health outcomes, controlling for demographic characteristics.
We utilized Twitter's Streaming Application Programming Interface (API) to continuously collect a random 1% subset of publicly available geo-located tweets. Approximately 80 million geotagged tweets from 603,363 unique Twitter users were collected in a 12-month period (April 2015- March 2016).
Across 3135 US counties, Twitter indicators of happiness, food, and physical activity were associated with lower premature mortality, obesity, and physical inactivity. Alcohol use tweets predicted higher alcohol-use related mortality.
Social media represents a new type of real-time data that may enable public health officials to examine movement of norms, sentiment, and behaviors that may portend emerging issues or outbreaks—thus providing a way to intervene to prevent adverse health events and measure the impact of health interventions.

Compass: Spatio Temporal Sentiment Analysis of US Election,  What twitter says!
Debjyoti Paul, Feifei Li, Murali Krishna Teja, Yu Xin, Richie Frost
SIGKDD 2017, 23rd SIGKDD Conference on Knowledge Discovery and Data Mining, Aug 13-17, Halifax, Canada. doi: 10.1145/3097983.3098053

[Abstract] | [pdf] | [link] | [Video]

With the widespread growth of various social network tools and platforms, analyzing and understanding societal response and crowd reaction to important and emerging social issues and events through social media data is increasingly an important problem. However, there are numerous challenges towards realizing this goal effectively and efficiently, due to the unstructured and noisy nature of social media data. The large volume of the underlying data also presents a fundamental challenge. Furthermore, in many application scenarios, it is often interesting, and in some cases critical, to discover patterns and trends based on geographical and/or temporal partitions, and keep track of how they will change overtime. This brings up the interesting problem of spatio-temporal sentiment analysis from large-scale social media data. This paper investigates this problem through a data science project called ``US Election 2016, What Twitter Says’‘. The objective is to discover sentiment on twitter towards either the democratic or the republican party at US county and state levels over any arbitrary temporal intervals, using a large collection of geotagged tweets from a period of 6 months leading up to the US presidential election in 2016. Our results demonstrate that by integrating and developing a combination of machine learning and data management techniques, it is possible to do this at scale with effective outcomes. The results of our project have the potential to be adapted towards solving and influencing other interesting social issues such as building neighborhood happiness and health indicators.

Social media indicators of the food environment and state health outcomes
Nguyen. Quynh, Meng. H, Li. D, Kath. Suraj, McCullough. Matt, Paul. Debjyoti, Kanokvimankul. P, Nguyen. T, Li. Feifei,
Public Health, American Public Health Association, 148, 120-128. doi: 10.1016/j.puhe.2017.03.013

[Abstract] | [pdf] | [link]

Contextual factors can influence health through exposures to health-promoting and risk-inducing factors. The aim of this study was to (1) build, from geotagged Twitter and Yelp data, a national food environment database and (2) to test associations between state food environment indicators and health outcomes.
This is a cross-sectional study based upon secondary analyses of publicly available data.
Using Twitter's Streaming Application Programming Interface (API), we collected and processed 4,041,521 food-related, geotagged tweets between April 2015 and March 2016. Using Yelp's Search API, we collected data on 505,554 unique food-related businesses. In linear regression models, we examined associations between food environment characteristics and state-level health outcomes, controlling for state-level differences in age, percent non-Hispanic white, and median household income.
A one standard deviation increase in caloric density of food tweets was related to higher all-cause mortality (+46.50 per 100,000), diabetes (+0.75%), obesity (+1.78%), high cholesterol (+1.40%), and fair/poor self-rated health (2.01%). More burger Yelp listings were related to higher prevalence of diabetes (+0.55%), obesity (1.35%), and fair/poor self-rated health (1.12%). More alcohol tweets and Yelp bars and pub listings were related to higher state-level binge drinking and heavy drinking, but lower mortality and lower percent reporting fair/poor self-rated health. Supplemental analyses with county-level social media indicators and county health outcomes resulted in finding similar but slightly attenuated associations compared to those found at the state level.
Social media can be utilized to create indicators of the food environment that are associated with area-level mortality, health behaviors, and chronic conditions.

Multi-objective Evolution based Dynamic Job Scheduler in Grid
Debjyoti Paul, Sanjeev K. Aggarwal,
The 8th International Conference on Complex, Intelligent, and Software Intensive Systems (CISIS 2014), July 2nd – 4th, 2014, Birmingham, UK. doi:10.1109/CISIS.2014.50

[Abstract] | [pdf] | [link]

Grid computing is a high performance computing environment to fulfill large-scale computational demands. It can integrate computational as well as storage resources from different networks and geographically dispersed organizations into a high performance computational & storage platform. It is used to solve complex computational-intensive problems, and also provide solution to storage-intensive applications with connected storage resources. Scheduling of user jobs properly on the heterogeneous resources is an important task in a grid computing environment. The main goal of scheduling is to maximize resource utilization, minimize waiting time of jobs, reduce energy consumption, minimize cost to the user after satisfying constraints of jobs and resources. We can trade off between the required level of quality of service, the deadline and the budget of user. In this paper, we propose a Multi-objective Evolution-based Dynamic Scheduler in Grid. Our scheduler have used Multi-objective optimization technique using Genetic algorithm with pareto front approach to find efficient schedules. It explores the search space vividly to avoid stagnation and generate near optimal solution. We propose that our scheduler provides a better grip on most features of grid from perspective of grid owner as well as user. Dynamic grid environment has forced us to make it a real time dynamic scheduler. A job grouping technique is proposed for grouping fine-grained jobs and for ease of computation. Experimentation on different data sets and on various parameters revealed effectiveness of multi- objective scheduling criteria and extraction of performance from grid resource.

RCached-tree: An Index Structure for Efficiently Answering Popular Queries
Manash Pal, Arnab Bhattacharya, Debjyoti Paul
ACM International Conference on Information and Knowledge Management (CIKM 2013), Oct. 27–Nov. 1, 2013, San Francisco, CA, USA. doi:10.1145/2505515.2507817

[Abstract] | [pdf] | [link]

In many applications of similarity searching in databases, a set of similar queries appear more frequently. Since it is rare that a query point with its associated parameters (range or number of nearest neighbors) will repeat exactly, intelligent caching mechanisms are required to efficiently answer such queries. In addition, the performance of non-repeating and non-cached queries should not suffer too much either. In this paper, we propose RCached-tree, belonging to the family of R-trees, that aims to solve this problem. In every internal node of the tree up to a certain level, a portion of the space is reserved for storing popular queries and their solutions. For a new query that is encompassed by a cached query, this enables bypassing the traversal of lower levels of the subtree corresponding to the node as the answers can be obtained directly from the result set of the cached query. The struc- ture adapts itself to varying query patterns; new popular queries replace the old cached ones that are not popular any more. Queries that are not popular as well as insertions, deletions and updates are handled in the same manner as in a general R-tree. Experiments show that the RCached-tree can outperform R-tree and other such structures by a signif- icant margin when the proportion of popular queries is 20% or more by reserving 30-40% of the internal nodes as cache.

Lightweight Security Enhancement Protocol for Radio Frequency Identification(RFID)
Debjyoti Paul, Sumana Basu, Sukanya Ghosh
Proceedings of International Conference on Scientific Paradigm Shift In Information Technology & Management (SPSITM 2011), January 2011, Kolkata, INDIA.

[Abstract] | [pdf] | [google scholar]

Though RFID provides automatic object identification, yet it is vulnerable to various security threats that put consumer and organization privacy at stake. In this work, we have considered some existing security protocols of RFID system and analyzed the possible security threats at each level. We have modified those parts of protocol that have security loopholes and thus finally proposed a modified four-level security model that has the potential to provide fortification against security threats.

Multilevel Security Protocol using Radio Frequency Identification
Debjyoti Paul, Sumana Basu, Punit Beriwal
IEEE Paper, International Conference on Emerging Trends in Mathematics and Computer Applications–2010 Page no-544 to 547 , Sivakasi, Tamil Nadu.


Projects and Skills

Twitter Election 2016 Sentiment Analysis, What Twitter says!


MusicAtlas - Music Wordwide!

| |

Online Topic Discovery via Online Clustering


Question Answering System

Metonym: Learn vocabulary with Wordweb interactively

Dart News: A street news browsing application with an interactive GIS interface


Choropleth : Indian States and Districts


IntelliAd: A Social Media driven Intelligent Ad-Targeting framework using Geo-profiling

Multi-objective Evolution based Dynamic Job Scheduler in Grid

| |

RCached-tree: An Index Structure for Efficiently Answering Popular Queries


AirQuality @ Utah


STaCHIT : Smart TimeLine and Chit-Chat (Yahoo HackU Winner 2012)

Real time discrimination of Speech and Music

Cryptography-Diffie-Hellman Key Exchange Through Elliptic Curve Method


Language: Python, Java
Language: C++,C
Databases: MySQL, HP Vertica, PostGRE SQL
Scripts: Javascript, Shell Script, D3.js, Three.js, LaTeX
Softwares etc.: Vim, OhMyZsh, IntellijIDEA, Eclipse, WebStorm
Tid-Bits: Apache Spark, Data Warehousing, Hadoop, Azkaban2 & Oozie (Exec engines), Maven


Spatio-temporal Sentiment Analysis Project estorm.org analyzed the sentiment of common people on US Election. It gained a lot of media coverage. (2016)
| | | | | | | | | | | and many more..
Building a National Neighborhood Dataset From Geotagged Twitter Data for Indicators of Happiness, Diet, and Physical Activity. JMIR 2016. Amassed a lot of media attention.
| | | | | | | | | |
Ranked 3rd out of 39 M.Tech students of CSE department in Indian of Institute Technology, Kanpur M.Tech (2011-2013)
Secured All India Rank 228 in GATE 2012 among 1.56 lakh participants of Computer Science & Information Technology department. (2011-2012)
Achieved All India Rank of 7 in Indian Space Research Organization (ISRO) recruitment exam. (2011-2012)
Secured All India Rank 223 in GATE 2011 among 1.36 lakh participants of Computer Science & Information Technology department. (2010-2011)
Amongst top 10 student of CSE department in Institute of Engineering Management, Kolkata, and awarded academic excellence for performance in B.Tech (2007-2011)
Acknowledged as the Best Project by the course professor for “Boosting performance of popular queries” which was done as part of CS618 (Indexing and Searching of Databases) course. (2011-2012)
Got 2nd Rank in Project Fair in Bits to Bytes 2008, in Most Economic Autonomous Line Follower Robot and 3rd position in Dzyan IEM Techfest 2008 in Roborally event.
Awarded 2nd prize for academic excellence in school in Class XII
Awarded 1st prize for academic excellence in school in Class XI
Awarded academic excellence certificate in district level for performance in Board Exam

things interest me

Timelapse Videos
Images: got 500+ likes..

Reach Me