Back to Projects
Project

Composite Behavioral Modeling for Identity Theft Detection in Online Social Networks

Posted by Admin: System Admin

Beginner
Abstract

In this work, we aim at building a bridge from coarse behavioral data to an effective, quick-response, and robust behavioral model for online identity theft detection. We concentrate on this issue in online social networks (OSNs) where users usually have composite behavioral records, consisting of multidimensional low-quality data, e.g., offline check-ins and online user-generated content (UGC). As an insightful result, we validate that there is a complementary effect among different dimensions of records for modeling users’ behavioral patterns. To deeply exploit such a complementary effect, we propose a joint (instead of fused) model to capture both online and offline features of a user’s composite behavior. We evaluate the proposed joint model by comparing it with typical models and their fused model on two real-world datasets: Foursquare and Yelp. The experimental results show that our model outperforms the existing ones, with the area under the receiver operating characteristic curve (AUC) values 0.956 in Foursquare and 0.947 in Yelp, respectively. Particularly, the recall (true positive rate) can reach up to 65.3% in Foursquare and 72.2% in Yelp with the corresponding disturbance rate (false-positive rate) below 1%. It is worth mentioning that these performances can be achieved by examining only one composite behavior, which guarantees the low response latency of our method. This study would give the cybersecurity community new insights into whether and how real-time online identity authentication can be improved via modeling users’ composite behavioral patterns. Machine learning is an important component of the growing field of data science. Through the use of statistical methods, different type of algorithms is trained to make classifications or predictions, and to uncover key insights in this project. These insights subsequently drive decision making within applications and businesses, ideally impacting key growth metrics. Machine learning algorithms build a model based on this project data, known as training data, in order to make predictions or decisions without being explicitly programmed to do so. Machine learning algorithms are used in a wide variety of datasets, where it is difficult or unfeasible to develop conventional algorithms to perform the needed tasks.

Existing System & Flaws

Sitova et al. [53] introduced hand movement, orientation, and grasp (HMOG), a set of behavioral features to continuously authenticate smartphone users. Rajoub and Zwiggelaar [15] used thermal imaging to monitor the periorbital region’s thermal variations and test whether it can offer a discriminative signature for detecting deception. However, these biometric technologies usually require expensive hardware devices which makes it inconvenient and difficult to popularize. Abouelenien et al. [30] explored a multimodal deception detection approach that relied on a novel dataset of 149 multimodal recordings, and integrated multiple physiological, linguistic, and thermal features. These works indicated that users’ behavior patterns can represent their identities. Many studies turn to utilize users’ behavior patterns for identifications. Behavior-based methods were born at the right moment, which plays important roles in a wide range of tasks including preventing and detecting identity theft. Typically, behavior-based user identification includes two phases: user profiling and user identifying. User profiling is a process to characterize a user with his/her history behavioral data. Some works focus on statistical characteristics, such as the mean, variance, median, or frequency of a variable, to establish the user profile. Naini et al. [55] studied the task of identifying the users by matching the histograms of their data in the anonymous dataset with the histograms from the original dataset. But it mainly relied on experts’ experience since different cases usually have different characteristics. Egele et al. [7] proposed a behavior-based method to identify compromises of individual high-profile accounts. However, it required high-profile accounts which were difficult to obtain. Other researchers discovered other features, such as tracing patterns, topic and spatial distributions, to describe user identity. Ruan et al. [32] conducted a study on online user behavior by collecting and analyzing user clickstreams of a well-known OSN. Lesaege et al. [31] developed a topic model extending the LDA to identify the active users. Viswanath et al. [56] presented a technique based on principal component analysis (PCA) that accurately modeled the “like” behavior of normal users in Facebook and identified significant deviations from it as anomalous behaviors. Zaeem et al. [33] proposed an approach that involved the novel collection of online news stories and reports on the topic of identity theft. Lichman and Smyth [48] proposed MKDE model to accurately characterize and predict the spatial pattern of an individual’s events. Tsikerdekis and Zeadally [57] presented a detection method based on nonverbal behavior for identity deception, which can be applied to many types of social media. These methods above mainly concentrated on a specific dimension of the composite behavior and seldom thought about utilizing multidimensional behavior data. Sekara et al. [58] explored the complex interaction between social and geospatial behavior and demonstrated that social behavior can be predicted with high precision. It indicated that composite behavior features can identify one’s identity. Yin et al. [42] proposed a probabilistic generative model combining the use of spatiotemporal data and semantic information to predict user’s behavior. Nilizadeh et al. [49] presented POISED, a system that leverages the differences in propagation between benign and malicious messages on social networks to identify spam and other unwanted content. These studies implied that composite behavior features are possibly helpful for user identification. Disadvantages 1) LDA model performs poorly in both datasets which may indicate its performance is strongly sensitive to the data quality. 2) CF-KDE and LDA model performs not well in Yelp dataset comparing to Foursquare dataset, but the fused model [17] observes a surprising reversion. 3) The joint model based on relative anomalous score Sr outperforms the model based on logarithmic anomalous score Sl . 4) The joint model (i.e., JOINT-SR, the joint model in the following content of the system all refer to the joint model based on Sr ) is indeed superior to the fused model.

Proposed System & Advantages

In this article, we propose an approach to detect identity theft by using multidimensional behavioral records which are possibly insufficient in each dimension. According to such characteristics, we choose the online social network (OSN) as a typical scenario where most users’ behaviors are coarsely recorded [39]. In the Internet era, users’ behaviors are composited by offline behaviors, online behaviors, social behaviors, and perceptual/cognitive behaviors. The behavioral data can be collected in many applications, such as offline check-ins in location-based services (LBSs), online tips-posting in instant messaging services, and social relationship-making in online social services. Accordingly, we design our method based on users’ composite behaviors by these categories. In OSNs, user behavioral data that can be used for online identity theft detection are often too low-quality or restricted to build qualified behavioral models due to the difficulty of data collection, the requirement of user privacy, and the fact that some users have a few several behavioral records. We devote ourselves to proving that a high-quality (effective, quickresponse, and robust) behavioral model can be obtained by integrally using multidimensional behavioral data, even though the data is extremely insufficient in each dimension. Advantages 1) We propose a joint model, CBM, to capture both online and offline features of a user’s composite behavior to fully exploit coarse behavioral data. 2) We devise a relative anomalous score Sr to measure the occurrence rate of each composite behavior for realizing real-time identity theft detection. 3) We perform experiments on two real-world datasets to demonstrate the effectiveness of CBM. The results show that our model outperforms the existing models and has the low response latency.

Software Requirements
  • ? Operating system : Windows 7 Ultimate.
  • ? Coding Language : Python.
  • ? Front-End : Python.
  • ? Back-End : Django-ORM
  • ? Designing : Html, css, javascript.
  • ? Data Base : MySQL (WAMP Server).
Hardware Requirements
  • H/W System Configuration:-
  • ? Processor - Pentium –IV
  • ? RAM - 4 GB (min)
  • ? Hard Disk - 20 GB
  • ? Key Board - Standard Windows Keyboard
  • ? Mouse - Two or Three Button Mouse
  • ? Monitor - SVGA

Interested in this Project?

You need an active student profile to apply for this project.

Log In to Apply