<?xml version="1.0" encoding="utf-8" ?>
<rss version="2.0">
<channel>
<title>Cyber Center Publications</title>
<copyright>Copyright (c) 2013 Purdue University All rights reserved.</copyright>
<link>http://docs.lib.purdue.edu/ccpubs</link>
<description>Recent documents in Cyber Center Publications</description>
<language>en-us</language>
<lastBuildDate>Sun, 12 May 2013 01:34:22 PDT</lastBuildDate>
<ttl>3600</ttl>


	
		
	

	
		
	

	
		
	

	
		
	

	
		
	

	
		
	

	
		
	

	
		
	

	
		
	

	
		
	

	
		
	

	
		
	

	
		
	

	
		
	

	
		
	

	
		
	

	
		
	

	
		
	

	
		
	

	
		
	




<item>
<title>OpenMPC: extended OpenMP for efficient programming and tuning on GPUs</title>
<link>http://docs.lib.purdue.edu/ccpubs/524</link>
<guid isPermaLink="true">http://docs.lib.purdue.edu/ccpubs/524</guid>
<pubDate>Fri, 10 May 2013 11:38:41 PDT</pubDate>
<description>
	<![CDATA[
	<p>General-purpose graphics processing units (GPGPUs) provide inexpensive, high performance platforms for compute-intensive applications. However, their programming complexity poses a significant challenge to developers. Even though the compute unified device architecture (CUDA) programming model offers better abstraction, developing efficient GPGPU code is still complex and error-prone. This paper proposes a directive-based, high-level programming model, called OpenMPC, which addresses both programmability and tunability issues on GPGPUs. We have developed a fully automatic compilation and user-assisted tuning system supporting OpenMPC. In addition to a range of compiler transformations and optimisations, the system includes tuning capabilities for generating, pruning, and navigating the search space of compilation variants. Evaluation using 14 applications shows that our system achieves 75% of the performance of the hand-coded CUDA programmes (92% if excluding one exceptional case).</p>

	]]>
</description>

<author>Seyong Lee et al.</author>


</item>




<item>
<title>Mining contrastive opinions on political texts using cross-perspective topic model</title>
<link>http://docs.lib.purdue.edu/ccpubs/523</link>
<guid isPermaLink="true">http://docs.lib.purdue.edu/ccpubs/523</guid>
<pubDate>Thu, 09 May 2013 12:36:29 PDT</pubDate>
<description>
	<![CDATA[
	<p>This paper presents a novel opinion mining research problem, which is called Contrastive Opinion Modeling (COM). Given any query topic and a set of text collections from multiple perspectives, the task of COM is to present the opinions of the individual perspectives on the topic, and furthermore to quantify their difference. This general problem subsumes many interesting applications, including opinion summarization and forecasting, government intelligence and cross-cultural studies. We propose a novel unsupervised topic model for contrastive opinion modeling. It simulates the generative process of how opinion words occur in the documents of different collections. The ad hoc opinion search process can be efficiently accomplished based on the learned parameters in the model. The difference of perspectives can be quantified in a principled way by the Jensen-Shannon divergence among the individual topic-opinion distributions. An extensive set of experiments have been conducted to evaluate the proposed model on two datasets in the political domain: 1) statement records of U.S. senators; 2) world news reports from three representative media in U.S., China and India, respectively. The experimental results with both qualitative and quantitative analysis have shown the effectiveness of the proposed model.</p>

	]]>
</description>

<author>Yi Fang et al.</author>


</item>




<item>
<title>Emotion tagging for comments of online news by meta classification with heterogeneous information sources</title>
<link>http://docs.lib.purdue.edu/ccpubs/522</link>
<guid isPermaLink="true">http://docs.lib.purdue.edu/ccpubs/522</guid>
<pubDate>Thu, 09 May 2013 12:36:28 PDT</pubDate>
<description>
	<![CDATA[
	<p>With the rapid growth of online news services, users can actively respond to online news by making comments. Users often express subjective emotions in comments such as sadness, surprise and anger. Such emotions can help understand the preferences and perspectives of individual users, and therefore may facilitate online publishers to provide users with more relevant services. This paper tackles the task of predicting emotions for the comments of online news. To the best of our knowledge, this is the first research work for addressing the task. In particular, this paper proposes a novel Meta classification approach that exploits heterogeneous information sources such as the content of the comments and the emotion tags of news articles generated by users. The experiments on two datasets from online news services demonstrate the effectiveness of the proposed approach.</p>

	]]>
</description>

<author>Ying Zhang et al.</author>


</item>




<item>
<title>Mixture model with multiple centralized retrieval algorithms for result merging in federated search</title>
<link>http://docs.lib.purdue.edu/ccpubs/521</link>
<guid isPermaLink="true">http://docs.lib.purdue.edu/ccpubs/521</guid>
<pubDate>Thu, 09 May 2013 12:36:27 PDT</pubDate>
<description>
	<![CDATA[
	<p>Result merging is an important research problem in federated search for merging documents retrieved from multiple ranked lists of selected information sources into a single list. The state-of-the-art result merging algorithms such as Semi-Supervised Learning (SSL) and Sample-Agglomerate Fitting Estimate (SAFE) try to map document scores retrieved from different sources to comparable scores according to a single centralized retrieval algorithm for ranking those documents. Both SSL and SAFE arbitrarily select a single centralized retrieval algorithm for generating comparable document scores, which is problematic in a heterogeneous federated search environment, since a single centralized algorithm is often suboptimal for different information sources. Based on this observation, this paper proposes a novel approach for result merging by utilizing multiple centralized retrieval algorithms. One simple approach is to learn a set of combination weights for multiple centralized retrieval algorithms (e.g., logistic regression) to compute comparable document scores. The paper shows that this simple approach generates suboptimal results as it is not flexible enough to deal with heterogeneous information sources. A mixture probabilistic model is thus proposed to learn more appropriate combination weights with respect to different types of information sources with some training data. An extensive set of experiments on three datasets have proven the effectiveness of the proposed new approach.</p>

	]]>
</description>

<author>Dzung Hong et al.</author>


</item>




<item>
<title>Effective query generation and postprocessing strategies for prior art patent search</title>
<link>http://docs.lib.purdue.edu/ccpubs/520</link>
<guid isPermaLink="true">http://docs.lib.purdue.edu/ccpubs/520</guid>
<pubDate>Thu, 09 May 2013 12:36:25 PDT</pubDate>
<description>
	<![CDATA[
	<p>Rapid increase in global competition demands increased protection of intellectual property rights and underlines the importance of patents as major intellectual property documents. Prior art patent search is the task of identifying related patents for a given patent file, and is an essential step in judging the validity of a patent application. This article proposes an automated query generation and postprocessing method for prior art patent search. The proposed approach first constructs structured queries by combining terms extracted from different fields of a query patent and then reranks the retrieved patents by utilizing the International Patent Classification (IPC) code similarities between the query patent and the retrieved patents along with the retrieval score. An extensive set of empirical results carried out on a large-scale, real-world dataset shows that utilizing 20 or 30 query terms extracted from all fields of an original query patent according to their log(tf)idf values helps form a representative search query out of the query patent and is found to be more effective than is using any number of query terms from any single field. It is shown that combining terms extracted from different fields of the query patent by giving higher importance to terms extracted from the abstract, claims, and description fields than to terms extracted from the title field is more effective than treating all extracted terms equally while forming the search query. Finally, utilizing the similarities between the IPC codes of the query patent and retrieved patents is shown to be beneficial to improve the effectiveness of the prior art search.</p>

	]]>
</description>

<author>Suleyman Cetintas et al.</author>


</item>




<item>
<title>Sentiment detection with auxiliary data</title>
<link>http://docs.lib.purdue.edu/ccpubs/519</link>
<guid isPermaLink="true">http://docs.lib.purdue.edu/ccpubs/519</guid>
<pubDate>Thu, 09 May 2013 12:30:48 PDT</pubDate>
<description>
	<![CDATA[
	<p>As an important application in text mining and social media, sentiment detection has aroused more and more research interests, due to the expanding volume of available online information such as microblogging messages and review comments. Many machine learning methods have been proposed for sentiment detection. As a branch of machine learning, transfer learning is an important technique that tries to transfer knowledge from one domain to another one. When applied to sentiment detection, existing transfer learning methods employ articles with human labeled sentiments from other domains to help the sentiment detection on a target domain. Although most existing transfer learning methods are devoted to handle the data distribution difference between different domains, they only resort to some approximation methods, which may introduce some unnecessary biases. Furthermore, the popular assumption of existing transfer learning techniques on conditional probability is often too strong for practical applications. In this paper, we propose a novel method to model the distribution difference between different domains in sentiment detection by directly modeling the underlying joint distributions for different domains. Some of the important properties of the proposed method, such as the convergence rate and time complexity, are analyzed. The experimental results on the product review dataset and the twitter dataset demonstrate the advantages of the proposed method over the state-of-the-art methods.</p>

	]]>
</description>

<author>Dan Zhang et al.</author>


</item>




<item>
<title>Initial results of using an intelligent tutoring system with Alice</title>
<link>http://docs.lib.purdue.edu/ccpubs/518</link>
<guid isPermaLink="true">http://docs.lib.purdue.edu/ccpubs/518</guid>
<pubDate>Thu, 09 May 2013 12:30:47 PDT</pubDate>
<description>
	<![CDATA[
	<p>This paper describes the initial steps taken towards incorporating an intelligent tutoring system (ITS) into Alice. After initially describing an ITS, the paper focuses on the development of several tutorials for teaching specific introductory programming concepts that have been created using stencils. Initial results concerning usability and effectiveness of these stencil-based tutorials are provided.</p>

	]]>
</description>

<author>Stephen Cooper et al.</author>


</item>




<item>
<title>Expertise Retrieval</title>
<link>http://docs.lib.purdue.edu/ccpubs/517</link>
<guid isPermaLink="true">http://docs.lib.purdue.edu/ccpubs/517</guid>
<pubDate>Thu, 09 May 2013 12:21:01 PDT</pubDate>
<description>
	<![CDATA[
	<p>People have looked for experts since before the advent of computers. With advances in information retrieval technology, coupled with the large-scale availability of traces of knowledge-related activities, computer systems that can fully automate the process of locating expertise have become a reality. The past decade has witnessed tremendous interest and a wealth of results in expertise retrieval as an emerging subdiscipline in information retrieval. This survey highlights advances in models and algorithms relevant to this field. We draw connections among methods proposed in the literature and summarize them in five groups of basic approaches. These serve as the building blocks for more advanced models that arise when we consider a range of content-based factors that may impact the strength of association between a topic and a person. We also discuss practical aspects of building an expert search system and present applications of the technology in other domains such as blog distillation and entity retrieval. The limitations of current approaches are also pointed out. We end our survey with a set of conjectures on what the future may hold for expertise retrieval research.</p>

	]]>
</description>

<author>Krisztian Balog et al.</author>


</item>




<item>
<title>A Discriminative Data-Dependent Mixture-Model Approach for Multiple Instance Learning in Image Classification</title>
<link>http://docs.lib.purdue.edu/ccpubs/516</link>
<guid isPermaLink="true">http://docs.lib.purdue.edu/ccpubs/516</guid>
<pubDate>Thu, 09 May 2013 12:21:00 PDT</pubDate>
<description>
	<![CDATA[
	<p>Multiple Instance Learning (MIL) has been widely used in various applications including image classification. However, existing MIL methods do not explicitly address the multi-target problem where the distributions of positive instances are likely to be multi-modal. This strongly limits the performance of multiple instance learning in many real world applications. To address this problem, this paper proposes a novel discriminative data-dependent mixture-model method for multiple instance learning (MM-MIL) approach in image classification. The new method explicitly handles the multi-target problem by introducing a data-dependent mixture model, which allows positive instances to come from different clusters in a flexible manner. Furthermore, the kernelized representation of the proposed model allows effective and efficient learning in high dimensional feature space. An extensive set of experimental results demonstrate that the proposed new MM-MIL approach substantially outperforms several state-of-art MIL algorithms on benchmark datasets.</p>

	]]>
</description>

<author>Qifan Wang et al.</author>


</item>




<item>
<title>Robust Nonnegative Matrix Factorization via $L_1$ Norm Regularization</title>
<link>http://docs.lib.purdue.edu/ccpubs/515</link>
<guid isPermaLink="true">http://docs.lib.purdue.edu/ccpubs/515</guid>
<pubDate>Thu, 09 May 2013 12:20:59 PDT</pubDate>
<description>
	<![CDATA[
	<p>Nonnegative Matrix Factorization (NMF) is a widely used technique in many applications such as face recognition, motion segmentation, etc. It approximates the nonnegative data in an original high dimensional space with a linear representation in a low dimensional space by using the product of two nonnegative matrices. In many applications data are often partially corrupted with large additive noise. When the positions of noise are known, some existing variants of NMF can be applied by treating these corrupted entries as missing values. However, the positions are often unknown in many real world applications, which prevents the usage of traditional NMF or other existing variants of NMF. This paper proposes a Robust Nonnegative Matrix Factorization (RobustNMF) algorithm that explicitly models the partial corruption as large additive noise without requiring the information of positions of noise. In practice, large additive noise can be used to model outliers. In particular, the proposed method jointly approximates the clean data matrix with the product of two nonnegative matrices and estimates the positions and values of outliers/noise. An efficient iterative optimization algorithm with a solid theoretical justification has been proposed to learn the desired matrix factorization. Experimental results demonstrate the advantages of the proposed algorithm.</p>

	]]>
</description>

<author>Bin Shen et al.</author>


</item>




<item>
<title>A Bayesian Approach toward Active Learning for Collaborative Filtering</title>
<link>http://docs.lib.purdue.edu/ccpubs/514</link>
<guid isPermaLink="true">http://docs.lib.purdue.edu/ccpubs/514</guid>
<pubDate>Thu, 09 May 2013 12:20:58 PDT</pubDate>
<description>
	<![CDATA[
	<p>Collaborative filtering is a useful technique for exploiting the preference patterns of a group of users to predict the utility of items for the active user. In general, the performance of collaborative filtering depends on the number of rated examples given by the active user. The more the number of rated examples given by the active user, the more accurate the predicted ratings will be. Active learning provides an effective way to acquire the most informative rated examples from active users. Previous work on active learning for collaborative filtering only considers the expected loss function based on the estimated model, which can be misleading when the estimated model is inaccurate. This paper takes one step further by taking into account of the posterior distribution of the estimated model, which results in more robust active learning algorithm. Empirical studies with datasets of movie ratings show that when the number of ratings from the active user is restricted to be small, active learning methods only based on the estimated model don't perform well while the active learning method using the model distribution achieves substantially better performance.</p>

	]]>
</description>

<author>Rong Jin et al.</author>


</item>




<item>
<title>A latent pairwise preference learning approach for recommendation from implicit feedback</title>
<link>http://docs.lib.purdue.edu/ccpubs/513</link>
<guid isPermaLink="true">http://docs.lib.purdue.edu/ccpubs/513</guid>
<pubDate>Thu, 09 May 2013 12:20:56 PDT</pubDate>
<description>
	<![CDATA[
	<p>Most of the current recommender systems heavily rely on explicit user feedback such as ratings on items to model users' interests. However, in many applications, it is very hard to collect the explicit feedback, while implicit feedback such as user clicks may be more available. Furthermore, it is often more suitable for many recommender systems to address a ranking problem than a rating predicting problem. This paper proposes a latent pairwise preference learning (LPPL) approach for recommendation with implicit feedback. LPPL directly models user preferences with respect to a set of items rather than the rating scores on individual items, which are modeled with a set of features by analyzing clickthrough data available in many real-world recommender systems. The LPPL approach models both the latent variables of group structure of users and the pairwise preferences simultaneously. We conduct experiments on the testbed from a real-world recommender system and demonstrate that the proposed approach can effectively improve the recommendation performance against several baseline algorithms.</p>

	]]>
</description>

<author>Yi Fang et al.</author>


</item>




<item>
<title>Efficient and Practical Approach for Private Record Linkage</title>
<link>http://docs.lib.purdue.edu/ccpubs/512</link>
<guid isPermaLink="true">http://docs.lib.purdue.edu/ccpubs/512</guid>
<pubDate>Thu, 09 May 2013 12:13:55 PDT</pubDate>
<description>
	<![CDATA[
	<p>Record linkage is used to associate entities from multiple data sources. For example, two organizations contemplating a merger may want to know how common their customer bases are so that they may better assess the benefits of the merger. Another example is a database of people who are forbidden from a certain activity by regulators, may need to be compared to a list of people engaged in that activity. The autonomous entities who wish to carry out the record matching computation are often reluctant to fully share their data; they fear losing control over its subsequent dissemination and usage, or they want to insure privacy because the data is proprietary or confidential, and/or they are cautious simply because privacy laws forbid its disclosure or regulate the form of that disclosure. In such cases, the problem of carrying out the linkage computation without full data exchange has been called private record linkage. Previous private record linkage techniques have made use of a third party. We provide efficient techniques for private record linkage that improve on previous work in that (1) our techniques make no use of a third party, and (2) they achieve much better performance than previous schemes in terms of their execution time while maintaining acceptable quality of output compared to nonprivacy settings. Our protocol consists of two phases. The first phase primarily produces candidate record pairs for matching, by carrying out a very fast (but not accurate) matching between such pairs of records. The second phase is a novel protocol for efficiently computing distances between each candidate pair (without any expensive cryptographic operations such as modular exponentiations). Our experimental evaluation of our approach validates these claims.</p>

	]]>
</description>

<author>Mohamed Yakout et al.</author>


</item>




<item>
<title>Reliable, Flexible Cloud Computing, Storage, Backup, Disaster Recovery</title>
<link>http://docs.lib.purdue.edu/ccpubs/511</link>
<guid isPermaLink="true">http://docs.lib.purdue.edu/ccpubs/511</guid>
<pubDate>Thu, 09 May 2013 12:13:54 PDT</pubDate>
<description>
	<![CDATA[
	<p>The large-scale, dynamic, and heterogeneous nature of cloud computing poses numerous security challenges. But the cloud's main challenge is to provide a robust authorization mechanism that incorporates multitenancy and virtualization aspects of resources. The authors present a distributed architecture that incorporates principles from security management and software engineering and propose key requirements and a design model for the architecture.</p>

	]]>
</description>

<author>Abdulrahman Almutairi et al.</author>


</item>




<item>
<title>M3: Stream Processing on Main-Memory MapReduce</title>
<link>http://docs.lib.purdue.edu/ccpubs/510</link>
<guid isPermaLink="true">http://docs.lib.purdue.edu/ccpubs/510</guid>
<pubDate>Thu, 09 May 2013 12:13:53 PDT</pubDate>
<description>
	<![CDATA[
	<p>The continuous growth of social web applications along with the development of sensor capabilities in electronic devices is creating countless opportunities to analyze the enormous amounts of data that is continuously steaming from these applications and devices. To process large scale data on large scale computing clusters, MapReduce has been introduced as a framework for parallel computing. However, most of the current implementations of the MapReduce framework support only the execution of fixed-input jobs. Such restriction makes these implementations inapplicable for most streaming applications, in which queries are continuous in nature, and input data streams are continuously received at high arrival rates. In this demonstration, we showcase M$^3$, a prototype implementation of the MapReduce framework in which continuous queries over streams of data can be efficiently answered. M$^3$ extends Hadoop, the open source implementation of MapReduce, bypassing the Hadoop Distributed File System (HDFS) to support main-memory-only processing. Moreover, M$^3$ supports continuous execution of the Map and Reduce phases where individual Mappers and Reducers never terminate.</p>

	]]>
</description>

<author>Ahmed Aly et al.</author>


</item>




<item>
<title>Spatial Queries with Two kNN Predicates</title>
<link>http://docs.lib.purdue.edu/ccpubs/509</link>
<guid isPermaLink="true">http://docs.lib.purdue.edu/ccpubs/509</guid>
<pubDate>Thu, 09 May 2013 12:13:52 PDT</pubDate>
<description>
	<![CDATA[
	<p>The widespread use of location-aware devices has led to countless location-based services in which a user query can be arbitrarily complex, i.e., one that embeds multiple spatial selection and join predicates. Amongst these predicates, the k-Nearest-Neighbor (kNN) predicate stands as one of the most important and widely used predicates. Unlike related research, this paper goes beyond the optimization of queries with single kNN predicates, and shows how queries with two kNN predicates can be optimized. In particular, the paper addresses the optimization of queries with: (i) two kNN-select predicates, (ii) two kNN-join predicates, and (iii) one kNN-join predicate and one kNN-select predicate. For each type of queries, conceptually correct query evaluation plans (QEPs) and new algorithms that optimize the query execution time are presented. Experimental results demonstrate that the proposed algorithms outperform the conceptually correct QEPs by orders of magnitude.</p>

	]]>
</description>

<author>Ahmed Aly et al.</author>


</item>




<item>
<title>Deep Web Query Interface Understanding and Integration</title>
<link>http://docs.lib.purdue.edu/ccpubs/508</link>
<guid isPermaLink="true">http://docs.lib.purdue.edu/ccpubs/508</guid>
<pubDate>Thu, 09 May 2013 12:13:51 PDT</pubDate>
<description>
	<![CDATA[
	<p>There are millions of searchable data sources on the Web and to a large extent their contents can only be reached through their own query interfaces. There is an enormous interest in making the data in these sources easily accessible. There are primarily two general approaches to achieve this objective. The first is to surface the contents of these sources from the deep Web and add the contents to the index of regular search engines. The second is to integrate the searching capabilities of these sources and support integrated access to them. In this book, we introduce the state-of-the-art techniques for extracting, understanding, and integrating the query interfaces of deep Web data sources. These techniques are critical for producing an integrated query interface for each domain. The interface serves as the mediator for searching all data sources in the concerned domain. While query interface integration is only relevant for the deep Web integration approach, the extraction and understanding of query interfaces are critical for both deep Web exploration approaches.</p>
<p>This book aims to provide in-depth and comprehensive coverage of the key technologies needed to create high quality integrated query interfaces automatically. The following technical issues are discussed in detail in this book: query interface modeling, query interface extraction, query interface clustering, query interface matching, query interface attribute integration, and query interface integration</p>

	]]>
</description>

<author>Eduard Dragut et al.</author>


</item>




<item>
<title>Ionomics Atlas: a tool to explore interconnected ionomic, genomic and environmental data</title>
<link>http://docs.lib.purdue.edu/ccpubs/507</link>
<guid isPermaLink="true">http://docs.lib.purdue.edu/ccpubs/507</guid>
<pubDate>Thu, 09 May 2013 12:13:50 PDT</pubDate>
<description>
	<![CDATA[
	<p>Ionomics Atlas facilitates access, analysis and interpretation of an existing large-scale heterogeneous dataset consisting of ionomic (elemental composition of an organism), genetic (heritable changes in the DNA of an organism) and geographic information (geographic location, altitude, climate, soil properties, etc). Ionomics Atlas allows connections to be made between the genetic regulation of the ionome of plant populations and their landscape distribution, allowing scientists to investigate the role of natural ionomic variation in adaptation of populations to varied environmental conditions in the landscape.  The goal of the Ionomics Atlas is twofold: (1) to allow both novice and expert users to easily access and explore layers of interconnected ionomic, genomic and environmental data; and (2) to facilitate hypothesis generation and testing by proving direct querying and browsing of the data as well as different display modes of the results.</p>

	]]>
</description>

<author>Eduard Dragut et al.</author>


</item>




<item>
<title>Polarity Consistency Checking for Sentiment Dictionaries</title>
<link>http://docs.lib.purdue.edu/ccpubs/506</link>
<guid isPermaLink="true">http://docs.lib.purdue.edu/ccpubs/506</guid>
<pubDate>Thu, 09 May 2013 12:13:49 PDT</pubDate>
<description>
	<![CDATA[
	<p>Polarity classification of words is important for applications such as Opinion Mining and Sentiment Analysis. A number of sentiment word/sense dictionaries ave been manually or (semi)automatically constructed. The dictionaries have substantial inaccuracies. Besides obvious instances, where the same word appears with different polarities in different dictionaries, the dictionaries exhibit complex cases, which cannot be detected by mere manual inspection. We introduce the concept of polarity consistency of words/senses in sentiment dictionaries in this paper. We show that the consistency problem is NP-complete. We reduce the polarity consistency problem to the satisfiability problem and utilize a fast SAT solver to detect inconsistencies in a sentiment dictionary. We perform experiments on four sentiment dictionaries and WordNet.</p>

	]]>
</description>

<author>Eduard Dragut et al.</author>


</item>




<item>
<title>A hybrid approach of OpenMP for clusters</title>
<link>http://docs.lib.purdue.edu/ccpubs/505</link>
<guid isPermaLink="true">http://docs.lib.purdue.edu/ccpubs/505</guid>
<pubDate>Thu, 09 May 2013 12:13:48 PDT</pubDate>
<description>
	<![CDATA[
	<p>We present the first fully automated compiler-runtime system that successfully translates and executes OpenMP shared-address-space programs on laboratory-size clusters, for the complete set of regular, repetitive applications in the NAS Parallel Benchmarks. We introduce a hybrid compiler-runtime translation scheme. Compared to previous work, this scheme features a new runtime data flow analysis and new compiler techniques for improving data affinity and reducing communication costs. We present and discuss the performance of our translated programs, and compare them with the performance of the MPI, HPF and UPC versions of the benchmarks. The results show that our translated programs achieve 75% of the hand-coded MPI programs, on average.</p>

	]]>
</description>

<author>Okwan Kwon et al.</author>


</item>




<item>
<title>Topic 11: Multicore and Manycore Programming</title>
<link>http://docs.lib.purdue.edu/ccpubs/504</link>
<guid isPermaLink="true">http://docs.lib.purdue.edu/ccpubs/504</guid>
<pubDate>Thu, 09 May 2013 12:13:12 PDT</pubDate>
<description>
	<![CDATA[
	<p>Modern multicore and manycore systems enjoy the benefits of technology scaling and promise impressive performance. However, harvesting this potential is not straightforward. While multicore and manycore processors alleviate several problems that are related to single-core processors – known as memory-, power-, or instruction-level parallelism-wall – they raise the issue of the programmability and programming effort. This topic focuses on novel solutions for multicore and manycore programmability and efficient programming in the context of generalpurpose systems.</p>

	]]>
</description>

<author>Eduart Ayguade et al.</author>


</item>




<item>
<title>PostgreSQL anomalous query detector</title>
<link>http://docs.lib.purdue.edu/ccpubs/503</link>
<guid isPermaLink="true">http://docs.lib.purdue.edu/ccpubs/503</guid>
<pubDate>Thu, 09 May 2013 12:13:11 PDT</pubDate>
<description>
	<![CDATA[
	<p>We propose to demonstrate the design, implementation, and the capabilities of an anomaly detection (AD) system integrated with a relational database management system (DBMS). Our AD system is trained by extracting relevant features from the parse-tree representation of the SQL commands, and then uses the DBMS roles as the classes for the bayesian classifier. In the detection phase, the maximum apriori probability role is chosen by the classifier which, if not matching the role associated with the SQL command, raises an alarm. We have implemented such system in the PostgreSQL DBMS, integrated with the statistics collection and the query processing mechanism of the DBMS. During the demonstration, our audience will be given the choice of training our system using either synthetic role-based SQL query traces based on probability sampling, or by entering their own set of training queries. In the subsequent detection mode, the audience can test the detection capabilities of the system by submitting arbitrary SQL commands. We will also allow the audience to generate arbitrary work loads to measure the overhead of the training phase and the detection phase of our AD mechanism on the performance of the DBMS.</p>

	]]>
</description>

<author>Bilal Shebaro et al.</author>


</item>




<item>
<title>Efficient and accurate strategies for differentially-private sliding window queries</title>
<link>http://docs.lib.purdue.edu/ccpubs/502</link>
<guid isPermaLink="true">http://docs.lib.purdue.edu/ccpubs/502</guid>
<pubDate>Thu, 09 May 2013 12:13:10 PDT</pubDate>
<description>
	<![CDATA[
	<p>Regularly releasing the aggregate statistics about data streams in a privacy-preserving way not only serves valuable commercial and social purposes, but also protects the privacy of individuals. This problem has already been studied under differential privacy, but only for the case of a single continuous query that covers the entire time span, e.g., counting the number of tuples seen so far in the stream. However, most real-world applications are window-based, that is, they are interested in the statistical information about streaming data within a window, instead of the whole unbound stream. Furthermore, a Data Stream Management System (DSMS) may need to answer numerous correlated aggregated queries simultaneously, rather than a single one. To cope with these requirements, we study how to release differentially private answers for a set of sliding window aggregate queries. We propose two solutions, each consisting of query sampling and composition. We first selectively sample a subset of representative sliding window queries from the set of all the submitted ones. The representative queries are answered by adding Laplace noises in a way satisfying differential privacy. For each non-representative query, we compose its answer from the query results of those representatives. The experimental evaluation shows that our solutions are efficient and effective.</p>

	]]>
</description>

<author>Jianneng Cao et al.</author>


</item>




<item>
<title>Efficient privacy-aware record integration</title>
<link>http://docs.lib.purdue.edu/ccpubs/501</link>
<guid isPermaLink="true">http://docs.lib.purdue.edu/ccpubs/501</guid>
<pubDate>Thu, 09 May 2013 12:13:08 PDT</pubDate>
<description>
	<![CDATA[
	<p>The integration of information dispersed among multiple repositories is a crucial step for accurate data analysis in various domains. In support of this goal, it is critical to devise procedures for identifying similar records across distinct data sources. At the same time, to adhere to privacy regulations and policies, such procedures should protect the confidentiality of the individuals to whom the information corresponds. Various private record linkage (PRL) protocols have been proposed to achieve this goal, involving secure multi-party computation (SMC) and similarity preserving data transformation techniques. SMC methods provide secure and accurate solutions to the PRL problem, but are prohibitively expensive in practice, mainly due to excessive computational requirements. Data transformation techniques offer more practical solutions, but incur the cost of information leakage and false matches.</p>
<p>In this paper, we introduce a novel model for practical PRL, which 1) affords controlled and limited information leakage, 2) avoids false matches resulting from data transformation. Initially, we partition the data sources into blocks to eliminate comparisons for records that are unlikely to match. Then, to identify matches, we apply an efficient SMC technique between the candidate record pairs. To enable efficiency and privacy, our model leaks a controlled amount of obfuscated data prior to the secure computations. Applied obfuscation relies on differential privacy which provides strong privacy guarantees against adversaries with arbitrary background knowledge. In addition, we illustrate the practical nature of our approach through an empirical analysis with data derived from public voter records.</p>

	]]>
</description>

<author>Mehmet Kuzu et al.</author>


</item>




<item>
<title>Efficient tree pattern queries on encrypted XML documents</title>
<link>http://docs.lib.purdue.edu/ccpubs/500</link>
<guid isPermaLink="true">http://docs.lib.purdue.edu/ccpubs/500</guid>
<pubDate>Thu, 09 May 2013 12:13:07 PDT</pubDate>
<description>
	<![CDATA[
	<p>Outsourcing XML documents is a challenging task, because it encrypts the documents, while still requiring efficient query processing. Past approaches on this topic either leak structural information or fail to support searching that has constraints on XML node content. In addition, they adopt a filtering-and-refining framework, which requires the users to prune false positives from the query results. To address these problems, we present a solution for efficient evaluation of tree pattern queries (TPQs) on encrypted XML documents. We create a domain hierarchy, such that each XML document can be embedded in it. By assigning each node in the hierarchy a position, we create for each document a vector, which encodes both the structural and textual information about the document. Similarly, a vector is created also for a TPQ. Then, the matching between a TPQ and a document is reduced to calculating the distance between their vectors. For the sake of privacy, such vectors are encrypted before being outsourced. To improve the matching efficiency, we use a k-d tree to partition the vectors into non-overlapping subsets, such that non-matchable documents are pruned as early as possible. The extensive evaluation shows that our solution is efficient and scalable to large dataset.</p>

	]]>
</description>

<author>Jianneng Cao et al.</author>


</item>




<item>
<title>Adaptive data protection in distributed systems</title>
<link>http://docs.lib.purdue.edu/ccpubs/499</link>
<guid isPermaLink="true">http://docs.lib.purdue.edu/ccpubs/499</guid>
<pubDate>Thu, 09 May 2013 12:13:06 PDT</pubDate>
<description>
	<![CDATA[
	<p><br />Security is an important barrier to wide adoption of distributed systems for sensitive data storage and management. In particular, one unsolved problem is to ensure that customers data protection policies are honored, regardless of where the data is physically stored and how often it is accessed, modified, and duplicated. This issue calls for two requirements to be satisfied. First, data should be managed in accordance to both owners' preferences and to the local regulations that may apply. Second, although multiple copies may exist, a consistent view across copies should be maintained. Toward addressing these issues, in this work we propose innovative policy enforcement techniques for adaptive sharing of users' outsourced data. We introduce the notion of autonomous self-controlling objects (SCO), that by means of object-oriented programming techniques, encapsulate sensitive resources and assure their protection by means of adaptive security policies of various granularity, and synchronization protocols. Through extensive evaluation, we show that our approach is effective and efficiently manages multiple data copies.</p>

	]]>
</description>

<author>Anna Squicciarini et al.</author>


</item>




<item>
<title>FENCE: continuous access control enforcement in dynamic data stream environments</title>
<link>http://docs.lib.purdue.edu/ccpubs/498</link>
<guid isPermaLink="true">http://docs.lib.purdue.edu/ccpubs/498</guid>
<pubDate>Thu, 09 May 2013 12:13:04 PDT</pubDate>
<description>
	<![CDATA[
	<p>In this paper, we address the problem of continuous access control enforcement in dynamic data stream environments, where both data and query security restrictions may potentially change in real-time. We present FENCE framework that ffectively addresses this problem. The distinguishing characteristics of FENCE include: (1) the stream-centric approach to security, (2) the symmetric model for security settings of both continuous queries and streaming data, and (3) two alternative security-aware query processing approaches that can optimize query execution based on regular and security-related selectivities. In FENCE, both data and query security restrictions are modeled symmetrically in the form of security metadata, called "security punctuations" embedded inside data streams. We distinguish between two types of security punctuations, namely, the data security punctuations (or short, dsps) which represent the access control policies of the streaming data, and the query security punctuations (or short, qsps) which describe the access authorizations of the continuous queries. We also present our encoding method to support XACML(eXtensible Access Control Markup Language) standard. We have implemented FENCE in a prototype DSMS and present our performance evaluation. The results of our experimental study show that FENCE's approach has low overhead and can give great performance benefits compared to the alternative security solutions for streaming environments.</p>

	]]>
</description>

<author>Rimma Nehme et al.</author>


</item>




<item>
<title>An efficient certificateless cryptography scheme without pairing</title>
<link>http://docs.lib.purdue.edu/ccpubs/497</link>
<guid isPermaLink="true">http://docs.lib.purdue.edu/ccpubs/497</guid>
<pubDate>Thu, 09 May 2013 12:13:03 PDT</pubDate>
<description>
	<![CDATA[
	<p>We propose a mediated certificateless encryption scheme without pairing operations. Mediated certificateless public key encryption (mCL-PKE) solves the key escrow problem in identity based encryption and certificate revocation problem in public key cryptography. However, existing mCL-PKE schemes are either inefficient because of the use of expensive pairing operations or vulnerable against partial decryption attacks. In order to address the performance and security issues, in this poster, we propose a novel mCL-PKE scheme. We implement our mCL-PKE scheme and a recent scheme, and evaluate the security and performance. Our results show that our algorithms are efficient and practical.</p>

	]]>
</description>

<author>Seung-Hyun Seo et al.</author>


</item>




<item>
<title>A file provenance system</title>
<link>http://docs.lib.purdue.edu/ccpubs/496</link>
<guid isPermaLink="true">http://docs.lib.purdue.edu/ccpubs/496</guid>
<pubDate>Thu, 09 May 2013 12:13:02 PDT</pubDate>
<description>
	<![CDATA[
	<p><br />A file provenance system supports the automatic collection and management of provenance i.e. the complete processing history of a data object. File system level provenance provides functionality unavailable in the existing provenance systems. In this paper, we discuss the design objectives for a flexible and efficient file provenance system and then propose the design of such a system, called FiPS. We design FiPS as a thin stackable file system for capturing provenance in a portable manner. FiPS can capture provenance at various degrees of granularity, can transform provenance records into secure information, and can direct the resulting provenance data to various persistent storage systems.</p>

	]]>
</description>

<author>Salmin Sultana et al.</author>


</item>




<item>
<title>Collusion Detection in Online Rating Systems</title>
<link>http://docs.lib.purdue.edu/ccpubs/495</link>
<guid isPermaLink="true">http://docs.lib.purdue.edu/ccpubs/495</guid>
<pubDate>Thu, 09 May 2013 12:13:01 PDT</pubDate>
<description>
	<![CDATA[
	<p>Online rating systems are subject to unfair evaluations. Users may try to individually or collaboratively promote or demote a product. Collaborative unfair rating, i.e., collusion, is more damaging than individual unfair rating. Detecting massive collusive attacks as well as honest looking intelligent attacks is still a real challenge for collusion detection systems. In this paper, we study impact of collusion in online rating systems and asses their susceptibility to collusion attacks. The proposed model uses frequent itemset mining technique to detect candidate collusion groups and sub-groups. Then, several indicators are used for identifying collusion groups and to estimate how damaging such colluding groups might be. The model has been implemented and we present results of experimental evaluation of our methodology.</p>

	]]>
</description>

<author>Mohammad Allahbakhsh et al.</author>


</item>




<item>
<title>Single-Database Private Information Retrieval from Fully Homomorphic Encryption</title>
<link>http://docs.lib.purdue.edu/ccpubs/494</link>
<guid isPermaLink="true">http://docs.lib.purdue.edu/ccpubs/494</guid>
<pubDate>Thu, 09 May 2013 12:12:59 PDT</pubDate>
<description>
	<![CDATA[
	<p>Private Information Retrieval (PIR) allows a user to retrieve the $(i)$th bit of an $(n)$-bit database without revealing to the database server the value of $(i)$. In this paper, we present a PIR protocol with the communication complexity of $(O(\gamma \log n))$ bits, where $(\gamma)$ is the ciphertext size. Furthermore, we extend the PIR protocol to a private block retrieval (PBR) protocol, a natural and more practical extension of PIR in which the user retrieves a block of bits, instead of retrieving single bit. Our protocols are built on the state-of-the-art fully homomorphic encryption (FHE) techniques and provide privacy for the user if the underlying FHE scheme is semantically secure. The total communication complexity of our PBR is $(O(\gamma \log m+\gamma n/m))$ bits, where $(m)$ is the number of blocks. The total computation complexity of our PBR is $(O(m\log m))$ modular multiplications plus $(O(n/2))$ modular additions. In terms of total protocol execution time, our PBR protocol is more efficient than existing PBR protocols which usually require to compute $(O(n/2))$ modular multiplications when the size of a block in the database is large and a high-speed network is available.</p>

	]]>
</description>

<author>Xun Yi et al.</author>


</item>




<item>
<title>Multi-route query processing and optimization</title>
<link>http://docs.lib.purdue.edu/ccpubs/493</link>
<guid isPermaLink="true">http://docs.lib.purdue.edu/ccpubs/493</guid>
<pubDate>Thu, 09 May 2013 12:12:58 PDT</pubDate>
<description>
	<![CDATA[
	<p>A modern query optimizer typically picks a single query plan for all data based on overall data statistics. However, many have observed that real-life datasets tend to have non-uniform distributions. Selecting a single query plan may result in ineffective query execution for possibly large portions of the actual data. In addition most stream query processing systems, given the volume of data, cannot precisely model the system state much less account for uncertainty due to continuous variations. Such systems select a single query plan based upon imprecise statistics. In this paper, we present “Query Mesh” (or QM), a practical alternative to state-of-the-art data stream processing approaches. The main idea of QM is to compute multiple routes (i.e., query plans), each designed for a particular subset of the data with distinct statistical properties. We use terms “plans” and “routes” interchangeably in our work. A classifier model is induced and used to assign the best route to process incoming tuples based upon their data characteristics. We formulate the QM search space and analyze its complexity. Due to the substantial search space, we propose several cost-based query optimization heuristics designed to effectively find nearly optimal QMs. We propose the Self-Routing Fabric (SRF) infrastructure that supports query execution with multiple plans without physically constructing their topologies nor using a central router like Eddy. We also consider how to support uncertain route specification and execution in QM which can occur when imprecise statistics lead to more than one optimal route for a subset of data. Our experimental results indicate that QM consistently provides better query execution performance and incurs negligible overhead compared to the alternative state-of-the-art data stream approaches.</p>

	]]>
</description>

<author>Rimma Nehmea et al.</author>


</item>




<item>
<title>Quality Control in Crowdsourcing Systems: Issues and Directions</title>
<link>http://docs.lib.purdue.edu/ccpubs/492</link>
<guid isPermaLink="true">http://docs.lib.purdue.edu/ccpubs/492</guid>
<pubDate>Thu, 09 May 2013 12:12:57 PDT</pubDate>
<description>
	<![CDATA[
	<p>As a new distributed computing model, crowdsourcing lets people leverage the crowd's intelligence and wisdom toward solving problems. This article proposes a framework for characterizing various dimensions of quality control in crowdsourcing systems, a critical issue. The authors briefly review existing quality-control approaches, identify open issues, and look to future research directions. In the Web extra, the authors discuss both design-time and runtime approaches in more detail.</p>

	]]>
</description>

<author>Mohammad Allahbakhsh et al.</author>


</item>




<item>
<title>Collaboration in multicloud computing environments: Framework and security issues</title>
<link>http://docs.lib.purdue.edu/ccpubs/491</link>
<guid isPermaLink="true">http://docs.lib.purdue.edu/ccpubs/491</guid>
<pubDate>Thu, 09 May 2013 12:12:55 PDT</pubDate>
<description>
	<![CDATA[
	<p>A proposed proxy-based multicloud computing framework allows dynamic, on-the-fly collaborations and resource sharing among cloud-based services, addressing trust, policy, and privacy issues without preestablished collaboration agreements or standardized interfaces.</p>

	]]>
</description>

<author>M Singhal et al.</author>


</item>




<item>
<title>Continuous aggregate nearest neighbor queries</title>
<link>http://docs.lib.purdue.edu/ccpubs/490</link>
<guid isPermaLink="true">http://docs.lib.purdue.edu/ccpubs/490</guid>
<pubDate>Thu, 09 May 2013 12:12:54 PDT</pubDate>
<description>
	<![CDATA[
	<p>This paper addresses the problem of continuous aggregate nearest-neighbor (CANN) queries for moving objects in spatio-temporal data stream management systems. A CANN query specifies a set of landmarks, an integer k, and an aggregate distance function f (e.g., min, max, or sum), where f computes the aggregate distance between a moving object and each of the landmarks. The answer to this continuous query is the set of k moving objects that have the smallest aggregate distance f. A CANN query may also be viewed as a combined set of nearest neighbor queries. We introduce several algorithms to continuously and incrementally answer CANN queries. Extensive experimentation shows that the proposed operators outperform the state-of-the-art algorithms by up to a factor of 3 and incur low memory overhead.</p>

	]]>
</description>

<author>Hicham G. Elmongui et al.</author>


</item>





</channel>
</rss>
