<?xml version="1.0" encoding="utf-8" ?>
<rss version="2.0">
<channel>
<title>ECE Technical Reports</title>
<copyright>Copyright (c) 2013 Purdue University All rights reserved.</copyright>
<link>http://docs.lib.purdue.edu/ecetr</link>
<description>Recent documents in ECE Technical Reports</description>
<language>en-us</language>
<lastBuildDate>Fri, 26 Apr 2013 01:41:48 PDT</lastBuildDate>
<ttl>3600</ttl>


	
		
	







<item>
<title>moreBugs: A New Dataset for Benchmarking Algorithms for Information Retrieval from Software Repositories</title>
<link>http://docs.lib.purdue.edu/ecetr/447</link>
<guid isPermaLink="true">http://docs.lib.purdue.edu/ecetr/447</guid>
<pubDate>Wed, 24 Apr 2013 06:49:55 PDT</pubDate>
<description>
	<![CDATA[
	<p>This report presents moreBugs, a new publicly available dataset derived from the AspectJ and JodaTime repositories for the benchmarking of algorithms for retrieval from software repositories. As a case in point, moreBugs contains all the information required to evaluate a search-based bug localization framework — it includes a set of closed/resolved bugs mined from the bug-tracking system, and, for each bug, its patch-file list and the corresponding snapshot of the repository extracted from version history. moreBugs tracks commit-level changes made to a software repository along with its release information. In addition to the benchmarking of bug localization algorithms, the other algorithms whose benchmarking moreBugs should prove useful for include: change detection, impact analysis, software evolution, vocabulary evolution, incremental learning, and so on.</p>

	]]>
</description>

<author>Shivani Rao et al.</author>


</item>






<item>
<title>An Effective Routability-driven Placer for Mixed-size Circuit Designs</title>
<link>http://docs.lib.purdue.edu/ecetr/446</link>
<guid isPermaLink="true">http://docs.lib.purdue.edu/ecetr/446</guid>
<pubDate>Thu, 18 Apr 2013 05:19:28 PDT</pubDate>
<description>
	<![CDATA[
	<p>We propose a routability-driven analytical placer that aims at distributing pins evenly. This is accomplished by including a group of pin density constraints in its mathematical formulation. Moreover, for mixed-size circuits, we adopt a scaled smoothing method to cope with fixed macro blocks. As a result, we have fewer cells overlapping with fixed blocks after global placement, implying that the optimization of the global placement solution is more accurate and that the global placement solution resembles a legal solution more. Routing solutions obtained by a commercial router show that for most benchmark circuits, better routing results can be achieved on the placement results generated by our pin density oriented placer.</p>

	]]>
</description>

<author>Shuai Li et al.</author>


</item>






<item>
<title>Summed Component Analysis for Dimensionality Reduction and Classification</title>
<link>http://docs.lib.purdue.edu/ecetr/445</link>
<guid isPermaLink="true">http://docs.lib.purdue.edu/ecetr/445</guid>
<pubDate>Tue, 09 Apr 2013 07:34:56 PDT</pubDate>
<description>
	<![CDATA[
	<p>In the area of dimensionality reduction, principal component analysis (PCA) has been used with much success. Other dimensionality reduction techniques have been proposed such as principal feature analysis (PFA) which was developed by Ira Cohen, Qi Tian et.al. PFA uses k-means clustering with the principal components to determine principal features. We present a new approach to dimensionality reduction of features called Summed Component Analysis (SCA). SCA uses similar criteria as PFA and PCA to create a lower dimensional feature space. However, it is unique in the way the features in the new space are formed by summing selected features from the original space. The simplicity of the approach lends some advantages to analysis since the new features are simply sums of a selected number of the original features in the lower dimensional space. Furthermore, with SCA we are able to show improved classification performance over PCA, which is known to give impressive lower-dimensional representation of a dataset, but which doesn’t always translate to improvement in classification. SCA can prove useful when applied to high dimensional data sets to be classified, such as physical measurements that describe different scenarios, or in the area of financial data analysis in which different stocks are to be combined in a way that provide optimal information needed to classify stock market trends.</p>

	]]>
</description>

<author>Mopelola Sofolahan et al.</author>


</item>






<item>
<title>Congruence closure with ACI function symbols</title>
<link>http://docs.lib.purdue.edu/ecetr/444</link>
<guid isPermaLink="true">http://docs.lib.purdue.edu/ecetr/444</guid>
<pubDate>Fri, 29 Mar 2013 06:00:21 PDT</pubDate>
<description>
	<![CDATA[
	<p>Congruence closure is the following well known reasoning problem: given a premise set of equations between ground terms over uninterpreted function symbols, does a given query equation follow using the axioms of equality? Several methods have been provided for polynomial-time answers to this question. Here we consider this same setting, but where some of the function symbols are known to be associative, commutative, and idempotent (ACI). Given these additional axioms, does the query equation follow from the premise equations? We provide a sound and complete cubic-time procedure correctly answering such questions. The problem requires exponential space when adding only AC function symbols [14], but requiring idempotence restores tractability . Our procedure is defined by providing a sound and complete "local" rule set for the problem [9]. A "local formula" is a formula mentioning only terms appearing in the premises or query. A local rule set is one for which any derivable local formula has a derivation using only local intermediate formulas. Closures under local rule sets can immediately be constructed in polynomial time by refusing to infer non-local formulas. Finally, we present results on the integration of ACI function symbols and equality inference rules into more general local rule sets.</p>

	]]>
</description>

<author>Tanji Hu et al.</author>


</item>






<item>
<title>Balancing latency and availability in geo-distributed cloud data stores</title>
<link>http://docs.lib.purdue.edu/ecetr/443</link>
<guid isPermaLink="true">http://docs.lib.purdue.edu/ecetr/443</guid>
<pubDate>Mon, 04 Mar 2013 13:15:35 PST</pubDate>
<description>
	<![CDATA[
	<p>Modern web applications face stringent requirements along many dimensions including latency, scalability, and availability. In response, several geo-distributed cloud storage systems have emerged in recent years. Customizing cloud data stores to meet application SLA requirements is a challenge given the scale of applications, and their diverse and dynamic workloads. In this paper, we tackle these challenges in the context of quorum-based systems (e.g. Amazon Dynamo, Cassandra), an important and widely used class of distributed cloud storage systems. We present models that seek to optimize percentiles of response time under normal operation and under a data-center (DC) failure. Our models consider a variety of factors such as the geographic spread of users, DC locations, relative priorities of read and write requests, application consistency requirements and inter-DC communication costs. We evaluate our models using realworld traces of three popular applications: Twitter,Wikipedia and Gowalla, and through experiments with a Cassandra cluster. Our results confirm the importance and effectiveness of our models, and offer important insights on the performance achievable with geo-distributed data stores.</p>

	]]>
</description>

<author>Shankaranarayanan P N et al.</author>


</item>






<item>
<title>Formal Verification and Planning:  An Evaluation</title>
<link>http://docs.lib.purdue.edu/ecetr/442</link>
<guid isPermaLink="true">http://docs.lib.purdue.edu/ecetr/442</guid>
<pubDate>Wed, 13 Feb 2013 05:34:54 PST</pubDate>
<description>
	<![CDATA[
	<p>We explore the implications of improvements in hardware and knowledge representation for the application of automated reasoning systems to planning. We apply a novel automated-reasoning system with no planningspecific features to questions about the PDDL planning domains Blocksworld and (no airports) Logistics. Our system, with no human interaction and considering no specific problem instances, is able to verify all the key state invariants for both domains. We propose organizing domain reasoning around (currently hand-written) recursive state-predicate– achievement macro-actions, such as a macro to achieve clear(b) in the Blocksworld. Leveraging (somewhat) limited human interaction, our system can completely characterize the effects of executing such recursive macros for each predicate in each domain. In addition, with substantial human interaction, our system can formally verify the solvability of arbitrary Blocksworld and Logistics problems, verifying a human-written generalized plan based on the macros. In each case, no specific problem instances are considered. We loosely meter and qualitatively characterize the human interaction required for the above verifications in order to stimulate research to reduce this benchmark until it is zero. We propose (and where possible estimate the benchmark improvement for) plausible future approaches to reducing interaction, including eliminating the need for hand-definitions of the recursive predicate achievement macros and generalized plans in these domains. Finally, we contrast our reasoning system favorably (for this task) with a widely used verification system, Coq.</p>

	]]>
</description>

<author>Rajesh Kalyanam et al.</author>


</item>






<item>
<title>Improving the Delay Performance of CSMA Algorithms:  A Virtual Multi-Channel Approach</title>
<link>http://docs.lib.purdue.edu/ecetr/441</link>
<guid isPermaLink="true">http://docs.lib.purdue.edu/ecetr/441</guid>
<pubDate>Fri, 11 Jan 2013 11:00:22 PST</pubDate>
<description>
	<![CDATA[
	<p>CSMA algorithms have recently received a significant amount of interest in the literature for designing efficient wireless control algorithms. CSMA algorithms are attractive because they incur low computation complexity and communication overhead, and can be shown to achieve the optimal capacity under certain assumptions. However, it has also been observed that CSMA algorithms suffer the starvation problem and incur large delay that may grow exponentially with the network size. In this paper, we propose a new algorithm, called Virtual-Multi-Channel (VMC-) CSMA, that can dramatically reduce delay without sacrificing the high capacity and low complexity of CSMA. The key idea of VMC-CSMA to avoid the starvation problem is to use multiple virtual channels to emulate a multi-channel system and compute a good set of feasible schedules simultaneously (without constantly switching/re-computing schedules). Under the protocol interference model and a single-hop utility-maximization setting, our proposed VMC-CSMA algorithm can approach arbitrarily close to the optimal total system utility, with both the number of virtual channels and the computation complexity increasing logarithmically with the network size. The VMC-CSMA algorithm inherits the distributed nature of CSMA algorithms. Further, once our algorithm converges to the steady-state, the expected packet delay for each link equals to the inverse of its long-term average rate, and the distribution of its head-of-line (HOL) waiting time can also be asymptotically bounded. Our simulation results confirm that the proposed VMC-CSMA algorithm indeed achieves both high throughput and low delay. Further, it can quickly adapt to network traffic changes.</p>

	]]>
</description>

<author>Po-Kai Huang et al.</author>


</item>






<item>
<title>Gibbs-Sampling-based Optimization for the Deployment of Small Cells in 3G Heterogeneous Networks</title>
<link>http://docs.lib.purdue.edu/ecetr/440</link>
<guid isPermaLink="true">http://docs.lib.purdue.edu/ecetr/440</guid>
<pubDate>Thu, 10 Jan 2013 06:14:45 PST</pubDate>
<description>
	<![CDATA[
	<p>The growing popularity of mobile data services has placed great demands for wireless cellular networks to support higher-throughput. One way to meet the rapidly growing traffic demand is through heterogeneous network deployment, which uses a mixture of macro cells and small cells (i.e., micro-/ pico-cells) to further enhance the spatial reuse and thus improves network throughput. In this paper, we propose a Gibbs-Sampling based optimization method for finding the optimal deployment of a given number of small cells in 3G networks. The Gibbs Sampling method intelligently balances two potentially conflicting considerations of placing small-cell BSs close to hotspots and avoiding interference with the macro-cell BSs & other small cell BSs. We show that it converges to the deployment decision with the maximum total system throughput with high probability. We also describe two low-complexity algorithms, the greedy EcNo and the greedy hotspot algorithms. Both algorithms are widely used in industry and can be used as the benchmark for comparing our Gibbs sampling-based (GSB) design. We have conducted extensive simulations based on real traffic traces from the 3G data network. Our numerical results show that the GSB placement outperforms the greedy solutions. The GSB approach produces 10% higher throughput and 30% higher off-loading factor than the greedy solutions. Since the cost of deploying small nodes could be expensive and each city may need a large number of small nodes, the proposed results thus represent significant cost savings when compared to the existing greedy solutions.</p>

	]]>
</description>

<author>Xiaohang li et al.</author>


</item>






<item>
<title>Nonlinear Dynamic Field Embedding: On Hyperspectral Scene Visualization</title>
<link>http://docs.lib.purdue.edu/ecetr/439</link>
<guid isPermaLink="true">http://docs.lib.purdue.edu/ecetr/439</guid>
<pubDate>Mon, 19 Nov 2012 07:15:31 PST</pubDate>
<description>
	<![CDATA[
	<p>In many areas of research, complex signals are commonly represented by high dimensional feature vectors. However, high dimensional vectors are difficult to analyze and interpret due to the curse of dimensionality. Ongoing research efforts simplify this problem by seeking data representations on a low dimensional space. Traditional dimensionality reduction methods spend little effort toward formulating unifying platforms and very few approaches provide intuitive insights that enable the design of local topology preserving algorithms. This study introduces a new localized bilateral kernel function for computing the high dimensional neighborhood graph of image data. The kernel function injects spatial sensitivity details that allow maintaining disjoint neighborhoods in the embedding space. Furthermore, the study exploits the force field interpretation from mechanics and devise a unifying nonlinear graph embedding framework. The generalized framework leads to novel unsupervised multidimensional artificial field embedding techniques that rely on the simple additive assumption of pair-dependent attraction and repulsion functions. The formulations capture long range and short range distance related effects often associated with living organisms and help to establish algorithmic properties that mimic mutual behavior for the purpose of dimensionality reduction. The main benefits from the proposed models includes the ability to preserve the local topology of data and produce quality visualizations i.e. maintaining disjoint meaningful neighborhoods. As part of evaluation, visualization, gradient field trajectories, and semisupervised classification experiments are conducted for image scenes acquired by multiple sensors at various spatial resolutions over different types of objects. The results demonstrate the superiority of the proposed embedding framework over various widely used methods.</p>

	]]>
</description>

<author>Dalton Lunga et al.</author>


</item>






<item>
<title>Modeling Complexity of Enterprise Routing Design</title>
<link>http://docs.lib.purdue.edu/ecetr/438</link>
<guid isPermaLink="true">http://docs.lib.purdue.edu/ecetr/438</guid>
<pubDate>Wed, 31 Oct 2012 10:37:38 PDT</pubDate>
<description>
	<![CDATA[
	<p>Enterprise networks often have complex routing designs given the need to meet a wide set of resiliency, security and routing policies. In this paper, we take the position that minimizing design complexity must be an explicit objective of routing design. We take a first step to this end by presenting a systematic approach for modeling and reasoning about complexity in enterprise routing design. We make three contributions. First, we present a framework for precisely defining objectives of routing design, and for reasoning about how a combination of routing design primitives (e.g. routing instances, static routes, and route filters etc.) will meet the objectives. Second, we show that it is feasible to quantitatively measure the complexity of a routing design by modeling individual routing design primitives, and leveraging configuration complexity metrics [5]. Our approach helps understand how individual design choices made by operators impact configuration complexity, and can enable quantifying design complexity in the absence of configuration files. Third, we validate our model and demonstrate its utility through a longitudinal analysis of the evolution of the routing design of a large campus network over the last three years. We show how our models can enable comparison of the complexity of multiple routing designs that meet the same objective, guide operators in making design choices that can lower complexity, and enable what-if analysis to assess the potential impact of a configuration change on routing design complexity.</p>

	]]>
</description>

<author>Xin Sun et al.</author>


</item>






<item>
<title>PUMA: Purdue MapReduce Benchmarks Suite</title>
<link>http://docs.lib.purdue.edu/ecetr/437</link>
<guid isPermaLink="true">http://docs.lib.purdue.edu/ecetr/437</guid>
<pubDate>Tue, 30 Oct 2012 09:07:11 PDT</pubDate>
<description>
	<![CDATA[
	
	]]>
</description>

<author>Faraz Ahmad et al.</author>


</item>






<item>
<title>A Comprehensive Theoretical Study on the Rank of the Integral Operators for Large-Scale Electrodynamic Analysis</title>
<link>http://docs.lib.purdue.edu/ecetr/436</link>
<guid isPermaLink="true">http://docs.lib.purdue.edu/ecetr/436</guid>
<pubDate>Mon, 15 Oct 2012 06:52:55 PDT</pubDate>
<description>
	<![CDATA[
	<p>We propose an analytical approach to study the rank of an integral operator, which is valid for an arbitrarily shaped object with an arbitrary electric size. With this analytical approach, we theoretically prove that for a prescribed error bound, the minimal rank of the interaction between two separated geometry blocks in an integral-equation operator, asymptotically, is a constant for 1-D distributions of source and observation points; grows very slowly with electric size as square root of the logarithm for 2-D distributions; and scales linearly with the electric size of the block diameter for 3-D problems. We thus prove the existence of an errorbounded low-rank representation of both surface- and volume-based integral operators for electromagnetic analysis, irrespective of electric size and object shape. Numerical experiments have validated the proposed analytical approach and its resultant findings on the rank of integral operators. This work provides a theoretical basis for employing and further developing low-rank matrix algebra for accelerating the computation of electrically large problems.</p>

	]]>
</description>

<author>Wenwen Chai et al.</author>


</item>






<item>
<title>Sparsifying Defaults:  Optimal Bailout Policies for Financial Networks in Distress</title>
<link>http://docs.lib.purdue.edu/ecetr/435</link>
<guid isPermaLink="true">http://docs.lib.purdue.edu/ecetr/435</guid>
<pubDate>Tue, 18 Sep 2012 06:35:07 PDT</pubDate>
<description>
	<![CDATA[
	
	]]>
</description>

<author>Zhang Li et al.</author>


</item>






<item>
<title>Dark Silicon is Sub-Optimal and Avoidable</title>
<link>http://docs.lib.purdue.edu/ecetr/434</link>
<guid isPermaLink="true">http://docs.lib.purdue.edu/ecetr/434</guid>
<pubDate>Tue, 28 Aug 2012 05:31:33 PDT</pubDate>
<description>
	<![CDATA[
	<p>Several recent papers argue that due to the slowing down of Dennard’s scaling of the supply voltage future multicore performance will be limited by dark silicon where an increasing number of cores are kept powered down due to lack of power. Customizing the cores to improve power efficiency may incur increased effort for hardware design, verification and test, and degraded programmability. In this paper, we show that dark silicon is sub-optimal in performance and avoidable, and that a gentler, evolutionary path for multicores exists. We make the key observations that (1) previous papers examine voltage-frequency-scaled designs on the power-performance Pareto frontier whereas the frontier extends to a new region derived by frequency scaling alone where voltage-scaled designs are infeasible, and (2) because memory latency improves only slowly over generations, performance of future multicores’ workloads will be dominated by memory latency. Guided by these observations and a simple analytical model, we exploit (1) the sub-linear impact of clock speed on performance in the presence of memory latency, and (2) the super-linear impact of throughput on queuing delays. Accordingly, we propose an evolutionary path for multicores, called successive frequency unscaling (SFU). Compared to dark silicon. SFU keeps powered significantly more cores running at clock frequencies on the extended Pareto frontier that are succesively lowered every generation to stay within the power budget. The higher active core count enables more memory-level parallelism, non-linearly offsetting the slower clock and resulting in more performance than that of dark silicon. For memory-intensive workloads, full SFU, where all the cores are powered up, performs 81% better than dark silicon at the 11 nm technology node. For enterprise workloads where both throughput and response times are important, we employ controlled SFU (C-SFU) which moderately slows down the clock and powers many, but not all, cores to achieve 29% better throughput than dark silicon at the 11 nm technology node. The higher throughput non-linearly reduces queuing delays and thereby compensates for the slower clock, resulting in C-SFU’s total response latency to be within +/- 10% of that of dark silicon.</p>

	]]>
</description>

<author>Hamza Bin Sohail et al.</author>


</item>






<item>
<title>A Low-Complexity Congestion Control and Scheduling Algorithm for Multihop Wireless Networks with Order-Optimal Per-Flow Delay</title>
<link>http://docs.lib.purdue.edu/ecetr/433</link>
<guid isPermaLink="true">http://docs.lib.purdue.edu/ecetr/433</guid>
<pubDate>Tue, 07 Aug 2012 04:49:37 PDT</pubDate>
<description>
	<![CDATA[
	<p>Quantifying the end-to-end delay performance in multihop wireless networks is a well-known challenging problem. In this paper, we propose a new joint congestion control and scheduling algorithm for multihop wireless networks with fixedroute flows operated under a general interference model with interference degree K. Our proposed algorithm not only achieves a provable throughput guarantee (which is close to at least 1=K of the system capacity region), but also leads to explicit upper bounds on the end-to-end delay of every flow. Our end-to-end delay- and throughput-bounds are in simple and closed forms, and they explicitly quantify the tradeoff between throughput and delay of every flow. Further, the per-flow end-to-end delay bound increases linearly with the number of hops that the flow passes through, which is order-optimal with respect to the number of hops. Unlike traditional solutions based on the backpressure algorithm, our proposed algorithm combines windowbased flow control with a new rate-based distributed scheduling algorithm. A key contribution of our work is to use a novel stochastic dominance approach to bound the corresponding perflow throughput and delay, which otherwise are often intractable in these types of systems. Our proposed algorithm is fully distributed and requires a low per-node complexity that does not increase with the network size. Hence, it can be easily implemented in practice.</p>

	]]>
</description>

<author>Po-Kai Huang et al.</author>


</item>






<item>
<title>Inter-Session Network Coding Schemes for Two Unicast Sessions with Sequential Hard Deadline Constraints</title>
<link>http://docs.lib.purdue.edu/ecetr/432</link>
<guid isPermaLink="true">http://docs.lib.purdue.edu/ecetr/432</guid>
<pubDate>Mon, 23 Jul 2012 05:37:52 PDT</pubDate>
<description>
	<![CDATA[
	<p>The emerging wireless media delivery services have placed greater demands for wireless networks to support highthroughput applications while minimizing the delay of individual packets. In this paper, we investigate using inter-session network coding to send packets wirelessly for two deadline-constrained unicast sessions. Specifically, each unicast session aims to transmit a stored video file, whose packets have hard sequential deadline constraints. We first characterize the corresponding deadlineconstrained capacity region under heterogeneous channel conditions and heterogeneous deadline constraints. We show that this deadline-constrained capacity region can be achieved asymptotically by modifying the existing generation-based schemes. Despite its asymptotic optimality, the generation-based scheme has poor performance and high complexity in the practical regime small & medium file sizes. To address these problems, we further develop new immediately-decodable network coding (IDNC) schemes that admit superior performance in the practical regime while being provably optimal in the asymptotic regime. In contrast to the existing delay/deadline-based IDNC results, which focus on a single multicast session (intra-session network coding) with homogeneous channel conditions, our new IDNC design takes full account of channel heterogeneity and provides the first rigorous asymptotic optimality analysis for two unicasts with (potentially heterogeneous) hard deadline constraints.</p>

	]]>
</description>

<author>Xiaohang Li et al.</author>


</item>






<item>
<title>Solution to the Electric Field Integral Equation at Arbitrarily Low Frequencies</title>
<link>http://docs.lib.purdue.edu/ecetr/431</link>
<guid isPermaLink="true">http://docs.lib.purdue.edu/ecetr/431</guid>
<pubDate>Thu, 03 May 2012 11:29:14 PDT</pubDate>
<description>
	<![CDATA[
	<p>The low-frequency breakdown problem in electric field integral equation (EFIE) has been well recognized and extensively studied. State of the art methods for solving this problem either reformulate the integral equations or introduce a different set of basis functions. The solution to the original full-wave EFIE with the Rao-Wilton-Glisson (RWG)-basis remains unknown at breakdown frequencies. The contribution of this work is the solution to the original RWG-basis based EFIE at an arbitrarily low frequency including DC. This solution is obtained by deriving a closed-form expression of the inverse of the EFIE system matrix, which is rigorous from high down to any low frequency. We also overcome the lowfrequency breakdown caused by the loss of the frequency dependence of the right hand side vector in scattering analysis and the same loss in Green’s function in RCS computation. In addition, we develop a fast solution that eliminates the lowfrequency breakdown of the EFIE in a reduced system of O(1). Instead of introducing additional computational cost to fix the low-frequency breakdown problem, the proposed fast O(1) solution speeds up low-frequency computation. Numerical experiments in inductance, capacitance, and RCS extraction at very low frequencies including DC have demonstrated both accuracy and efficiency of the proposed method.</p>

	]]>
</description>

<author>Jianfang Zhu et al.</author>


</item>






<item>
<title>Automatically Enhancing Locality for Tree Traversals with Traversal Splicing</title>
<link>http://docs.lib.purdue.edu/ecetr/429</link>
<guid isPermaLink="true">http://docs.lib.purdue.edu/ecetr/429</guid>
<pubDate>Wed, 15 Feb 2012 05:46:15 PST</pubDate>
<description>
	<![CDATA[
	<p>Generally applicable techniques for improving locality in irregular programs, which operate over pointer-based data structures such as trees and graphs, are scarce. Focusing on a subset of irregular programs, namely, tree traversal algorithms like Barnes-Hut and nearest neighbor, recent work has proposed point blocking, a technique analogous to loop tiling in regular algorithms, to improve locality. However, point blocking requires that programs be “pre-optimized” using application-specific techniques to be effective. In this work, we identify the root cause of point blocking’s poor performance on baseline irregular algorithms, and propose traversal splicing, a new, general, automatic locality optimization for irregular tree traversal codes, that addresses these drawbacks. For four benchmark algorithms, we show that traversal splicing can deliver substantial single-thread performance improvements of up to 338% (geometric mean: 138%) over baseline implementations, and up to 112% (geometric mean: 77%) over point-blocked implementations. Further, we show that in many cases, applying traversal splicing to a baseline implementation yields performance that is competitive with carefully hand-optimized implementations.</p>

	]]>
</description>

<author>Youngjoon Jo et al.</author>


</item>






<item>
<title>MigrantStore: Leveraging Virtual Memory in DRAM-PCM Memory Architecture</title>
<link>http://docs.lib.purdue.edu/ecetr/428</link>
<guid isPermaLink="true">http://docs.lib.purdue.edu/ecetr/428</guid>
<pubDate>Wed, 01 Feb 2012 13:06:45 PST</pubDate>
<description>
	<![CDATA[
	<p>With the imminent slowing down of DRAM scaling, Phase Change Memory (PCM) is emerging as a lead alternative for main memory technology. While PCM achieves low energy due to various technology-specific advantages, PCM is significantly slower than DRAM (especially for writes) and can endure far fewer writes before wearing out. Previous work has proposed to use a large, DRAM-based hardware cache to absorb writes and provide faster access. However, due to ineffectual caching where blocks are evicted before sufficient number of accesses, hardware caches incur significant overheads in energy and bandwidth, two key but scarce resources in modern multicores. Because using hardware for detecting and removing such ineffectual caching would incur additional hardware cost and complexity, we leverage the OS virtual memory support for this purpose. We propose a DRAM-PCM hybrid memory architecture where the OS migrates pages on demand from the PCM to DRAM. We call the DRAM part of our memory as MigrantStore which includes two ideas. First, to reduce the energy, bandwidth, and wear overhead of ineffectual migrations, we propose migration hysteresis. Second, to reduce the software overhead of good replacement policies, we propose recentlyaccessed- page-id (RAPid) buffer, a hardware buffer to track the addresses of recently-accessed MigrantStore pages.</p>

	]]>
</description>

<author>Hamza Bin Sohail et al.</author>


</item>






<item>
<title>Distributed Online Channel Assignment Toward Optimal Monitoring in Multi-Channel Wireless Networks</title>
<link>http://docs.lib.purdue.edu/ecetr/427</link>
<guid isPermaLink="true">http://docs.lib.purdue.edu/ecetr/427</guid>
<pubDate>Thu, 12 Jan 2012 04:54:22 PST</pubDate>
<description>
	<![CDATA[
	<p>This paper studies an optimal channel assignment problem for passive monitoring in multi-channel wireless networks, where a set of sniffers capture and analyze the network traffic to monitor the network. The objective of this problem is to maximize the total amount of traffic captured by sniffers by judiciously assigning the radios of sniffers to a set of channels. This problem is NP-hard, with the computational complexity growing exponentially with the number of sniffers. We develop distributed online solutions to this problem for large-scale and dynamic networks. Prior works have attained constant factor of 1 − 1 e of the maximum monitoring coverage in a centralized setting. Our algorithm preserves the same ratio while providing a distributed solution that is amenable to online implementation. Also, our algorithm is cost-effective, in terms of communication and computational overheads, due to the use of only local communication and the adaptation to incremental network changes. We present two operational modes of our algorithm for two types of networks that have different rates of network changes. One is a proactive mode for fast varying networks, while the other is a reactive mode for slowly varying networks. Simulation results demonstrate the effectiveness of the two modes of our algorithm.</p>

	]]>
</description>

<author>Dong-Hoon Shin et al.</author>


</item>





</channel>
</rss>
