CSE Technical Reports Sorted by Technical Report Number

TR Number

Title

Authors

Date

Pages

CSE-TR-01-88

A Holographic File System for a Multiple with many Disk Nodes

Barak Galler and Farber

May, 88

CSE-TR-02-88

Range Estimation from Intensity Gradient Analysis

Skifstad and Jain

Mar, 88

CSE-TR-03-88

Polynomial Methods for Structure from Motion

Jerian and Jain

Mar, 88

CSE-TR-04-88

Structure from Motion - A Critical Analysis of Methods

Jerian and Jain

Mar, 88

CSE-TR-05-88

Using Dynamic Programming for Solving Variational Problems in Vision

Amini Weymouth and Jain

Mar, 88

CSE-TR-06-88

High performance Communications for Hypercube Multiprocessors

Buzzard

Mar, 88

175

CSE-TR-07-88

Building an Environment Model Using Depth Information

Roth Tabek and Jain

Sep, 88

CSE-TR-08-88

An Attribute-Based Service-Request Mechanism for Heterogeneous Distributed Systems

Ravishankar and Chang

Mar, 88

CSE-TR-09-88

Discontinuity-Perserving Surface Reconstruction in Vision Processing

Sinha and Schunck

Mar, 88

CSE-TR-10-88

A Formal Metalanguage for Nuprl

Knoblock

Mar, 88

CSE-TR-11-89

A Linear Approach to Updating in Logic-Based Models of Manufacturing Systems and Other Discrete-Event Systems

Naylor and Shadmehr

Jan, 89

CSE-TR-12-89

A Microprogrammable VLSI Routing Controller for HARTS

Dolter Ramanathan and Shin

Mar, 89

CSE-TR-13-89

Robust Surface Approximation Using Least Median Squares Regression

Tirumalai and Schunck

Mar, 89

CSE-TR-14-89

Linguistic Support for Dataflow

Ravishankar and Finkel

Mar, 89

CSE-TR-15-89

The Engineering of XCON

Bachant and Soloway

Mar, 89

CSE-TR-16-89

Architecture and Application of a Segmented Array Processor with Dynamically Reconfigurable Links

Chang and Shin

Mar, 89

CSE-TR-17-89

Optimal Rollback Recovery with Checkpointing and Damage Assessment for Concurrent Processes

Lin and Shin

Mar, 89

CSE-TR-18-89

Multi-Project Scheduling Using an Enumerative Method

Peng and Shin

Mar, 89

CSE-TR-19-89

Bus and Cache Memory Organizations for Multiprocessors

Winsor

Jun, 89

CSE-TR-20-89

Analysis of Minimal Path Routing Schemes in the Presence of Faults

Gordon

Mar, 89

CSE-TR-21-89

Graphs Embeddings and Their Applications

Sherlekar and JaJa

Mar, 89

CSE-TR-22-89

Balanced Graph Dissections and Layouts for Herarchical VLSI Layout Design

Sherlekar and JaJa

Mar, 89

CSE-TR-23-89

Automated Knowledge Acquisition for a Computer Hardware Synthesis System

Birmingham Siewiorek

Mar, 89

CSE-TR-24-89

The Complex Behavior of Simple Machines

Machlin and Stout

Jul, 89

CSE-TR-25-89

Design Implementation and Application of Synthetic Workload Generators for Real-Time Systems

Kiskis Woodbury and Shin

Mar, 89

CSE-TR-26-89

Distributed Genetic Algorithms for Function Optimization

Tanese

Mar, 89

168

CSE-TR-27-89

Computing Oriented Texture Fields

Rao and Schunck

Mar, 89

CSE-TR-28-89

Multi-Project Scheduling Using an Enumerative Method

Peng and Shin

Mar, 89

CSE-TR-29-89

A Parallel Technique for Signal-Level Perceptual Organization

Liou A.Chiu and Jain

Mar, 89

CSE-TR-30-89

A Hexagonal Array Machine for Multi-Layer Wire Routing

Venkateswaran and Mazumder

Mar, 89

CSE-TR-31-89

Analysis and Design of Latch-Controlled Synchronous Digital Circuits

Sakallah Mudge and Olukotun

Mar, 89

CSE-TR-32-89

Generating Aspect Graphs for Curved Objects

Sripradisvarakul and Jain

Mar, 89

CSE-TR-33-89

A New Performance Measure for Scheduling Independent Real-Time Tasks

Peng and Shin

Mar, 89

CSE-TR-34-89

Reasoning in Early VIsion

Lu and Jain

Mar, 89

CSE-TR-35-89

Failure Recovery in the MICON System

Daga and Birmingham

Mar, 89

CSE-TR-36-89

Failure Detection in the MICON System

Daga and Birmingham

Mar, 89

CSE-TR-37-89

Ignorance and Myopia in Computer Vision Systems

Jain and Binford

Mar, 89

CSE-TR-38-89

Single Query Optimization in Multicomputer Database Systems

Padmanabhan and Baru

Mar, 89

CSE-TR-39-89

Failure Handling in the MICON System

Daga

Mar, 89

CSE-TR-40-90

Gate Matrix Layout Techniques

Chakravarthy and Mazumder

Mar, 89

CSE-TR-41-90

Semantics-First Natural Language Processing

Lytinen

Mar, 89

CSE-TR-42-90

A GaAs Micro-Supercomputer: Rationale and Design

Brown Dykstra Mudge and Milano

Mar, 89

CSE-TR-43-90

Scanline Rendering for 3-Axis NC-Toolpath Generation Simulation and Verification

Thomas

Mar, 89

CSE-TR-44-90

Building Knowledge-AcquisitionTools

Birmingham and Klinker

Mar, 89

CSE-TR-45-90

Distributed Editor: A Toolkit for Collaborative Editing of Files in Distributed Environments

Prakash and Knister

Mar, 89

CSE-TR-46-90

Language Support for an Abstract View of Network Services

Chang and Ravishankar

Mar, 89

CSE-TR-47-90

A Timing Model of Synchronous Digital Circuits

Sakallah Mudge and Olukotun

Mar, 89

CSE-TR-48-90

CSE-TR-49-90

Null Nonterminal Insertion in LR(k) Grammars

Anderson

Mar, 89

CSE-TR-50-90

A Coordinated Location Policy for Load Sharing in Hypercube Multicomputers

Shin and Y.Chang

Mar, 89

CSE-TR-51-90

Evaluation and Application of Neural Network Models

Fisher Jain Maloney and Wiitanen

Mar, 89

CSE-TR-52-90

An Efficient Reconfiguration Algorithm for Restructuring Hexagonal Arrays

Venkateswaran and Mazumder

Mar, 89

CSE-TR-53-90

Environment Model for Mobile Robots Indoor Navigation

Tabek and Weymouth

Mar, 89

CSE-TR-54-90

CSE-TR-55-90

Integration of Detailed Timing Constraints in Control Flow Graphs

Abramson Birmingham and Guthy

Mar, 89

CSE-TR-56-90

CSE-TR-57-90

A Spatially Variant Image Smoothing Method Using Regularization

Wang and Schunck

Mar, 89

CSE-TR-58-90

On Robust Edge Detection

Liu Schunck and Meyer

Mar, 89

CSE-TR-59-90

Robust Statistical Methods for Building Classification Procedures

Chen and Schunck

Apr, 90

CSE-TR-60-90

Robust Computational Vision

Schunck

May, 90

CSE-TR-61-90

Optimum Boundary Interpolation Using Piecewise Cubic Polynomials with Tanget Slopes

Tehrani Weymouth and Schunck

May, 90

CSE-TR-62-90

Fitting Cubic Spline Contours to Edge Points with Unreliable Edge Orientations

Tehrani Weymouth and Schunck

May, 90

CSE-TR-63-90

Knowledge-Guided Left Ventricular Boundary Detection

Tehrani Weymouth and Mancini

May, 90

CSE-TR-64-90

MICE Users Guide

Montgomery and Durfee

Jun, 90

CSE-TR-65-90

Optimal Clocking of Synchronous Systems

Sakallah Mudge and Olukotun

Jun, 90

CSE-TR-66-90

Parallel Language Constructs for Efficient Parallel Processing

Clapp and Mudge

Jun, 90

CSE-TR-67-90

Evidential Reasoning for Building Environment Maps

Tirumalai Jain and Schunck

Jun, 90

CSE-TR-68-90

Dynamic Stereo with Self-Calibration

Tirumalai Schunck and Jain

Jun, 90

CSE-TR-69-90

Reliable Broadcast Algorithms for HARTS

Kandlur and Shin

Aug, 90

CSE-TR-70-90

An On-Chip Double-Bit Error-Correcting Code for a Three Dimensional Dynamic Random-Access Memory

Mazumder

Aug, 90

CSE-TR-71-90

Algorithms for Timing Verification and Optimal Clocking of Synchronous Digital Circuits

Sakallah Mudge and Olukotun

Sep, 90

CSE-TR-72-90

Soars User's Manual Version 5.2

Laird Congdon Altmann and Swedlow

Oct, 90

224

CSE-TR-73-90

Designing and Reconfiguring Fault-Tolerant Multiprocessor Systems

Dutt

Mar, 90

213

CSE-TR-74-90

Fast High-Level Simulation of Shared-Memory Multiprocessor Systems

Lin and Abraham

Mar, 90

CSE-TR-75-90

Exploiting Topological Information to Speedup Concurrent Simulation of Parallel Programs

Lin and Abraham

Mar, 90

CSE-TR-76-90

A Knowledge-Level Analysis of Several Design Tools

Balkany Birmingham and Tommelein

Mar, 90

CSE-TR-77-90

Enhancing Efficiency and Modularity in Join Query Optimization

Yoo and Lafortune

Dec, 90

CSE-TR-78-91

Attribute-Based Naming of Files

Sechrest

Jan, 91

CSE-TR-79-91

The Hyperfuile Model and a Hyperfile Service

Sechrest Adamson and Park

Jan, 91

CSE-TR-80-91

Deterministic Autonomous Systems

Covrigaru and Lindsay

Jan, 91

CSE-TR-81-91

Design of Fault-Tolerant Hypercube Computers

Lee

Dec, 90

165

CSE-TR-82-91

An Efficient Method for Representation and Integration of External Timing Constraints

Abramson and Birmingham

Nov, 90

CSE-TR-83-91

Binding and Scheduling under External Timing Constraints: The ARIEL Synthesis System

Abramson and Birmingham

Jan, 91

CSE-TR-84-91

A Service Acquisition Mechanism for the Client/Service Model in Cygnus

Chang and Ravishankar

Jan, 91

CSE-TR-85-91

Multi-Dimensional Extensions to Adaptive Data Partitioning

Hudak and Abraham

Feb, 91

CSE-TR-86-91

Conditional Knowledge Approach to Optimistic Distributed Simulations

Prakash and Subramanian

Feb, 91

CSE-TR-87-91

Memory Architectures for Vector Processing

Raghavan

Mar, 91

145

CSE-TR-88-91

WAM Algebras - A Mathematical Study of Implementation Part II

Boerger and Rosenzweig

Apr, 91

CSE-TR-89-91

An Analysis of Prolog Database Views and their Uniform Implementation

Boerger and Rosenzweig

Apr, 91

CSE-TR-90-91

Routing Algorithms for VLSI Design

Venkateswaran and Mazumder

Mar, 91

108

CSE-TR-91-91

Context-Based Caching for Interactive Visual Simulations

Roth and Jain

Apr, 91

CSE-TR-92-91

Algorithms for Real-Time Scheduling Problems

Walden and C.Ravishankar

Apr, 91

CSE-TR-93-91

Exact Layout Area Minimization of Static CMOS Cells

Maziasz

Apr, 91

206

CSE-TR-94-91

Integration of Texternal Timing Constraints into the High Level Synthesis Process: The ARIEL Synthesis System

Abramson

Apr, 91

CSE-TR-95-91

Software Change Analysis via Attributed Graph-Based Representations

Al-Zoubi and Prakash

May, 91

CSE-TR-96-91

A Concurrency Control Protocol for Nested Transactions

Lo and C.Ravishankar

May, 91

CSE-TR-100-91

Correcting and Extending Domain Knowledge Using Outside Guidance

Laird Hucka Yager and Tuck

Aug, 91

CSE-TR-101-91

Distributed Constraint Optimization as a Formal Model of Partially Adversarial Cooperation

Yokoo and Durfee

Aug, 91

CSE-TR-102-91

Distributed Constraint Satisfaction for Formalizing Distributed Problem Solving

Yokoo Durfee Ishida and Kuwabara

Aug, 91

CSE-TR-103-91

Exact Timing Analysis of Combinational Circuits

Silva Sakallah and Vidigal

Aug, 91

CSE-TR-104-91

Automating the Design of Computer Systems

Gupta Birmingham and Siewiorek

Aug, 91

CSE-TR-105-91

Hypothesize-and-Test Method for Depth Map Construction using Monocular Vision and Infrared Range Sensor

Fujii Wehe and Weymouth

Sep, 91

CSE-TR-106-91

Derivation and Application of Hard Deadlines for Real-Time Control Systems

Shin and Kim

Sep, 91

CSE-TR-107-91

Model-Driven Pose Correction - Techniques and Experiments

Roth Wu Arpaci Weymouth and Jain

Sep, 91

CSE-TR-108-91

Attentional Modeling of Object Identification and Search

Wiesmeyer and Laird

Oct, 91

CSE-TR-109-91

Design of a Communication Subsystem for HARTS

Kandlur and Shin

Oct, 91

CSE-TR-110-91

Verification versus Discovery in Vision-Based Systems

Roth and Jain

Nov, 91

CSE-TR-111-91

Efficient Simulation of Multiple Cache Configurations Using Binomial Trees

Sugumar and Abraham

Nov, 91

CSE-TR-112-91

User-Level Physical Memory Management for Mach

Sechrest and Park

Nov, 91

CSE-TR-113-91

Blending Hierarchical and Attribute-Based File Naming

Sechrest and McClennen

Oct, 91

CSE-TR-114-91

Correction Imperfect Domain Theories: A Knowledge-Level Analysis

Huffman Pearson and Laird

Nov, 91

CSE-TR-115-91

Ultra-Fast Expected Time Parallel Algorithms

MacKenzie and Stout

Nov, 91

CSE-TR-116-91

Accommodating RPC Heterogeneities Using Automatic Agent Synthesis

Huang and C.Ravishankar

Nov, 91

CSE-TR-117-91

Two-Level Adaptive Training Branch Prediction

Yeh and Patt

Jul, 91

CSE-TR-118-91

Perception Projection and Reaction in External Environments in Soar: Initial Approaches

Huffman and Laird

Nov, 91

CSE-TR-119-92

A Scene-Based Multi-Level Representation for Mobile Robot Spatial Mapping and Navigation

Kortenkamp Weymouth Chown and Kaplan

Jan, 92

CSE-TR-120-92

Vector Field Analysis for Oriented Patterns

Shu and Jain

Feb, 92

CSE-TR-121-92

Query Optimization by Intelligent Search

Yoo

Mar, 92

144

CSE-TR-122-92

Automatically Generating Bus-Interface Models

Leong and Birmingham

Feb, 92

CSE-TR-123-92

An Operator-Based Model of Human Covert Visual Attention

Wiesmeyer

Feb, 92

130

CSE-TR-124-92

Situation Caching

Roth G.Zhang and Jain

Mar, 92

CSE-TR-125-92

Undoing Actions in Collaborative Work

Prakash and Knister

Mar, 92

CSE-TR-126-92

Almost Tag-Free Garbage Collection for Strongly-Typed Object-Oriented Languages

Cheong

Apr, 92

CSE-TR-127-92

Optimal Part Selection

Haworth Birmingham and Haworth

May, 92

CSE-TR-128-92

An Investigation of the Performance of Alternative Multipressor Node Architectures

Marsa and Patt

May, 92

CSE-TR-129-92

Performance Bounds and Buffer Space Requirements for Concurrent Processors

Mangione-Smith

May, 92

114

CSE-TR-130-92

Constructing Symmetric Fault-Tolerant and Near-Optiman Quorums

Ng and C.Ravishankar

Jun, 92

CSE-TR-131-92

Accommodating RPC Heterogeneities Using Automatic Agent Synthesis

Huang and Ravishankar

Jun, 92

CSE-TR-132-92

Content-Based Modeling in Multimedia Information Systems

Swanberg Weymouth and Jain

Jun, 92

CSE-TR-133-92

An Interactive Image Management System for Face Information Retrieval

Bach Paul and Jain

Jun, 92

CSE-TR-134-92

Selection on the Reconfigurable Mesh

Hao MacKenzie and Stout

Jun, 92

CSE-TR-135-92

Data Placement in Shared-Nothing Parallel Database Systems

Padmanadhan

Apr, 92

175

CSE-TR-136-92

Generation of Synthetic Workloads for Distributed Real-Time Computing Systems

Kiskis

Apr, 92

115

CSE-TR-137-92

Imaging Models and Surface Recovery Methods for Scanning Probe Microscopy

Pingali and Jain

Apr, 92

CSE-TR-138-92

Emergence of Meta-Level Control in Multi-Tasking Autonomous Agents

Covrigaru

Apr, 92

CSE-TR-139-92

Knowledge Acquisition in the Small

Runkel and Birmingham

Apr, 92

CSE-TR-140-92

Balanced Boolean Functions: Theory and Applications

Chakrabarty

Sep, 92

107

CSE-TR-141-92

The Evolving Algebra Semantics of C Preliminary Version

Gurevich and Huggins

Oct, 92

CSE-TR-142-92

Specification of Interface Behaviour for the Automatic Generation of Bus-Interface Models

Daga and Birmingham

Oct, 92

CSE-TR-143-92

Efficient Simulation of Caches under Optimal Replacement with Applications to Miss Characterization

Sugumar and Abraham

Oct, 92

CSE-TR-144-92

Towards Optimal System-Level Design

Haworth and Birmingham

Oct, 92

CSE-TR-145-92

Symbolic-Timing-Equation Generation from a High-Level Specification of Interface Behavior

Daga and Birmingham

Oct, 92

CSE-TR-146-92

Shortest-Path Routing in Homogeneous Point-to-Point Networks with Virtual Cut-Through Switching

Rexford and Shin

Nov, 92

CSE-TR-147-92

Monster: A Tool for Analyzing the Interaction Between Operating Systems and Computer Architectures

Nagle Uhlig and Mudge

Nov, 92

CSE-TR-148-92

Design Views for Synthesis: Providing Both Uniform Data Integration and Diverse Data Customization

Rundensteiner

Nov, 92

CSE-TR-149-92

Concurrent Engineering: An Automated Design-Space Exploration Approach

Darr and Birmingham

Nov, 92

CSE-TR-150-92

Expected Deadlock Time in a Multiprocessing Systems

Compton and Ravishankar

Nov, 92

CSE-TR-151-92

Any-Dimension Algorithms and Real-Time AI

Musliner Durfee and Shin

Dec, 92

CSE-TR-152-93

Hardware Support for Hiding Cache Latency

Golden and Mudge

Feb, 93

CSE-TR-153-93

A Single Symbolic Algorithm for Incremental Concept Acquisition

Miller and Laird

Jan, 93

CSE-TR-154-93

Real-Time Fault-Tolerant Communication in Computer Networks

Zheng

Feb, 93

102

CSE-TR-155-93

MDARTS: A Multiprocessor Database Architecture for Real-Time Systems

Lortz and Shin

Mar, 93

CSE-TR-156-93

Software TLB Management in OSF/1 and Mach 3.0

Uhlig Nagle Mudge and Sechrest

Dec, 92

CSE-TR-157-93

Modeling Concept Acquisition in the Context of a Unified THeory of Cognition

Miller

Jun, 93

CSE-TR-158-93

Design Tradeoffs for Software-Managed TLBs

Nagle Uhlig Stanley and Sechrest

Apr, 12

CSE-TR-159-93

Analysis of Memory Latency Factors and their Impact on KSR1 MPP Performance

Kahhaleh

Apr, 93

CSE-TR-160-93

Identification of Critical paths in Circuit with Level-Sensitive Latches

Burks Sakallah and Mudge

Apr, 93

CSE-TR-161-93

Surface Fitting for Industrial Applications

Schunck Rogers and Sinha

Apr, 93

CSE-TR-162-93

The Evolving Algebra Semantics of COBOL

Vale

Apr, 93

CSE-TR-163-93

An Information Model for Genome Map Representation and Assembly

Lee Rundensteiner Thomas and Lafortune

May, 93

CSE-TR-164-93

Modeling and Approaching the Deliverable Performance Capability of the KSR1 Processor

Azeem

Jun, 93

CSE-TR-165-93

First-Order Logic Models for Real-Time Discrete-Event Systems

Naylor

May, 93

CSE-TR-166-93

Min-Max Linear Programming

Burks and Sakallah

Aug, 93

CSE-TR-167-93

A Scalable Board-Level-Timing Verification Methodology

Daga and Birmingham

Aug, 93

CSE-TR-168-93

VITCh: A Methodology for the Timing Verification of Board-Level Circuits

Daga and Birmingham

Aug, 93

CSE-TR-169-93

Cyclic Job Shop Scheduling Using Collision Vectors

Chaar and Davidson

Aug, 93

CSE-TR-170-93

A Universal RPC Toolkit

Huang and Ravishankar

Aug, 93

CSE-TR-171-93

Cicero: A Protocol Construction Language

Huang and Ravishankar

Aug, 93

CSE-TR-172-93

Task Allocation and Redistribution in Distributed Real-Time Systems

Hou

Aug, 93

222

CSE-TR-173-93

Multi-Configuration Simulation Algorithms for the Evaluation of Computer Architecture Designs

Sugumar and Abraham

Aug, 93

143

CSE-TR-174-93

Automated Design for Concurrent Engineering

Darr and Birmingham

Sep, 93

CSE-TR-175-93

CIRCA: The Coopeartive Intelligent Real-Time Control Architecture

Musliner

Sep, 93

170

CSE-TR-176-93

Semaphore Queue Priority Assignment for Real-Time Multiprocessor Synchronization

Lortz and Shin

Sep, 93

CSE-TR-177-93

Tailoring Recursion for Complexity

Gradel and Gurevich

Sep, 93

CSE-TR-178-93

Search-Space Pruning Heuristics for Path Sensitization in Test Pattern Generation

Silva and Sakallah

Oct, 93

A powerful combinational path sensitization engine is required for the efficient implementation of tools for test pattern generation, timing analysis, and delay-fault testing. Path sensitization can be posed as a search, in the n-dimensional Boolean space, for a consistent assignment of logic values to the circuit nodes which also satisfies a given condition. While the conditions for path sensitization are different for different applications, the search mechanism need not be. In this paper we propose and demonstrate the effectiveness of several new deterministic techniques for search-space pruning for test pattern generation. These techniques are based on a dynamic analysis of the search process and can be viewed as extensions of methods that were introduced in FAN and SOCRATES. In particular, we present linear-time algorithms for dynamically identifying unique sensitization points and for dynamically maintaining reduced head line sets. In addition, we present two powerful mechanisms that drastically reduce the number of backtracks: failure-driven assertions and dependency-directed backtracking. Both mechanisms can be viewed as a form of learning while searching and have analogs in other application domains. These search pruning methods have been implemented in a generic path sensitization engine called LEAP. A test pattern generator, TG-LEAP, that uses this engine was also developed. We present experimental results that compare the effectiveness of our proposed search pruning strategies to those of PODEM, FAN, and SOCRATES. In particular, we show that LEAP is very efficient in identifying redundant faults and in generating tests for difficult faults.

CSE-TR-179-93

The Minimization and Decomposition of Interface State Machines

Daga and Birmingham

Oct, 93

CSE-TR-180-93

SPIDER: Flexible and Efficient Communication Support for Point-to-Point Distributed Systems

Dolter Daniel Mehra Rexford Feng and Shin

Oct, 93

CSE-TR-181-93

Striping in a RAID Level 5 Disk Array

Chen and Lee

Oct, 93

CSE-TR-182-93

Two-Level Adaptive Branch Prediction and Instruction Fetch Mechanisms for High Performance Superscalar Processors

Yeh

Oct, 93

101

CSE-TR-183-93

Aggressive Execution Engines for Surpassing Single Basic Block Execution

Butler

Oct, 93

CSE-TR-184-93

Optimal Allocation of On-Chip Memory for Multiple-API Operating Systems

Nagle Uhlig Mudge and Sechrest

Nov, 93

CSE-TR-185-93

Kernel-Based Memory Simulation

Nagle Uhlig Mudge and Sechrest

Nov, 93

CSE-TR-186-93

Is Deduction Enough

Farag

Feb, 91

CSE-TR-187-93

The Right Information at the Right Time: A Scientific Information Tracking System

Farag

Sep, 92

CSE-TR-188-93

Knowledge Flexibility in a Learning System

Farag

Dec, 90

CSE-TR-189-93

SEIF: A System for Learning Adaptable Actions

Farag

Jun, 89

CSE-TR-190-93

The Semantics of the C++ Programming Language

Wallace

Nov, 93

CSE-TR-191-93

Implementation Experience with Building an Object-Oriented View Management System

Kuno and Rundensteiner

Aug, 93

CSE-TR-192-93

DOCTOR: An IntegrateD SOftware Fault InjeCTiOn EnviRonment

Han Rosenberg and Shin

Dec, 93

CSE-TR-193-94

Instructable Autonomous Agents

Huffman

Jan, 94

142

CSE-TR-194-94

Scheduling for Modern Disk Drives and Non-Random Workloads

Worthington Ganger and Patt

Mar, 94

CSE-TR-195-94

Effect of Fan-Out on the Performance of a Single-Message Cancellation Scheme

Prakash Wu and Jetli

Feb, 94

CSE-TR-196-94

Undoing Actions in Collaborative Work: Framework and Experience

Prakash and Knister

Mar, 94

CSE-TR-197-94

Architectural Support for Managing Communication in Point-to-Point Distributed Systems

Feng Rexford Mehra Daniel Dolter and Shin

Mar, 94

CSE-TR-198-94

Optimal Local Register Allocation for a Multiple-Issue Machine

Meleis and Davidson

Mar, 94

CSE-TR-199-94

Schema Evolution for Real-Time Object-Oriented Databases

Zhou Rundensteiner and Shin

Mar, 94

CSE-TR-200-94

Unix I/O Performance in Workstations and Mainframes

Chen and Patterson

Mar, 94

Rapid advances in processor performance have shifted the performance bottleneck to I/O systems. The relatively slow rate of improvement in I/O is due in part to a lack of quantitative performance analysis of software and hardware alternatives. Using a new self-scaling I/O benchmark, we provide such an evaluation for 11 hardware configurations using 9 variations of the Unix operating system. In contrast to processor performance comparisons, where factors of 2 are considered large, we find differences of factors of 10 to 100 in I/O systems. The principal performance culprits are the policies of different Unix operating systems; some policies on writes to the file cache will cause processors to run at magnetic disk speeds instead of at main memory speeds. These results suggest a greater emphasis be placed on I/O performance when making operating system policy decisions.

CSE-TR-201-94

Evaluation of Fault-Tolerance Latency from Real-Time Application's Perspectives

Kim and Shin

Mar, 94

CSE-TR-202-94

Ravel-XL: A Hardware Accelerator for Assigned-Delay Compiled-Code Logic Gate Simulation

Riepe Silva Sakallah and Brown

Mar, 94

Ravel-XL is a single-board hardware accelerator for gate-level digital logic simulation. It uses a standard levelized-code approach to statically schedule gate evaluations. However, unlike previous approaches based on levelized-code scheduling, it is not limited to zero- or unit-delay gate models and can provide timing accuracy comparable to that obtained from event-driven methods. We review the synchronous waveform algebra that forms the basis of the Ravel-XL simulation algorithm, present an architecture for its hardware realization, and describe an implementation of this architecture as a single VLSI chip. The chip has about 900,000 transistors on a die that is approximately 1.4cm^2, requires a 256-pin package and is designed to run at 33MHz. A Ravel-XL board consisting of the processor chip and local instruction and data memory can simulate up to one billion gates at a rate of approximately 6.6 million gate evaluations per second. To better appreciate the tradeoffs made in designing Ravel-XL, we compare its capabilities to those of other commercial and research software simulators and hardware accelerators.

CSE-TR-203-94

IDtrace - A Tracing Tool for i486 Simulation

Pierce and Mudge

Mar, 94

CSE-TR-204-94

Data and Program Restructuring of Irregular Applications for Cache-Coherent Multiprocessors

Tomko and Abraham

Mar, 94

CSE-TR-205-94

Reduction of Cache Interference Misses Through Selective Bit-Permutation Mapping

Abraham and Agusleo

Mar, 94

CSE-TR-206-94

Partitioning Regular Applications for Cache-Coherent Multiprocessors

Tomko and Abraham

Mar, 94

CSE-TR-207-94

Collected Papers of the Soar/IFOR Project

Laird et.al.

Apr, 94

100

CSE-TR-208-94

Coordinating Decision Making in Large Organizations

Birmingham D'Ambrosio Darr and Durfee

Apr, 94

CSE-TR-209-94

Supporting Queries on Source Code: A Formal Framework

Paul and Prakash

Apr, 94

CSE-TR-210-94

An Object-Oriented Real-Time Database System for Multiprocessors

Lortz

Apr, 94

132

CSE-TR-211-94

A Transparent Object-Oriented Schema Change Approach Using View Evolution

Ra and Rundesteiner

Apr, 94

CSE-TR-212-94

Runtime Monitoring of Timing Constraints in Distributed Real-Time Systems

Jahanian Rajkumar and Raju

Apr, 94

CSE-TR-213-94

An Active Visual Estimator for Dexterous Manipulation

Rizzi and Koditschek

May, 94

CSE-TR-214-94

An Active OODB System For Genome Physical Map Assembly

Lee Rundensteiner and Thomas

May, 94

CSE-TR-215-94

A Flexible Object-Oriented Database Model and Implementation for Capacity-Augmenting Views

Ra Kuno and Rundensteiner

Apr, 94

CSE-TR-216-94

Automatic Acquisition of Word Meaning From Context

Hastings

May, 94

128

CSE-TR-217-94

Probing and Fault Injection of Protocol Implementations

Dawson and Jahanian

Oct, 94

CSE-TR-218-94

PLINK: An Intelligent Natural Language Parser

Huyck

Aug, 94

102

CSE-TR-219-94

The Evolution of the Soar Cognitive Architecture

Laird and Rosenbloom

Sep, 94

CSE-TR-220-94

Conceptual Modeling of Manufacturing Automation

Birla

Sep, 94

CSE-TR-221-94

An Attribute-Space Representation and Algorithm for Concurrent Engineering

Darr and Birmingham

Oct, 94

CSE-TR-222-94

Wrong-Path Instruction Prefetching

Pierce and Mudge

Nov, 94

CSE-TR-223-94

Broy-Lamport Specification Problem: An Evolving Algebras Solution (update see 320-96)

Huggins

Aug, 94

CSE-TR-224-94

Design ing Databases with Fuzzy Data and Rules for Application to Discrete Control

Chaudhry Moyne and Rundensteiner

Nov, 94

CSE-TR-225-94

A Graphical Query Language for Identifying Temporal Trends in Video Data

Hibino and Rundensteiner

Dec, 94

CSE-TR-226-94

Fault-Tolerant Interconnection Networks for Multiprocessors

Dec, 94

124

CSE-TR-227-95

Design and Evaulation of a Window-Consistent Replication Service

Mehra Rexford Ang and Jahanian

Jan, 95

CSE-TR-228-95

Optimization of Mass Storage Hierarchies

Jacob

Feb, 95

CSE-TR-229-95

An Offline Partial Evaluator for Evolving Algebras

Huggins

Mar, 95

CSE-TR-230-95

The Generalized Railroad Crossing Problem: An Evolving Algebra Based Solution

Gurevich Huggins and Mani

Mar, 95

CSE-TR-231-95

Notes on Calculating Computer Performance

Jacob and Mudge

Mar, 95

CSE-TR-232-95

Hierarchical performance Modeling with Cache Effects: A Case Study of the DEC Alpha

Wang

Mar, 95

CSE-TR-233-95

Reliable File Caches

Chen Aycock and Rajamani

Apr, 95

CSE-TR-234-95

A Microeconomic Approach to Intellitent Resource Sharing in Multiagent Systems

Lee and Durfee

Apr, 95

CSE-TR-235-95

Toward a Dynamical Pick and Place

Burridge Rizzi and Koditschek

May, 95

CSE-TR-236-95

Modeling Computation and Communication Performance of Parallel Scientific Applications: A Case Study of the IBM SP2

Boyd Abandah Lee and Davidson

May, 95

CSE-TR-237-95

A Constraint-based Object Model for Structured Document Management

Nica and Rundensteiner

May, 95

CSE-TR-238-95

Design Tradeoffs in Implementing Real-Time Channels on Bus-Based Multiprocessor Hosts

Indiresan Mehra and Shin

May, 95

CSE-TR-239-95

Tailoring Routing and Switching Schemes to Application Workloads in Multicomputer Networks

Feng Rexford Daniel Mehra and Shin

May, 95

CSE-TR-240-95

Equivalence Is In The Eye Of The Beholder

Gurevich and Huggins

May, 95

CSE-TR-241-95

The Object-Slicing Technique: A Flexible Object-Representation and Its Evaluation

Kuno Ra and Rundensteiner

May, 95

CSE-TR-242-95

Collected Papers of the Soar/IFOR Project Spring 1995

Johnson Jones Koss Laird et. al.

Jun, 95

CSE-TR-243-95

System-Oriented Evaluation of I/O Subsystem Performance

Ganger

Jun, 95

133

CSE-TR-244-95

Aggressive Centralized and Distributed Scheduling of Disk Requests

Worthington

Jun, 95

208

CSE-TR-245-95

Hierarchical Testing Using Precomputed Tests for Modules

Murray

Jun, 95

210

CSE-TR-246-95

The MultiView OODB ViewSystem: Design and Implementation

Kuno and Rundensteiner

Jun, 95

CSE-TR-247-95

Loop Optimization Techniques on Multi-Issue Architectures

Kaiser

Jun, 95

183

CSE-TR-248-95

Multiagent Negotiation Framework (A preliminary report)

Park and Birmingham

Jun, 95

CSE-TR-249-95

Specification and Verification of the Undo/Redo Algorithm for Database Recovery

Gurevich and Wallace

Jul, 95

CSE-TR-250-95

Rio: Storing Files Reliably in Memory

Chen Aycock Ng Rajamani and Sivaramakrishnan

Jul, 95

CSE-TR-251-95

An Object Model and Algebra for the Implicit Unfolding of Hierarchical Structures

Jones and Rundensteiner

Jul, 95

CSE-TR-252-95

Using Object-Oriented Principles to Optimize Update Propagation to Materialized Views

Kuno and Rundensteiner

Jul, 95

CSE-TR-253-95

Dynamic Modeling of Logic Gate Circuits

Sakallah

Jul, 95

CSE-TR-254-95

Soft Updates: A Solution to the Metadata Update Problem in File Systems

Ganger and Patt

Aug, 95

CSE-TR-255-95

Functional Abstractions and Partial Specification of Boolean Functions

Sakallah

Aug, 95

We define functional abstraction as the process of deliberately ignoring the dependence of a Boolean function on a subset of its variables. Functional abstraction causes a completely specified function to become partially specified. We propose functions sets as a theoretical model for partially specified functions and function intervals as a practical approximation to them. We develop an interval Boolean algebra suitable for the symbolic manipulation of function intervals and highlight the relationship between functional abstraction and universal and existential quantification.

CSE-TR-256-95

Automatic Generation of Performance Bounds on the KSR1

Lee and Davidson

Aug, 95

CSE-TR-257-95

Timing Analysis of Digital Systems with Gated Clocks

VanCampenhout and Mudge

Aug, 95

CSE-TR-258-95

Modeling the Communication and Computation Performance of the IBM SP2

Abandah

May, 95

CSE-TR-259-95

Hierarchical Concurrent Engineering: Supporting Hierarchical Decomposition and Peer-to-Peer Problem Solving

D'Ambrosio and Birmingham

Sep, 95

CSE-TR-260-95

An Optimal Bandwidth Allocation Strategy for the Delivery of Compressed Prerecorded Video

Feng Jahanian and Sechrest

Sep, 95

CSE-TR-261-95

Optimal Dual-Issue Instruction Scheduling with Spills For Binary Expression Trees

Meleis and Davidson

Oct, 95

CSE-TR-262-95

Modeling the Effects of Temporal Proximity of Input Transitions on Gate Propagation Delay and Transition Time

Chandramouli and Sakallah

Oct, 95

While delay modeling of gates with a single switching input has received a lot of attention, the case of multiple inputs switching in close temporal proximity is just beginning to be addressed in the literature. The effect of proximity of input transitions can be significant on the delay and output transition time. The few attempts that have addressed this issue are based on a series-parallel transistor collapsing method that reduces the multi-input gate to an inverter. This limits the technique to CMOS technology. Moreover, none of them discuss the appropriate choice of voltage thresholds to measure delay for a multi-input gate. In this paper, we first present a method for the choice of voltage thresholds for a multi-input gate that ensures a positive value of delay for any combination of input transition times and the temporal separations among them. We next introduce a dual-input proximity model for the case when only two inputs of the gate are switching. We then prose a simple algorithm for calculating the delay and output transition time that makes repeated use of the dual-input proximity model and that does not collapse the gate into a equivalent inverter. Comparison with simulation results shows that our method performs quite well in practice. Before concluding the paper we also show the close relationship between the inertial delay of a gate and the proximity of input transitions.

CSE-TR-263-95

Automatic Parallel Program Conversion from Shared-Memory to Message-Passing

Lee and Davidson

Oct, 95

CSE-TR-264-95

A Comparison of Genetic Algorithms and Other Machine Learning Systems of a Complex Classification Task from Common Disease Research

Congdon

Oct, 95

168

CSE-TR-265-95

Real-Time Concurrency Control in Groupware

Jensen and Soparkar

Oct, 95

CSE-TR-266-95

A Reduced Multipipeline Machine Description that Preserves Scheduling Constraints

Eichenberger and Davidson

Oct, 95

CSE-TR-267-95

Event Propagation Conditions in Timing Analysis

Yalcin and Hayes

Nov, 95

CSE-TR-268-95

A Simplified Model of a Supercritical Power Plant

Shinohara and Koditschek

Sep, 95

CSE-TR-269-95

Monitoring and Assertion-Checking of Real Time Specifications in Modechart

Brockmeyer and Jahanian

Oct, 95

CSE-TR-270-95

The Publish/Subscribe Paradigm for Scalable Group Collaboration Systems

Mathur Hall Jahanian Prakash and Rasmussen

Nov, 95

CSE-TR-271-95

Providing VCR Functionality in a Constant Quality Video-On-Demand Transportation Service

Feng Jahanian and Sechrest

Dec, 95

CSE-TR-272-95

Interactive Visualizations for Temporal Analysis: Application to CSCW Multimedia Data

Hibino and Rundensteiner

Dec, 95

CSE-TR-273-95

Measuring and Improving Memory's Resistance to Operating System Crashes

Ng Rajamani Aycock and Chen

Dec, 95

CSE-TR-274-95

A Protocol Composition-Based Approach to QoS Control in Collaboration Systems

Mathur and Prakash

Dec, 95

CSE-TR-275-95

Protocols for Authenticated Download to Mobile Information Appliances

Jaeger and Rubin

Dec, 95

CSE-TR-276-96

Distributed Object Replication Support for Collaborative Systems

Wu and Prakash

Jan, 96

CSE-TR-277-96

Characterizing Shared Memory and Communication Performance: A Case Study of the Convex SPP-10000

Abandah and Davidson

Jan, 96

CSE-TR-278-96

Using Simulation and Performance Improvement Knowledge for Redesigning Business Processes

Jaeger and Prakash

Jan, 96

CSE-TR-279-96

Getting More Information into File Names

McClennen and Sechrest

Jan, 96

CSE-TR-280-96

Design and Evaluation of a QoS-Sensitive Communication Subsystem Architecture

Mehra Indiresan and Shin

Jan, 96

CSE-TR-281-96

Resource Management for Real-Time Communication: Making Theory Meet Practice

Mehra Indiresan and Shin

Jan, 96

CSE-TR-282-96

Limits to Branch Prediction

Mudge Chen and Coffey

Jan, 96

CSE-TR-283-96

Correlation and Aliasing in Dynamic Branch Predictors - Full Simulation Results

Sechrest CC Lee and Mudge

Feb, 96

107

CSE-TR-284-96

Tool Coordination and Media Integration on Asynchronously-Shared Computer-Supported Workspaces

Manohar and Prakash

Feb, 96

CSE-TR-285-96

Faster Static Timing Analysis via Bus Compression

VanCampenhout and Mudge

Feb, 96

CSE-TR-286-96

The Rio File Cache: Surviving Operating System Crashes

Chen Ng Rajamani and Aycock

Mar, 96

CSE-TR-287-96

Impact of Selection Functions on Routing Algorithms Performance in Multicomputer Networks

Feng and Shin

Mar, 96

CSE-TR-288-96

Induction: Inference and Process

Rothleder

Mar, 96

CSE-TR-289-96

ISMAUT Tools: A Software Tool Kit for Rational Tradeoffs Among Conflicting Objectives

D'Ambrosio

Mar, 96

CSE-TR-290-96

Corona: A Communication Service for Scalable Reliable Group Collaboration Systems (Preliminary Design)

Hall Mathur Jahanian Prakash and Ramussen

Mar, 96

CSE-TR-291-96

RTCAST: Lightweight Multicast for Real-Time Process Groups

Abdelzaher Shaikh Jahanian and Shin

Apr, 96

CSE-TR-292-96

GRASP - - A New Search Algorithm for Satisfiability

Silva and Sakallah

Apr, 96

This report introduces GRASP (Generic seaRch Algorithm for the Satisfiability Problem), an integrated algorithmic frame-work for SAT that unifies several previously propose search-pruning techniques and facilitates identification of additional ones. GRASP is premised on the inevitability of conflicts during search and it s most distinguishing feature is the augmentation of basic backtracking search with a powerful conflict analysis procedure. Analyzing conflicts to determine their causes enables GRASP to backtrack non-chronologically to earlier levels in the search tree, potentially pruning large portions of the search space. In addition, by �recording� the causes of conflicts, GRASP can recognize and preempt the occurrence of similar conflicts later on in the search. Finally, straightforward bookkeeping of the causality chains leading up toe conflicts allows GRASP to identify assignments that are necessary for a solution to be found. Experimental results obtained from a large number of benchmarks, including many from the field of test pattern generation, indicate that application of the proposed conflict analysis techniques to SAT algorithms can be extremely effective for a large number of representative classes of SAT instances.

CSE-TR-293-96

Optiming Delay in Delayed-Write File Systems

Chen

May, 96

CSE-TR-294-96

Goal-Directed Performance Tuning for Scientific Applications

Shih

Jun, 96

135

CSE-TR-295-96

Modeling Domino Logic for Static Timing Analysis

VanCampenhout Mudge and Sakallah

Jun, 96

CSE-TR-296-96

Timing Analysis of Domino Logic

VanCampenhout Mudge and Sakallah

Jun, 96

CSE-TR-297-96

Performance of a Distributed Object-Based Internet Collaboratory

Malan Jahanian Rasmussen and Knoop

Jul, 96

CSE-TR-298-96

Experiments on Six Commercial TCP Implementations Using a Software Fault Injection Tool

Dawson Jahanian and Mitton

Jul, 96

CSE-TR-299-96

Optimal Zero-Aliasing Space Compaction of Test Response

Chakrabarty Murray and Hayes

Aug, 96

CSE-TR-300-96

A Cellular Wireless Local Area Network with QoS Guarantees for Heterogeneous Traffic

Choi and Shin

Aug, 96

CSE-TR-301-96

Using Stall Cycles to Improve Microprocessor Performance

Dundas and Mudge

Sep, 96

CSE-TR-302-96

The Satisfiability-Indicating Multi-Index Organization for Maintaining Materialized Path Query OODB Views

Kuno and Rundensteiner

Sep, 96

CSE-TR-303-96

Sectored Cache Performance Evaluation: A case study on the KSR-1 data subcache

Rivers and Davidson

Sep, 96

CSE-TR-304-96

Design Issues on the Support of Tools and Media in Replayable Workspaces

Manohar and Prakash

Sep, 96

CSE-TR-305-96

Preliminary Analytical Approach to a Brachiation Robot Controller

Nakanishi Fukude and Koditschek

Sep, 96

CSE-TR-306-96

Tagless Two-level Branch Prediction Schemes

Chen Lee Postiff and Mudge

Sep, 96

CSE-TR-307-96

An Agent Model for Distributed Part-Selection

Darr and Birmingham

Sep, 96

CSE-TR-308-96

Query Processing in the MultiMedia Visual Information Seeking Environment: A Comparative Evaluation

Hibino and Rundensteiner

Oct, 96

CSE-TR-309-96

Learning Procedural Planning Knowledge in Complex Environments

Pearson

Oct, 96

158

CSE-TR-310-96

Signal Delay in Coupled Distributed RC Lines in the Presence of Temporal Proximity

Chandramouli Kayssi and Sakallah

Oct, 96

With improvements in technology, accurate delay modeling of interconnects is becoming increasingly important. Due to decreasing feature sizes, the spacing between the signal lines is also decreasing. Consequently, the switching activities on the neighboring lines can have a significant impact on the delay of the line of interest, and can no longer be ignored. Accurate modeling of this phenomenon, which we call the proximity effect, is the subject of this paper. This is similar to the state-dependency of logic gate delays, where signal delay can be affected by the switching activities on the side inputs of a gate. We describe an efficient and accurate delay computation method using precomputed interconnect moments that treats the coupled lines as uniform, distributed RC lines and does not make any lumped approximations. This allows the proposed delay model to be used in a timing analysis tool operating over both gate and interconnect domains while accounting for state-dependency.

CSE-TR-311-96

The Dynamic Information Integration Model

Nica and Rundensteiner

Oct, 96

CSE-TR-312-96

TCP Enhancement for an Integrated Services Internet

Feng Kandlur Saha and Shin

Oct, 96

CSE-TR-313-96

Workshop on Databases: Active and Real-Time 1996 (Concepts meet Practice)

Soparkar and Ramamritham

Nov, 96

107

CSE-TR-314-96

Specification of the PUMA memory management design

Jacob and Mudge

Nov, 96

CSE-TR-315-96

Extended Aggregation Relationships for Process Specification and Enactment in Active Databases

Chaudhry Moyne and Rundensteiner

Nov, 96

CSE-TR-316-96

A Name-Based Mapping Scheme for Rendezvous

Thaler and Ravishankar

Nov, 96

CSE-TR-317-96

Early Design Cycle timing Simulation of Caches (Preliminary Exam Report)

Tam and Davidson

Nov, 96

CSE-TR-318-96

ORCHESTRA: A Fault Injection Environment for Distributed Systems

Dawson Jahanian and Mitton

Nov, 96

CSE-TR-319-96

An Instruction Stream Compression Technique (U of M Confidential)

Bird and Mudge

Nov, 96

CSE-TR-320-96

Broy-Lamport Specification Problem: A Gurevich Abstract State Machine Solution (Update to 223-94)

Huggins

Dec, 96

CSE-TR-321-96

Specification and Verification of Pipelining in the ARM2 RISC Microprocessor

Huggins and VanCampenhout

Dec, 96

Gurevich Abstract State Machines (ASMs) provide a sound mathematical basis for the specification and verification of systems. An application of the ASM methodology to the verification of a pipelined microprocessor (an ARM2 implementation) is described. Both the sequential execution model and final pipelined model are formalized using ASMs. A series of intermediate models are introduced that gradually expose the complications of pipelining. The first intermediate model is proven equivalent to the sequential model in the absence of structural, control, and data hazards. In the following steps, these simplifying assumptions are lifted one by one, and the original proof is refined to establish the equivalence of each intermediate model with the sequential model, leading ultimately to a full proof of equivalence of the sequential and pipelined models.

CSE-TR-322-96

Recursive Abstract State Machines

Gurevich and Spielmann

Dec, 96

CSE-TR-323-96

On-Line Extraction of SCSI Disk Drive Parameters

Worthington Ganger Patt and Wilkes

Dec, 96

CSE-TR-324-97

Evaluating View Materialization Strategies for Complex Hierarchical Objects

Jones and Rundensteiner

Jan, 97

CSE-TR-325-97

The Linear Time Hierarchy Theorems for Abstract State Machines and RAMs

Blass and Gurevich

Jan, 97

CSE-TR-326-97

Gurevich Abstract State Machines and Schonhage Storage Modification Machines

Dexter Doyle and Gurevich

Jan, 97

CSE-TR-327-97

Formalizing Database Recovery

Gurevich Soparkar and Wallace

Jan, 97

CSE-TR-328-97

Specifying and Constructing Schedulers for Workflows with Autonomous Executions

Jensen Wallace and Soparkar

Feb, 97

CSE-TR-329-97

Characterizing Multicase Orderings using Concurrency Control Theory

Jensen Soparkar and Mathur

Feb, 97

CSE-TR-330-97

The Impact of Instruction Compression on I-cache Performance

Chen Bird and Mudge

Feb, 97

CSE-TR-331-97

Communication Characterization of a Cray T3D

Vlaovic

Feb, 97

CSE-TR-332-97

Internet Routing Instability

Labovitz Malan and Jahanian

Feb, 97

CSE-TR-333-97

Measurement-based Admission Control Algorithms for Controlled-Load Service: A Structural Examination

Jamin and Shenker

Apr, 97

CSE-TR-334-97

Symbolic Performance & Learning In Continuous-valued Environments

Rogers

May, 97

133

Real-world and simulated real-world domains, such as flying anddriving, commonly have the characteristics of continuous-valued (CV) environments. These environments are frequently complex and difficult to control, requiring a great deal of specific, detailed knowledge. Symbolic computation can use many different types of knowledge to constrain the control function. This thesis investigates learning the control function using observed data in a symbolic architecture. SPLICE (Symbolic Performance & Learning In Continuous-Valued Environments) is a Soar agent that performs symbolic control function approximation. SPLICE uses a three-level framework to first classify its sensory information into symbolic regions, then map the set of regions to a local model, then use the local model to determine an action. Learning causes the models to gradually become more specific and more accurate. SPLICE agent is evaluated against other adaptive control algorithms in simple domains using four experimental methodologies. We also instantiate SPLICE in a complex environment, simulated flight, and evaluate its performance. The final goal is not only to create an effective controller for complex continuous environments, but also to understand more clearly the ramifications of symbolic learning in continuous domains.

CSE-TR-335-97

Critical Issues Regarding the Trace Cache Fetch Mechanism

Patel Friendly and Patt

May, 97

In order to meet the demands of wider issue processors, fetch mechanisms will need to fetch multiple basic blocks per cycle. The trace cache supplies several basic blocks each cycle by storing logically contiguous instructions in physically contiguous storage. When a particular basic block is requested, the trace cache can potentially respond with the requested block along with several blocks that followed it when the block was last encountered. In this technical report, we examine some critical features of a trace cache mechanism designed for a 16-wide issue processor and evaluate their effects on performance. We examine features such as cache associativity, storage partitioning, branch predictor design, instruction cache design, and fill unit design. We compare the performance of our trace cache mechanism with that of the design presented by Rotenberg et al and show a 23% improvement in performance. In our final analysis, we compare our trace cache mechanism with an aggressive single basic block fetch mechanism and show that the trace cache mechanism attains a 24% improvement in performance.

CSE-TR-336-97

May 1997 Draft of the ASM Guide

Gurevich

May, 97

This is only a draft of a portion of the ASM guide, but there are no plans to change this part in any substantial way, only to augment it.

CSE-TR-337-97

Classification-Directed Branch Predictor Design

Chang

May, 97

109

Pipeline stalls due to branches represent one of the most significant impediments to realizing the performance potential of deeply pipelined superscalar processors. Two well-known mechanisms have been proposed to reduce the branch penalty, speculative execution in conjunction with branch prediction and predicated execution. This dissertation proposes branch classification, coupled with improvements in conditional branch prediction, indirect branch prediction, and predicted execution, to reduce the branch execution penalty. Branch classification allows an individual branch instruction to be associated with the branch predictor best suited to predict its direction. Using this approach, a hybrid branch predictor is constructed which achieves a higher prediction accuracy than any branch predictor previously reported in the literature. This dissertation also proposes a new prediction mechanism for predicting indirect jump targets. For the perl and gcc benchmarks, this mechanism reduces the indirect jump misprediction rate by 93.4% and 63.3% and the overall execution time on an 8-wide issue out-of-order execution machine by 14% and 5%. Finally, this dissertation proposes a new method for combining the performance benefits of predicated execution and speculative execution. This approach significantly reduces the branch execution penalty suffered by wide-issue processors.

CSE-TR-338-97

Choiceless Polynomial Time

Blass Gurevich and Shelah

May, 97

Turing machines define polynomial time (PTime) on strings but cannot deal with structures like graphs directly, and there is no known, easily computable string encoding of isomorphism classes of structures. Is there a computation model whose machines do not distinguish between isomorphic structures and compute exactly PTime properties? This question can be recast as follows: Does there exists a logic that captures polynomial time (without presuming the presence of a linear order)? Earlier, one of us conjectured the negative answer.The problem motivated a quest for stronger and stronger PTime logics. All these logics avoid arbitrary choice. Here we attempt to capture the choiceless fragment of PTime. Our computation model is a version of abstract state machines (formerly called evolving algebras). The idea is to replace arbitrary choice with parallel execution. The resulting logic is more expressive than other PTime logics in the literature. A more difficult theorem shows that the logic does not capture all PTime.

CSE-TR-339-97

Using Chromaticity Distributions and Eigenspace Analysis for Pose- Illumination- and Specularity-Invariant Color Indexing of 3D Objects

Lin and Lee

May, 97

The distribution of object colors can be effectively utilized for recognition and indexing. Difficulties arise in the recognition of object color distributions when there are variations in illumination color, changes in object pose with respect to illumination direction, and specular reflections. However, most of the recent approaches to color-based recognition focus mainly on illumination color invariance. We propose an approach that identifies object color distributions influenced by: (1) illumination pose, (2) illumination color and (3) specularity. We suggest the use of chromaticity distributions to achieve illumination pose invariance. To characterize changes in chromaticity distribution due to illumination color, an eigenspace analysis is employed. A set of chromaticity histograms of each object is generated for a range of lighting colors based on linear models of illumination and reflectance, and the histograms are represented using a small number of eigen basis vectors constructed from principal components analysis. Represented in an eigenspace, chromaticity signatures for each object in the model database form a manifold encompassing a range of lighting colors. Recognition and illumination color estimation are performed by projecting the chromaticity signature of a test object into the eigenspace and examining its proximity to database manifolds. Since specular reflections may displace a projection away from its corresponding manifold, a model-based specularity detection/rejection algorithm, called chromaticity differencing, is developed to reduce this discrepancy.

CSE-TR-340-97

Schema Version Removal: Optimizing Transparent Schema Evolution Systems

Crestana Lee and Rundensteiner

May, 97

Powerful interoperability-enabling solutions for software application integration must allow applications to evolve and data requirements to change, while minimizing such changes on other integrated applications. The transparent schema evolution system (TSE), that we developed at the University of Michigan, accomplishes evolution through schema versioning, where a schema version is a dynamic object-oriented view. When the TSE system receives a view schema change request, the system computes a new view schema that reflects the desired change instead of modifying the old view schema in place, therefore preserving existing view schemas for old applications. This generation of a potentially large number of schema versions over time results in an excessive build-up of classes and underlying object instances, not all being necessarily still in use. Since our view system provides us with materialized views, a degradation of system performance due to update progagation is expected in a situation where we have no-longer utilized view schemas in the system. Also, a larger number of view schemas will result in storage overhead costs. In this paper, we address this problem using consistent schema removal techniques. First, we characterize four potential problems of schema consistency that could be caused by removal of a single virtual class; and then outline our solution approach for each of these problems. Second, we demonstrate that view schema removal is sensitive to the order in which individual classes are processed. Our solution to this problem is based on a formal model, called the dependency model, of capturing all dependencies between classes as logic clauses and of manipulating them to make decisions on class deletions and nondeletions while guaranteeing the consistency of the schema. Third, based on this formal model, we have developed and proven consistent a dependency graph (DG) representation and an associated set of rules for DG generation, reduction, and transformation. A first preliminary version of the latter has been successfully implemented in our Schema Version Removal (SVR) tool. Fourth, we present a cost model for evaluating alternative removal patterns on DG. Lastly, we report our preliminary experimental studies that demonstrate the impact of our schema version removal on the performance of the TSE system.

CSE-TR-341-97

A Case Study of a Hardware-Managed TLB in a Multi-Tasking Environment

Lee Uhlig and Mudge

Jun, 97

There have been very few performance studies of hardware-managed translation look-aside buffers (TLBs). The major reason is the lack of efficient and accurate analysis tools. Newer operating systems, applications, and the popularity of the client-server model of computation place a greater burden than their predecessors on memory system components such as TLBs. Thus it is becoming more important to measure the performance of memory systems under such workloads. In this work, we implemented a trap-driven simulator in an operating system to emulate a variety of TLBs. Using this tool, we were able to evaluate the performance of a range of TLBs under these newer workloads. The results show that in order to improve the TLB performance, we should carefully map pages into the TLB, append process identifiers to avoid flushing the TLB contents frequently, or reserve part of the TLB for a particular server process.

CSE-TR-342-97

Improving Code Density Using Compression Techniques

Lefurgy Bird Chen and Mudge

Jul, 97

We propose a method for compressing programs in embedded processors where instruction memory size dominates cost. A post-compilation analyzer examines a program and replaces com mon sequences of instructions with a single instruction codeword. A microprocessor executes the compressed instruction sequences by fetching codewords from the instruction memory, expanding them back to the original sequence of instructions in the decode stage, and issuing them to the execution stages. We apply our technique to the PowerPC instruction set and achieve 30% to 50% reduction in size for SPEC CINT95 programs.

CSE-TR-343-97

A Chromaticity Space for Specularity Illumination Color- and Illumination Pose-Invariant 3-D Object Recognition

Berwick and Lee

Jul, 97

Most of the recent color recognition/indexing approaches concentrate on establishing invariance to illumination color to improve the utility of color recognition. However, other effects caused by illumination pose and specularity on three-dimensional object surfaces have not received notable attention. We present a chromaticity recognition method that discounts the effects of illumination pose, illumination color and specularity. It utilizes a chromaticity space based on log-ratio of sensor responses for illumination pose and color invariance. A model-based specularity detection/rejection algorithm can be used to improve the chromaticity recognition and illumination estimation for objects including specular reflections.

CSE-TR-344-97

An Architecture for Inter-Domain Troubleshooting

Thaler and Ravishankar

Aug, 97

In this paper, we explore the constraints of a new problem: that of coordinating network troubleshooting among peer administrative domains or Internet Service Providers, and untrusted observers. Allowing untrusted observers permits any entity to report problems, whether it is a Network Operations Center (NOC), end-user, or application. Our goals here are to define the inter-domain coordination problem clearly, and to develop an architecture which allows observers to report problems and receive timely feedback, regardless of their own locations and identities. By automating this process, we also relieve human bottlenecks at help desks and NOCs whenever possible. We begin by presenting a troubleshooting methodology for coordinating problem diagnosis. We then describe GDT, a distributed protocol which realizes this methodology. We show through simulation that GDT performs well as the number of observers and problems grows, and continues to function robustly amidst heavy packet loss.

CSE-TR-345-97

Interactive Delayed-Sharing of Computer-Supported Workspaces via the Streaming of Re-Executable Content

Manohar

Aug, 97

179

This dissertation introduces the notion of interactive delayed-sharing of a session on a computer-supported workspace to allow reuse of such session at a later time. A data artifact, referred to as a session object, encapsulates the session. A session object is composed of heterogeneous multimedia streams that represent temporally-ordered input sequences to applets in the workspace. Playback of a session object recreates the underlying workspace spanned by these applets through the streaming and re-execution of these input sequences in their respective applets. The contributions of this dissertation are as follows. First, this dissertation pioneers the re-execution approach to the record and replay of sessions for supporting computer-supported collaborative work. In doing so, it introduces a novel paradigm for flexible support of asynchronous collaboration that allow users to collaborate by annotating, editing, and refining these delayed-shared workspaces. This dissertation explores these collaborative features particularly focusing on the record, representation, and playback of these workspaces. Second, this dissertation introduces the notion of workspaces as authored, transportable, and temporally-aware artifacts and investigates flexible mechanisms for the delivery of temporal-awareness to workspaces composed of heterogeneous applications through the decoupling of tool and media services. This dissertation develops a formal specification through temporal relationships in these workspaces. Finally, this dissertation describes mechanisms for the integration of re-executable content streams together with continuous multimedia streams. It describes a framework for media-independent integration of these heterogeneous streams and their temporal constraints. The dissertation introduces the use of statistical process controls to guide the relaxation and/or constraint of scheduling and synchronization mechanisms so as to flexibly support a range of heterogeneous media streams and their temporal relationships.

CSE-TR-346-97

Limited Dual Path Execution

Tyson Lick and Farrens

Aug, 97

This work presents a hybrid branch predictor scheme that uses a limited form of dual path execution along with dynamic branch prediction to improve execution time across a broad set of applications. The ability to execute down both paths of a conditional branch enables branch penalty to be minimized; however, relying exclusively on dual path execution is infeasible due to the a cascading effect because instruction fetch rates far exceed the capability of the pipeline to retire a single branch before others must be processed. By using confidence information, available in the dynamic branch prediction state tables, a limited form of dual path execution becomes feasible. The reduces the burden on the branch predictor by allowing predictions of low confidence to be avoided. In this study we present a new approach to gather branch prediction confidence with little or no overhead, and use this confidence mechanism to determine whether dual path execution or branch prediction should be used.Comparing this hybrid predictor model to the dynamic branch predictor shows a dramatic decrease in misprediction rate (from 30 to 60 %). This translates to an reduction in runtime of 10 - 15%. These results imply that dual path execution, which often is thought to be a resource consuming method, may be a worthy approach if restricted with a predicting set.

CSE-TR-347-97

Adaptive Packet Marking for Providing Differentiated Services in the Internet

Feng Kandlur Saha and Shin

Oct, 97

This paper examines the use of adaptable priority marking for providing soft bandwidth guarantees to individual connections or connection groups over the Internet. The proposed scheme does not require resource reservation for individual connections and can be supported with minimal changes to the network infrastructure. The scheme uses modest support from the network in the form of priority handling for appropriately marked packets and relies on intelligent transmission control mechanisms at the edges of the network to achieve the desired throughput levels. The paper describes the control mechanisms and evaluates their behavior in various network environments. The mechanisms described have the flexibility to adapt to different network environments, thereby making them suitable for deployment in an evolving Internet.

CSE-TR-348-97

Flexible Timing Simulation of Multiple-Cache Configurations

Tam Rivers and Davidson

Oct, 97

As the gap between processor and memory speeds increases, cache performa nce becomes more critical to overall system performance. Behavioral cache simul ation is typically used early in the design cycle of new processor/cache configu rations to determine the performance of proposed cache configurations on target workloads. However, behavioral cache simulation does not account for the latenc y seen by each memory access. The Latency-Effects (LE) cache model presented in this paper accounts this nominal latency as well as the additional latencies du e to trailing-edge effects, bus width considerations, port conflicts, and the nu mber of outstanding accesses that a cache allows before it blocks. We also exte nd the LE cache model to handle the latencies seen for moving data among multipl e caches. mlcache, a new, easily configurable and extensible tool, has b een built based on the extended LE model. We show the use of mlcache in estimating the performance of traditional and novel cache configurations, includ ing odd/even, 2-level, Assist, Victim, and NTS caches. We also show how the LE cache timing model provides more useful, realistic performance estimates than ot her possible behavioral-level cache timing models.

CSE-TR-349-97

Techniques for Eliminating Packet Loss in Congested TCP/IP Networks

Feng Kandlur Saha and Shin

Nov, 97

The congestion control mechanisms used in TCP have been the focus of numerous studies and have undergone a number of enhancements. However, even with these enhancements, TCP connections still experience alarmingly high loss rates, especially during times of congestion. To alleviate this problem, the IETF is considering active queue management mechanisms, such as RED, for deployment in the network. In this paper, we first show that even with RED, loss rates can only be reduced marginally in times of congestion. We then show that the solution to this problem lies in decoupling congestion notification from packet loss through the use of explicit congestion notification (ECN) and in understanding the traffic generated by an aggregation of a large number of sources. Given this, we then propose and experiment with more adaptive active queue management algorithms (Adaptive RED) and more conservative end-host mechanisms (SubTCP) which can significantly reduce loss rates across congested links. When used together, these mechanisms can effectively eliminate packet loss while maintaining high link utilizations under the most difficult scenarios.

CSE-TR-350-97

Dynamics of Quality-of-Service Routing with Inaccurate Link-State Information

Shaikh Rexford and Shin

Nov, 97

Quality-of-service (QoS) routing can satisfy application performance requirements and optimize network resource usage by selecting paths based on connection traffic parameters and link load information. However, effective path-selection schemes require the distribution of link-state information, which can cause a significant burden on the bandwidth and processing resources in the network. We investigate the fundamental tension between network overheads and the quality of routing decisions in the context of source-directed QoS routing algorithms. In contrast to previous performance studies that compare different routing algorithms under specific network configurations, we characterize how the performance and overheads of QoS routing relate to the link-state update policies, as a function of the underlying traffic load, network topology, and link-cost metrics. We explore the interplay between stale link-state information and random fluctuations in traffic load through a broad set of simulation experiments on a parameterized model of QoS routing. These results suggest ways to tune the frequency of link-state update messages and the link-cost functions to strike a careful balance between high accuracy and low complexity.

CSE-TR-351-97

Stateful Multicast Services for Supporting Collaborative Applications

Shim Hall Litiu and Prakash

Nov, 97

Collaborative, multi-user applications require group multicast services with ordering guarantees for maintaining consistency of replicated shared state among collaborating processes. In traditional group multicast services (such as Isis), the groups shared state is maintained only by the clients and the multicast service facilitates atomic state transfer from existing clients to new members. In this paper, we argue that, in order to support collaborative applications in Internet-type environments, a copy of the groups state should also be maintained by the multicast service. We show that by maintaining a copy of groups state, the multicast service can provide consistently fast group-join and state transfer times when both slow and fast clients are present or when clients are unreliable --- a crucial requirement in collaborative, multi-user applications where users may dynamically join and leave a collaborative session and expect predictable join times and interactive response time even in the presence of slow or unreliable clients. We show that the overheads incurred by a multicast service in managing each groups shared state can be made minimal and that the multicast service does not have to be aware of the semantics of the groups state. We present the design of such a multicast service, present performance results, and discuss how it meets the various needs of collaborative applications.

CSE-TR-352-97

Individual and Group QoS Issues in Communication Services for Groupware Systems

Litiu and Prakash

Nov, 97

The proliferation of computer networks in the last decade and the ubiquity of the World Wide Web have led to increased interest in the development of computer-supported cooperative work (CSCW) systems. Collaborative, multi-user applications require group multicast services that provide ordering guarantees for maintaining consistency of replicated shared context as well as provide a high degree of interactivity, even under varying load on the multicast servers. While the most common view of the quality of service (QoS) in a distributed system is in terms of the guarantee of the network connection parameters (bandwidth, end-to-end delay), in this paper we investigate QoS from the perspective of the various requirements placed on group communication servers with limited resources by multiple and diverse groups of collaborative users. We show that in the absence of QoS considerations in the design of a group communication service, some groups or individual users can be severely affected by bursty traffic or increase in the size of other groups. We present the design of a best-effort QoS-based adaptive group communication service for supporting reliable data communication in CSCW systems. Our QoS considerations address both groups requirements and individual users requirements. We present performance results showing the effectiveness of the approach and discuss some of the open issues for future work.

CSE-TR-353-97

Scalable Flow Control for Multicast ABR Services in ATM Networks

Zhang Shin Saha Kandlur

Nov, 97

We propose an efficient flow-control scheme for ATM ABR multicast services. We develop a second-order rate-control algorithm to deal with the variation in RM (Resource Management) cell feedback delay resulting from dynamic ``drift' of the bottleneck location within a multicast tree. The proposed scheme makes the rate process converge to the available bandwidth of the multicast-connections most congested link. It also confines the buffer occupancy to a target regime bounded by a given (finite) buffer capacity at the bottleneck node. Using fluid approximation, we model the proposed scheme and study the system dynamics under the most stressful traffic conditions. We derive expressions for queue buildups and average throughputs in both transient and equilibrium states. We identify the system control factors that govern the system dynamics and develop an optimal control condition which guarantees monotonic convergence of the system state to the target regime from an arbitraryinitial value.

CSE-TR-354-97

Integrated Rate and Credit Based Flow Control for Unicast ABR Service in ATM Networks

Zhang Shin Zheng

Nov, 97

We propose a flow-control scheme that combines the merits of credit- and rate-based flow-control schemes by applying {it direct/} control over {it both} bandwidth and buffer resources. The goal of the proposed scheme is to design an optimal rate-control policy for a given finite buffer capacity by %that maximizing average throughput and bounding end-to-end delay. By applying the second-order rate control, the proposed scheme not only makes the rate process converge to the neighborhood of link bandwidth, but also confines the queue-length fluctuation to a regime bounded by buffer capacity. Using hop-by-hop credit flow control, the proposed scheme ensures lossless transmission with a finite buffer capacity while maintaining high network utilization. The source dynamically adjusts its transmission rate and rate-control parameters according to both the network congestion status and whether or not the system states are in the target operating regime. Using the fluid approximation method, we model the proposed flow-control scheme and study the system dynamic behavior for ABR (Available Bit Rate) service under the most stressful traffic condition. We derive the expressions for queue build-ups and average throughput in the equilibrium states as functions of rate-control parameters, feedback delay, link bandwidth, and buffer capacity. Based on the analytical results, we identify the optimal equilibrium state and propose the second-order rate control algorithm for transient-state flow control. We derive a sufficient condition which guarantees the second-order rate control to drive the system from any transient state to an optimal equilibrium state. The worst-case performance bounds, derived as the closed-form functions of flow-control parameters, provide an analytical means of evaluating the performance of the second-order rate control in terms of convergence speed and buffer utilization. The analytical results %in both transient and equilibrium states for both single and multiple connection scenarios have shown the proposed scheme to be stable and efficient in that the source rate and bottleneck queue length rapidly converge to a small neighborhood of the designated operating point. Also, presented are examples showing that the proposed scheme outperforms the other existing schemes.

CSE-TR-355-97

The Semantics of the Java Programming Language: Preliminary Version

Wallace

Dec, 97

A mathematical model of the Java programming language is constructed.

CSE-TR-356-98

Real-Time Primary-Backup (RTPB) Replication with Temporal Consistency Guarantees

Zou and Jahanian

Feb, 98

A common approach to building fault-tolerant distributed systems is to replicate servers that fail independently. The objective is to give the clients the illusion of service that is provided by a single server. The main approaches for structuring fault-tolerant servers are emph{passive} and emph{active} replication. This paper presents a primary-backup (passive) replication scheme for supporting fault-tolerant real-time applications. The proposed scheme, called Real-Time PB (RTPB) replication service, is an elegant and simple approach that provides the benefits of fault-tolerance, real-time access, and temporal consistency guarantees that are not otherwise easily attainable. This paper formally defines two types of temporal consistency, namely emph{external temporal consistency} and emph{inter-object temporal consistency}. By introducing a key concept called emph{phase variance}, we are able to build our temporal consistency models and derive necessary and sufficient conditions that can be used as the basis for update and transmission scheduling that achieve temporal consistency guarantees. Furthermore, we prove that the term emph{phase variance} used in the models can be bounded under various scheduling algorithms, namely EDF, Rate Monotonic~cite{Liu:73}, and Distance-Constrained Scheduling~cite{Han:92}. The paper also presents an implementation of the real-time primary-backup replication scheme with the aforementioned temporal consistency models. This implementation was developed within the $x$-kernel architecture on the MK 7.2 microkernel from the Open Group. The results of a detailed performance evaluation of this implementation is also discussed.

CSE-TR-357-98

Configuration Independent Analysis for Characterizing Shared-Memory Applications

Abandah and Davidson

Jan, 98

Characterizing shared-memory applications provides insight to design efficient systems, and provides awareness to identify and correct application performance bottlenecks. Configuration dependent analysis is often used to simulate detailed application traces on a particular hardware model. The communication traffic and computation workload generated by the application trace is used as a characterization of this application. This paper demonstrates that configuration independent analysis is a useful tool to characterize shared-memory applications. Configuration independent analysis characterizes inherent application characteristics that do not change from one configuration to another. While configuration dependent analysis is repeated for each target configuration, configuration independent analysis is only performed once. Moreover, configuration independent analysis does not require developing models for the target configurations and is faster than detailed simulation. However, configuration dependent analysis directly provides more information about specific configurations. A combination of the two analysis types constitutes a comprehensive and efficient methodology for characterizing shared-memory applications. In this paper, we show how configuration independent analysis is used to characterize eight aspects of application behavior: general characteristics, working sets, concurrency, communication patterns, communication variation over time, communication slack, communication locality, and sharing behavior. We illustrate the advantages and limitations of this approach by analyzing eight case-study benchmarks from the scientific and commercial domains and interpreting the results.

CSE-TR-358-98

The DiskSim Simulation Environment Version 1.0 Reference Manual

Ganger Worthington and Patt

Feb, 98

DiskSim is an efficient, accurate and highly-configurable disk system simulator developed at the University of Michigan to support research into various aspects of storage subsystem architecture. It includes modules that simulate disks, intermediate controllers, and buses, device drivers, request schedulers, disk block caches, and disk array data organizations. In particular, the disk drive module simulates modern disk drives in great detail and has been carefully validated against several productions disks (with accuracy that exceeds any previously reported simulator). This manual describes how to configure and use DiskSim, which has been made publicly available with the hope of advancing the state-of-the-art in disk system performance evaluation in the research community. The manual also briefly describes DiskSim�s internal structure and various validation results.

CSE-TR-359-98

Constructive Multilevel Logic Synthesis Under Properties of Boolean Algebra

Kravets and Sakallah

Mar, 98

We describe a new constructive multilevel logic synthesis system that integrates the traditionally sepa rate technology-independent and technology-dependent stages of modern synthesis tools. Dubbed M32, this system is capable of generating circuits incrementally based on both functional as well as structural considerations. This is achieved by maintaining a dynamic structural representation of the evolving implementation and by refining it through progressive introduction of gates from a target technology library. Circuit construction proceeds from the primary inputs towards the primary out puts. Preliminary experimental results show that circuits generated using this approach are generally superior to those produced by multi-stage synthesis.

CSE-TR-360-98

Block Enlargement Optimizations for Increasing the Instruction Fetch Rate in Block-Structured Instruction Set Architectures

Hao

Mar, 98

129

To exploit larger amounts of instruction level parallelism, processors are being built with wider issue widths and larger numbers of functional units. Instruction fetch rate must also be increased in order to effectively exploit the performance potential of such processors. Block-structured ISAs are a new class of instruction set architectures that we designed to address the performance obstacles faced by processors attempting to exploit high levels of instruction level parallelism. The major distinguishing feature of a block-structured ISA is that it defines the architectural atomic unit (i.e. the instruction)( to be a group of operations which is called an atomic block. This dissertation defines an optimization, block enlargement, that can be applied to a block-structured ISA to increase the instruction fetch rate of a processor that implements that ISA. A compiler that generates block-structured ISA processor were constructed to evaluate the performance benefit of block-structured ISAs. This dissertation shows that for the SPECint95 benchmarks, the block-structured ISA processor executing enlarged atomic blocks and using simpler microarchitectural mechanisms to support wide-issue and dynamic scheduling out performs a conventional ISA processor that also supports wide-issue and dynamic scheduling by 28% when assuming perfect branch prediction and by 15% when using real branch prediction.

CSE-TR-361-98

Improving Performance of an L1 Cache With an Associated Buffer

Srinivasan

Mar, 98

Memory access latencies are much larger than processor cycle times, and the trend has been for this gap to increase over time. Cache performance becomes critical in bridging this gap. However, since it is difficult to make a cache both large and fast, cache misses are expected to continue to have a significant performance impact. Victim caching, proposed by Jouppi, is an approach to decrease the miss ratio of direct-mapped caches without affecting their access time. NTS caching, proposed by Rivers is a multilateral cache design scheme that improves performance of first-level(L1) caches based on the temporal locality of the reference pattern. We propose an improvement of these schemes, which we call NT-victim caching. Taking the lead from NTS design we have a bilateral L1 cache, having a main cache (cache A) and a small fully associative buffer (cache B). Cache B is similar to a victim buffer, and holds the victim block replaced by a miss. A cache block is temporal if after it is brought into cache, some word in that block is accessed more than once before the block is replaced. Unlike Victim caches a block that is hit in cache B is swapped with a block in cache A only if it is temporal, and in most of our replacement strategies temporal blocks are less likely to be selected for replacement than non-temporal blocks. Every cache block loaded into L1 cache is monitored for temporal behavior by a hardware detection unit. Our results show that this design reduces the number of swaps between cache A and cache B, relative to the Victim cache, yet gives a comparable miss ratio.

CSE-TR-362-98

Reducing Communication Cost in Scalable Shared Memory Systems

Abandah

Apr, 98

162

Distributed shared-memory systems provide scalable performance and a convenient model for parallel programming. However, their non-uniform memory latency often makes it difficult to develop efficient parallel applications. Future systems should reduce communication cost to achieve better programmability and performance. We have developed a methodology, and implemented a suite of tools, to guide the search for improved codes and systems. As the result of one such search, we recommend a remote data caching technique that significantly reduces communication cost. We analyze applications by instrumenting their assembly-code sources. During execution, an instrumented application pipes a detailed trace to configuration independent (CIAT) and configuration dependent (CDAT) analysis tools. CIAT characterizes inherent application characteristics that do not change from one configuration to another, including working sets, concurrency, sharing behavior, and communication patterns, variation over time, slack, and locality. CDAT simulates the trace on a particular hardware model, and is easily retargeted to new systems. CIAT is faster than detailed simulation; however, CDAT directly provides more information about a specific configuration. The combination of the two tools constitutes a comprehensive and efficient methodology. We calibrate existing systems using carefully designed microbenchmarks that characterize local and remote memory, producer-consumer communication involving two or more processors, and contention when multiple processors utilize memory and interconnect. This dissertation describes these tools and illustrates their use by characterizing a wide range of applications and assessing the effects of architectural and technological advances on the performance of HP/Convex Exemplar systems, evaluates strengths and weaknesses of current system approaches, and recommends solutions. CDAT analysis of three CC-NUMA system approaches shows that current systems reduce communication cost by minimizing either remote latency or remote communication frequency. We describe four architecturally varied systems that are technologically similar to the low remote latency SGI Origin 2000, but incorporate additional techniques for reducing the number of remote communications. Using CCAT, a CDAT-like simulator that models communication contention, we evaluate the worthiness of these techniques. Superior performance is reached when processors supply cached clean data, or when a remote data cache is introduced that participates in the local bus protocol.

CSE-TR-363-98

mlcache: A Flexible Multi-Lateral Cache Simulator

Tam Rivers Tyson and Davidson

May, 98

As the gap between processor and memory speeds increases, cache performance becomes more critical to overall system performance. To address this, processor designers typically design for the largest possible caches that can still remain on the ever growing processor die. However, alternate, multi-lateral cache designs such as the Assist Cache, Victim Cache, and NTS Cache have been shown to perform as well as or better than larger, single structure caches while requiring less die area. For a given die size, reducing the requirements to attain a given rate of data supply can allow more space dedicated for branch prediction, data forwarding, increasing the size of the reorder buffer, etc. Current cache simulators are not able to study a wide variety of multi-lateral cache configurations. Thus, the mlcache multi-lateral cache simulator was developed to help designers in the middle of the design cycle decide which cache configuration would best aid in attaining the desired performance goals of the target processor. mlcache is an event-driven, timing-sensitive simulator based on the Latency Effects cache timing model. It can be easily configured to model various multi-lateral cache configurations using its library of cache state and data movement routines. The simulator can be easily joined to a wide range of event-driven processor simulators such as RCM_brisc, Talisman, SimICS, and SimpleScalar. We use the SPEC95 benchmarks to illustrate how mlcache can be used to compare the performance of several different data cache configurations

CSE-TR-364-98

Transistor Level Micro Placement and Routing for Two-Dimensional Digital VLSI Cell Synthesis

Riepe and Sakallah

Jun, 98

Previous research into the problem of cell library synthesis for digital VLSI design has concentrated mostly on relatively simple 1-dimensional cell topologies for static CMOS designs. Recent interest has emerged in less constrained 2-dimensional topologies to support more complex non-dual circuits such as latches and flip flops, as well as high performance circuit families such as CVSL, PTL, and domino CMOS. We discuss a CAD methodology which supports a generalized placement and routing approach to the realization of mask geometry for such complex circuits. We explore the options available within this methodology, show how the transistor level placement and routing problems at the transistor level differ from those at the block level, and present some results for a prototype tool, TEMPO, which adopts this methodology.

CSE-TR-365-98

A Scalable Flow-Control Algorithm for Point-to-Multipoint Communications in High-Speed Integrated Networks

Zhang and Shin

Jun, 98

We propose and study a scalable flow-control algorithm for multicast ABR (Available Bit Rate) service. A key idea behind the algorithm is the Soft-Synchronization Protocol (SSP), which derives a single ``consolidated' RM (Resource Management) cell at each multicast branch-point from feedback RM-cells of different downstream branches that are not necessarily responses to the same forward RM-cell in each synchronization cycle. Using balanced and unbalanced binary-tree models, we analyze the scalability of SSP in terms of height and structure of the multicast tree. In contrast with the other existing schemes, SSP is shown to be able to effectively support synchronization of feedback RM-cells and make the effective RM-cell roundtrip delay virtually independent of the multicast-trees height and structure, thus making it scalable. Another key problem in multicast flow-control is how to deal with the variation of RM-cell roundtrip delay due to the randomly drifting bottleneck in a multicast tree. This problem is handled by adapting the second-order rate-control parameter to roundtrip-delay variations. A fluid model is used to analyze the second-order rate control and study the system dynamics for multicast ABR service. We derive an analytical relationship between the second-order rate parameter and RM-cell roundtrip delay subject to both lossless transmission and finite buffer capacity constraints. This relationship ensures the feasibility of the second-order rate control in dealing with RM-cell roundtrip-delay variations and provides an insight on how the required buffer space can be controlled by adjusting the rate parameter. We develop an optimal control condition, under which the second-order rate control guarantees monotonic convergence of the system state to the optimal regime from an arbitrary initial value. The proposed second-order rate-control algorithm is also shown to be feasible and optimal in buffer-allocation efficiency and fairness at the bottleneck.

CSE-TR-366-98

A Scalable Key Distribution Hierarchy

McDaniel and Jamin

Jul, 98

As the use of the Internet for electronic commerce, audio and video conferencing, and other applications with sensitive content grows, the need for secure services becomes critical. Central to the success of these services is the support for secure public key distribution. Although there are several existing services available for this purpose, they are not very scalable, either because they depend on a centralized server or rely on ad hoc trust relationships. In this paper, we present and examine a flexible approach to certificate distribution scalable to arbitrarily large networks. We propose a two level hierarchy where certificates can be independently authenticated by one or more peer authorities, called keyservers. Certificates for end-user and host entities are managed within local domains, called enterprises. By administering certificates close to the source, we reduce the load on the key servers and the effects of network topology changes. We describe the design of our system and present a preliminary performance analysis based on traces of present-day DNS requests.

CSE-TR-367-98

Optimization of a Real-Time Primary-Backup Replication Service

Zou and Jahanian

Jul, 98

The primary-backup replication model is one of the commonly adopted approaches to providing fault tolerant data services. The extension of the primary-backup protocol to the real-time environment, however, imposes the additional constraint of timing predictability, which requires a bounded overhead for managing redundancy. There is a trade-off between reducing system overhead and increasing (temporal) consistency between the primary and backup, and the way to achieve balance between them is not so obvious. In this paper, we try to explore ways to optimize scheduling update messages from primary to backup while maintaining the temporal consistency guarantees of the system. Depending on the purpose of the individual replication service and the environment in which it is running, very different objectives can be sought in the process of optimization. This paper considers the optimization problem from two perspectives with one aimed to minimize the average temporal distance between the primary and backup, and the other aimed to minimize the resource being used in maintaining a given temporal constraint on the system. Corresponding optimization techniques have been developed for these two diverse objectives and an implementation of such techniques is also presented. The implementation is built on top of the existing RTPB model developed in citation Zou2:98 which in turn was developed within the x-kernel architecture on the Mach OSF platform running MK (mach kernel) 7.2. Results of an experimental evaluation of the proposed optimization techniques are also discussed.

CSE-TR-368-98

Origins of Internet Routing Instability

Labovitz Malan and Jahanian

Jul, 98

This paper examines the network routing messages exchanged between core Internet backbone routers. Internet routing instability, or the rapid fluctuation of network reachability information, is an important problem currently facing the Internet engineering community. High levels of network instability can lead to packet loss, increased network latency and time to convergence. At the extreme, high levels of routing instability have led to the loss of internal connectivity in wide-area, national networks. In an earlier study of inter-domain routing, we described widespread, significant pathological behaviors in the routing information exchanged between backbone service providers at the major U.S. public Internet exchange points. These pathologies included several orders of magnitude more routing updates in the Internet core than anticipated, large numbers of duplicate routing messages, and unexpected frequency components between routing instability events. The work described in this paper extends our earlier analysis by identifying the origins of several of these observed pathological Internet routing behaviors. We show that as a result of specific router vendor software changes suggested by our earlier analysis, the volume of Internet routing updates has decreased by an order of magnitude. We also describe additional router software changes that can decrease the volume of routing updates exchanged in the Internet core by an additional 30 percent or more. We conclude with a discussion of trends in the evolution of Internet architecture and policy that may lead to a rise in Internet routing instability.

CSE-TR-369-98

Fixed-Choice and Independent-Choice Logics

Blass and Gurevich

Aug, 98

We study extensions of first-order logic with the choice construct (choose x : phi(x)). We prove some results about Hilberts epsilon operator, but in the main part of the paper we consider the case when all choices are independent.

CSE-TR-370-98

Malleable Shared Workspaces to Support Multiple Usage Paradigms

Lee and Prakash

Aug, 98

Several recent systems provide a room-based metaphor to represent shared workspaces that require use of multiple collaborative tools. These systems provide users with a fairly static usage paradigm of room-centered collaboration, requiring users to mold their collaborative activities to the paradigm rather than molding the paradigm to fit the requirements of their collaborative activities. In this paper, we propose a powerful and yet simple event-action based model, augmented with access control and multi-user features, for room-based systems for providing a high degree of malleability so that these systems can be adapted to provide support for a variety of collaborative facilites, such as call centers, mailboxes, shared repositories, and role-based collaboration, including facilities that are not necessarily anticipated by system designers. The model can be used by both system developers as well as by end-users to customize a system to meet the requirements of their group tasks.

CSE-TR-371-98

Specification and Verification of Pipelining in the ARM2 RISC Microprocessor

Huggins and Van Campenhout

Aug, 98

CSE-TR-372-98

Assembly as a Noncooperative Game of its Pieces: Analysis of 1D Sphere Assemblies

Bozma and Koditschek

Aug, 98

We propose an event-driven algorithm for simple robot assembly problems based on the noncooperative game theory. We examine rigorously the simplest setting-three bodies with one degree of fredom ands offer extensive simulation for the 2 DOF extension. The intial analysis and the accompanying simulations suggest that this approach may indeed offer an attractive means of building robust event driven assembly systems.

CSE-TR-373-98

Performance Limits of Trace Caches

Postiff Tyson and Mudge

Sep, 98

A growing number of studies have explored the use of trace caches as a mechanism to increase instruction fetch bandwidth. The trace cache is a memory structure that stores statically noncontiguous but dynamically adjacent instructions in contiguous memory locations. When coupled with an aggressive trace or multiple branch predictor, it can fetch multiple basic blocks per cycle using a single-ported cache structure. This paper compares trace cache performance to the theoretical limit of a three-block fetch mechanism equivalent to an idealized 3-ported instruction cache with a perfect alignment network. Several new metrics are defined to formalize analysis of the trace cache. These include fragmentation, duplication, indexability, and efficiency metrics. We show that performance is more limited by branch mispredictions than ability to fetch multiple blocks per cycle. As branch prediction improves, high duplication and the resulting low efficiency are shown to be among the reasons that the trace cache does not reach its upper bound. Based on the shortcomings of the trace cache discovered in this paper, we identify some potential future research areas.

CSE-TR-374-98

Optimal Few-Stage Allocation Procedures

Hardwick and Stout

Sep, 98

This paper gives very general algorithms for the design of optimal experiments involving two Bernoulli populations in which sampling is carried out in stages. It is assumed that the outcomes of the previous stage are available before the allocations for the next stage are decided. At each stage, one must decide how many observations to take and how many to sample from each of the alternative populations. Of particular interest are 2- and 3-stage experiments. To illustrate that the algorithms can be used for experiments of useful sample sizes, they are applied to estimation and optimization problems. Results indicate that, for problems of moderate size, published asymptotic analyses do not always represent the true behavior of the optimal stage sizes, and efficiency may be lost if the analytical results are used instead of the true optimal allocation. Our results also suggest that one might approach large problems by extrapolating optimal solutions for moderate sample sizes; and, that approaches of this sort could give design guidelines that are far more explicit (and hopefully closer to optimal) than those obtained through asymptotic analyses alone. The examples also show the ease with which the computational approach can solve problems that present analytical difficulties. This allows one to use models that more accurately represent important characteristics of the problem. It is also shown that in various cases the base algorithms can be modified to incorporate simplifications. In such settings, significant speedups and space reductions can be obtained, permitting the exact solution of even larger problems.

CSE-TR-375-98

Realizing Services for Guaranteed-QoS Communication on a Microkernel Operating System

Mehra Shaikh Abdelzaher Wang and Shin

Sep, 98

Provision of end-to-end QoS guarantees on communication necessitates appropriate support in the end systems (i.e., hosts) and network routers that form the communication fabric. Typically, the support is in the form of suitable extensions to the communication subsystem and the underlying operating system for specification and maintenance of QoS guarantees. This paper focuses on the architectural and implementation challenges involved in realizing QoS-sensitive host communication subsystems on contemporary microkernel operating systems with limited real-time support. We motivate and describe the components constituting our integrated service architecture that together ensure QoS-sensitive handling of network traffic at both sending and receiving hosts. We separate the policies from mechanisms in each component, demonstrating a communication framework that can implement alternative QoS models by applying appropriate policies. We also report the results of a detailed execution profile of the system to characterize communication costs for the purposes of admission control. An experimental evaluation in a controlled configuration demonstrates the efficacy with which QoS guarantees are maintained, despite limitations imposed by the underlying operating system.

CSE-TR-376-98

Windowed Key Revocation in Public Key Infrastructures

McDaniel and Jamin

Oct, 98

A fundamental problem inhibiting the wide acceptance of a Public Key Infrastructure (PKI) in the Internet is the lack of a mechanism that provides scalable certificate revocation. In this paper, we propose a novel mechanism called Windowed Revocation. In windowed revocation, certificate revocation is announced for short periods in periodic Certificate Revocation Lists (CRLs). Due to the assurances provided by the protocol over which certificates are retrieved, we bound the amount of time that any certificate is cached by users. Thus, we can limit the announcement of revocation only to the time in which the certificate may be cached; not until its expiration. Because the time in which certificate are announced is short, CRLs are similarly small. By limiting the size of CRLs, we are able to integrate other mechanisms that increase the scalability of the PKI. One such mechanism is the use of ``pushed' CRLs using multicast. We include a proof of the correctness of our approach.

CSE-TR-377-98

A Program for Sequential Allocation of Three Bernoulli Populations

Hardwick Oehmke and Stout

Oct, 98

We describe a program for optimizing and analyzing sequential allocation problems involving three Bernoulli populations. Previous researchers had considered this problem computationally intractable, and we know of no prior exact optimizations for such problems, even for very small sample sizes. Despite this, our program is currently able to solve problems of size 200 or more by using a parallel computer, and problems of size 100 on a workstation. We describe the program and the techniques used to enable it to scale to large sample sizes. As an illustrative example, the program is used to create an adaptive sampling procedure that is the optimal solution to a 3-arm bandit problem. The bandit procedure is then compared to two other allocation procedures along various frequentist and Bayesian metrics. We also indicate extensions of the program that enable it to solve a variety of related problems. Keywords: multi-arm bandit, parallel computing, dynamic programming, adaptive allocation, sequential sampling, clinical trial, load balancing, high-performance computing, recursive equations, design of experiments, statistics

CSE-TR-378-98

Modeling Dual-Task Performance Improvement: Casting Executive Process Knowledge Acquisition as Strategy Refinement

Chong

Oct, 98

152

People demonstrate a remarkable ability to perform complex, multiple-task activities in spite of the limitations of our sensory, perceptual, cognitive, and motor systems. A prominent theory that addresses how multiple-tasks activities are performed is that of the executive process. Some of the functions of the executive process include enforcing task priorities and arbitrating access to limited resources. It has been shown that a time-sharing skill (or executive-process knowledge) is acquired during training on dual-task combinations. This dissertation presents the development of a computational, task-independent framework for modeling the acquisition of the knowledge acquired during training on dual-task combinations-� executive process knowledge. On a selected dual-task combination�-a continuous tracking task and a discrete two-choice reaction time task-�this framework, when given the declarative and procedural representation of the novice task, has produced an expert model whose performance is a good match to empirical reaction time and tracking error data for the task combination. There are three main contributions of this work. First is the development of EPIC-Soar, a symbolic hybrid architecture that possesses a psychologically-motivated learning mechanism and psychologically-plausible perception and motor systems. Second is the identification and classification of executive process knowledge and the taxonomies that result from this analysis. Third, is an acquisition framework which consists of: a novel data structure for representing task strategies; a task-independent procedure for resolving simultaneous access for motor resources and learning new knowledge that avoids such collisions in the future; a second task-independent learning procedure which refines the strategy data structure and creates new procedural knowledge for performing the task; and a collection of guidelines that regulate how and when promotions are applied.

CSE-TR-379-98

Ensuring Reasoning Consistency in Hierarchical Architectures

Wray

Nov, 98

149

Agents often dynamically decompose a task into a hierarchy of subtasks. Hierachical task decomposition reduces the cost of knowledge design in comparison to non-hierarchical knowledge because knowledge is simplified and can be shared across multiple tasks. However, hierarchical decomposition does have limitations. In particular, hierarchical decomposition can lead to inconsistency in agent processing, resulting potentially in irrational behavior. Further, an agent must generate the decomposition hierarchy before reacting to an external stimulus. Thus, decomposition can also reduce agent responsiveness. This thesis describes ways in which the limitations of hierarchical decomposition can be circumvented while maintaining inexpensiveknowledge design and efficient agent processing. We introduce Goal-oriented Heuristic Hierarchical Consistency (GOHHC), which eliminates across-level inconsistency. GOHHC computes logical dependencies in asserted knowledge between goals rather than individual assertions, thus avoiding the computational expense of maintaining dependencies for all assertions. Although this goal-oriented heuristic ensures consistency, it does sometime lead to unnecessary repetition in reasoning and delay in task execution. We show empirically that these drawbacks are inconsequential in execution domains. Thus, GOHHC provides an efficient guarantee of processing consistency in hierarchical architectures. Ensuring consistency also provides an architectural framework for unproblematic knowledge compilation in dynamic domains. Knowledge compilation can be used to cache hierarchical reasoning and thus avoid the delay in reaction necessitated by decomposition. An empirical investigation of compilation in hierarchical agents shows that compilation can lead to improvement in both overall performance and responsiveness, while maintaining the low design cost of hierarchical knowledge.

CSE-TR-380-98

Code Compression for DSP

Lefurgy and Mudge

Nov, 98

Previous works have proposed adding compression techniques to a variety of architectural styles to reduce instruction memory requirements. It is not immediately clear how these results apply to DSP architectures. DSP instructions are longer and have potentially greater variation which can decrease compression ratio. Our results demonstrate that DSP programs do provide sufficient repetition for compression algorithms. We propose a compression method and apply it to SHARC, a popular DSP architecture. Even using a very simple compression algorithm, it is possible to halve the size of the instruction memory requirements. Keywords: Compression, Code Density, Code Space Optimization, DSP, Embedded Systems

CSE-TR-381-98

Evaluating the Overheads of Source-Directed Quality-of-Service Routing

Shaikh Rexford and Shin

Dec, 98

Quality-of-service (QoS) routing satisfies application performance requirements and optimizes network resource usage by selecting paths based on connection traffic parameters and link load information. However, effective path-selection schemes require the distribution of link-state information, which can impose a significant burden on the bandwidth and processing resources in the network. We investigate the fundamental trade-off between network overheads and the quality of routing decisions in the context of the source-directed link-state routing protocols proposed for future IP and ATM networks. In contrast to previous work that compares different routing algorithms under specific network configurations, we construct a detailed model of QoS routing that parameterizes the path-selection algorithm, link-cost function, and link-state update policy. Through extensive simulation experiments with several representative network topologies and traffic patterns, we uncover the effects of stale link-state information and random fluctuations in traffic load on the routing and signalling overheads. We then investigate how the inaccuracy of link-state information interacts with the size and connectivity of the underlying topology. Finally, we show that by tuning the coarseness of the link-cost metric to the inaccuracy of underlying link-state information we can reduce the computational complexity of the path-selection algorithm without significantly degrading performance. The paper concludes by summarizing our key results as a list of guidelines for designing efficient quality-of-service routing policies in large backbone networks.

CSE-TR-382-98

Experimental Study of Internet Stability and Wide-Area Backbone Failures

Labovitz Ahuja and Jahanian

Dec, 98

In this paper, we describe an experimental study of Internet stability and the origins of failure in Internet protocol backbones. The stability of end-to-end Internet paths is dependent both on the underlying telecommunication switching system, as well as the higher level software and hardware components specific to the Internets packet-switched forwarding and routing architecture. Although a number of earlier studies have examined failures in the public telecommunication system, little attention has been given to the characterization of Internet stability. Our paper analyzes Internet failures from three different perspectives. We first examine several recent major Internet failures and their probable origins. These empirical observations illustrate the complexity of the Internet and show that unlike commercial transaction systems, the interactions of the underlying components of the Internet are poorly understood. Next, our examination focuses on the stability of paths between Internet Service Providers. Our analysis is based on the experimental instrumentation of key portions of the Internet infrastructure. Specifically, we logged all of the routing control traffic at five of the largest U.S. Internet exchange points over a three year period. This study of network reachability information found unexpectedly high levels of path fluctuation and an aggregate low mean time between failures for individual Internet paths. These results point to a high level of instability in the global Internet backbone. While our study of the Internet backbone identifies major trends in the level of path instability between different service providers, these results do not characterize failures inside the network of service provider. The final portion of our paper focuses on a case study of the network failures observed in a large regional Internet backbone. This examination of the internal stability of a network includes twelve months of operational failure logs and a review of the internal routing communication data collected between regional backbone routers. We characterize the type and frequency of failures in twenty categories, and describe the failure properties of the regional backbone as a whole.

CSE-TR-383-99

Agent Communication with Differentiated Ontologies: eight new measures of description compatibility

Weinstein and Birmingham

Jan, 99

We propose an approach to achieve appropriate exchange of services and data in distributed systems subject to semantic heterogeneity. We assume _differentiated ontologies_: that terms have formal definitions as concepts related to other concepts, that local concepts inherit from concepts that are shared, and that most or all primitives are shared. We then develop measures of _description compatibility_ using the structure of the source and target definitions. We evaluate these measures by generating description-logic ontologies in artificial worlds. In our simulations, the "meaning" of a concept is its denotation in a finite universe of instances. The accuracy of the description compatibility measures can thus be judged by their success in predicting the overlap of concept denotations. Description compatibility can be used to guide agent search for services across communities that subscribe to differentiated ontologies.

CSE-TR-384-99

The Performance of Tuple Difference Coding for Compressing Databases and Data Warehouses

Wu and Ravishankar

Feb, 99

Databases and data warehouses have grown dramatically in size in recent years, and we are already seeing data warehouses that are tens of terabytes in size. Compression is important in such systems because it reduces space requirements as well as response times for queries, which are typically I/O-bound. Conventional compression methods tend to compress and decompress objects in their entirety, and are generally unsuitable for databases which require selective access to points in the space of tuples they represent. The database compression method called Tuple-Difference Coding (TDC) has been shown to meet the requirements for database compression, and to achieve high compression ratios in practice. The factors affecting the performance of TDC are known, but their effects are not well understood. This paper presents a theoretical analysis of the performance of TDC, and verifies these results through simulations. It presents analytical expressions for estimating the compression ratios when database tuples are composed of one or more attributes drawn from various distributions. It also studies the effects of attribute domain reordering on the compression ratio. Our simulations show excellent agreement with theory.

CSE-TR-385-99

Mining Decentralized Data Repositories

Crestana and Soparkar

Feb, 99

Current technology for mining data typically applies to data stored centrally (i.e., in one single repository, homogeneous, with central administration, and a single schema). For instance, association rules algorithms assume that the data of interest is available at one location in a single table, or that prior to mining, the data will be warehoused centrally. However, it is important to study the mining of decentralized data (i.e., consisting of several tables, perhaps obtained via normalization or partitioning and allocation, stored in several repositories with possibly separate administration and schema). Real-life database design, management, and performance aspects suggest these considerations. While there have been a few extensions to mining algorithms for such data, the algorithms developed have largely been aimed at parallel processing (as opposed to the specific issues relating to decentralized data). In this paper, we examine the issues associated with mining decentralized data. We motivate the need for developing decentralized techniques, and identify the problems that must be addressed. We describe the mining of association rules for the case of a star schema wherein several smaller dimension tables are associated by foreign keys to a central fact table, and present an algorithm that adapts standard algorithms to work efficiently with such data. In contrast to approaches where the data would have been joined first to form a single table, we exploit the foreign key relationships to develop decentralized algorithms that execute concurrently on the separate tables, and thereafter merge the results. We provide analyses to assess our techniques, and we present empirical studies for particular cases to validate the performance.

CSE-TR-386-99

The Use of the MPI Communication Library in the NAS Parallel Benchmarks

Tabe and Stout

Feb, 99

The statistical analysis of traces taken from the NAS Parallel Benchmarks can tell one much about the type of network traffic that can be expected from scientific applications run on distributed-memory parallel computers. For instance, such applications utilize a relatively few number of communication library functions, the length of their messages is widely varying, they use many more short messages than long ones, and within a single application the messages tend to follow relatively simple patterns. Information such as this can be used by hardware and software designers to optimize their systems for the highest possible performance.

CSE-TR-387-99

BLUE: A New Class of Active Queue Management Algorithms

Feng Kandlur Saha and Shin

Mar, 99

In order to stem the increasing packet loss rates caused by an exponential increase in network traffic, the IETF is considering the deploymentof active queue management techniques such as RED. While active queue management can potentially reduce packet loss rates in the Internet, this paper shows that current techniques are ineffective in preventing high loss rates. The inherent problem with these queue management algorithms is that they all use queue lengths as the indicator of the severity of congestion. In light of this observation, a fundamentally different active queue management algorithm called Blue is proposed. Blue uses packet loss and link idle events to manage congestion. Using simulation and controlled experiments, Blue is shown to perform significantly better than RED, both in terms of packet loss rates and buffer size requirements in the network. As an extension to Blue, a novel technique for enforcing fairness among a large number of flows is described. In particular, this paper proposes and evaluates Stochastic Fair Blue (SFB), a queue management algorithm which can identify and rate-limit non-responsive flows using a very small amount of state information.

CSE-TR-388-99

Distributed QoS Routing with Bounded Flooding for Real-Time Applications

Kweon and Shin

Mar, 99

Quality-of-Service (QoS) routing is indispensable to real-time applications on integrated services packet-switching networks. However, existing QoS-routing schemes based on either request flooding or link-state dissemination are not efficient in terms of their message overhead and/or operational cost. This paper proposes a QoS-routing algorithm with bounded flooding which not only dramatically reduces message overhead and operational cost, but also provides good performance. This significant overhead reduction is achieved by limiting the search area within which connection-request messages are flooded. Since our scheme is based on limited flooding, it does not require on-demand path calculation nor link-state dissemination, which are often known to be very expensive. In order to limit the search area, we employ a distance table at each node storing hop counts of the shortest paths to all the nodes reachable via its outgoing links. Unlike link-state routing, the proposed scheme only needs network-topology information that changes much less frequently than link-load states. Extensive simulation results have demonstrated the effectiveness of the proposed scheme in terms of performance, operational cost, and message overhead. Especially, in an ordinary operating condition in which the networks physical topology does not change frequently, it is shown to be much better than all the other existing QoS-routing schemes in terms of message overhead and operational cost, while providing reasonably good performance.

CSE-TR-389-99

Improving Processor Performance by Dynamically Pre-Processing the Instruction Stream

Dundas

Apr, 99

266

The exponentially increasing gap between processors and off-chip memory, as measured in processor cycles, is rapidly turning memory latency into a major processor performance bottleneck. Traditional solutions, such as employing multiple levels of caches, are expensive and do not work well with some applications. We evaluate a technique, called runahead pre-processing, that can significantly improve processor performance. The basic idea behind runahead is to use the processor pipeline to pre-process instructions during cache miss cycles, instead of stalling. The pre-processed instructions are used to generate highly accurate instruction and data stream prefetches, while all of the pre-processed instruction results are discarded after the cache miss has been serviced: this allows us to achieve a form of very aggressive speculation with a simple in-order pipeline. The principal hardware cost is a means of checkpointing the sequential state of the register file and memory hierarchy while instructions are pre-processed. As we discard all pre-processed instruction results, the checkpointing can be accomplished with a modest amount of hardware. The instruction and data stream prefetches generated during runahead episodes led to a significant performance improvement for all of the benchmarks we examined. We found that runahead typically led to about a 30% reduction in CPI for the four Spec95 integer benchmarks that we simulated, while runahead was able to reduce CPI by 77% for the STREAM benchmark. This is for a five stage pipeline with two levels of split instruction and data caches: 8KB each of L1, and 1MB each of L2. A significant result is that when the latency to off-chip memory increases, or if the caching performance for a particular benchmark is poor, runahead is especially effective as the processor has more opportunities in which to pre-process instructions. Finally, runahead appears particularly well suited for use with high clock-rate in-order processors that employ relatively inexpensive memory hierarchies.

CSE-TR-390-99

Transistor Level Micro Placement and Routing For Two-Dimensional Digital VLSI Cell Synthesis

Riepe

Apr, 99

300

The automated synthesis of mask geometry for VLSI leaf cells, referred to as the cell synthesis problem, is an important component of any structured custom integrated circuit design environment. Traditional approaches based on the classic functional cell style of Uehara & VanCleemput pose this problem as a straightforward one-dimensional graph optimization problem for which optimal solution methods are known. However, these approaches are only directly applicable to static CMOS circuits and they break down when faced with more exotic logic styles. There is an increasing need in modern VLSI designs for circuits implemented in high-performance logic families such as Cascode Voltage Switch Logic (CVSL), Pass Transistor Logic (PTL), and domino CMOS. Circuits implemented in these non-dual ratioed logic families can be highly irregular with complex geometry sharing and non-trivial routing. Such cells require a relatively unconstrained two-dimensional full-custom layout style which current methods are unable to synthesize. In this work we define the synthesis of complex two-dimensional digital cells as a new problem which we call transistor-level micro-placement and routing. To address this problem we develop a complete end-to-end methodology which is implemented in a prototype tool named TEMPO. A series of experiments on a new set of benchmark circuits verifies the effectiveness of our approach. Our methodology is centered around techniques for the efficient modeling and optimization of geometry sharing, and is supported at two different levels. Chains of diffusion-merged transistors are formed explicitly and their ordering optimized for area and global routing. In addition, more arbitrary merged structures are supported by allowing electrically compatible adjacent transistors to overlap during placement. The synthesis flow in TEMPO begins with a static transistor chain formation step. These chains are broken at the diffusion breaks and the resulting sub-chains passed to the placement step. During placement, an ordering is found for each chain and a location and orientation is assigned to each sub-chain. Different chain orderings affect the placement by changing the relative sizes of the sub-chains and their routing contribution. We conclude with a detailed routing step and an optional compaction step.

CSE-TR-391-99

Report on the Study of University Policies on Intellectual Property

Galler

Apr, 99

CSE-TR-392-99

Load-Sensitive Routing of Long-Lived IP Flows

Shaikh Rexford and Shin

Apr, 99

Internet service providers face a daunting challenge in provisioning network resources, due to the rapid growth of the Internet and wide fluctuations in the underlying traffic patterns. The ability of dynamic routing to circumvent congested links and improve application performance makes it a valuable traffic engineering tool. However, deployment of load-sensitive routing is hampered by the overheads imposed by link-state update propagation, path selection, and signalling. Under reasonable protocol and computational overheads, traditional approaches to load-sensitive routing of IP traffic are ineffective, and can introduce significant route flapping, since paths are selected based on out-of-date link-state information. Although stability is improved by performing load-sensitive routing at the flow level, flapping still occurs, because most IP flows have a short duration relative to the desired frequency of link-state updates. To address the efficiency and stability challenges of load-sensitive routing, we introduce a new hybrid approach that performs dynamic routing of long-lived flows, while forwarding short-lived flows on static preprovisioned paths. By relating the detection of long-lived flows to the timescale of link-state update messages in the routing protocol, route stability is considerably improved. Through simulation experiments using a one-week ISP packet trace, we show that our hybrid approach significantly outperforms traditional static and dynamic routing schemes, by reacting to fluctuations in network load without introducing route flapping.

CSE-TR-393-99

Analytical Macro-Modeling for High-Level Power Estimation

Bernacchia and Papaefthymiou

Apr, 99

In this paper, we present a new analytical macro-modeling technique for high-level power estimation. Our technique is based on a parameterizable analytical model that relies exclusively on statistical information of the circuits primary inputs. During estimation, the statistics of the required metrics are extracted from the input stream and a power estimate is obtained by evaluating a model function that has been characterized in advance. Thus, our model yields power estimates within seconds, since it does not rely on the statistics of the circuits primary outputs and, consequently, does not perform any simulation during estimation. Moreover, our model achieves significantly better accuracy than previous macro-modeling approaches, because it takes into account both spatial and temporal correlations in the input stream. In experiments with ISCAS-85 combinational circuits, the average absolute relative error of our power macro-modeling technique was at most 3%. For all but one circuit in our test suite, the worst-case error was at most 19.5%. In comparison with power estimates for a ripple-carry adder family that were obtained using Spice, the average absolute and worst-case errors of our models estimates were 5.1% and 19.8%, respectively. In addition to power dissipation, our macro-modeling technique can be used to estimate the statistics of a circuits primary outputs. Our experiments with ISCAS-85 circuits yield very low average errors. Thus, our technique is suitable for fast and accurate power estimation in core-based system with pre-characterized blocks. Once the metrics of the primary inputs are known, the power dissipation of the entire system can be estimated by simply propagating this information through the blocks using their corresponding model functions.

CSE-TR-394-99

Trace Cache Design for Wide-Issue Superscalar Processors

Sanjay J. Patel

Apr, 99

166

To maximize the performance of a wide-issue superscalar processor, the fetch mechanism must be capable of delivering at least the same instruction bandwidth as the execution mechanism is capable of consuming. Fetch mechanisms consisting of a simple instruction cache are limited by difficulty in fetching a branch and its taken target in a single cycle. Such fetch mechanisms will not suffice for processors capable of executing multiple basic blocks worth of instructions. The Trace Cache is proposed to deal with lost fetch bandwidth due to branches. The trace cache is a structure which overcomes this partial fetch problem by storing logically contiguous instructions---instructions which are adjacent in the instruction stream---in physically contiguous storage. In this manner, the trace cache is able to deliver multiple non-contiguous blocks each cycle. This dissertation contains a description of the trace cache mechanism for a 16-wide issue processor, along with an evaluation of basic parameters of this mechanism, such as relative size and associativity. The main contributions of this dissertation are a series of trace cache enhancements which boost instruction fetch bandwidth by 34% and overall performance by 14% over an aggressive instruction cache. Also included is an analysis of two important performance limitations of the trace cache: branch resolution time and instruction duplication.

CSE-TR-395-99

Stylistic Structures: An Initial Investigation of the Stochastic Generation of Tonal Music

Alamkan Birmingham and Simoni

May, 99

In this paper, we explore the use of a Markov Decision Network to generate music. We show how the network is trained and used to generate compositions of approximately 10 minutes in duration that are representative of the style of the composer whose music was used to train the network.

CSE-TR-396-99

Automated Partitioning of Tonal Music

Pardo and Birmingham

May, 99

The majority of research related to automated analysis of music presupposes human partitioning of the input into segments corresponding to significant harmonic or melodic chunks. In this paper, we analyze the difficulty of the partitioning problem in a formal manner. We then describe HarmAn, a system that partitions tonal music into harmonically significant segments corresponding to single chords and labels these segments with the proper chord labels. Input to the system consists of standard MIDI files and the output is both an annotated piano-roll style display and a text file with partitioning and chord-name information. Chord labels for segments are determined through template matching in the space of pitch-class with conflict resolution between equal scoring templates resolved through simple default preference rules. Our system�s results are compared with the results described in papers by Winograd (Winograd 1968), Maxwell (Maxwell 1992), and Temperley and Sleator (Temperley and Sleator 1999)

CSE-TR-397-99

A Constraint Satisfaction Approach to Tonal Harmonic Analysis

Hoffman and Birmingham

May, 99

This paper gives an algorithm for harmonic analysis for for a tonal composition using a constraint-satisfaction problem (CSP) model. The algorithm combines rule-based and preference-based approaches to perform harmonic analysis, taking the position that the problem can be modeled as a computational process and less of a psychological phenomenon. Using cadential patterns within the piece, it identifies tonal centers and assigns the harmonic analysis, applying preferences when necessary. A software implementation of the algorithm is presented, along with a discussion of its results.

CSE-TR-398-99

Design Verification Using Reverse Engineering

Hauke

May, 99

167

Modern processors are very difficult to design and require advanced design verification methods to ensure that they function correctly. While simulation-based verification is the primary method used by industry, formal methods are gaining acceptance although they are limited to relatively small designs. Equivalence checking, for example, is useful for verifying the equivalence between two levels of a design hierarchy, but is not applicable above the widely used register-transfer level (RTL). For this reason, new design tools are necessary to verify that an RTL description of an implementation matches its original instruction set architecture (ISA), which specifies the processor components and instructions visible to a programmer. This Masters thesis proposes a design verification method, Reverse Engineering Verification (REVE), based on analysis of an implementations data flow. REVE takes a high-level RTL implementation of a new processor design and extracts (reverse engineers) specification information from the implementations internal data paths. The reverse-engineered information is then matched with the original ISA specification to verify functional correctness of the implementation.

CSE-TR-399-99

Eager Writeback - a Technique for Improving Bandwidth Utilization

Lee Tyson and Farrens

Jun, 99

Modern high-performance processors utilize multi-level cache structures to help tolerate the increasing latency (measured in processor cycles) of main memory. These caches employ either a writeback or a write-through strategy to deal with store operations. Write-through caches propagate data to more distant memory levels at the time each store occurs, which requires a very large bandwidth between the memory hierarchy levels. Writeback caches can significantly reduce the bandwidth requirements between caches and memory by marking cache lines as dirty when stores are processed and writing those lines to the memory system only when that dirty line is evicted. This approach works well for many applications (e.g. SPEC95), but for graphics applications that experience significant numbers of cache misses due to streaming data, writeback cache designs can degrade overall system performance by clustering bus activity when dirty lines contend with data being fetched into the cache. In this paper we present a new technique called Eager Writeback, which re-distributes and balances traffic by writing dirty cache lines to memory prior to their eviction. This reduces the likelihood of writing dirty lines that impede the loading of cache miss data. Eager Writeback can be viewed as a compromise between write-through and writeback policies, in which dirty lines are written later than write-through, but prior to writeback. We will show that this approach can reduce the large number of writes seen in a write-through design, while avoiding the performance degradation caused by clustering bus traffic of a writeback approach.

CSE-TR-400-99

A Static Filter for Reducing Prefetch Traffic

Srinivasan Tyson and Davidson

Jun, 99

The growing difference between processor and main memory cycle time necessitates the use of more aggressive techniques to reduce or hide main memory access latency. Prefetching data into higher speed memories is one such technique. However, speculative prefetching can significantly increase memory traffic. We present a new technique, called Static Filtering (SF), to reduce the traffic generated by a given hardware prefetching scheme while preserving its reduced miss rate. SF uses profiling to select which load instructions should be marked "enabled" to do data prefetching. This is done by identifying which load instructions generate data references that are useful prefetch triggers. SF enables the hardware prefetch mechanism only for the set of references made by "enabled" loads. Our results from applying SF to two well-known hardware prefetching techniques, Next Sequential Prefetching (NSP) and Shadow Directory Prefetching (SDP), shows that SF preserves the decrease in misses that they achieve and reduces the prefetch traffic by 50 to 60% for NSP and by 64 to 74% for SDP. In addition, timing analysis reveals that when finite memory bandwidth is a limiting factor, applying SF does in fact increase the speedup obtained by a baseline hardware prefetching technique. The other major contribution of this paper is a complete taxonomy which classifies individual prefetches in terms of the additional traffic they generate and the resulting reduction (or increase) in misses. This taxonomy provides a formal method for classifying prefetches by their usefulness. A histogram of the prefetches by category provides a new basis for comparing prefetch techniques.

CSE-TR-401-99

Allocation By Conflict: A Simple Effective Cache Management Scheme

Tam Tyson and Davidson

Jun, 99

Many schemes have been proposed that incorporate an auxiliary buffer to improve theperformance of a given size cache. One of the most thoroughly evaluated of these schemes, Victim caching, aims to reduce the impact of conflict misses in direct-mappedcaches. While Victim has shown large performance benefits, its competitive advantage is limited to direct-mapped caches, whereas todays caches are increasingly associative. Fur-thermore, it requires a costly data path for swaps and saves between the cache and the buffer.Several other schemes attempt to obtain the performance improvements of Victim, but across a wide range of associativities and without the costly data path for swaps and saves.While these schemes have been shown to perform well overall, their performance still lags that of Victim when the main cache is direct-mapped. Furthermore, they also requirecostly hardware support, but in the form of history tables for maintaining allocation decision information.This paper introduces a new cache management scheme, Allocation By Conflict (ABC), which generally outperforms both Victim and the history-based schemes. Further-more, ABC has the lowest hardware requirements of any proposed scheme -- only a single additional bit per block in the main cache is required to maintain the informationrequired by the allocation decision process, and no swap-save data path is needed.

CSE-TR-402-99

Cost-Effective Adaptive Error Control Using a Hybrid of FEC and ARQ in Wireless Networks

Choi and Shin

Jun, 99

Wireless links are known to suffer location-dependent, time-varying, and bursty errors. This paper considers adaptive error-control schemes in the data link layer for reliable communication over wireless links, in which the error-control code and frame length are chosen adaptively based on the estimated channel state/condition. Three error-control schemes are considered according to (1) the number of code segments a packet is divided into, and (2) the way a lost packet is retransmitted. Through throughput performance and computation complexity analyses, these three schemes are compared, and then one of them is claimed to be the most attractive in terms of computation complexity and practicality even though its throughput performance is not the best. Simulation results also verify that this scheme works well over a time-varying fading channel. Error control for the MAC header and its effect on the performance of each error-control scheme are also considered since, without proper error protection for the header, it would be futile to exercise error control on the user data.

CSE-TR-403-99

Smart Register Files for High-Performance Microprocessors

Postiff and Mudge

Jun, 99

This report examines how the compiler can more efficiently use a large number of processor registers. The placement of data items into registers, called register allocation, is known to be one of the most important compiler optimizations for high-speed computers because registers are the fastest storage devices in the computer system. However, register allocation has been limited in scope because of aliasing in the memory system. To break this limitation and allow more data to be placed into registers, new compiler and microarchitecture support is needed. We propose the modification of register access semantics to include an indirect access mode. We call this optimization the Smart Register File. The smart register file allows the relaxation of overly-conservative assumptions in the compiler by having the hardware provide support for aliased data items in processor registers. As a result, the compiler can allocate data from a larger pool of candidates than in a conventional system. An attendant advantage is that the smart register file reduces the number of load and store operations executed by the processor. The simple addition of an indirect register access mode not only simplifies alias handling, but also provides opportunities for other optimizations. This research examines several such optimizations.

CSE-TR-404-99

Relaxed Multicast Protocols using Concurrency Control Theory

Jensen and Soparkar

Jun, 99

Techniques for coordinating distributed executions includeprocess groups and transactions, each with different properties in a trade-off in terms of the degrees of consistency and performance. Transaction concurrency control theory may be used to characterize process group multicast semantics; our framework which uses conflict relations among events, and the acyclicity properties of a particular precedence graph, may be used to describe the correctness of multicast orderings. In this paper, we describe protocols to realize new relaxed multicast orderings as obtained by the use of concurrency control techniques. In particular, we demonstrate efficient, time-stamp based protocols for relaxed orderings that depend on data, operation, or any similar attributes that bring about such weaker orderings. In fact, we provide a protocol for any arbitrary reflexive and symmetric relation that may be defined among the message types, and this makes our approach amenable to incorporating application semantics into multicast protocols. Using the precedence graph acyclicity framework, we can show that our protocols produce correct multicast delivery orderings. Furthermore, we discuss the efficiency of our protocols in terms of some qualitative performance metrics.

CSE-TR-405-99

A Brachiating Robot Controller

Nakanishi Fukuda and Koditschek

Aug, 99

We report on our empirical studies of a new controller for a two-link brachiating robot. Motivated by the pendulum-like motion of an apes brachiation, we encode this task as the output of a ``target dynamical system.' Numerical simulations indicate that the resulting controller solves a number of brachiation problems that we term the ``ladder', ``swing up' and ``rope' problems. Preliminary analysis provides some explanation for this success. The proposed controller is implemented on a physical system in our laboratory. The robot achieves behaviors including ``swing locomotion' and ``swing up' and is capable of continuous locomotion over several rungs of a ladder. We discuss a number of formal questions whose answers will be required to gain a full understanding of the strengths and weaknesses of this approach.

CSE-TR-406-99

Second-Order Rate-Based Flow Control with Decoupled Error Control for High-Throughput Transport Protocols

Zhang and Shin

Aug, 99

We propose an efficient flow and error control scheme for high-throughput transport protocols, which (i) decouples flow control from error control, and (ii) uses a second-order rate control, called $alpha$-control, for flow control and a new sliding-window scheme for error control. The $alpha$-control minimizes the need for packet retransmissions by adjusting the rate-gain parameter in response to the variations of cross-traffic flows and their round-trip delays (RTDs). Using selective retransmission, the sliding-window scheme guarantees lossless transmission. Separation of flow and error control simplifies both components and enhances the throughput since the source rate control is independent of the dynamics of the error-control window. Applying the $alpha$-control, the proposed scheme can drive the flow-controlled system to a retransmission-free equilibrium state. Using the fluid analysis, we model the packet-loss behavior and derive the closed-form expressions for packet losses, loss rate, and the link-transmission efficiency. We prove that the $alpha$-control is feasible and optimal linear control in terms of efficiency and fairness. Also presented are the vector-space analysis and simulation results that verify the analytical results, and demonstrate the superiority of the proposed scheme to others in dealing with cross-traffic and RTDs variations, controlling packet losses/retransmissions, and achieving buffer-use fairness and a high throughput.

CSE-TR-407-99

Self-Organizing Network Services

Jamjoom Jamin and Shin

Aug, 99

There is a proliferation of application-level servers that offer enhanced network services beyond the simple packet forwarding provided by the underlying network infrastructure. Examples of such servers range from Web server mirrors, Web caches, Web page distilling proxies, video transcoders, and application-level multicast routers, to application-level load-adaptive multipath routers. A fundamental question arising from the deployment of such servers is where to place them within a network. This paper explores technical issues related to the creation of an infrastructure to allow self-organization of network service placement based on observed demand for each service. In so doing, we propose a framework, called Sortie, whereby services are allocated on network nodes based on a set of very simple rules independently executed by each node. The distributed nature of allocation decisions ensures the scalability of the framework. We also present simulation results confirming the stability and efficiency of the proposed framework.

CSE-TR-408-99

A Temporal Consensus Model

Zou and Jahanian

Sep, 99

The active (state-machine) replication protocol has been one of the commonly adopted approaches to provide fault tolerant data services. In such a scheme, service state is replicated at all participating servers and client requests are presented to emph{all} the servers. A strict consensus protocol executed by the servers ensures that all servers receive all the requests in the same order. The output of each request from all the servers are then compared to mask the result of faulty server(s). However, the overhead associated with running the strict consensus protocol tends to slow down the response to client requests, which makes it unsuitable for real-time applications where timing predictability is a necessity. This paper presents a weak consensus model called ``temporal consensus' that reduces such overhead and enable the active scheme to meet the timing constraint of real-time applications.

CSE-TR-409-99

The Theory and Practice of Failure Transparency

Lowell and Chen

Oct, 99

System and application failures are all too common. In this paper,we argue that operating systems should provide the abstraction of failure transparency--the illusion that systems and applications do not fail. We construct a theory of consistent recovery that is useful for systems that want to provide failure transparency. The theory defines precisely what constitutes consistent recovery and provides a simple invariant that all applications must uphold to guarantee they get it. The theory unifies the various recovery protocols for achieving consistent recovery; existing protocols can be viewed as different ways of upholding our theorys invariant. We use the theory to design new recovery protocols and analyze existing protocols. We conclude by evaluating performance for a suite of recovery protocols. We focus our study on interactive programs, and application domain for which it is challenging to provide failure transparency. Our results indicate that it is possible to provide failure transparency for general applications with overhead of 1-2% for systems with reliable main memory, and 10-40% for disk-based systems.

CSE-TR-410-99

Discount Checking: Transparent Low-Overhead Recovery for General Applications

Lowell and Chen

Oct, 99

Checkpointing is a general technique for recovering applications.Unfortunately, current checkpointing systems add many seconds of overhead per checkpoint. Their high overhead prevents them from making failures transparent to users and other external entities, so failures lose visible state. This paper presents a checkpointing system called Discount Checking that is built on reliable main memory and high-speed transactions. Discount Checking can be used to make general-purpose applications recoverable easily and with low overhead. The checkpoints taken by Discount Checking are extremely fast, ranging for our target applications from 50 microseconds to a few milliseconds. Discount Checkings low overhead makes it possible to provide ideal failure transparency by checkpointing each externally visible event. Yet even with this high rate of checkpointing, Discount Checking slows real applications down by less than 0.6%.

CSE-TR-411-99

Low-Cost Embedded Program Loop Caching - Revisited

Lee Moyer and Arends

Oct, 99

Many portable and embedded applications are characterized by spending a large fraction of their execution time on small program loops. In these applications, instruction fetch energy can be reduced by using a small instruction cache when executing these tight loops. Recent work has shown that it is possible to use a small instruction cache without incurring any performance penality [4, 6]. In this paper, we will extend the work done in [6]. In the modified loop caching scheme proposed in this paper, when a program loop is larger than the loop cache size, the loop cache is capable of capturing only part of the program loop without having any cache conflict problem. For a given loop cache size, our loop caching scheme can reduce instruction fetch energy more than other loop cache schemes previously proposed. We will present some quantitative results on how much power can be saved on an integrated embedded design using this scheme.

CSE-TR-412-99

A Formalism for the Composition of Loosely Coupled Robot Behaviors

Klavins and Koditschek

Nov, 99

We address the problem of controlling large distributed robotic systems such as factories. We introduce tools which help us to compose local, hybrid control programs for a class of distributed robotic systems, assuming a palette of controllers for individual tasks is already constructed. These tools, which combine backchaining behaviors with Petri Nets, expand on successful work in sequential composition of robot behaviors. We apply these ideas to the design of a robotic bucket brigade and to simple, distributed assembly tasks.

CSE-TR-413-99

Windowed Certificate Revocation

McDaniel and Jamin

Nov, 99

The advent of electronic commerce and personal communications on the Internet heightens concerns over the lack of privacy and security. Network services providing a wide range of security related guarantees are increasingly based on public key certificates. A fundamental problem inhibiting the wide acceptance of existing certificate distribution services is the lack of a scalable certificate revocation mechanism. We argue in this paper that the resource requirements of extant revocation mechanisms place significant burden on certificate servers and network resources. We propose a novel mechanism called Windowed Revocation that satisfies the security policies and requirements of existing mechanisms and, at the same time, reduces the burden on certificate servers and network resources. We include a proof of correctness of windowed revocation and a trace-based performance study illustrating the scalability and general applicability of windowed revocation.

CSE-TR-414-99

A Fully Associative Software-Managed Cache Design

Hallnor and Reinhardt

Nov, 99

As DRAM access latencies approach a thousand instruction-execution times and on-chip caches grow to multiple megabytes, it is not clear that conventional cache structures continue to be appropriate. Two key features�full associativity and software manage-ment�have been used successfully in the virtual-memory domain to cope with disk access latencies. Future systems will need to employ similar techniques to deal with DRAM latencies. This paper presents a practical, fully associative, software-managed sec-ondary cache system that provides performance competitive with or superior to traditional caches without OS or application involvement. We see this structure as the first step toward OS- and application-aware management of large on-chip caches. This paper has two primary contributions: a practical design for a fully associative memory structure, the indirect index cache (IIC), and a novel replacement algorithm, gen-erational replacement, that is specifically designed to work with the IIC. We analyze the behavior of an IIC with generational replacement as a drop-in, transparent substitute for a conventional secondary cache. We achieve miss rate reductions from 8% to 85% relative to a 4-way associative LRU organization, matching or beating a (practically infeasible) fully associative true LRU cache. Incorporating these miss rates into a rudimentary timing model indicates that the IIC/generational replacement cache could be competitive with a conventional cache at today�s DRAM latencies, and will outperform a conventional cache as these CPU-relative latencies grow.

CSE-TR-415-99

Out-of-Order Fetch Decode and Issue

Stark

Nov, 99

296

To exploit larger amounts of parallelism, processors are being built with ever wider issue widths. Unfortunately, as issue widths grow, instruction cache misses increasingly bottleneck performance. Out-of-order fetch, decode, and issue reduces the bottleneck due to these misses. The term *issue* denotes the act of writing instructions into reservation stations, *fetch block* denotes the instructions brought into the processor by an instruction cache fetch, and *branch predictor* denotes the entire next fetch address generation logic. Upon encountering an instruction cache miss, a conventional processor waits until the miss has been serviced before continuing to fetch, decode, and issue any new fetch blocks. A processor with out-of-order fetch, decode, and issue temporarily ignores the block associated with the miss, and attempts to fetch, decode, and issue the blocks that follow. The addresses of these blocks are generated by the branch predictor. Because the predictor does not depend on the instructions in the current block to generate the address of the next fetch block, it can continue making predictions even if the current block misses in the cache. Thus, the processor can skip the block that missed and continue fetching, decoding, and issuing the following blocks using the addresses generated by the predictor. After servicing the miss, the processor can fetch, decode, and issue the skipped block. This dissertation evaluates the *potential* performances of processors with various issue widths and window sizes, shows that enough potential performance exists to justify building a 16-wide processor, examines bottlenecks that would prevent this processor from achieving its potential performance, and demonstrates that the bottleneck due to instruction cache misses would be severe. It introduces a remedy for this bottleneck--out- of-order fetch, decode, and issue--describes its implementation, and demonstrates its usefulness. For a 16-wide processor with a 16k byte direct-mapped instruction cache, instruction cache misses reduce its performance on a set of sixteen integer benchmarks by an average of 24%. When it is equipped with out-of-order fetch, decode, and issue, its performance is only reduced by an average of 5%.

CSE-TR-416-99

DACIA: A Mobile Component Framework for Building Adaptive Distributed Applications

Litiu and Prakash

Dec, 99

Current computing systems exhibit an impressive heterogeneity of hardware resources, software capabilities, and network connectivity. They have to accommodate various application and client demands and to adjust to the continuously changing load on network segments and intermediate processing nodes. We present DACIA, a framework for building adaptive distributed applications through the flexible composition of software modules implementing individual functions. An application is seen as a collection of components connected in a directed graph. Through runtime reconfiguration of the application graph, the application adapts to changes in the execution environment, resource availability, and application requirements. Thus, a more efficient execution is achieved. The location where various components are executed can be changed dynamically and the execution can be occasionally outsourced to nodes having specialized hardware or to less loaded sites on the network. Some components or groups of components can be replaced by other components or groups of components. DACIA supports application and user mobility, allowing parts of an application to move to different hosts at runtime, while maintaining seamless connectivity.

CSE-TR-417-99

Improving Branch Prediction by Understanding Branch Behavior

Evers

Dec, 99

163

Accurate branch prediction can be seen as a mechanism for enabling design decisions. When short pipelines were the norm, accurate branch prediction was not as important. However, having accurate branch prediction enables technologies like wide-issue deeply pipelined superscalar processors. If branch predictors can be improved further, we can more successfully use more aggressive speculation techniques. Accurate branch prediction enables larger scheduling windows, out-of-order fetch, deeper pipelines etc. It is therefore likely that there will be a growing demand for more accurate predictors beyond todays prediction technology. Previous studies have shown which branch predictors and configurations best predict the branches in a given set of benchmarks. Some studies have also investigated effects, such as pattern history table interference, that can be detrimental to the performance of these branch predictors. However, little research has been done on which characteristics of branch behavior make branches predictable. This dissertation approaches the branch problem in a different way from previous studies. The focus is on understanding how branches behave and why they are predictable. Branches are classified based on the type of behavior, and the extent of each type of behavior is quantified. One important result is that two thirds of all branches are very predictable using a simple predictor because they follow repeating patterns. We also show how correlation between branches works, and what part of this correlation is important for branch prediction. Based on this information about branch behavior, some shortcomings of current branch predictors are identified, new branch predictors are introduced, and potential areas for future improvement are identified. One of the new predictors, Dual History Length Gshare with Selective Update is more accurate than a Gshare predictor using Branch Filtering while having a simpler implementation. Another new predictor, the Multi Hybrid, suffers 10% fewer mispredictions than a state-of-the-art PAs/Gshare hybrid predictor at an implementation cost of 100 KB.

CSE-TR-418-99

Algebra-based Optimization Strategies for Decentralized Mining

Jensen and Soparkar

Dec, 99

CSE-TR-419-00

ASH-3: A Distributed File System Over Distributed Shared Memory

Janjoom Raasch Shih

Jan, 00

The use of distributed file systems (DFS) has been popularized by systems such as AFS and NFS. They allow users to share system resources from any location in the network. They also increase the reliability and availability of these resources in that underlying changes are transparent to the user. Concurrently, much research has been undergoing in the area of distributed-shared memory (DSM). The concept of DSM is similar to that of DFS, in that it can allow many users to share a common base of resources. In DSM, however, the resource is not disk storage, but rather memory. Like DFS, DSM allows all users to have the same view of the resource that is transparent when moving between machines. Because these two systems are so similar, we combine the power of a DFS with the ease of development available from a DSM system. We quantify the performance lost by using the DSM layer and will also demonstrate the gains in ease of development.

CSE-TR-420-00

Generalized Symmetries in Boolean Functions

Kravets and Sakallah

Feb, 00

In this report we take a fresh look at the notion of symmetries in Boolean functions. In particular, we show that the classical characterization of symmetries based on invariance under variable swaps is a special case of a more general invariance based on unrestricted variable permutations. We propose a generalization of classical symmetry that allows for the simultaneous swap of groups of variables and show that it captures more of a function�s invariant permutations without undue computational requirements. We apply the new symmetry definition to analyze a large set of benchmark circuits and provide extensive data showing the existence of substantial symmetries in those circuits. Specific case studies of several of these benchmarks reveal additional insights about their functional structure and how it might be related to their circuit structure.

CSE-TR-421-00

The MIRV SimpleScalar/PISA Compiler

Postiff Greene Lefurgy Helder and Mudge

Apr, 00

We introduce a new experimental C compiler in this report. The compiler, called MIRV, is designed to enable research that explores the interaction between the compiler and microarchitecture. This introductory paper makes comparisons between MIRV and GCC. We notice trends between the compilers and optimization levels across SPECint1995 and SPEC2000. Finally, we provide a set of SimpleScalar/PISA binaries to the research com-munity. As we improve the compiler, we encourage architecture researchers to use these optimized binaries as reference programs for architecture research.

CSE-TR-422-00

The Construction of Attention Functions for Phase Regulation

Klavins and Koditschek

Apr, 00

In this report we describe a method for building an attention function on the two dimensional torus. Such a function is used as an analytic switch by an underactuated robotic system to change its attention from one controlled cyclic process to another. A point in the torus represents two phase variables which each describe the progress of a cyclic system around a circle, such as that of a bouncing ball or a hopping leg. The attention function is designed so that under a certain flow on the torus, attention is paid to the phase which will next become zero.

CSE-TR-423-00

An Automatic Verification Tool for UML

Compton Gurevich Huggins and Shen

May, 00

The Unified Modeling language is becoming more and more popular in the software development. However because of its ambiguisity in its semantic model, few verification tool has been built. Abstract State Machines have been successfully applied in giving semantics for programming language like C. In this report, we try to use the Abstract State Macines to give a semantics model for UML and then use ASM Model Checker to design a verification tool for UML. Last we give a toy example to show how the verification works.

CSE-TR-424-00

A Prefetch Taxonomy

Srinivasan Davidson and Tyson

Apr, 00

The difference in processor and main memory cycle time has necessitated the use of aggressive prefetching techniques to reduce or hide main memory access latency. However, prefetching can significantly increase memory bandwidth and unsuccessful prefetches may even pollute the primary cache. Although the current metrics, coverage and accuracy, do provide an impression of the overall quality of a prefetching algorithm, they are simplistic and do not unambiguously and precisely explain the performance of a prefetching algorithm. In this paper, we introduce a new and complete taxonomy for classifying prefetches, accounting for the difference in traffic and misses generated by each prefetch. Our classification of individual prefetches is very useful in identifying the specific strengths and weaknesses of an existing prefetch implementation and we demonstrate it by applying our taxonomy to two implementable prefetching techniques. Based on the histogram of prefetches by category we developed a static filter to reduce/eliminate polluting prefetches of any given prefetching technique.

CSE-TR-425-00

SpliCS-Split Latency Cache System

Srinivasan Charney Davidson and Tyson

Apr, 00

Memory access latencies are much larger than processor cycle times, and this gap has been increasing over time. Cache performance is critical to bridging this gap. Since caches cannot be both large and fast, cache design involves hierarchies with trade-offs between access times and miss ratios. This paper introduces a novel primary cache design, called the Split Latency Cache System (SpliCS). SpliCS employs two data stores: a small, fast cache (A) and a larger, but slower cache (B), inclusion is maintained and both caches are accessed in parallel. SpliCS improves the effectiveness of the cache hierarchy by allowing very fast access to some lines while still allowing a large primary cache to be used. Our results using an out-of-order processor model show that relative to a similarly configured traditional cache, SpliCS achieves a 12 to 13% improvement in CPI on commercial applications and 14 to 18% improvement in CPI on some benchmarks from the SPEC suite.

CSE-TR-426-00

Antigone: Implementing Policy in Secure Group Communication

McDaniel and Prakash

May, 00

Significant strides have been made in achieving strong semantics and security guarantees within group communication and multicast systems. However, the scope of available security policies in these systems is often limited. In contrast, the applications that require the services provided by these systems can differ significantly in their security policy needs. Often application designers have to either make significant compromises in using a given group communication system or build their own customized solutions, an error-prone task. This paper presents Antigone, a framework that provides a suite of mechanisms from which flexible application security policies may be implemented. With Antigone, developers may choose a policy that best addresses their security and performance requirements of an application requiring group communication. We describe the Antigones mechanisms, consisting of a set of micro-protocols, and show how different security policies can be implemented using those mechanisms. We also present a performance study illustrating the security/performance tradeoffs that can be made using Antigone. Through an example conferencing application, we demonstrate the use of the Antigone applications programming interface and consider the use of policy in several distinct session environments.

CSE-TR-427-00

QGuard: Protecting Internet Servers from Overload

Jamjoon Reumann and Shin

May, 00

Current operating systems are not well-equipped to handle sudden load surges that arecommonly experienced by Internet servers. This means that service providers and customers may not be able to count on servers being available once their content becomes very popular. Recent Denial-of-Service attacks on major e-commerce sites have capitalized on this weakness. Remedies that were proposed to improve server behavior under overload require substantial changes to the operating system or applications, which is unacceptable to businesses that only want to use the tried and true. This paper presents QGuard, a much less radical solution to providing differential QoS, protection from overload and some DoS attacks. QGuard is an adaptive mechanism that exploits rate controls for inbound traffic in order to fend off overload and provide QoS differentiation between competing traffic classes. Our Linux-2.2.14-based QGuard prototype provides freely configurable QoS differentiation (preferred customer treatment and service differentiation) and effectively counteracts SYN and ICMP-flood attacks. Since QGuard is a purely network-centric mechanism, it does not require any changes to server applications and can be implemented as a simple add-on module for any OS. Our measurements indicate no performance degradation on lightly loaded servers and only a small reduction of aggregated server throughput (less than 2%) under overload. Well-behaved "preferred customers" remain virtually unaffected by server overload.

CSE-TR-428-00

Lightweight Failure Detection in Secure Group Communication

McDaniel and Prakash

Jun, 00

The secure and efficient detection of process failures is an essential requirement of many distributed systems. In this paper, we present the design and analysis of a mechanism used for the detection of member failures in secure groups. Based on one-time passwords, our solution does not obviate the need for periodic statements from group members, but significantly reduces the cost of their generation and validation. A study comparing the costs of traditional mechanisms with our proposed approach is presented. Results of the study indicate the average case performance of the proposed scheme is 1/10th of traditional failure detection in trusted groups, and negligible in the untrusted groups. A discussion of security and performance tradeoffs made through mechanism policy is provided.

CSE-TR-429-00

Banana Tree Protocol an End-host Multicast Protocol

Helder and Jamin

Jul, 00

Multicast is a network technology that allows a host to efficiently send data to a group of hosts. IP multicast is one example of multicast that relies on Internet routers to forward multicast packets to multicast group members. End-host multicast is another approach where the hosts in the multicast group form a virtual network and route multicast data through the graph themselves without router cooperation. This paper describes Banana Tree Protocol (BTP), an end-host multicast protocol we designed and implemented. We have simulated BTP along with other multicast protocols and theoretically optimal virtual networks. We find BTP performs well in optimal conditions, but performs poorly in more realistic conditions. We analyze this behavior and find BTP to be too limited in its allowed graph transformations. We conclude that an end-host multicast protocol must allow nodes to make a wide range of graph transformations in order to effectively optimize the graph.

CSE-TR-430-00

Library Design for Symmetric Decomposition

Kravets and Sakallah

Jul, 00

Using a very general symbolic decomposition template for logic synthesis we show how to pre-compute libraries for the decomposition patterns implied by the function structure. When coupled with decomposition the outfitted libraries are intended to produce improved synthesis quality. We illustrate the pre-computation process for functions that are symmetric in some inputs. For these functions we derive a set of fan-in-bounded cell libraries that guarantee a reduction in the width of the circuit being synthesized with each successive decomposition step.

CSE-TR-431-00

Fugue: Time Scales of Adaptation in Mobile Video

Corner Noble Wasserman

Aug, 00

Providing interactive video on hand-held, mobile devices is extremely difficult. These devices are subject to processor, memory, and power constraints, and communicate over wireless links of rapidly varying quality. Furthermore, the size of encoded video is difficult to predict, complicating the encoding task. We present Fugue, a system that copes with these challenges through a division along time scales of adaptation. Fugue is structured as three separate controllers: transmission, video, and preference. This decomposition provides adaptation along different time scales: per-packet, per-frame, and per-video. The controllers are provided at modest time and space costs compared to the cost of video encoding. We present simulations confirming the efficacy of our transmission controller, and compare our video controller to several alternatives. We find that, in situations amenable to adaptive compression, our scheme provides video quality equal to or better than the alternatives at a comparable or substantially lower computational cost. We also find that distortion, the metric commonly used to compare mobile video, under-values the contribution smooth motion makes to perceived video quality.

CSE-TR-432-00

SANE: Stable Agile Network Estimation

Kim and Noble

Aug, 00

Distributed systems are becoming increasingly dependent on network estimation, the ability to determine performance along one or more network paths. Producing quality estimates is challenging because network observations are noisy; this is particularly true in wide-area or mobile settings. Current systems depend on simple exponentially weighted moving average filters. These filters are either able to detect true changes quickly or to mask transient changes, but cannot do both. In this paper, we present four filters designed to react quickly to persistent changes while tolerating transient ones. These filters are evaluated in a variety of networking scenarios through three metrics: agility, stability and accuracy. While no single filter dominates, one based on techniques from statistical process control shows promise relative to the others.

CSE-TR-433-00

Inet: Internet Topology Generator

Jin Chen and Jamin

Sep, 00

Network research often involves the evaluation of new application designs, system architectures, and protocol implementations. Due to the immense scale of the Internet, deploying an Internet-wide system for the purpose of experimental study is nearly impossible. Instead, researchers evaluate their designs using generated random network topologies. In this report, we present a topology generator that is based on Autonomous System (AS) connectivity in the Internet. We compare the networks generated by our generator with other types of random networks and show that it generates topologies that best approximate the actual Internet AS topology.

CSE-TR-434-00

The Need for Large Register Files in Integer Codes

Postiff Greene and Mudge

Oct, 00

Register allocation is an important optimization for high performance microprocessors but there is no consensus in the architecture or compiler communities as to the best num-ber of registers to provide in an instruction set architecture. This paper discusses reasons why this situation has occurred and shows from a compiler perspective that, compared to the conventional 32-register file, 64 or more registers enables performance improvements from 5% to 20%. This is demonstrated with existing advanced compiler optimizations on the SPECint95 and SPEC2000 benchmarks. This work also documents that the optimiza-tions eliminate cache hit operations, converting common-case cache hits to faster register accesses. Finally, this work provides additional measurements for the proper number of registers in a high-performance instruction set and shows that most programs can easily use 100 to 200 registers when multiple active functions are considered for simultaneous allocation to the register file.

CSE-TR-435-00

ICP Registration using Invariant Features

Sharp Lee and Wehe

Oct, 00

This paper investigates the use of Euclidean invariant features in a generalization of iterative closest point registration of range images. Pointwise correspondences are chosen as the closest point with respect to a weighted linear combination of positional and feature distances. It is shown that under ideal noise-free conditions, correspondences formed using this distance function are correct more often than correspondences formed using the positional distance alone. In addition, monotonic convergence to at least a local minimum is shown to hold for this method. When noise is present, a method that automatically sets the optimal relative contribution of features and positions is described. This method trades off error in feature values due to noise against error in positions due to misalignment. Experimental results show that using invariant features decreases the probability of being trapped in a local minimum, and is most effective for difficult registration problems where the scene is very small compared to the model.

CSE-TR-436-00

SimSect Hybrid Dynamical Simulation Environment

Saranli

Oct, 00

This report is the users manual of SimSect, a flexible environment for the simulation of hybrid dynamical systems. Hybrid dynamical systems are systems where the continuous dynamics undergo discrete changes upon detection of predefined state based events. The simulation of such systems involve accurate detection of events and handling of the transition between different continuous systems. This report also includes the details of two example hybrid dynamical systems: A spring loaded inverted pendulum (SLIP) runner and a compliant hexapod. The former example illustrates the basic elements of programming models with SimSect, while the latter implements a much more complicated model, addressing many common issues arising when defining complex dynamical models with SimSect.

CSE-TR-437-00

Algebra Based Optimization Strategies for Decentralized Mining

V. Jenson and N. Soparkar

Oct, 00

In previous work, we demonstrated the importance of mining of decentralized data (i.e., normalized tables, possibly residing in several repositories, with several schemas, separate administration etc) in contrast to current technology which applies to centrally stored data. Our decentralized approach to mining data computes partial results at the separate tables, and merges them to obtain the final one -- thereby avoiding the expensive materialization of joins. We provided approaches to mining multiple tables located in a data warehouse, and indicated how these may be further extended. Our decentralized algorithms for finding frequent itemsets (used in association rules and classification algorithms) across multiple tables were analyzed for their performance characteristics, and these were validated empirically as well. In this paper, we detail our approach as extended to general database schema designs, including physically distributed tables, and we describe our techniques which are similar to standard query optimization and processing. We present an algebra (more as an explanatory tool than for rigor) over expressions that denote the mined information (frequent itemsets, in our case) to describe how our basic decentralized mining approaches apply to generic database schema designs. To complement our algebra, we provide useful equivalences among the expressions, and this allows expression re-writing to represent different alternatives possible during the mining. Each equivalent expression indicates several processing strategies for decentralized and/or distributed designs. Using cost formulae available from our previous work, we present the choice among the processing strategies as an optimization problem as regards the efficiency in execution. We discuss the parameters needed for estimating the cost of different executions and finally, we describe heuristics to reduce the search within the space of possible execution plans.

CSE-TR-438-00

Ismene: Provisioning and Policy Reconciliation in Secure Group Communication

McDaniel and Prakash

Dec, 00

Group communication systems increasingly provide security services. However, in practice, the use of such systems is complicated by the divergent requirements and abilities of group members. In this paper, we define a policy language called Ismene that directs the provisioning of security-related resources at member sites. The communication service is defined through a reconciliation of a group policy and members local policies into a security configuration. Group authorization and access control appropriate for the operating conditions and session configuration are also defined within the policy. The use of Ismene policies to define security is illustrated through an extended example of a group application built on the prototype Ismene framework.

CSE-TR-439-01

The Chordal Analysis of Tonal Music

Pardo and Birmingham

Mar, 01

This paper provides a theoretical framework upon which to build a system for chordal analysis of music. We establish that, in the worst case, the partitioning and labeling of harmonies in tonal music is O(2^N), where N is number of notes in a piece of music. We show that, when segments of the music can be analyzed locally, and a segment scoring metric can be found that is both additive and transitive, the problem becomes O(N^2). Under these constraints, the problem is cast as finding the best path through a single-source directed acyclic graph, which can be solved in O(E + V) time. This translates to a worst-case time complexity of O(N^2) for our problem. We then show that the results of the O(N^2) search can be closely approximated through the use of a heuristic that allows O(N) time search. The results of the heuristic search are then compared to exhaustive graph search and the results of analyses by existing systems by Winograd (Winograd 1968), Maxwell (Maxwell 1992), and Temperley and Sleator (Temperley and Sleator 1999).

CSE-TR-440-01

The Effects of the x86 ISA on the Front End: Where have all the cycles gone?

Vlaovic and Davidson

Apr, 01

Although x86 processors have been around for a long time and are the most ubiquitous processors in the world, the amount of academic research regarding details of their performance has been minimal. Here, we will introduce x86 simulation environment, which we call Trace Analysis for X86 Interpretation, or TAXI, and use it to discuss the differences between current x86 processors and other processors, and present some performance results of eight Win32 applications. By utilizing TAXI, we can attribute performance bottlenecks to those components of the microarchitecture that cause the most performance degradation. We look at 8 aspects of front-end that can contribute to performance loss; then based on this information, we introduce an improvement that yields 17% speedup in overall execution time.

CSE-TR-441-01

Optimization and Evaluation Strategies for Decentralized Database Mining

Jensen and Soparkar

Jun, 01

CSE-TR-442-01

MASE: A Novel Infrastructure for Detailed Microarchitectural Modeling

Larson Chatterjee and Austin

Jul, 01

MASE (Micro Architectural Simulation Environment) is a novel infrastructure that provides a flexible and capable environment to model modern microarchitectures. Many popular simulators, such as SimpleScalar, are predominately trace-based where the performance simulator is driven by a trace of instructions read from a file or generated on-the-fly by a functional simulator. Trace-driven simulators are well-suited for oracle studies and provide a clean division between performance modeling and functional emulation. A major problem with this approach, however, is that it does not accurately model timing dependent computations, an increasing trend in microarchitecture designs such as those found in multiprocessor systems. MASE implements a micro-functional performance model that combines timing and functional components into a single core. In addition, MASE incorporates a trace-driven functional component used to implement oracle studies and check the results of instructions as they commit. The check feature reduces the burden of correctness on the micro-functional core and also serves as a powerful debugging aid. MASE also implements a callback scheduling interface to support resources with non-deterministic latencies such as those found in highly concurrent memory systems. MASE was built on top of the current version of SimpleScalar. Analyses show that the performance statistics are comparable without a significant increase in simulation time.

CSE-TR-443-01

Improving Energy and Performance of Data Cache Architectures by Exploiting Memory Reference Characteristics

Lee

Aug, 01

110

Minimizing power, increasing performance, and delivering effective memory bandwidth are todays primary microprocessor design goals for the embedded, high-end and multimedia workstation markets. In this dissertation, I will discuss three major data cache architecture design optimization techniques, each of which exploits the data memory reference characteristics of the applications written in high-level languages. Through a better understanding of the memory reference behavior, we can design a system that executes at higher performance, while consuming less energy, and delivering more effective memory bandwidth. The first part of this dissertation presents an in-depth characterization of data memory references, including analysis of semantic region accesses and behavior of data stores. This analysis leads to a new organization of the data cache hierarchy called Region-based Cachelets. Region-based Cachelets are capable of improving memory performance of embedded applications while significantly reducing dynamic energy consumption, resulting in a 50% to 70% improvement in energy-delay product efficiency using this approach. Following this, I will discuss a new cache-like structure, the Stack Value File (or SVF), which boosts performance of general purpose applications by routing stack data references to a separate storage structure optimized for the unique characteristics of the stack reference substream. By utilizing a custom structure for stack references, we are able to increase memory level parallelism, reduce memory latency, and reduce off-chip memory activity. The performance can be improved by 24% by implementing an 8KB SVF for a processor with a dual-ported L1 cache. Finally, I will address memory bandwidth issues by proposing a new write policy called Eager Writeback which can effectively improve overall system performance by shifting the writings of dirty cache lines from on-demand to times when the memory bus is less congested. It lessens the criticality of on-demand misses and improves performance by 6% to 16% for the 3D graphics geometry pipeline.

CSE-TR-444-01

Statistical Analysis in Music Information Retrieval

Birmingham and Rand

Oct, 01

We introduce a statistical model of music that allows for the retrieval of music based upon an audio input. This model uses frequency counts of state transitions to index pieces of music. Several methods of comparing these index values to choose an appropriate match are examined. We explained how this model can serve as the basis for a query-by-humming system. The model is also shown to be robust to errors of several kinds.

CSE-TR-445-01

Faster SAT and Smaller BDDs via Common Function Structure

Aloul Markov and Sakallah

Dec, 01

The increasing popularity of SAT and BDD techniques in verification and synthesis encourages the search for additional speed-ups. Since typical SAT and BDD algorithms are exponential in the worst-case, the structure of real-world instances is a natural source of improvements. While SAT and BDD techniques are often presented as mutually exclusive alternatives, our work points out that both can be improved via the use of the same structural properties of instances. Our proposed methods are based on efficient problem partitioning and can be easily applied as pre-processing with arbitrary SAT solvers and BDD packages without any source code modifications. Finding a better variable ordering is a well recognized problem for both SAT solvers and BDD packages. Currently, all leading edge variable ordering algorithms are dynamic, in the sense that they are invoked many times in the course of the "host" algorithm that solves SAT or manipulates BDDs. Examples include the DLCS ordering for SAT solvers and variable-sifting during BDD manipulations. In this work we propose a universal variable ordering MINCE (MIN Cut Etc.) that pre-processes a given Boolean formula in CNF. MINCE is completely independent from target algorithms and outperforms both DLCS for SAT and variable sifting for BDDs. We argue that MINCE tends to capture structural properties of Boolean functions arising from real-world applications. Our contribution is validated on the ISCAS circuits and the DIMACS benchmarks. Empirically, our technique often outperforms existing techniques by a factor of two or more in most cases. Our results motivate search for stronger dynamic ordering heuristics and combined static/dynamic techniques.

CSE-TR-446-02

HYPE: A Hybrid Power Estimation Method for Programmable Systems

Liu and Papaefthymiou

Jan, 02

In this paper, we present a novel power estimation scheme for programmable systems consisting of predesigned datapath and memory components. The proposed hybrid methodology yields highly accurate estimates within short runtimes by combining high-level behavioral simulation with analytical macromodeling of circuit characteristics. Given a system of predesigned components along with its initial state and a program to execute, behavioral level simulation is used to derive and analyze all control signals and all data signals at the datapath/memory interface. Within the datapath itself, an iterative procedure is used in conjunction with analytical output macromodeling to compute signal statistics, as opposed to actual signals, at the inputs of its constituent logic components. These statistics are then applied to analytical power macromodels to obtain the dissipation of the datapath and the global interconnects. Our approach is theoretically sound and yields fast and accurate estimates in practice. Using Banachs fixed point theorem, we prove that, under certain sufficient conditions on the output macromodels, the iterative procedure always converges to a unique point. We have implemented our hybrid technique into a power estimation tool called HyPE and used it to explore various architectural alternatives in the design of a 256-state Viterbi decoder and a Rijndael encryptor. For designs with close to 1 million transistors, our estimator terminates within 10 seconds. Compared with state-of-the-art industrial gate-level power estimators, the proposed methodology is about 1,000 times faster with 5.4% error on average.

CSE-TR-447-02

Predicting Last-Touch References under Optimal Replacement

Lin and Reinhardt

Jan, 02

Effective cache replacement is becoming an increasingly important issue in cache hierarchy design as large set-associative caches are widely used in high-performance systems. This paper proposes a novel approach to approximate the decisions made by an optimal replacement algorithm (OPT) using last-touch prediction. The central idea is to identify, via prediction, the final reference to a cache block before the block would be evicted under OPT�the �OPT last touch�. Given perfect prediction, replacing the referenced block immediately after each OPT last touch would give optimal replacement behavior. This paper evaluates the feasibility of this approach by studying, at a fundamental level, the predictability of OPT last-touch references, and the applicability of these predictions to improving replacement decisions. We show that trace-based predictors, previously proposed for LRU last-touch prediction, can predict OPT last touches as well. We enhance these predictors using future information, but find that their performance degrades significantly as cache size and associativity increase. We introduce a new class of predictors based on last-touch history, which significantly outperform trace-based predictors on large set-associative secondary caches. Across eight SPEC CPU2000 benchmarks on a 1MB 16-way associative secondary cache, an idealized history-based OPT last-touch predictor can potentially eliminate 39% of LRU misses�eliminating 63% of the gap between LRU and OPT.

CSE-TR-448-02

The FMSAT Satisfiability Solver: Hypergraph Partitioning meets Boolean Satisfiability

Ramani and Markov

Jan, 02

This report is intended to present the Fiduccia Mattheyses (FM) heuristic for hypergraph bipartitioning as a method for solving large Boolean satisfiability (SAT) instances. We hope to extend the success of the FM heuristic in the partitioning domain to satisfiability. This report outlines how a SAT problem can be viewed as a partitioning problem, which argues for the applicability of FM. Further, we present the software architecture and implementation of our SAT solver, inspired by a previous, highly successful implementation of FM for hypergraph bipartitioning. We briefly discuss experimental results that justify our claim that the FM heuristic shows promise in the satisfiability domain.

CSE-TR-449-02

Visual Servoing via Navigation Functions

Cowan Weingarten and Koditschek

Feb, 02

This technical report presents a framework for visual servoing that guarantees convergence to a visible goal from most initially visible configurations while maintaining full view of all the feature points along the way. The method applies to first and second order fully actuated plant models. The solution entails three components: a model for the "occlusion-free" workspace; a change of coordinates from image to model coordinates; and a navigation function for the model space. We present three example applications of the framework, along with experimental validation of its practical efficacy.

CSE-TR-450-02

Optimization-Based Multicast Flow Control Using Virtual M-Ary Feedback

Zhang and Shin

Feb, 02

CSE-TR-451-02

Majority-Based Decomposition of Carry Logic in Binary Adders

Nazhandali and Sakallah

Mar, 02

We introduce a new carry lookahead adder architecture based on a decomposition of the carry logic into a network consisting exclusively of majority gates. We call adders with this architecture �M&M� adders to distinguish them from adders based on the conventional Generate/Propagate, or �G&P,� architecture. We show how M&M and G&P adders are related, and characterize optimal realizations of M&M adders.

CSE-TR-452-02

Design and Analysis of a Flipping Controller for RHex

Saranli and Koditschek

Mar, 02

We report on the design and analysis of a controller that can achieve dynamical self-righting of our hexapedal robot, RHex. We present an empirically tuned controller that works reasonably well on indoor surfaces, using a hybrid energy pumping strategy to overcome torque limitations of its actuators. Subsequent modeling and analysis yields a new controller with a much wider domain of success as well as a preliminary understanding of the hybrid control strategy. Simulation results demonstrate the superiority of the improved control strategy relative to the first generation empirically designed controller.

CSE-TR-453-02

Registration of Range Images in the Presence of Occlusions and Missing Data

Sharp Lee and Wehe

Mar, 02

We describe a method for the registration of range images in the presence of occlusions and missing data. Our approach uses the missing data as an additional clue for matching. Each point in both images, valid or missing, is classified and scored within a maximum likelihood framework. Occlusions, shadows, noise, partially overlapping views, and outliers are all considered in within this framework. We found that our registration method can be used to register real scenes with complex geometry, discontinuities, missing data, and overlap as small as ten percent.

CSE-TR-454-02

Towards Capturing Representative AS-Level Internet Topologies

Chang Govindan Jamin Shenker and Willinger

Mar, 02

Recent studies concerning the Internet connectivity at the AS level have attracted considerable attention. These studies have exclusively relied on the BGP data from Oregon route-views~cite{Oregon} to derive some unexpected and intriguing results. The Oregon route-views data sets reflect AS peering relationships, as reported by BGP, seen from a handful of vantage points in the global Internet. The possibility that these data sets from Oregon route-views may provide only a very sketchy picture of the complete inter-AS connections that exist in the actual Internet has received surprisingly little scrutiny. In this paper, we will use the term ``{em AS peering relationship}' to mean that there is ``at least one direct router-level connection' between two existing ASs, and that these two ASs agree to exchange traffic by enabling BGP between them. By augmenting the Oregon route-views data sets with BGP summary information from a large number of Internet {em Looking Glass/} sites and with routing policy information from Internet Routing Registry (IRR) databases, we find that (1) a significant number of existing AS connections remain hidden from most BGP routing tables, (2) the AS connections to tier-1 ASs are in general more easily observed than those to non tier-1 ASs, and (3) there are at least about 25--50% more AS connections in the Internet than commonly-used BGP-derived AS maps reveal (but only about 2% more ASs). These findings point out the need for an increased awareness of and a more critical attitude toward the applicability and completeness of given data sets at hand when establishing the generality of any particular observations about the Internet.

CSE-TR-455-02

A Markov Chain Sequence Generator for Power Macromodeling

Liu and Papaefthymiou

Apr, 02

In macromodeling-based power estimation, circuit macromodels are created from simulations of synthetic input vector sequences. Fast generation of these sequences with all possible statistics is crucial for ensuring the accuracy and speed of macromodeling. In this paper, we present a novel sequence generator based on a Markov chain model. Specifically, we formulate the problem of generating a sequence of vectors with given statistics as a transition matrix computation problem, in which the matrix elements are subject to constraints derived from the specified statistics. We also present a practical heuristic that computes such a matrix and generates a sequence of l n-bit vectors in O(nl+n*n) time. Our generator is guaranteed to yield vector sequences with a given average input probability p, average transition density d, and spatial correlation s, or reports that the specified sequence type does not exist. Derived from a strongly mixing Markov chain, it generates binary vector sequences with accurate statistics, high uniformity, and high randomness. Experimental results show that our sequence generator can cover more than 99% of the parameter space. Sequences of 2,000 48-bit vectors are generated in less than 0.05 seconds, with average deviations of the signal statistics p, d, and s equal to 1.0%, 0.6%, and 0.6%, respectively. Our generator enables the detailed study of power macromodeling. Using our tool and the ISCAS-85 benchmark circuits, we have assessed the power sensitivities of three input statistics. Our investigation has revealed that power is most sensitive to transition density, while only occasionally exhibiting high sensitivity to signal probability and spatial correlation. We have also investigated the power estimation error resulting from input signal imbalance. Our experiments show that signal imbalance can cause estimation error as high as 100% in extreme cases, although errors are usually within 25%.

CSE-TR-456-02

Inet-3.0: Internet Topology Generator

Winick and Jamin

May, 02

In this report we present version 3.0 of Inet, an Autonomous System (AS) level Internet topology generator. Our understanding of the Internet topology is quickly evolving, and thus, our understanding of how synthetic topologies should be generated is changing too. We document our analysis of Inet-2.2, which highlighted two shortcommings in its topologies. Inet-3.0 improves upon Inet-2.2s two main weaknesses by creating topologies with more accurate degree distributions and minimum vertex covers as compared to Internet topologies. We also examine numerous other metrics to show that Inet-3.0 better approximates the actual Internet AS topology than does Inet-2.2. Inet-3.0s topologies still do not well represent the Internet in terms of maximum clique size and clustering coefficient. These related problems stress a need for a better understanding of Internet connectivity and will be addressed in future work.

CSE-TR-457-02

Satometer: How Much Have We Searched?

Aloul Sierawski and Sakallah

Jun, 02

We introduce Satometer, a tool that can be used to estimate the percentage of the search space actually explored by a backtrack SAT solver. Satometer calculates a normalized minterm count for those portions of the search space identified by conflicts. The computation is carried out using a zero-suppressed binary decision diagram (ZBDD) data structure and can have adjustable accuracy. The data provided by Satometer can help diagnose the performance of SAT solvers and can shed light on the nature of a SAT instance.

CSE-TR-458-02

The Phi-Calculus - A Hybrid Extension of the Pi-Calculus to Embedded Systems

Rounds and Song

Jun, 02

In this paper we present a new language which allows concurrent programs to interact with arbitrary continuous environments. It is an extension of the powerful pi-calculus of Milner, which already provides for concurrency and mobility. Our contribution adds the notion of *active environments* which can specify flows over continuous time using ordinary differential equations. We provide examples of robotic assembly systems; we show how Petri nets can be expressed in the calculus, and we prove ``context theorems' showing that bisimilar phi-calculus processes flow the same way in the same active environment.

CSE-TR-459-02

Dynamic Leakage-Energy Management of Secondary Caches

Hallnor and Reinhardt

Jun, 02

Static leakage currents are the prime contributer to energy consumption for large on-chip secondary/tertiary caches. This energy cost can be reduced by dynamically disabling unneeded portions of the cache. To provide overall energy savings, however, the leakage-energy reduction must exceed the extra energy incurred due to additional off-chip accesses and increased runtime. We derive a practical formula for on-line estimation of net energy reduction based on the number of additional misses incurred and the ratio of off-chip access energy to per-cycle cache leakage energy. We estimate reasonable values for this ratio by measuring the actual off-chip access energy cost on a current system. We incorporate our estimation formula into a previously proposed dyamic resizing scheme, preventing it from increasing net energy consumption and improving its overall effectiveness. We also present a new resizing framework that tracks misses and controls power at the granularity of asssociative ways. Per-way miss counters enable simultaneous estimation of net energy consumption at all possible cache sizes, allowing direct identification of the minimum-energy configuration. This structure also greatly reduces hardware overhead compared to block-level resizing schemes. We call the combination of this framework with our estimation formula Energy Aware Bank-Level Resizing (EnABLR). Simulations show that, across the entire SPEC CPU2000 suite, EnABLR reduces secondary-cache leakage by up to 80%, without increasing overall memory-system energy consumption by more than 1% in any situation. Average energy reductions range from 14% to 29% depending on off-chip access energy costs.

CSE-TR-460-02

Remora: A Dynamic Self-Tuning Processor

Weaver Gebara Austin and Brown

Jul, 02

CSE-TR-461-02

Generic ILP Versus Specialized 0-1 ILP: An Update

Aloul Sakallah and Markov

Aug, 02

CSE-TR-462-02

Improved Score Following for Acoustic Music

Pardo and Birmingham

Aug, 02

CSE-TR-463-02

Solving Difficult Instances of Boolean Satisfiability in the Presence of Symmetry

Aloul Ramani Markov and Sakallah

Aug, 02

Research in algorithms for Boolean satisfiability (SAT) and their implementations has recently outpaced benchmarking efforts. Most of the classic DIMACS benchmarks can now be solved in seconds on commodity PCs. More recent benchmarks take longer to solve due of their large size, but are still solved in minutes. Yet, small and difficult SAT instances must ex-ist if P != NP. To this end, our work articulates SAT instances that are unusually difficult for their size, including satisfiable instances derived from Very Large Scale Integration (VLSI) routing problems. With an efficient implementation to solve the graph automorphism problem , we show that in structured SAT instances difficulty may be associated with large numbers of symmetries. We point out that a previously published symmetry-detection mechanism based on a reduction to the graph automorphism problem often produces many spurious symmetries. Our work contributes two new reductions to graph automorphism, which detect all correct symmetries detected previously as well as phase-shift symmetries not detected earlier. The correctness of our reductions is rigorously proven, and they are evaluated empirically. We also formulate an improved construction of symmetry-breaking clauses in terms of permutation cycles and propose to use only generators of symmetries in this process. These ideas are implemented in a fully automated flow that first detects symmetries in a given SAT instance, pre-processes it by adding symmetry-breaking clauses and then calls a state-of-the-art backtrack SAT solver. Significant speed-ups are shown on many benchmarks versus di-rect application of the solver. In an attempt to further improve the practicality of our approach, we pro-pose a scheme for fast opportunistic symmetry detection and also show that considerations of symmetry may lead to more efficient reductions to SAT in the VLSI routing domain.

CSE-TR-464-02

A Local Convergence Proof for the minvar Algorithm for Computing Continuous Piecewise Linear Approximations*

Groff Khargonekar and Koditschek

Sep, 02

The class of continuous piecewise linear (PL) functions represents a useful family of approximants because invertibility can be readily imposed, and if a PL function is invertible, then it can be inverted in closed form. Many applications, arising for example in control systems and robotics, involve the simultaneous construction of a forward and inverse system model from data. Most approximation techniques require that separate forward and inverse models be trained, whereas an invertible continuous PL affords, simultaneously, the forward and inverse system model in a single representation. The minvar algorithm computes a continuous PL approximation to data. Local convergence of minvar is proven for the case when the data generating function is itself a PL function and available directly rather than through data.

CSE-TR-465-02

Operating System Extensions to Support Host Based Virtual Machines

King and Chen

Sep, 02

This paper presents two new operating system extensions that facilitate the fast execution of host based virtual machines: KTrace and MMA. KTrace provides a convenient and efficient mechanism for writing kernel modules that control the execution of user-level processes. MMA exposes more detail of the underlying memory management hardware, providing applications with access to high performance intra-address space protection and address space overloading functionality. These extensions are applied to UMLinux, an x86 virtual machine that implements Linux as a user-mode process. As a result, overhead for UMLinux is reduced from 33% for a CPU bound workload and 819% - 1240% for system call and I/O intensive workloads, to 6% and 24% - 49% respectively

CSE-TR-466-02

Image-Based Illumination for Electronic Display of Artistic Paintings

Sharp Lee Ju Yoo

Sep, 02

Visual impressions from two-dimensional artistic paintings greatly vary under different illumination conditions, but this effect has been largely overlooked in most poster productions and electronic display. The light-dependent impressions are more pronounced in oil paintings and they arise mainly from the non-diffuse specular reflectances. We present an efficient method of representing the variability of lighting conditions on artistic paintings utilizing both simple empirical reflectance models and an image-based lighting method. The Lambertian and Phong models account for a significant portion of image variations depending on illumination directions, and residual intensity and color variations that cannot be explained by the reflection models are processed in a manner that is similar to the image-based lighting methods. Our technique allows brush strokes and paint materials to be clearly visible with relatively low data dimensionality.

CSE-TR-467-02

Towards a factored analysis of legged locomotion models

Altendorfer Koditschek and Holmes

Sep, 02

In this paper, we report on a new stability analysis for hybrid legged locomotion systems based on factorization of return maps. We apply this analysis to a family of models of the Spring Loaded Inverted Pendulum (SLIP) with different leg recirculation strategies. We obtain a necessary condition for the asymptotic stability of those models, which is formulated as an exact algebraic expression despite the non-integrability of the SLIP dynamics. We outline the application of this analysis to other models of legged locomotion and its importance for the stability of legged robots and animals.

CSE-TR-468-02

Effect of Node Size on the Performance of Cache-Conscious Indices

Hankins and Patel

Oct, 02

In main-memory environments, the number of processor cache misses has a critical impact on the performance of the system. Cache-conscious indices are designed to improve the performance of main-memory indices by reducing the number of processor cache misses that are incurred during a search operation. Conventional wisdom suggests that the indexs node size should be equal to the cache line size in order to minimize the number of cache misses and improve performance. As we show in this paper, this design choice ignores additional effects, such as instruction count, which play a significant role in determining the overall performance of the index. Using analytical models and a detailed experimental evaluation, we investigate the effect of the indexs node size on two common cache-conscious indices: a cache-conscious B+-tree (CSB+-tree), and a cache-conscious extendible hash index. We show that using node sizes much larger than the cache line size can result in better search performance for the CSB+-tree. For the hash index, reducing the number of overflow chains is the key to improving search performance, even if it requires using a node size that is much larger than the cache line size. Extensive experimental evaluation demonstrates that these node size choices are valid for a variety of data distributions, for range searches on the CSB+-tree, and can also be used to speed up the execution of traditional hash-based query operators that use memory-resident indices internally.

CSE-TR-469-02

Enhanced Wired Equivalent Privacy for IEEE 802.11 Wireless LANs

Park Wang Cho and Shin

Nov, 02

The Wired Equivalent Privacy (WEP) is defined as part of the IEEE 802.11 standard to provide secure communication over a wireless channel. However, it suffers serious security flaws, such as the vulnerability of RC4 to keystream reuse and the misuse of CRC checksum in ensuring data integrity. In this paper, we design, implement, and evaluate a software (middleware) approach, which runs on top of WEP, to fill the security holes of WEP. The core of this middleware is a novel key-management protocol in which (1) to minimize the possibility of keystream reuse, the message-keys for data encryption are frequently refreshed; (2) to achieve secure exchange of message-keys, we append the Hash Message Authentication Code (HMAC) to each message-key, and then encrypt it with a base-key; (3) to provide reliable key-management service, we set up a hidden TCP connection and develop the corresponding daemons at the access point (AP) and a mobile node; and (4) to provide a mobile node with a new base-key and a message-key when the node moves from one microcell to another, we realize ``secure roaming' based on Inter-Access Point Protocol (IAPP). Furthermore, to ensure data integrity at the data-link layer, each data frame is augmented with HMAC. By setting the key-refreshment interval judiciously, we can reduce the rate of keystream reuse to an acceptably low level. More importantly, any compromised message-key becomes unusable after a single refreshment interval, and it does not reveal any information about the future message-keys. Our performance evaluation shows that the computation overhead of the proposed scheme is insignificant even when data is transferred at the maximum rate, and it is feasible for an AP to maintain hidden TCP connections for many mobile nodes.

CSE-TR-470-02

SYN-dog: Sniffing SYN Flooding Attacks

Wang Zhang and Shin

Nov, 02

This paper presents a simple and robust mechanism, called SYN-dog, to sniff SYN flooding attacks. The core of SYN-dog is based on the distinct protocol behavior of TCP connection establishment and teardown, and is an instance of the Sequential Change Point Detection. To make the detection mechanism insensitive to site and access pattern, a non-parametric Cumulative Sum (CUSUM) method is applied, thus making the detection mechanism robust, more generally applicable and its deployment much easier. The statelessness and low computation overhead of SYN-dog make itself immune to flooding attacks. The efficacy of SYN-dog is evaluated by extensive trace-driven simulations. The evaluation results show that SYN-dog has short detection latency and high detection accuracy. Moreover, due to its proximity to the flooding sources, SYN-dog not only sets alarm upon detection of ongoing SYN flooding attacks, but also reveals the location of the flooding sources without resorting to the IP traceback.

CSE-TR-471-02

Johnny Can't Sing: a Comprehensive Trainable Error Model for Sung Music Queries

Meek and Birmingham

Nov, 02

We propose a model for errors in sung queries, a variant of the hidden Markov model (HMM). This is a solution to the problem of identifying the degree of similarity between a (typically error-laden) sung query and a potential target in a database of musical works, an important problem in the field of music information retrieval. Similarity metrics are a critical component of ``query-by-humming' (QBH) applications which search audio and multimedia databases for strong matches to aural queries. Our model comprehensively expresses the types of error or variation between target and query: cumulative and non-cumulative local errors, transposition, tempo and tempo changes, insertions, deletions and modulation. The model is not only expressive, but automatically trainable, or able to learn and generalize from query examples. We present results of simulations, designed to assess the discriminatory potential of the model, and tests with real sung queries, to demonstrate relevance to real-world applications.

CSE-TR-472-03

Verifying pi -calculus Processes by Promela Translation

Song and Compton

Feb, 03

In this paper, the possibility of verifying pi-calculus processes via Promela translation is investigated. A general translation method from pi-calculus processes to Promela models is presented and its usefulness is shown by performing verification tasks with translated pi-calculus examples and SPIN. Model checking translated pi-calculus processes in SPIN is shown to overcome shortcomings of the Mobility Workbench, which implements a theorem-proving style mu-calculus model checking algorithm for the pi-calculus.

CSE-TR-473-03

Hop-Count Filtering: An Effective Defense Against Spoofed Traffic

Jin Wang and Shin

Feb, 03

IP spoofing has been exploited by Distributed Denial of Service (DDoS) attackers to (1) conceal flooding sources and localities of flooding traffic, and (2) coax uncompromised hosts into becoming reflectors, redirecting and amplifying flooding traffic. Thus, the ability to filter spoofed IP packets near victims is essential to their own protection as well as to their avoidance of becoming involuntary DoS reflectors. Although an attacker can forge any field in the IP header, he cannot falsify the number of hops an IP packet takes to reach its destination. This hop-count information can be inferred from the Time-to-Live (TTL) value in the IP header. Based on this observation, we propose a novel filtering technique for Internet servers to winnow away spoofed IP packets. By clustering address prefixes based on hop-counts, Hop-Count Filtering (HCF) builds an accurate IP to hop-count (IP2HC) mapping table to detect and discard spoofed IP packets. Through analysis using network measurement data, we show that HCF can identify and then discard close to 90% of spoofed IP packets with little collateral damage. We implement and evaluate the HCF in the Linux kernel, demonstrating its benefits with experimental measurements.

CSE-TR-474-03

Preventing Traffic Analysis in Secure Group Communication

Verma and Prakash

Feb, 03

Group security protocols have primarily focused on the three major security problems -- authentication, confidentiality and integrity -- and overall these problems have been well addressed. However, the problem of traffic analysis where an attacker monitors messages and is able to obtain useful information from the protocol header fields as well as the size, number or time of the packets has not been given much attention. In this paper, we discuss these problems from a point of view of anonymity -- privacy of the groups identity and membership. We present modifications to secure group communication protocols that provide anonymity to groups. Our scheme lets only members of group g know whether or not a packet is for group g. We also present an authenticated and private key-recovery scheme that allows only group members to detect missed rekey messages from subsequent group messages. We discuss the security guarantees our solutions provide, analyze the performance costs and discuss several performance optimizations. We show that adequate performance can be achieved for many applications, including protecting traffic on broadcast-based networks (such as Ethernets) from traffic analysis.

CSE-TR-475-03

What Causal Forces Shape Internet Connectivity at the AS-level?

Chang Jamin and Willinger

Apr, 03

Two ASs are connected in the Internet AS graph only if they have a business ``peering relationship.' By focusing on the AS subgraph AS_PC whose links represent provider-customer relationships, we present an empirical study that identifies three crucial causal forces at work in the design of AS connectivity: (i) AS-geography, i.e., locality and number of PoPs within individual ASs; (ii) AS-specific business models, abstract toy models that describe how individual ASs choose their ``best' provider; and (iii) AS evolution, a historic account of the ``lives' of individual ASs in a dynamic ISP market. Based on these findings that directly relate to how provider-customer relationships may be determined in the actual Internet, we develop a new optimization-driven model for Internet growth at the AS_PC level. Its defining feature is an explicit construction of a novel class of intuitive, multi-objective, local optimizations by which the different ASs determine in a fully distributed and decentralized fashion their ``best' upstream provider. We show that our model is broadly robust, perforce yields graphs that match inferred AS connectivity with respect to many different metrics, and is ideal for exploring the impact of new peering incentives or policies on AS-level connectivity.

CSE-TR-476-03

Constructing Optimal Policies for Agents with Constrained Architectures

Dolgov and Durfee

May, 03

Optimal behavior is a very desirable property of autonomous agents and, as such, has received much attention over the years. However, making optimal decisions and executing optimal actions typically requires a substantial effort on the part of an agent, and in some situations the agent might lack the necessary sensory, computational, or actuating resources to carry out the optimal policy. In such cases, the agent will have to do the best it can, given its architectural constraints. We distinguish between three ways in which an agents architecture can affect policy optimality. An agent might have limitations that impact its ability to formulate, operationalize (convert to internal representation), or execute an optimal policy. In this paper, we focus on agents facing the latter two types of limitations. We adopt the Markov decision problem framework in our search for optimal policies and show how gradations of increasingly constrained agent architectures create correspondingly more complex optimization problems ranging from polynomial to NP-complete problems. We also present algorithms based on linear and integer programming that work across a range of such constrained optimization problems.

CSE-TR-477-03

Experiences with Monitoring OSPF on a Regional Service Provider Network

Watson Labovitz and Jahanian

May, 03

This paper presents the results from a detailed, experimental study of OSPF, an intra-domain routing protocol, running on a mid-size regional Internet service provider. Using multiple, distributed probes running custom monitoring tools, we collected continuous protocol information for a full year. We use this data to analyze the health of the network including the amount, source, duration and periodicity of routing instability. We found that information from external routing protocols produces significant levels of instability within OSPF. We also examine the evolution of the routing topology over time, showing that short term changes are incremental and that the long term trend shows constant change. Finally, we present a set of detailed investigations into several large scale anomalies. These anomalies demonstrate the significant impact external routing protocols have on OSPF. In addition, they highlight the need for new network management tools that can incorporate information from routing protocols.

CSE-TR-478-03

Eve: A Scalable Network Client Emulator

Jamjoom Shin

Jun, 03

Client emulation tools play a central role in the performance evaluation, capacity planning, and workload characterization of servers. However, traditional emulation tools, because of their limited scalability and extensibility, fail to keep up with servers as they increase in complexity and performance. In this paper, we propose Eve, a scalable client emulation tool that is capable of stress-testing powerful servers. Eve relies on an open and modular architecture that provides a simple and extensible programming environment. By incorporating I/O call handing into its user-thread library, Eve is capable of simultaneously emulating thousands of clients. Furthermore, Eve uses a distributed shared variable (DSV) core to facilitate communication between different clients, thus enhancing scalability to multiple machines.

CSE-TR-479-03

Energy-Aware Quality of Service Adaptation

Pillai Huang and Shin

Jun, 03

In a wide variety of embedded control applications, it is often the energy resources that form the fundamental limits on the system, not the systems computing capacity. Various techniques have been developed to improve energy efficiency in hardware, such as Dynamic Voltage Scaling (DVS), effectively extending the battery life of these systems. However, a comprehensive mechanism of task adaptation is needed in order to make the best use of the available energy resources, even in the presence of DVS or other power-reducing mechanisms. Further complicating this are the strict timeliness guarantees required by real-time applications commonly found in embedded systems. This paper develops a new framework called Energy-aware Quality of Service (EQoS) that can manage real-time tasks and adapt their execution to maximize the benefits of their computation for a limited energy budget. The concept of an adaptive real-time task and the notion of utility, a measure of the benefit or value gained from their execution, are introduced. Optimal algorithms and heuristics are developed to maximize the utility of the system for a desired system runtime and a given energy budget, and then extended to optimize utility without regard to runtime. We demonstrate the effects of DVS on this system and how EQoS in conjunction with DVS can provide significant gains in utility for fixed energy budgets. Finally, we evaluate this framework through both simulations and experimentation on a working implementation.

CSE-TR-480-03

The Impact of Concurrency Gains on the Analysis and Control of Multi-Threaded Internet Services

Jamjoom Chou and Shin

Jul, 03

CSE-TR-481-03

Columnar Timing Mechanisms in Neural Models of Problem Solving

Simen Freedman Lewis adn Polk

Aug, 03

By using a columnar subnetwork of continuous-time sigmoid activation units as a building block, hierarchical neural networks can be constructed that serve as cognitive models of algorithmic behavior, including goal-directed problem solving. Previous work by Polk et al.(Cognitive Brain Research, 2002) demonstrated the power of a hierarchical composition of local attractor networks for modeling problem solving in the Tower of London task, but critical timing issues were addressed by non-neural components. The essential function of the column structure proposed here for addressing timing issues is to produce a controllable propagation delay in the signal from a columns input unit to its output unit. Lateral inhibition between input units and between output units in different columns forms layered decision-making modules. These modules use winner-take-all attractor dynamics to compute the results of simple if-then rules applied to the feedforward outputs of other modules. Strong recurrent excitation in multiple layers of these columnar modules produces activation-based short-term memory that preserves the symbolic results of these computations during algorithmic processing. We show that propagation delay between layers allows the application of timing methods from digital sequential circuit design that solve the timing problems inherent in the existing Tower of London model without relying on discrete-time updating or binary values.

CSE-TR-482-03

Sprint-and-Halt Scheduling for Energy Reduction in Real-Time Systems with Software Power-Down

Pillai and Shin

Aug, 03

Mobile computing platforms are performing increasingly complex and computationally intensive tasks. To help lengthen useful battery life, these platforms often incorporate some form of hardware power-down that is controlled by the system software. Unfortunately, these often incur substantial transition latencies when switching between power-down and active states, making them difficult to use in time-critical embedded systems. This paper introduces a class of sprint-and-halt schedulers that attempt to maximize the energy savings of software-controlled power-down mechanisms, while simultaneously maintaining hard real-time deadline guarantees. Several different algorithms are proposed to reclaim unused processing time, defer processing, and extend power-down intervals while respecting task deadlines. Sprint-and-halt schedulers are shown to reduce energy consumption by 40-70% over typical operating parameters. For very large or small state transition latencies, simple approaches work very close to theoretical limits, but over a critical range of latencies, advanced schedulers show 10-20% energy reduction over simpler methods.

CSE-TR-483-03

Stability Analysis of Legged Locomotion Models by Symmetry-Factored Return Maps

Altendorfer Koditschek and Holmes

Sep, 03

We present a new stability analysis for hybrid legged locomotion systems based on the ``symmetric" factorization of return maps. We apply this analysis to 2 and 3 degree of freedom (DOF) models of the Spring Loaded Inverted Pendulum (SLIP) with different leg recirculation strategies. Despite the non-integrability of the SLIP dynamics, we obtain a necessary condition for asymptotic stability (and a sufficient condition for instability) at a fixed point, formulated as an exact algebraic expression in the physical parameters. We use this expression to study a variety of 2 DOF SLIP models that have previously been posited as low dimensional representations of running, focusing on the sensory ``cost" required to achieve ``fast" transients as measured by the degree of singularity of the linearized dynamics. We introduce a new 3 DOF SLIP model with pitching dynamics whose stability properties, revealed by this analysis, provide for the first time the beginnings of a formal explanation for the surprisingly stable gaits of the open loop controlled robot, RHex.

CSE-Tr-484-03

End-to-End Delay Bounds for Traffic Aggregates under Guaranteed-Rate Scheduling Algorithms

Sun and Shin

Oct, 03

This paper evaluates, via both analysis and simulation, the end-to-end (e2e) delay performance of aggregate scheduling with guaranteed-rate (GR) algorithms. Deterministic e2e delay bounds for a single aggregation are derived under the assumption that all incoming flows at an aggregator conform to the token bucket model. %and all flows in the same traffic aggregate share %the same e2e path inside the aggregation region. Each aggregator can use any of three types of GR scheduling algorithms: stand-alone GR, two-level hierarchical GR, and rate-controlled two-level hierarchical GR. We also derive an e2e delay bound for the case of multiple aggregations when the rate-controlled two-level hierarchical GR is used at aggregators. By using the GR scheduling algorithms to handle traffic aggregates, we show not only the existence of delay bounds, but also the fact that under certain conditions (e.g., when the aggregate traverses a long path after the aggregation point) the bounds are even tighter than that of per-flow scheduling. We then compare the analytic delay bounds numerically, and conduct in-depth simulation to (i) confirm the analytic results, and (ii) compare the e2e delays of aggregate and per-flow scheduling. The simulation results have shown that aggregate scheduling is very robust and can take advantage of statistical multiplexing gains. It performs better than per-flow scheduling in most simulation scenarios we considered. Overall, aggregate scheduling is shown theoretically to provide bounded e2e delays, and practically to provide excellent e2e delay performance. Moreover, it incurs lower scheduling and state-maintenance overheads at routers than per-flow scheduling. All of these salient features make aggregate scheduling very attractive to Internet core networks.

CSE-TR-485-04

Infeasible Group Inversion and Broadcast Encryption

Irrer Lokam Opyrchal and Prakash

Jan, 04

Broadcast encryption allows a sender to broadcast messages in such a way that only the sender-specified subset of users can decrypt them. Over the years a number of schemes have appeared that solve different variants of the problem. The number of applications that could take advantage from an efficient broadcast encryption scheme has also grown and includes such examples as content-based publish subscribe systems. Such systems are characterized by messages going to arbitrary and rapidly changing subsets of users. Each message can potentially go any of the $2^n-1$ possible subsets. Collusion resistance and small resource requirements are necessary. In this paper we present a simple and efficient broadcast encryption scheme based on one-way group operations. A group operation $gpop$ is one-way if $x gpop y$ is easy to perform for any two group elements $x$ and $y$, but computing the inverse $inv{x}$ (w.r.t. $gpop$) is infeasible. Groups with infeasible inversion have recently received attention as a cryptographic primitive and their existence is still uncertain. Under the assumption that such groups exist, we prove the security of our broadcast encryption scheme and explore a few areas where such constructions might come from. Finally, we discuss some functions used in existing cryptosystems and compare them with one-way group operations.

CSE-TR-486-04

Coordinated Navigation of Multiple Independent Disk-Shaped Robots

Karagoz Bozma and Koditschek

Feb, 04

This paper addresses the coordinated navigation of multiple independently actuated disk-shaped robots - all placed within the same disk-shaped workspace. We encode complete information about the goal, obstacles and workspace boundary using an artificial potential function over the cross product space of the robots� simultaneous configurations. The closed-loop dynamics governing the motion of each robot take the form of the approriate projection of the gradient of this function. We show, with some reasonable restrictions on the allowable goal positions, that this function is an essential navigation function - a special type of artificial potential function that is ensured of connecting the kinematic planning with the dynamic execution in a correct manner. Hence, each robot is guaranteed of collision-free navigation to its destination from almost all initial free placements.

CSE-TR-487-04

Practical Slicing and Non-slicing Block-Packing without Simulated Annealing

Chan and Markov

Feb, 04

We propose a new block-packer BloBB based on multi-level branch-and-bound. It is competitive with annealers in terms of runtime and solution quality. We empirically quantify the gap between optimal slicing and non-slicing oorplans by comparing optimal packings and best seen results. Most ongoing work deals with non-slicing packings, and implicitly assumes that best slicing packings are highly sub-optimal. Contrary to common belief, we show that the gap in optimal slicing and non-slicing packings is very small. Optimal slicing and non-slicing packings for apte, xerox and hp are reported. We extend BloBB to the block-packer CompaSS, that handles soft blocks. Optimal slicing packings for soft versions of apte, xerox and hp are reported. We discover that the soft versions of all MCNC benchmarks, except for apte, and all GSRC benchmarks can be packed with zero dead-space. Moreover, the aspect ratio bound [0.5,2] turns out to be not very restrictive, when area is concerned. Our heuristic slicing block-packer is able to pack with zero dead-space in most cases when we restrict the aspect ratio bound to [0.57,1.70]. CompaSS improves the best published results for the ami49 X benchmarks suite, outperforming the leading multilevel annealer in runtime, solution quality and scalability. Additionally, realistic floorplans often have blocks with similar dimensions, if design blocks, such as memories, are reused. We show that this greatly reduces the complexity of black-packing.

CSE-TR-488-04

A Compressed Memory Hierarchy Using an Indirect Index Cache

Hallnor and Reinhardt

Mar, 04

The large and growing impact of memory hierarchies on overall system performance compels designers to investigate innovative techniques to improve memory-system efficiency. We propose and analyze a memory hierarchy that increases both the effective capacity of memory structures and the effective bandwidth of interconnects by storing and transmitting data in a compressed form. Caches play a key role in hiding memory latencies. However, cache sizes are constrained by die area and cost. A caches effective size can be increased by storing compressed data, if the storage unused by a compressed block can be allocated to other blocks. We use a modified Indirect Index Cache to allocate variable amounts of storage to different blocks, depending on their compressibility. By coupling our compressed cache design with a similarly compressed main memory, we can easily transfer data between these structures in a compressed state, increasing the effective memory bus bandwidth. This optimization further improves performance when bus bandwidth is critical. Our simulation results, using the SPEC CPU2000 benchmarks, show that our design increases performance by up to 107% on some benchmarks while degrading performance by no more thatn 2% on others. Compressed bus transfers alone account for up to 59% of this improvement, with the remainder coming from increased effective cache capacity. As memory latencies increase, our design becomes even more beneficial.

CSE-TR-490-04

Data Propagation in a Distributed File System

Kim, Noble, Chen, Tilbury

July, 04

Distributed file systems benefit greatly from optimistic replication that allows clients to update any replica at any site. However, such systems face a new dilemma: Once data is updated at a replica site, when should it be shipped to others? In conventional file system workloads, most data is read by its writer, so it needs to be shipped only for administrative reasons. Unfortunately, shipping on demand substantially penalizes those who do share, while shipping aggressively places undue load on the system, limiting scalability. This paper explores a mechanism to predict and support sharing in a wide area file system. This mechanism uses the observation that updates that invalidate cached objects elsewhere are likely to be shared, and they should be propagated. The mechanism is evaluated on its precision, recall, and F-measure over traces capturing live file system uses. It reduces data shipped by orders of magnitude compared to aggressive schemes, while reducing the performance penalty of on-demand shipment by nearly a factor of two.

CSE-TR-491-04

The Internet Motion Sensor: A distributed global scoped Internet threat monitoring system

Cooke, Bailey, Watson, Jahanian and Nazario

July, 2004

Networks are increasingly subjected to a broad spectrum of threats that impact the reliability and availability of criticalinfrastructure. In response, researchers and network operators have increasingly relied on monitoring to characterize and track these threats. This paper introduces the Internet Motion Sensor (IMS), a globally scoped Internet threat monitoring system whose goal is to measure, characterize, and track threats. The dark address sensors in the IMS extend simplepassive capture using a novel transport layer service emulation technique to elicit payloads across all services, thereby addressing the issue depth of service coverage. To achieve breadth of coverage, the IMS employs a distributed infrastructure and utilizes sensors that are aware of their address diversity and their position in the actively routedtopology. Finally, the IMS uses an innovative signature encoding and data warehousing system combined with a hierarchical architecture to realize a system that is not only time and space efficient, but is also scalable to a global deployment. We explore the various architectural tradeoffs in the context of a 3 year deployment across multiple darkaddress blocks ranging in size from /24s to a /8. We show how the current architecture emulates services across a diverse set of routed and address topologies in a scalable manner. Results from three recent events are presented to illustrate the utility of such a system: the SCO Denial of Service attacks (December, 2003), the Blaster worm (August,2003), and the Bagle backdoor scanning efforts (March, 2004).

CSE-TR-492-04

Pueblo: A Modern Pseudo-Boolean SAT Solver

Sheini and Sakallah

July, 2004

In this report we introduce a new SAT solver that integrates logic-based reasoning and integer programming methods to systems of CNF and PB constraints. Its novel features include an efficient PB literal watching strategy that takes advantage of the preponderance of unit-coefficient literals in most PB constraints. Additionally, the solver incorporates several PB learning methods that take advantage of the pruning power of PB constraints while minimizing their overhead. Empirical evidence suggests that such judicious injection of IP techniques can be quite effective in practice.

CSE-TR-493-04

CIDS: Causality Based Intrusion Detection System

King, Mao and Chen

August, 2004

This paper presents a new style of intrusion detection system, CIDS, that links network and host based intrusion detection systems together using causal dependencies. This combination provides a number of benefits. First, combining alerts reduces false positives for both network and host based approaches. Second, host based intrusion detection systems have significantly less data to process since they only act on processes and files that are causally linked to a suspicious network packet. Third, alarms are associated with a specific network service, allowing system administrators to act accordingly after detecting a compromise. To show the utility of this new system, we describe how several existing intrusion detection systems benefit from using our platform, and we introduce a new host based intrusion detection system that leverages the full power of causal dependency tracking.

CSE-TR-494-04

What and How Much to Gain by Spectral Agility?

Chou, Kim and Shin

August, 2004

Static spectrum allocation has resulted in low spectrum efficiency in licensed bands and poor performance of radio devices in crowded unlicensed bands. To remedy these problems, we exploit the concept of ``spectral agility" such that radio devices can dynamically utilize idle spectral bands. We establish a mathematical model for the performance gain made by spectral agility, and use the model to evaluate important performance metrics such as spectrum efficiency, throughput of a spectral-agile network, and packet blocking/waiting time of spectral-agile devices. We propose three basic mechanisms to realize spectral-agile networks: spectrum opportunity discovery, spectrum opportunity management, and spectrum use coordination. These mechanisms are implemented in the ns-2, and the control overhead incurred by using spectral agility is evaluated. Our simulation results have shown that the throughput of a spectral-agile network is improved by up to 90%, and the improvement is very close to the performance bound predicted by our analytical (mathematical) model. These results demonstrate and confirm the spectral agility's capability of improving spectral utilization in an efficient, distributed, and automatic manner.

CSE-TR-495-04

Debugging operating systems with time-traveling virtual machines

King, Dunlap, and Chen

August, 2004

Operating systems are among the most difficult of software systems to debug with traditional cyclic debugging. They are non-deterministic; they run for long periods of time; their state and code is large and complex; and their state is easily perturbed by the act of debugging. This paper describes a time-traveling virtual machine that overcomes many of the difficulties associated with debugging operating systems. By time travel, we mean the ability to navigate backward and forward arbitrarily through the execution history of a particular run and to replay arbitrary segments of the past execution. We integrate time travel into a general-purpose debugger to enable a programmer to debug an OS in reverse, implementing commands such as reverse breakpoint, reverse watchpoint, and reverse single step. The space and time overheads needed to support time travel are reasonable for debugging, and movements in time are fast enough to support interactive debugging. We demonstrate the value of our time-traveling virtual machine by using it to understand and fix several OS bugs that are difficult to find with standard debugging tools.

CSE-TR-496-04

Enriching intrusion alerts through multi-host causality

King, Mao, Lucchetti, and Chen

August, 2004

Current intrusion detection systems point out suspicious states or events but do not show how the suspicious state or events relate to other states or events in the system. We show how to enrich an IDS alert with information about how those alerts causally lead to or result from other events in the system. By enriching IDS alerts with this type of causal information, we can leverage existing IDS alerts to learn more about the suspected attack. Backward causal graphs can be used to find which host allowed a multi-hop attack (such as a worm) to enter a local network; forward causal graphs can be used to find the other hosts that were affected by the multi-hop attack. We demonstrate this use of causality on a local network by tracking the Slapper worm, a manual attack that spreads via several attack vectors, and an e-mail virus. Causality can also be used to correlate distinct network and host IDS alerts. We demonstrate this use of causality by correlating Snort and host IDS alerts to reduce false positives on a testbed system connected to the Internet.

CSE-TR-497-04

Design and Applications of a Virtual Context Architecture

Oehmke, Binkert, Reinhardt, Mudge

September, 2004

This paper proposes a new register-file architecture that virtualizes logical register contexts. This architecture makes the number of active register contexts�representing different threads or activation records�independent of the number of physical registers. The physical register file is treated as a cache of a potentially much larger memory-mapped logical register space. The implementation modifies the rename stage of the pipeline to trigger the movement of register values between the physical register file and the data cache. We exploit the fact that the logical register mapping can be easily updated�simply by changing the base memory pointer�to construct an efficient implementation of register windows. This reduces the execution time by 8% while generating 20% fewer data cache accesses. We also use the large logical register space to avoid the cost of a large physical register file normally required for a multithreaded processor, allowing us to create an SMT core with fewer physical registers than logical registers that still performs within 2% of the baseline. Finally, the two are combined to create a simultaneous multithreaded processor that supports register windows. This architecture achieves a 10% increase in performance over the baseline architecture even with fewer physical than logical registers while also reducing data cache bandwidth. Thus we are able to combine the advantages of register windows with multithreading.

CSE-TR-498-04

Plato: A Platform for Virtual Machine Services

King, Dunlap, and Chen

October, 2004

Virtual machines are being used to add new services to system level software. One challenge these virtual machine services face is the semantic gap between VM services and the machine-level interface exposed by the virtual machine monitor. Using the virtual machine monitor interface, VM services have access to hardware-level events like Ethernet packets or disk I/O. However, virtual machine services also benefit from guest software (software running inside the virtual machine) semantic information, like sockets and files. These abstractions are specific to the guest software context and are not exposed directly by the machine-level virtual machine monitor interface. Existing ways to bridge this semantic gap are either ad-hoc or use debuggers. Ad-hoc methods often lead to cutting-and-pasting large sections of the guest operating system to reconstruct its interpretation of the hardware level events. Debuggers add too much overhead for production environments. Both ad-hoc methods and debuggers could cause unwanted perturbations to the virtual system. To address these shortcomings, we developed a new platform for implementing virtual machine services: Plato. The goal of Plato is to make it easy and fast to develop and run new virtual machine service. Plato allows VM services to call guest kernel functions and access guest local variables, eliminating the need to cut-and-paste sections of the virtual machine source code. Plato provides a checkpoint/rollback facility that allows VM services to correct for undesired perturbations to the virtual state. Plato adds less than 5% overhead for a variety of macrobenchmarks.

CSE-TR-499-04

A Hybrid Honeypot Architecture for Scalable Network Monitoring

Bailey, Cooke, Watson, Jahanian and Provos

October, 2004

To provide scalable, early warning and analysis of new Internet threats like worms or automated attacks, we propose a globally distributed, hybrid monitoring architecture that can capture and analyze new vulnerabilities and exploits as they occur. To achieve this, our architectures increases the exposure of high-interaction honeypots to these threats by employing low-interaction honeypots as frontend content filters. Host-based techniques capture relevant details such as packet payload of attacks while network monitoring provides wide coverage for quick detection and assessment. To reduce the load of the backends, we filter prevalent content at the network frontends and use a novel handoff mechanism to enable interactions between network and host components. We use measurements from live networks over five months to demonstrate the effectiveness of content prevalence as a filtering mechanism. Combining these observations with laboratory measurements, we demonstrate that our hybrid architecture is effective in preserving the detail of a specific threat while still achieving performance and scalability. We illustrate the benefits of this framework by showing how it enables earlier, higher-confidence detection, more detailed forensics, and robust signatures for mitigation of threats.

CSE-TR-500-04

Weakly Supervised Graph-Based Methods for Classification

Radev

October, 2004

We compare two weakly supervised graph-based classification algorithms: spectral partitioning and tripartite updating. We provide results from empirical tests on the problem of number classification. Our results indicate (a) that both methods require minimal labeled data, (b) that both methods scale well with the number of unlabeled examples, and (c) that tripartite updating outperforms spectral partitioning.

CSE-TR-501-04

Media-Aware Rate Control

Wang, Banerjee and Jamin

November, 2004

Streaming media transfers over the Internet are expected to behave in a TCP friendly manner and react appropriately to congestion similar to the way TCP does. However, our work shows that existing "TCP-friendly" protocols are not necessarily "media friendly". In this paper, we present our findings on the impact of the TFRC protocol on streaming media quality. We propose the MARC (Media Aware Rate Control) protocol that, unlike TFRC, exhibits significant tolerance towards transient changes in background workload, while still maintaining TCP-friendliness.

CSE-TR-502-04

Network Maps Beyond Connectivity

Jin, Wang and Jamin

November, 2004

Knowing network topology is becoming increasingly important for a number of applications such as server placement and traceback of DDoS attacks. Recent works in modeling the Internet topology and constructing network maps have focused on the connectivity aspect. This paper describes the first study in incorporating connectivity, latency, and routing information all into a network map based on a large set of traceroute data. We highlight the common challenges in constructing such network maps and discuss our solutions. We evaluate our network map based on various Internet routing models proposed in the literature. The evaluation shows that, for those traceroute data that we are able to evaluate, at least 85% of computed hop-counts and latencies are within a factor of two of the actual values. Furthermore, we show that a flat routing model based on hop-count performs as well as more complicated policy routing models.

CSE-TR-503-04

Worm Hotspots: Explaining Non-Uniformity in Worm Targeting Behavior

Cooke, Mao, and Jahanian

November, 2004

Long after the Blaster, Slammer/Sapphire, and CodeRedII worms caused significant worldwide disruptions, a huge number of infected hosts from these worms continue to probe the Internet today. This paper investigates hotspots (non-uniformities) in the targeting behavior of these important Internet worms. Recent data collected over the period of a month and a half using a distributed blackhole data collection infrastructure covering 18 networks including ISPs, enterprises, and academic networks show 75K Blaster infected hosts, 180K slammer infected hosts, and 55K CodeRedII hosts. We discover through detailed analysis how critical flaws and side effects in the targeting behavior lead to a significant bias for certain destination address blocks. In particular, we demonstrate three previously unexplored biases: a severely restricted initial random seed forcing infection attempts to certain blocks; flaws in the parameters of a random number generator making certain hosts cycle through limited target addresses; and the widespread use of private address space dramatically changing the targeting distribution of certain worms. A direct consequence of these biases is that certain blocks are subjected to far more infection attempts than others. We discuss the implication of these hotspots on worm simulation and modeling, placement of blackhole sensors, worm detection and quarantine.

CSE-TR-504-04

Cooperative ReVirt: Adapting Message Logging for Intrusion Analysis

Basrai and Chen

Nov, 2004

Virtual-machine logging and replay enables system administrators to analyze intrusions more completely and with greater integrity than traditional system loggers. One challenge in these types of systems is the need to log a potentially large volume of network traffic. Cooperative ReVirt adds message-logging techniques to ReVirt to reduce the amount of network traffic that needs to be logged. Cooperative ReVirt adapts message-logging techniques to address the challenges of intrusion analysis, such as the need to work in the presence of network attacks and unreliable networks, the need to support asymmetric trust relationships among computers, and the need to support dynamic trust and traffic patterns. CooperativeReVirt is able to reduce the log volume needed to replay a computer by an average of 70% for a variety of distributed computing benchmarks, while adding less than 7% overhead. Measurements of a live network indicate that Cooperative ReVirt would be able to avoid logging 85% of the received network data.

CSE-TR-505-04

Analyzing NIC Overheads in Network-Intensive Workloads

Binkert, Hsu, Saidi, Dreslinski, Schultz and Reinhardt

December, 2004

Modern high-bandwidth networks place a significant strain on host I/O subsystems. However,despite the practical ubiquity of TCP/IP over Ethernet for high-speed networking, the vast majority of end host networking research continues in the current paradigm of the network interface as a generic peripheral device. As a result, proposed optimizations focus on purely software changes, or on moving some of the computation from the primary CPU to the off-chip network interface controller (NIC). We look at an alternative approach: leave the kernel TCP/IP stack unchanged, but eliminate bottlenecks by closer attachment of the NIC to the CPU and memory system. To evaluate this approach, we have developed a simulation environment specifically targeted for networked systems. It simulates server and client systems along with a network in a single process. Full-system simulation captures the execution of both application and OS code. Our model includes a detailed out-of-order CPU, event-driven memory hierarchy, and Ethernet interface device. Using this simulator, we find that tighter integration of the network interface can provide benefits in TCP/IP throughput and latency. We also see that the interaction of the NIC with the on-chip memory hierarchy has a greater impact on performance than the raw improvements in bandwidth and latency that come from integration.

CSE-TR-506-05

From Max-SAT to Min-UNSAT: Insights and Applications

Liffiton, Andraus and Sakallah

February, 2005

This report describes a strong connection between maximum satisfiability and minimally-unsatisfiable subformulas of any constraint system, as well as techniques for exploiting it. Focusing on CNF formulas, we explore this relationship and present novel algorithms for extracting minimally-unsatisfiable subformulas, including one that finds all such subformulas. We present experimental results showing how these can be used in counterexample-guided abstraction refinement for hardware verification, decreasing the number of iterations the process requires to verify a design. A large set of benchmarks is used to investigate the relationship between maximum satisfiability and minimally-unsatisfiable subformulas in a product configuration application.

CSE-TR-507-05

Boost Logic: A High Speed Energy Recovery Circuit Family

Sathe, Ziesler and Papaefthymiou

February, 2005

In this paper, we propose Boost Logic, a logic family which relies on voltage scaling, gate overdrive and energy recovery techniques to achieve high energy efficiency at frequencies in the GHz range. The key feature of our design is the use of an energy recovering ``boost' stage to provide an efficient gate overdrive to a highly voltage scaled logic at near threshold supply voltage. We have evaluated our logic family using post-layout simulation of an 8-bit pipelined array multiplier in a 0.13um CMOS process with Vth=340mV. At 1.6GHz and a 1.3V supply voltage, the Boost multiplier dissipates 8.11pJ per computation. Comparing results from post-layout simulations of boost and voltage-scaled conventional multipliers, our proposed logic achieves 68% energy savings with respect to static CMOS. Using low Vth devices, Boost Logic has been verified to operate at 2GHz with a 1.25V voltage supply and 8.5pJ energy dissipation per cycle.

CSE-TR-508-05

Towards Declarative Querying for Biological Sequences

Tata, Patel, Friedman and Swaroop

March, 2005

The ongoing revolution in life sciences research is producing vast amounts of genetic and proteomic sequence data. Scientists want to pose increasingly complex queries on this data, but current methods for querying biological sequences are primitive and largely procedural. This limits the ease with which complex queries can be posed, and often results in very inefficient query plans. There is a growing and urgent need for declarative and efficient methods for querying biological sequence data. In this paper we introduce a system called Periscope/SQ which addresses this need. Queries in our system are based on a well-defined extension of relational algebra. We introduce new physical operators and support for novel indexes in the database. As part of the optimization framework, we describe a new technique for selectivity estimation of string pattern matching predicates that is more accurate than previous methods. We also describe a simple, yet highly effective algorithm to optimize sequence queries. Using a real-world application in eye genetics, we show how Periscope/SQ can be used to achieve a speedup of two orders of magnitude over existing procedural methods!

CSE-TR-489-04

Network Overlay Construction under Limited End-to-End Addressability

Wenjie Wang, Cheng Jim ad Sugih Jamin

May, 2004

Network overlay construction has found many applications in today's Internet, such as P2P networks [1][2], end-host multicast [3][4], and network security [5]. Many researchers have proposed solutions to building an overlay network among a given group of end-hosts. The usual overlay construction assumes two-way network communication� each host can initiate connections to and accept requests from other hosts. This is not always true on the Internet due to the use of Network Address Translation (NAT) and rewalls. Our experiments with one P2P le-sharing system revealed that over 34% of hosts were guarded hosts, i.e., hosts that cannot accept connections from other Internet hosts. The lack of two-way communication capability presents a challenge because only a subset of hosts can act as overlay routers. Using a cluster-base approach, we design a new overlay construction mechanism called e*, which organizes members based on reachability and intelligently select overlay routers to reduce end-to-end latencies on the overlay. Under realistic scenarios involving guarded hosts, e* can reduce average end-to-end latency on the overlay by 28-61% compared to existing protocols. Furthermore, e* signicantly improves the worst case end-to-end latency on the overlay.

CSE-TR-489-04

Network Overlay Construction under Limited End-to-End Addressability

Wenjie Wang, Cheng Jim ad Sugih Jamin

May, 2004

CSE-TR-509-05

On the Design and Implementation of a Proxy-Based Architecture for Web Access on Mobile Devices

Upatkoon, Wang and Jamin

April, 2005

In this paper, we introduce WebBee, a client-proxy architecture that combines a web scraping proxy with a Java client to make a platform-independent gateway between small mobile devices, such as mobile phones, and the vast information available on the World Wide Web. The key technology behind WebBee is a web scraping engine that executes "scraping scripts," a domain-specific scripting language we designed for the purpose of fetching web pages and selectively extracting information from them. By transmitting to and displaying only this extracted information on the client device, WebBee gives mobile devices with small displays and low network bandwidth clean and quick access to web content customarily designed for desktop browsers. With WebBee, providers do not need to tailor their contents specifically for mobile devices. Furthermore, the use of a Java client ensures that WebBee is a portable solution among modern mobile devices.

CSE-TR-510-05

Multi-scale Line Drawings from 3D Meshes

Alex Ni, Kyuman Jeong, Seungyong Lee, and Lee Markosian

July, 2005

We present an effective method for automatically selecting the appropriate scale of shape features that are depicted when rendering a 3D mesh in the style of a line drawing. The method exhibits good temporal coherence when introducing and removing lines as needed under changing viewing conditions, and it is fast because the calculations are carried out entirely on the graphics card. The strategy is to pre-compute a sequence of filtered versions of the mesh that eliminate (via a signal processing technique) shape features at progressively larger scales. Each mesh in the sequence retains the original number of vertices and connectivity, providing a direct correspondence between meshes. The mesh sequence is loaded onto the graphics card, and at run-time a vertex program interpolates positions and curvature values between two of the meshes, depending on distance to the camera. A pixel program then renders silhouettes and suggestive contours to produce the line drawing. In this way, fine shape features are depicted over nearby surfaces, while appropriately coarse-scaled features are depicted over more distant regions.

CSE-TR-511-05

Performance Validation of Network-Intensive Workloads on a Full-System Simulator

Ali G. Saidi, Nathan L. Binkert, Lisa R. Hsu, and Steven K. Reinhardt

July, 2005

Performance accuracy is a critical but often neglected aspect of architectural performance simulators. One approach to evaluating performance accuracy is to attempt to reproduce observed performance results from a real machine. In this paper, we attempt to model the performance of a Compaq Alpha XP1000 workstation using the M5 full-system simulator. There are two novel aspects to this work. First, we simulate complex TCP/IP networking workloads and use network bandwidth as our primary performance metric. Unlike conventional CPU-intensive applications, these workloads spend most of their time in the operating system kernel and include significant interactions with platform hardware such as the interrupt controller and network interface device. Second, we attempt to achieve performance accuracy without extremely precise modeling of the reference hardware. Instead, we use simple generic component models and tune them to achieve appropriate bandwidths and latencies. Overall, we were able to achieve reasonable accuracy even with our relatively imprecise model, matching the bandwidth of the real system within 15% in most cases. We also used profiling to break CPU time down into categories, and found that the simulation results correlated well with the real machine.

CSE-TR-512-05

Towards Protecting Sensitive Files in a Compromised System

Xin Zhao, Kevin Borders, and Atul Prakash

September, 2005

Protecting sensitive files from a compromised system helps administrator to thwart many attacks, discover intrusion trails, and fast restore the system to a safe state. However, most existing file protection mechanisms can be turned off after an attacker manages to exploit a vulnerability to gain privileged access. In this paper we propose SVFS, a Secure Virtual File System that uses virtual machine technology to store sensitive files in a virtual machine that is dedicated to providing secure data storage, and run applications in one or more guest virtual machines. Accesses to sensitive files must go through SVFS and are subject to access control policies. Because the access control policies are enforced independently in an isolated virtual machine, intruders cannot bypass file protection by compromising a guest VM. In addition, SVFS introduces a Virtual Remote Procedure Call mechanism as a substitute of standard RPC to deliver better performance in data exchanging across virtual machine boundaries. We implemented SVFS and tested it against attacks on a guest operating system using several available rootkits. SVFS was able to prevent most of the rootkits from being installed, and prevent all of them from persisting past reboot. We also compared the performance of SVFS to the native Ext3 file system and found that performance cost was reasonable considering the security benefits of SVFS. Our experimental results also show VRPC does improve the filesystem performance.

CSE-TR-513-05

Resonant System Design with Coarse Grained Pipelines

Visvesh S. Sathe and Marios C. Papaefthymiou

September, 2005

In this report, we present an efficient approach to resonant system design. Our approach involves the use of resonant clocks to drive level sensitive latches in pipelined datapaths. Through judicious design of these timing elements, the energy efficiency of resonant clocking can be obtained without performance penalties, while maintaining robust, race-free operation. Since our approach involves driving only the timing elements with resonant clocks and places no restrictions on the type of computational logic, the method can be used with existing static CMOS design flows. We describe our technique for two, three and four phase clock systems and present clock generation mechanisms. We also introduce the level-sensitive timing elements to be used with these clocks and discuss how they are introduced into a datapath.

CSE-TR-514-06

A Simple Integrated Network Interface for High-Bandwidth Servers

Nathan L. Binkert, Ali G. Saidi, and Steven K. Reinhardt

January, 2006

High-bandwidth TCP/IP networking places a significant burden on end hosts. We argue that this issue should be addressed by integrating simple network interface controllers (NICs) more closely with host CPUs, not by pushing additional computation out to the NICs. We present a simple integrated NIC design (SINIC) that is significantly less complex and more flexible than a conventional DMA-descriptor-based NIC but performs as well or better than the conventional NIC when both are integrated onto the processor die. V-SINIC, an extended version of SINIC, provides virtual per-packet registers, enabling packet-level parallel processing while maintaining a FIFO model. V-SINIC also enables deferring the copy of the packet payload on receive, which we exploit to implement a zero-copy receive optimization in the Linux 2.6 kernel. This optimization improves bandwidth by over 50% on a receive-oriented microbenchmark.

CSE-TR-515-06

Language and Analysis Techique for Efficient Software Model Checking of Data Structure Properties

Paul Darga and Chandrasekhar Boyapati

January, 2006

This paper presents novel language and analysis techniques that significantly speed up software model checking of data structure properties. Consider checking a red-black tree implementation. Traditional software model checkers systematically generate all red-black tree states (within some given bounds) and check every red-black tree operation (such as insert, delete, or lookup) on every red-black tree state. Our key idea is as follows. As our checker checks a red-black tree operation o on a red-black tree state s, it uses a combination of dynamic and static analyses to identify other red-black tree states s'_1, s'_2, ..., s'_k on which the operation o behaves similarly. Our analyses guarantee that if o executes correctly on s, then o will execute correctly on every s'_i. Our checker therefore does not need to check o on any s'_i once it checks o on s. It thus safely prunes those state transitions from its search space, while still achieving complete test coverage within the bounded domain. Our preliminary results show orders of magnitude improvement over previous approaches. We believe our techniques can make checking of data structure properties significantly faster, and thus enable checking of much larger programs than currently possible.

CSE-TR-516-06

Accurate Real-time Identification of IP Hijacking

Xin Hu and Z. Morley Mao

May, 2006

In this paper, we present novel and practical techniques to accurately detect IP prefix hijacking attacks in real time to facilitate timely mitigation responses. There are strong evidences that IP hijacking is common on today��s Internet. Attackers may hijack victim��s IP address space to perpetrate malicious activities such as spamming and launching DoS attacks without worrying about disclosing their identity through source IP addresses. More seriously, they can disrupt network services or regular communication by temporarily stealing actively used addresses. Unintentional network misconfigurations can also have similar effects, possibly leading to severe impact on reachability. We propose novel ways to much more accurately detect IP hijacking by combining analysis of passively collected BGP routing updates and data plane fingerprints of suspicious prefixes. The key insight is to use data plane information in the form of edge network fingerprinting to disambiguate potentially numerous suspect IP hijacking incidences based on routing anomaly detection. Previous work on identifying IP hijacking solely relies on control plane information in the form of anomalous routing updates or external data such as stale address registries. Such an approach is inaccurate, suffering from too many false positives to be useful in practice. In our proposed scheme, real-time fingerprinting provides confirming evidence for hijacking, while incurring little overhead. More importantly, we provide mechanisms to perform online mitigation rather than post-mortem analysis. Utilizing real-time BGP data from multiple feeds as well as RouteViews, we demonstrate the ability of our system to distinguish between legitimate routing changes and hijacking attacks.

CSE-TR-517-06

OpenFire: Opening Networks to Reduce Network Attacks on Legitimate Services

Kevin Borders, Laura Falk, and Atul Prakash

May, 2006

Remote network attacks are a serious problem facing network administrators in universities, corporations, and governments today. Hackers, worms, and spammers will exploit and control hosts inside of a network to send junk E-mail, create bot nets, launch DDoS attacks, and steal sensitive information. Current solutions exist that attempt to deal with these threats, but they are often unsuccessful. In addition, security researchers have deployed honeypots to lure attackers and learn more about their techniques, but honeypots have had only limited success at gaining information and protecting other machines. In response to need for better security against network attacks, we present OpenFire. OpenFire is protection system that takes the opposite approach of traditional firewalls: instead of blocking unwanted traffic, it instead accepts {it all} traffic, forwarding unwanted messages to a cluster of decoy machines. To the outside, all ports and all IP addresses appear open in an OpenFire network. To evaluate OpenFire, we deployed it in a live sub-network here at the University of Michigan. During a three-week evaluation period, we found that OpenFire reduced the number of attacks on legitimate computers by 57%, while honeypots only reduced attacks by 8%. OpenFire decoys also had considerably better exposure than honeypots, and were able to elicit over three times as many attacks during the evaluation period.

CSE-TR-518-06

Adaptive MAC-layer Sensing of Spectrum Availability in Cognitive Radio Networks

Hyoil Kim and Kang G. Shin

May, 2006

In the recently-suggested dynamic spectrum allocation policy of cognitive radio networks [1]-[3], sensing/monitoring of spectrum availability is identified as a key requirement. To meet this requirement we address an important MAC-layer sensing issue: which of proactive and reactive sensing is more energy-efficient? An algorithm is proposed to dynamically determine which sensing mode to use. For proactive sensing, sensing-period adaptation to maximize discovery of opportunities and optimal-ordering of channels to minimize the delay in finding an available channel are proposed. Channel-usage patterns are also estimated. Our simulation results demonstrate the efficacy of the proposed sensing scheme, as well as its performance improvements over the previously-proposed schemes. The sensing-period adaptation discovers up to 98% of the analytical maximum of spectral opportunities, regardless of choice of the initial sensing period. For the testing scenario and simulation parameters we used, the proposed scheme is shown to discover up to 20% more opportunities than the previous sensing schemes without sensing-period adaptation. This improvement may become greater as the initial sensing period grows. The delay in finding an idle channel with the proposed channel-ordering is around 0.02 second, which is a half of the delay without channel-ordering, and the proposed scheme yields steady performance over a wide range of the number of channels.

CSE-TR-519-06

ECO-system: Enbracing the Change in Placement

Jarrod A. Roy and Igor L. Markov

June, 2006

In a successful RTL-to-GDS flow, numerous circuit modifications and optimizations must interact with physical aspects of the design. For example, timing violations can be moderated by gate sizing and net buffering, while routing violations can often be resolved by local replacement. Such a community of tools requires a reliable system for performing Engineering Change Orders (ECOs), notably in placement. Existing work is often overly incremental and limited in the amount of design change it can support. Most of it describes stand-alone tools that offer poor interfaces to the design flow and cannot handle a full range of modern VLSI layouts. We develop a new, strong ECO-system that reliably handles fixed objects and movable macros in instances with widely varying amounts of whitespace. It detects layout regions and sections of the netlist that require modification and applies an adequate amount of change in each case: given a reasonable initial placement, it applies minimal changes, but is capable of re-placing large regions to handle pathological cases. ECO-system favorably compares to recent detail placers and fares well in high-level and physical synthesis.

CSE-TR-520-06

Malware Spreading in Mobile Enterprise Environments

Abhijit Bose, Scott Boehmer and Kang G. Shin

June, 2006

The proliferation of networked mobile devices, such as laptops, PDAs, and cellular phones, make today�s enterprises highly vulnerable to an entirely new generation of malicious bots, worms and viruses. These malicious �agents� can spread not only via traditional social engineering methods, such as email, peer-to-peer and instant messaging (IM), but also via new vectors, such as proximity scanning using Bluetooth and SMS/MMS messages. The standard epidemic models based on homogeneity, average-degree distributions, and perfect-mixing assumptions are inadequate to model malicious agents in enterprise networks with overlapping wired, wireless and cellular segments. In this paper, we study three parameters crucial to describing malware propagation in such environments: service interactions among hosts, local network structure, and user mobility. The majority of the parameters in our study are derived from real-life network traces collected from a large enterprise network, and therefore, represent realistic malware propagation and infection scenarios. We propose a general-purpose agent-based malware modeling framework targeted to enterprise environments. We examine two likely scenarios: (i) a malicious virus such as Cabir spreading among the subscribers of a cellular network using Bluetooth, and (ii) a hybrid worm that exploit email and P2P file-sharing to infect users of an enterprise network. In both cases, we identify the parameters crucial to the spread of the epidemic based upon our extensive simulation results.

CSE-TR-521-06

Node Mergers in the Presence of Don't Cares

Stephen Plaza, Kai-hui Chang, Igor Markov, and Valeria Bertacco

July, 2006

Merging equivalent nodes in a circuit is a synthesis technique useful for reducing area and improving equivalence checking. Recently proposed algorithms that determine node equivalence are unable to exploit observability don't cares (ODC) and therefore miss several merging opportunities. Current strategies for extracting don't care information for identifying additional node mergers restrict the types of don't cares computed because of computational expense. We develop an efcient framework that merges nodes by exploiting ODCs through simulation and SAT. Specically, the framework integrates (1) a novel strategy for representing ODC information during logic simulation, (2) a fast ODC-aware simulator, (3) an ODC-aware equivalence checker, and (4) progressive renement of simulation vectors. Our results indicate that ODCs enable many node mergers with up to 30% gate reduction on unoptimized circuits and 23% area reduction after synthesis.

CSE-TR-522-06

Keeping Physical Synthesis Safe and Sound

Kai-hui Chang, Igor L. Markov and Valeria Bertacco

July, 2006

In this work we define and explore the concepts of physical safeness and logical soundness, and empirically evaluate the effects of physical safeness on route length, via count and timing. In addition, we propose a new physically safe and logically sound optimization, SafeResynth, which provides immediately-measurable improvement without altering the functionality of a design. It can enhance circuit timing without detrimental effects on route length and congestion. We achieve these improvements by performing a series of netlist transformations and re-placements that are individually evaluated for logical soundness (using on-line verification) and physical safeness. When used alone, SafeResynth improves circuit delay of IWLS05 benchmarks by 11% on average after routing, while increasing route length by less than 0.2%. Our resynthesis can also be used in an unsafe mode, akin to more traditional physical synthesis algorithms popular in commercial tools. Applied together, our safe and unsafe transformations achieve 24% average delay improvement for seven large benchmarks from the OpenCores suite, which we show to be orthogonal to improvement from timing-driven placement.

CSE-TR-523-06

Improving Resiliency of Overlay Networks for Streaming Applications

Wang, Du and Jamin

July, 2006

Early deployment of peer-to-peer (P2P) streaming network demonstrates its potential to deliver quality media streams to a large audience. However, P2P streaming is vulnerable to node failures and malicious attacks because of its high bandwidth demand and stringent timing requirement. In this paper, we investigated several heuristics to improve an overlay's resilience to failures and attacks. We formalized the overlay connectivity resilience as a graph theoretic problem and proved that disjoint paths with variable lengths are effective in increasing overlays' connectivity resilience. We proposed a distributed heuristic called �MPath� that enables each peer to improve its connectivity without global knowledge or coordination. We designed the MPath heuristic to work in conjunction with existing overlay improvement mechanisms. Such integration not only speeds up overlay convergence, but also significantly cuts down MPath's maintenance overhead. Our experiments showed that MPath reduced the number of disconnected components in an overlay by an order of magnitude when 20% of overlay nodes failed.

CSE-TR-524-06

PAN-on-Demand: Building self-organizing WPANs for better power management

Manish Anand and Jason Flinn

August, 2006

In this paper, we present PAN-on-Demand, a self-organizing multi-radio system that balances performance and energy concerns by scaling the structure of the network to match the demands of applications. Since current mobile devices ship with multiple radios, PAN-on-Demand can improve performance and extend battery lifetime by switching between different wireless interfaces such as Bluetooth and WiFi and opportunistically exploiting the available power-saving strategies. When applications are actively using the network, PAN-on-Demand offers high-bandwidth, low-latency communication; when demand is light, PAN-on-Demand adapts the network structure to minimize energy usage. Our results show that PAN-on-Demand reduces the average response time of common PAN applications like MP3 playing, e-mail viewing and photo sharing by 92% and extends the battery lifetime of mobile devices by up to 47% compared to current PAN management strategies.

CSE-TR-525-06

A Type System for Preventing Data Races and Deadlocks in the Java Virtual Machine Language

Pratibha Permandla and Chandrasekhar Boyapati

September, 2006

In previous work on SafeJava we presented a type system extension to the Java source language that statically prevents data races and deadlocks in multithreaded programs. SafeJava is expressive enough to support common programming patterns, its type checking is fast and scalable, and it requires little programming overhead. SafeJava thus offers a promising approach for making multithreaded programs more reliable. This paper presents a corresponding type system extension for the Java virtual machine language (JVML). We call the resulting language SafeJVML. Well-typed SafeJVML programs are guaranteed to be free of data races and deadlocks. Designing a corresponding type system for JVML is important because most Java code is shipped in the JVML format. Designing a corresponding type system for JVML is nontrivial because of important differences between Java and JVML. In particular, the absence of block structure in JVML programs and the fact that they do not use named local variables the way Java programs do make the type systems for Java and JVML significantly different. For example, verifying absence of races and deadlocks in JVML programs requires performing an alias analysis, something that was not necessary for verifying absence of races and deadlocks in Java programs. This paper presents static and dynamic semantics for SafeJVML. It also includes a proof that the SafeJVML type system is sound and that it prevents data races and deadlocks. To the best of our knowledge, this is the first type system for JVML that statically ensures absence of synchronization errors.

CSE-TR-526-06

Improving Distributed File System Performance in Virtual Machine Environments

Xin Zhao, Atul Prakash, Brian Noble, and Kevin Borders

September, 2006

Virtual machine (VM) systems have traditionally used virtual disks for file storage. Recently, there has been interest in using distributed file systems as a way to provide data storage to guest virtual machines, with the file server running on the same physical machine. Potential advantages include fine-grained data sharing, data protection, versioning, and backup to multiple guests from one central place. Unfortunately, distributed file systems often perform substantially worse than local file systems because of high network overheads, even when the file server is on the same physical machine. This paper proposes two mechanisms to reduce communication overheads of inter-VM file system operations: Inter-VM metadata sharing and Virtual Remote Procedure Call (VRPC). Inter-VM metadata sharing enables clients to directly read file attributes from the server without an RPC exchange. VRPC follows standard RPC format but uses inter-VM shared memory to exchange data between a file server and its clients, which cuts out a lot of communication overhead. We implemented these two mechanisms on the Xen virtual machine platform and adapted NFS version 3 to take advantage of these two mechanisms. For an Apache build workload, NFS with Inter-VM metadata sharing and VPRCs was only 6.3% slower than the Ext3 file system, while standard NFS was 31.2% slower. These results suggest that Inter-VM metadata sharing and VRPCs significantly reduce communication overhead between virtual machines running on the same hardware.

CSE-TR-527-06

Design of Location Service for a Hybrid Network of Mobile Actors and Static Sensors

Zhigang Chen, Min-gyu Cho and Kang G. Shin

September, 2006

Location services are essential to many applications running on a hybrid of wirelessly-networked mobile actors and static sensors, such as surveillance systems and the Pursuer and Evader Game (PEG). To our best knowledge, there has been no previous location service protocol for wireless sensor networks. A number of location service protocols have been proposed for mobile ad hoc networks, but they are not applicable to sensor networks due to the usually large per-hop latency between sensors. In this report, we present a distributed location service protocol (DLSP) for wireless sensor networks. Using a rigorous analysis of DLSP, we derive the condition for achieving a high packet-delivery ratio, and show how to configure the protocol parameters to ensure the scalability of DLSP. We prove that DLSP is scalable if the mobile�s speed is below a certain fraction of the packet-transmission speed, which depends on a movement threshold. For example, if the movement threshold for the lowest-level location servers is the same as the radio range, the mobile�s speed limit is one-tenth of the packet-transmission speed. The mobile�s theoretical speed limit is one-fifth of the packet-transmission speed, beyond which DLSP cannot scale regardless of the movement threshold. Because DLSP suffers from a high location-update overhead, we propose an optimization, called DLSP with the Selected Neighbor (DLSP-SN), which can reduce the update overhead by more than 70%, while achieving a high packet-delivery ratio. Due to the griding effect, the average packet�s path length of DLSP-SN is longer than that of DLSP. This increases data-delivery cost for continuous data streams. In order to make a tradeoff between update and data-delivery costs, we present a greedy adaptation mechanism, called DLSP-ASN, which can make a significant improvement of overall energy-efficiency.

CSE-TR-528-07

Extracting Statistical Loop-Level Parallelism using Hardware-Assisted Recovery

Steven A. Lieberman, Hongtao Zhong, and Scott Mahlke

February, 2007

Chip multiprocessors with multiple simpler cores are gaining popularity because they have the potential to drive future performance gains without exacerbating the problems of power dissipation and hardware complexity. These designs provide real benefits for server-class applications that are explicitly multi-threaded. However, for desktop and other systems, there is a large code base of single-thread applications that have yet to see any benefit from multicore systems. While these applications were designed for execution on a single processor, many regions of computation have statistically suitable structure for extracting thread-level parallelism. In this work, we examine automatic extraction of statistical loop-level parallelism from single-thread applications. Our approach is to utilize simple hardware mechanisms combined with intelligent compiler code generation to perform low-cost targeted thread-level speculation. Our technique combines memory dependence profiling to identify suitable loops, stylized code generation to untangle register dependences between iterations, and a hybrid compiler/hardware recovery mechanism to handle infrequent memory dependences. We show that significant amounts of statistical loop-level parallelism indeed exist in non-numeric applications, and present the architectural extensions and detailed compiler algorithms required to exploit it.

CSE-TR-529-07

Research in virtual machines

Chen, Lucchetti, and Joshi

February, 2007

CSE-TR-530-07

Automated Classification and Analysis of Internet Malware

Michael Bailey, Jon Oberheide, Jon Andersen, Z. Morley Mao, Farnam Jahanian, Jose Nazario

April, 2007

Numerous attacks, such as worms, phishing, and botnets, threaten the availability of the Internet, the integrity of its hosts, and the privacy of its users. A core element of defense against these attacks is anti-virus(AV)�a service that detects, removes, and characterizes these threats. The ability of these products to successfully characterize these threats has far-reaching effects�from facilitating sharing across organizations, to detecting the emergence of new threats, and assessing risk in quarantine and cleanup. In this paper, we examine the ability of existing host-based anti-virus products to provide semantically meaningful information about the malicious software and tools (or malware) used by attackers. Using a large, recent collection of malware that spans a variety of attack vectors (e.g., spyware, worms, spam), we show that different AV products characterize malware in ways that are inconsistent across AV products, incomplete across malware, and that fail to be concise in their semantics. To address these limitations, we propose a new classification technique that describes malware behavior in terms of system state changes (e.g., files written, processes created) rather than in sequences or patterns of system calls. To address the sheer volume of malware and diversity of its behavior, we provide a method for automatically categorizing these profiles of malware into groups that reflect similar classes of behaviors and demonstrate how behavior-based clustering provides a more direct and effective way of classifying and analyzing Internet malware.

CSE-TR-531-07

CEGAR-Based Formal Hardware Verification: A Case Study

Zaher S. Andraus, Mark H. Liffiton, and Karem A. Sakallah

May, 2007

We describe the application of the Reveal formal functional verification system to six representative hardware test cases. Reveal employs counterexample-guided abstraction refinement, or CEGAR, and is suitable for verifying the complex control logic of designs with wide datapaths. Reveal performs automatic datapath abstraction yielding an approximation of the original design with a much smaller state space. This approximation is subsequently used to verify the correctness of control logic interactions. If the approximation proves to be too coarse, it is automatically refined based on the spurious counterexample it generates. Such refinement can be viewed as a form of on-demand �learning� similar in spirit to conflict- based learning in modern Boolean satisfiability solvers. The abstraction/refinement process is iterated until the design is shown to be correct or an actual design error is reported. The Reveal system allows some user control over the abstraction and refinement steps. Using six representative benchmarks as case studies, we examine the effect on Reveal�s performance of the various available options for abstraction and refinement. We also show examples of the types of learned �facts� that Reveal discovers in the refinement step. Based on our initial experience with this system, we believe that automating the verification for a useful class of hardware designs is now quite feasible.

CSE-TR-532-07

Spector: Automatically Analyzing Shell Code

Kevin Borders, Atul Prakash, and Mark Zielinski

May, 2007

Detecting the presence of buffer overflow attacks in network messages has been a major focus of the security community. Only knowing whether a message contains an attack, however, is not always enough to mitigate the threat. Sometimes it is also critical to know what the attack does. Some attacks will open up backdoors on the victim hosts, while others may download a secondary malware payload from a rogue web server. Understanding the exploit behavior can be very helpful in determining the proper response. Unfortunately, shell code from buffer overflow attacks is written in low-level assembly language, and is often obfuscated with encoding routines. The current method of analyzing shell code, manual reverse engineering, can be time-consuming and requires significant expertise. Furthermore, new analysis must be done for each unique payload, which makes manual analysis nearly impossible for large wide-scale polymorphic attacks. In response to the need for automated attack payload analysis, we introduce Spector. Spector uses symbolic execution to extract meaningful high-level application programming interface (API) calls from shell code. This information exposes the code�s real functionality and can aid in attack mitigation. Spector�s high-level output also helps facilitate classification of different attack payloads that have the same behavior. To evaluate Spector, we tested it with over 23,000 unique payloads gathered from lightweight honeypot deployments. It identified eleven different classes of shell code, and was able to process all the payloads in just over three hours. Spector was also able to successfully classify polymorphic instances of the same shell code that were generated using two polymorphism tools. In this paper, we present the architecture and implementation of Spector, as well as our experience using it to analyze real attack payloads.

CSE-TR-533-07

Prism: Lightweight Filesystem Virtualization Via Selective Cloning

Xin Zhao, Atul Prakash, and Kevin Borders

May, 2007

Suppose filesystems supported cloning of any subset of directories, even those containing gigabytes of data, almost instantaneously, while guaranteeing isolation between the original copy and the cloned copy. We show that this simple abstraction can greatly simplify many routine tasks, both for a single operating system and for clusters of similar virtual machines in a virtual server farm. In this paper, we describe the design of Prism, a file server that supports lightweight cloning and centralized file management of multiple filesystems. Unlike copying in standard file systems or cloning operations on virtual disks, the Prism filesystem supports high-level operations to create fast clones of any part of a filesystem. After cloning, each cloned file system can evolve independently, providing similar read/write semantics as if the file systems were copied from each other. This abstraction is useful for supporting virtual server farms. It is possible to create filesystems for new virtual machines interactively, while protecting private home directories, log files, and password files. Furthermore, when cloned systems have a high degree of similarity, Prism makes centralized scanning operations (e.g., for malware checking) for multiple cloned file systems significantly faster than scanning each file system individually. Prism�s capabilities are also applicable to individual operating systems. Prism can help provide lightweight file system isolation between processes to prevent untrusted code from damaging the filesystem. Prism is currently implemented by extending the ext3 filesystem. On the Postmark benchmark and the Apache build workload, Prism�s performance is comparable to that of a standard ext3 filesystem. For applications that require canning or comparing multiple, cloned filesystems, Prism is significantly faster. This paper describes the design and evaluation of the Prism filesystem.

CSE-TR-534-07

Web Security Analysis of Financial Institutions

Atul Prakash, Laura Falk

July, 2007

Banks and other financial institutions are providing more and more information and services for their potential and existing customers via their web sites. By providing services such as online banking, they enable their customers to take care of business at any time and from any place. Banks are heavily regulated and have a substantial financial stake in providing security to their customers. Thus, one would expect that the latest security technology is deployed on their sites and the sites are analyzed regularly by security-o riented IT professionals. However, we found substantial gaps in online security at a large number of financial institutions, including banks and brokerages. To assess the effectiveness of online security procedures, we examined 706 f inancial institutions for seven types of vulnerabilities that could be exploited by sophisticated attackers. To our surprise, a large percentage of these institutions are susceptible to several of the vulnerabilities. For each vulnerability, we recommend solutions that we hope financial institutions will adapt to improve the security of their web sites. We also designed a modified proxy server that users can use to identify several of these vulnerabilities in the web sites they visit.

CSE-TR-535-07

Citation Analysis, Centrality, and the ACL Anthology

Mark Thomas Joseph and Dragomir R. Radev

October, 2007

We analyze the ACL Anthology citation network in an attempt to identify the most "central" papers and authors using graph-based methods. Citation data was obtained using text extraction from the library of PDF files with some post-processing performed to clean up the results. Manual annotation of the references was then performed to complete the citation network. The analysis compares metrics across publication years and venues, such as citations in and out. The most cited paper, central papers, and papers with the highest impact factor are also established.

CSE-TR-536-07

CLAIRLIB Documentation v1.03

Dragomir Radev, Mark Hodges, Anthony Fader, Mark Joseph, Joshua Gerrish, Mark Schaller, Jonathan dePeri, and Bryan Gibson

October, 2007

220

The University of Michigan CLAIR (Computational Linguistics and Information Retrieval) group has developed the Clair Library. A collection of utilities and code modules, it is intended to simplify a number of generic tasks in Natural Language Processing (NLP), Information Retrieval (IR), and Network Analysis (NA). Its architecture is also designed for easy integration with external software.

CSE-TR-537-07

Tracking Factual Information in Evolving Text: An Empirical Study

Jahna Otterbacher, Siwei Shen, Dragomir Radev, and Yang Ye

November, 2007

We consider the problem of tracking information over time and across sources in clusters of news stories about emergency events. While previous approaches have focused on finding new information at the document and sentence levels (e.g. TDT FSD and the TREC Novelty track, respectively), we are interested in following information at the factual level. Specifically, we propose to develop a ``fact tracking" system, that when given a user's factual question of interest, returns a matrix displaying the extracted answers by time and source. As a first step towards this goal, we present an empirical analysis of a corpus of breaking news stories that have been manually annotated for factual questions and answers. Our current goal is to compare extracted answers to a given question and to examine how features such as lexical similarity and source information relate to the chronological and semantic relationships between them. Our study will show that while there appears to be no direct relationship between the lexical similarity and publication time difference of a given answer pair, lexical similarity is highly correlated to whether or not the answers come from the same sources, and whether or not they express the same or different answer to the given question.

CSE-TR-538-07

Single-document and multi-document summary evaluation using Relative Utility

Dragomir R. Radev, Daniel Tam, and Gunes Erkan

November, 2007

We present a series of experiments to demonstrate the validity of Relative Utility (RU) as a measure for evaluating extractive summarization systems. Like some other evaluation metrics, it compares sentence selection between machine and reference summarizers. Additionally, RU is applicable in both single-document and multi-document summarization, is extendable to arbitrary compression rates with no extra annotation effort, and takes into account both random system performance and interjudge agreement. RU also provides an option for penalizing summaries that include sentences with redundant information. Our results are based on the JHU summary corpus and indicate that Relative Utility is a reasonable, and often superior alternative to several common summary evaluation metrics. We also give a comparison of RU with some other well-known metrics with respect to the correlation with the human judgements on the DUC corpus.

CSE-TR-539-07

Empirical Exploitation of Live Virtual Machine Migration

Jon Oberheide, Evan Cooke, and Farnam Jahanian

December, 2007

As virtualization continues to become increasingly popular in enterprise and organizational networks, operators and administrators are turning to live migration of virtual machines for the purpose of workload balancing and management. However, the security of live virtual machine migration has yet to be analyzed. This paper looks at this poorly explored area and attempts to empirically demonstrate the importance of securing the migration process. We begin by defining and investigating three classes of threats to virtual machine migration: control plane, data plane, and migration module threats. We then show how a malicious party using these attack strategies can exploit the latest versions of the popular Xen and VMware virtual machine monitors and present a tool to automate the manipulation of a guest operating system�s memory during a live virtual machine migration. Using this experience, we discuss strategies to address the deficiencies in virtualization software and secure the live migration process.

CSE-TR-540-07

< Select One >Efficient Software Model Checking of Soundness of Type Systems

Michael Roberson, Melanie Agnew, Paul T. Darga, and Chandrasekhar Boyapati

December, 2007

This paper presents novel techniques for checking the soundness of a type system automatically using a software model checker. Our idea is to systematically generate every type correct intermediate program state (within some finite bounds), execute the program one step forward if possible using its small step operational semantics, and then check that the resulting intermediate program state is also type correct---but do so efficiently by detecting similarities in this search space and pruning away large portions of the search space. Thus, given only a specification of type correctness and the small step operational semantics for a language, our system automatically checks type soundness by checking that the progress and preservation theorems hold for the language (albeit for program states of at most some finite size). Our preliminary experimental results on several languages---including a language of integer and boolean expressions, a simple imperative programming language, an object-oriented language which is a subset of Java, and a language with ownership types---indicate that our approach is feasible and that our search space pruning techniques do indeed significantly reduce what is otherwise an extremely large search space. Our paper thus makes contributions both in the area of checking soundness of type systems, and in the area of reducing the state space of a software model checker.

CSE-TR-541-08

BreadCrumbs: Forecasting Mobile Connectivity

Anthony J. Nicholson and Brian D. Noble

January, 2008

As mobile devices continue to shrink, users are no longer merely nomads, but truly mobile, employing devices on the move. At the same time, these users no longer rely on a single managed network, but exploit a wide variety of connectivity options as they spend their day. Together, these trends argue that systems must consider the derivative of connectivity--- the changes inherent in movement between separately managed networks, with widely varying capabilities. To manage the derivative of connectivity, we exploit the fact that people are creatures of habit; they take similar paths every day. Our system, BreadCrumbs, tracks the movement of the device�s owner, and customizes a predictive mobility model for that specific user. Rather than rely on a synthetic model or aggregate observations, this custom-tailored model can be used together with past observations of wireless network capabilities to generate connectivity forecasts. Applications can in turn use these forecasts to plan future network use with confidence. We have built a BreadCrumbs prototype, and evaluated it with several weeks of real-world usage. Our results show that these forecasts are sufficiently accurate, even with as little as one week of training, to provide improved performance with reduced power consumption for several applications.

CSE-TR-542-08

Juggler: Virtual Networks for Fun and Profit

Anthony J. Nicholson, Scott Wolchk and Brian Noble

April, 2008

There are many situations in which an additional network interface---or two---can provide benefits to a mobile user. Additional interfaces can support parallelism in network flows, improve handoff times, and provide sideband communication with nearby peers. Unfortunately, such benefits are outweighed by the added costs of an additional physical interface. Instead, virtual interfaces have been proposed as the solution, multiplexing a single physical interface across more than one communication endpoint. However, the switching time of existing implementations is too high for some potential applications, and the benefits of this approach to real applications are not yet clear. This paper directly addresses these two shortcomings. It describes an implementation of a virtual 802.11 networking layer, called Juggler, that achieves switching times of approximately 3 ms, and less than 400 us in certain conditions. We demonstrate the performance of this implementation on three application scenarios. By devoting 10% of the duty cycle to background tasks, Juggler can provide nearly instantaneous handoff between base stations or support a modest sideband channel with peer nodes, without adversely affecting foreground throughput. Furthermore, when the client issues concurrent network flows, Juggler is able to assign these flows across more than one AP, providing significant speedup when wired-side bandwidth from the AP constrains end-to-end performance.

CSE-TR-543-08

Secure Communication Framework for Mobile Devices

Byung S. Yang, Soren Dreijer, Sugih Jamin, Sarit Mukherjee, and Limin Wang

May, 2008

The WebBee system is a complete software framework that supports security-sensitive applications for mobile, handheld devices. Though the WebBee system is designed to support security for users, there is a critical window of time between when a device is compromised, and when security is once again established. Security is reestablished by one of two mechanisms. The first is enacted when the user is able to notify the system of the compromise. Secondly, security will also be reestablished when the security keys are renewed. In the paper, we introduce a challenge-response system which is tailored for the WebBee environment to support its unique characteristics. The challenge system has been designed and implemented on top of the WebBee system so that the system as a whole can maximize security even in an emergency. The design scheme has been carefully chosen to minimize the impact on the existing WebBee infrastructure and to integrate with other detection mechanisms easily. The overall performance of the WebBee system is not affected by the challenge system significantly because challenges are only assigned in limited numbers.

CSE-TR-544-08

Webbee: A Secure Coordination and Communication Application Infrastructure for Handheld Environments in Crisis Scenarios

The Webbee Team

May, 2008

The Webbee research project�s mission is to provide a secure coordination and communication infrastructure to a team of first responders. Our architecture consists of three basic elements: an instant infrastructure, which restores connectivity; the Webbee Coordination Server, comprised of application daemons that provide communication and coordination services on top a secure, mobile handheld-friendly environment; and a Database Server, which serves all data necessary for interactions, and which is enriched with triggers to automatically take action when certain new data is supplied to the database. We have successfully deployed several sample applications using this architecture - such as Gas Prices, Event Reports, and AC2 - so as to illustrate the architecture�s viability, flexibility, and security, especially in disaster scenario environments.

CSE-TR-545-08

Practical Strong Security for Mobile Handheld Devices

Matt England, Garcia-Pena, and Sugih Jamin

May, 2008

We present a method for constructing signature schemes for use with mobile handheld devices that mitigates the risk of an attacker forging signatures using key material garnered from a lost handheld. This scheme is forward-secure, meaning that signatures created before a breach are still valid, and server-assisted, meaning that a separate untrusted server must assist the device in signing, thereby protecting against offline dictionary attacks. We also present our experience with implementing these types of signature schemes for Java-enabled mobile devices, and the results of our experiments with this implementation.

CSE-TR-546-08

Novel Methods in Information Retrieval

Vahed Qazvinian and Dragomir R. Radev

September, 2008

Information retrieval (IR) is the science of information search within documents, relational databases, and the World Wide Web (WWW). In this work, we have tried to review some novel methods in IR theory. This report covers a number of the state of art methods in a wide range of topics with a focus on graph-based techniques in IR. This report is created based on the literature review done as a requirement of the Directed Study, EECS 599, course at the Department of Electrical Engineering and Computer Science, University of Michigan, Ann Arbor. The ﬁrst author would like to thank Prof. Dragomir R. Radev for his discussions, reviews, and useful suggestions to support this work.

CSE-TR-547-08

Algorithms for Information Retrieval and Natural Language Processing Tasks

Pradeep Muthukrishnan and Dragomir R. Radev

September, 2008

This summarizes some of the common algorithms and approaches used for various tasks in Information Retrieval and Natural Language processing like, Text Summarization, Machine Translation, Sentiment Detection, Dependence Parsing, etc. This report is submitted towards satisfying the requirement of the Directed Study, EECS 599 course, Computer Science division, University of Michigan, Ann Arbor under the supervision of Professor Dragomir R. Radev

CSE-TR-548-08

Fine-Grained Tamper-Evident Data Pedigree

Jing Zhang, Adriane Chapman and Kristen LeFevre

October, 2008

Database provenance chronicles the history of updates and modifications to data, and has received much attention due to its central role in scientific data management. However, the use of provenance information still requires a leap of faith. Without additional protections, provenance records are vulnerable to accidental corruption, and even malicious forgery, a problem that is most pronounced in the loosely-coupled multi-user environments often found in scientific research. This paper investigates the problem of providing integrity and tamper-detection for database provenance. We propose a checksum-based approach, which is well-suited to the unique characteristics of database provenance, including non-linear provenance objects and provenance associated with multiple fine granularities of data. We demonstrate that the proposed solution satisfies a set of desirable security properties, and that the additional time and space overhead incurred by the checksum approach is manageable, making the solution feasible in practice.

CSE-TR-551-09

Analysis of the Green Dam Censorware System

Scott Wolchok, Randy Yao, and J. Alex Halderman

June, 2009

CSE-TR-552-09

Remote Fingerprinting and Exploitation of Mail Server Antivirus Engines

Jon Oberheide and Farnam Jahanian

June, 2009

Vulnerabilities within antivirus engines deployed at a mail server represent a serious risk to the security of organizations. If a sophisticated attacker is able to remotely probe a mail server and identify the particular antivirus engine used, he may craft a malformed message to exploit the engine with a low risk of detection. This paper explores how much information is exposed by these mail servers, how this information could be used effectively by an attacker, and what can be done to limit the exposure and reduce the impact of antivirus vulnerabilities. Towards this goal, we introduce and evaluate three techniques that can expose the vendor and version of antivirus software used by a mail server: message bouncing, detection probing, and iterative refinement. Through a series of experiments, we determine that over 28% of bounced messages expose information, 78% of response messages and 16% of delivered messages leak information through detection probing, and demonstrate the effectiveness of iterative refinement with a dataset of over 7200 malware samples and antivirus engines from 10 popular vendors. We also show that the most commonly deployed mail server antivirus engine is the most vulnerable engine and is one of the top offenders in exposing information. Finally, we conclude by suggesting methods of reducing the amount of exposure and discuss isolation techniques that may provide greater security.

CSE-TR-549-08

Dsearch: Distributed search for a personal area network

Garrett Brown, Daniel Fabbri, Brett Higgins, and Azarias Reda

October, 2008

Searching through a user's distributed data set effectively is crucial. User created content is increasingly stored on multiple devices away from home. Conventional desktop search and distributed file systems have relied on kernel modules and practically unlimited resources to organize and search user content. These designs do not consider the complex set of constraints and challenges in the distributed search domain. We propose a distributed architecture, DSearch, to manage the complexities of a mobile data set to improve query performance across all the devices in a user's personal area network. First, we provide a light-weight infrastructure that can effectively organize and search a set of devices. Second, we develop a membership system to manage the dynamics of multiple devices in a network that records the current set of active devices and distributes information to the group. Third, we examine three search index replication schemes - no replication, centralized replication, and device-based replication - to improve query performance. We developed the DSearch distributed search architecture and evaluated its performance.

CSE-TR-550-09

TrapperKeeper: Using Virtualization to Add Type-Awareness to File Systems

Daniel Peek and Jason Flinn

May, 2009

CSE-TR-553-09

Splash: Integrated Ad-Hoc Querying of Data and Statistical Models

Lujun Fang and Kristen LeFevre

July, 2009

This paper presents a system called Splash, which integrates statistical modeling and SQL for the purpose of ad-hoc querying and analysis. Splash supports a novel, simple, and practical abstraction of statistical modeling as an aggregate function, which in turn provides for natural integration with standard SQL queries and a relational DBMS. In addition, we introduce and implement a novel representatives operator to help explain statistical models using a limited number of representative examples. We present a proof-of-concept implementation of the system, which includes several performance optimizations. An experimental study indicates that our system scales well to large input datasets. Further, to demonstrate the simplicity and usability of the new abstractions, we conducted a case study using Splash to perform a series of exploratory analyses using network log data. Our study indicates that the query-based interface is simpler than a common data mining software package, and for ad-hoc analysis, it often requires less programming effort to use.

CSE-TR-554-09

An Online Framework for Publishing Dynamic Privacy-Sensitive Location Traces

Wen Jin, Kristen LeFevre and Jignesh M Patel

July, 2009

This paper considers the problem of protecting individual anonymity when continuously publishing a stream of location trace information collected from a population of users. Fundamentally, the key challenge that arises in this setting is the presence of evolving data, and in particular, data that evolves in semi-predictable ways. The main contribution of this paper is the first comprehensive formal framework for reasoning about privacy in this setting. Through careful analysis of the expected threat, we articulate a new privacy principle called temporal unlinkability. Then, by incorporating a model of user motion, we are able to quantify the risk of privacy violations probabilistically. Within this framework, we develop a simple initial set of algorithms for continuous publishing, and we demonstrate the feasibility of the approach using both real and synthetic location data.

CSE-TR-555-09

Offline Symbolic Analysis for Multi-Processor Execution Replay

Dongyoon Lee, Satish Narayanasamy, Mahmoud Said, and Zijiang (James) Yang

July, 2009

Ability to replay a program�s execution on a multi-processor system can significantly help parallel programming. To replay a shared-memory multi-threaded program, existing solutions record the program input (I/O, DMA, etc.) and the shared-memory dependencies between threads. Prior processor based record-and-replay solutions are efficient, but they require non-trivial modifications to the coherency protocol and the memory sub-system for recording the shared-memory dependencies. In this paper, we propose a processor-based record-and-replay solution that does not require detecting and logging shared-memory dependencies to enable multi-processor replay. It is based on our insight that, a load-based checkpointing scheme that records the program input has sufficient information for deterministically replaying each thread. We propose an offline symbolic analysis algorithm based on a SMT solver that determines the shared-memory dependencies using just the program input logs during replay. In addition to saving log space, the proposed approach significantly reduces the hardware support required for enabling replay.

CSE-TR-556-09

Anatomizing Application Performance Differences on Smartphones

Junxian Huang, Qiang Xu, Z. Morley Mao, Ming Zhang, Paramvir Bahl

September, 2009

The widespread deployment of 3G technologies and the rapid adoption of new smartphone devices like iPhone and Blackberry are making cellular data networks increasingly popular. In addition to email and Web browsing, a variety of different network applications are now available, making smartphones potentially reasonable substitute for their desktop counterparts. Unfortunately, the performance of smartphone applications, from the perspective of the users and application developers, is still not well understood. We believe our study, the ﬁrst of its kind, ﬁlls this void. We identify and study important factors that impact user perceived performance. We formalize the method for comparing the performance of smartphone applications along several unique dimensions such as carrier network, device capabilities, application types, and network protocols. To ensure a fair comparison across platforms and networks we develop a detailed measurement methodology. Our work is an important and necessary step towards understanding the performance of smartphone applications from users and application developers perspective. Our analysis culminates with a set of recommendations that can lead to better application design and infrastructure support for smartphone users.

CSE-TR-557-09

Maestro: Orchestrating Lifetime Reliability in Chip Multiprocessors

Shuguang Feng, Shantanu Gupta, Amin Ansari, and Scott Mahlke

November , 2009

CSE-TR-558-09

Charlatans' Web: Analysis and Application of Global IP-Usage Patterns of Fast-Flux Botnets

Matthew Knysz, Xin Hu, and Kang G. Shin

November , 2009

Botnet-based hosting or redirection/proxy services provide botmasters with an ideal platform for hosting malicious and illegal content while affording them a high level of misdirection and protection. Because of the unreliable connectivity of the constituent bots (typically compromised home computers), domains built atop botnets require frequent updates to their DNS records, replacing the IPs of offline bots with online ones to prevent a disruption in (malicious) service. Consequently, their DNS records contain a large number of constantly-changing (i.e., ``fluxy") IPs, earning them the descriptive moniker of fast-flux domains---or, when both the content and name servers are fluxy, double fast-flux domains. In this paper, we study the global IP-usage patterns exhibited by different types of malicious and benign domains, including single and double fast-flux domains. We have deployed a lightweight DNS probing engine, called DIGGER, on 240 PlanetLab nodes spanning 4 continents. Collecting DNS data for over 3.5 months on a plethora of domains, our global vantage point enabled us to identify distinguishing behavioral features between them based on their DNS-query results. We have quantified these features and demonstrated their effectiveness for detection by building a proof-of-concept, multi-leveled SVM classifier capable of discriminating between five different types of domains with minimal false positives. We have also uncovered new, cautious IP-management strategies currently employed by criminals to evade detection. Our results provide insight into the current global state of fast-flux botnets, including the increased presence of double fast-flux domains and their range in implementation. In addition, we expose potential trends for botnet-based services, uncovering previously-unseen domains whose name servers alone demonstrate fast-flux behavior.

CSE-TR-559-09

Detection of Botnets Using Combined Host- and Network-Level Information

Yuanyuan Zeng, Xin Hu, and Kang G. Shin

December, 2009

Bots are coordinated by a command and control (C&C) infrastructure to launch such attacks as Distributed-Denial-of-Service (DDoS), spamming, identity theft and phishing, all of which seriously threaten the Internet services and users. Most contemporary botnet-detection approaches have been designed to function at the network level, requiring the analysis of packets� payloads. However, analyzing packets� payloads raises privacy concerns and incurs large computational overheads. Moreover, network traffic analysis alone can seldom provide a complete picture of botnets� behavior. By contrast, general in-host detection approaches are useful to identify each bot�s host-wide behavior, but are susceptible to the host-resident malware if used alone. To address these limitations, we account for both the coordination within a botnet and the malicious behavior each bot exhibits at the host level, and propose a C&C protocol-independent detection framework that combines both host- and network-level information for making detection decisions. This framework clusters similarly-behaving hosts into groups based on network-flow analysis without accessing packets� payloads, and then correlates the clusters with each individual�s in-host behavior for validation. The framework is shown to be effective and incurs low false-alarm rates in detecting various types of botnets.

CSE-TR-560-09

On Detection of Storm Botnets

Yuanyuan Zeng, Kang G. Shin

December, 2009

A botnet, which is a group of compromised and remotely controlled computers (also called bots), poses a serious threat to the Internet. The commonly-used command and control (C&C) channel for a botnet is used by a central server, such as IRC or HTTP. Recently, Storm botnet, a P2P-based botnet with a decentralized C&C channel has appeared in the wild. In this paper, we propose a distributed approach to detecting Storm botnets at the network level. Our approach is composed of two stages. First, we identify P2P and SMTP packets from each host�s traffic. Second, we use a machine learning technique to differentiate Storm from benign P2P traffic based on several distinguishing traffic attributes. Both of the two stages only require packet header information without analyzing payloads. Our evaluation has shown the detection strategy to be effective with low false alarm rates.

CSE-TR-561-10

A Case for Unsupervised-Learning-based Spam Filtering

Feng Qian, Abhinav Pathak, Y. Charlie Hu, Z. Morley Mao, and Yinglian Xie

February, 2010

Spam filtering has traditionally relied on extracting spam signatures via supervised learning, i.e., using emails explicitly manually labeled as spam or ham. Such supervised learning is labor-intensive and costly, more importantly cannot adapt to new spamming behavior quickly enough. The fundamental reason for needing labeled training corpus is that the learning, e.g., the process of extracting signatures, is carried out by examining individual emails. In this paper, we study the feasibility of unsupervised learning-based spam filtering that can more effectively identify new spamming behavior. Our study is motivated by three key observations of todays Internet spam: (1) the vast majority of emails are spam, (2) a spam email should always belong to some campaign, (3) spam from the same campaign are generated from some template that obfuscates some parts of the spam, e.g., sensitive terms, leaving other parts unchanged. We present the design of an online, unsupervised spam learning and detection scheme. The key component of our scheme is a novel text-mining-based campaign identification framework that clusters spam into campaigns and extracts the invariant textual fragments from spam as campaign signatures. While the individual terms in the invariant fragments can also appear in ham, the key insight behind our unsupervised scheme is that our learning algorithm is effective in extracting co-occurrences of terms that are generated by campaign templates and rarely appear in ham. Using large traces containing about 2 million emails from three sources, we show our unsupervised scheme alone achieves a false negative ratio of 3.5% and a false positive ratio of at most 0.4%. These detection accuracies are comparable to those of the de-facto supervised-learning-based filtering systems such as SpamAssassin (SA), suggesting that unsupervised spam filtering holds high promise in battling todays Internet spam.

CSE-TR-562-10

Design of SMS Commanded-and-Controlled and P2P-Structured Mobile Botnets

Yuanyuan Zeng, Kang G. Shin and Xin Hu

March, 2010

Botnets have become one of the most serious security threats to the Internet and personal computer (PC) users. Although botnets have not yet caused major outbreaks in mobile networks, with the rapidly-growing popularity of smartphones such as Apple's iPhone and Android-based phones that store more personal data and gain more capabilities than earlier generation handsets, botnets are expected to move towards this mobile domain. Since SMS is ubiquitous to every phone and can delay message delivery for offline phones, it is a suitable medium for command and control (C&C). In this paper, we describe how a mobile botnet can be built by utilizing SMS messages for C&C, and how different P2P structures can be exploited for mobile botnets. Our simulation results demonstrate that a modified Kademlia's structured architecture's a better choice for a mobile botnet's topology. In addition, we discuss potential countermeasures to defend against this mobile botnet threat.

CSE-TR-563-10

Good Guys vs. Bot Guise: Mimicry Attacks Against Fast-Flux Detection Systems

Matthew Knysz, Xin Hu, Kang G. Shin

June, 2010

Fast-Flux (FF) service networks are botnet-based hosting or redirection/ proxy services for hosting malicious and illegal content while affording botmasters a high level of misdirection and protection. With their use as service networks among criminals on the rise, researchers and security experts have designed fast and accurate detection systems based on their intrinsic behavior patterns. However, botmasters have responded, adopting a plethora of countermeasures to evade detection. In this paper, we explore the escalating �arms race� between FF botnet detectors and the botmasters� effort to subvert them, presenting several novel mimicry attack techniques that allow botmaster to avoid detection. We first analyze the state-of-art FF detectors and their effectiveness against the current botnet threat, demonstrating how botmasters can�with their current resources�thwart detection strategies. Based on the realistic assumptions inferred from empiricallyobserved trends, we create formal models for bot decay, online availability, DNSadvertisement strategies and performance, allowing us to compare how different mimicry attacks affect the overall online availability and capacity of botnets.

CSE-TR-569-11

A Static Analysis for Automatic Detection of Atomicity Violations in Java Programs

Michael Roberson and Chandrasekhar Boyapati

February, 2011

Multithreaded programs can have subtle errors that result from undesired interleavings of concurrent threads. A common technique programmers use to prevent these errors is to ensure that certain blocks of code are atomic. A block of code is atomic if every execution is equivalent to a serial execution in which no other thread's instructions are interleaved with the code. Atomic blocks of code are amenable to sequential reasoning and are therefore significantly simpler to analyze, verify, and maintain. This paper presents a system for automatically detecting atomicity violations in Java programs without requiring any specifications. Our system infers which blocks of code must be atomic and detects violations of atomicity of those blocks. The paper first describes a synchronization pattern in programs that is likely to indicate a violation of atomicity. The paper then presents a static analysis for detecting occurrences of this pattern. We tested our system on over half a million lines of popular open source programs, and categorized the resulting atomicity warnings. Our experience demonstrates that our system is effective. It successfully detects all the previously known atomicity errors in those programs as well as several previously unknown atomicity errors. Our system also detects badly written code whose atomicity depends on assumptions that might not hold in future versions of the code.

CSE-TR-565-10

A Hybrid Overlay Network for Bulk Data in Challenged Networks

Azarias Reda and Brian Noble

October, 2010

Developing countries face significant challenges in network access, making even simple network tasks unpleasant. Transferring bulk data in these networks is often prohibitively difficult, rendering useful information out of reach for many people. This paper introduces a hybrid delay tolerant networking framework, an approach for orchestrating bulk data transfer through a series of hosts with spare storage and diverse network connectivity. By combining natural individual mobility and available network connectivity, hybrid DTN networking forms a hybrid overlay network for delivering bulk data while preserving scalability in the state required to do so. Our implementation of a hybrid DTN network, Bati, outperforms earlier attempts in using mobility for data delivery and in-network storage for enhanced path selection. Our evaluation demonstrates substantial savings in using Bati for delivering bulk data, transferring an order of magnitude more data than the network alone, and improving the delivery rate by more than 40% compared to popular ad-hoc networks.

CSE-TR-566-10

The Very Small World of the Well-connected

Xiaolin Shi, Matthew Bonner, Lada A. Adamic and Anna C. Gilbert

December, 2010

Online networks occupy an increasingly larger position in how we acquire information, how we communicate with one another, and how we disseminate information. Frequently, small sets of vertices dominate various graph and statistical properties of these networks and, because of this, they are relevant for structural analysis and efficient algorithms and engineering. For the web overall, and specifically for social linking in blogs and instant messaging, we provide a principled, rigorous study of the properties, the construction, and the utilization of subsets of special vertices in large online networks. We show that graph synopses defined by the importance of vertices provide small, relatively accurate portraits, independent of the importance measure, of the important vertices in the underlying larger graphs. Furthermore, they can be computed relatively efficiently in real-world networks. In addition, we study the stability of these graph synopses over time and trace their development in several large dynamic data sets. We show that important vertices are more likely to have longer active life spans than unimportant ones and that the graph synopses consisting of important vertices remain stable over long periods of time after a short period of initial growth.

CSE-TR-567-10

Measuring the Effectiveness of Infrastructure-Level Detection of Large-Scale Botnets

Yuanyuan Zeng, Guanhua Yan, Stephan Eidenbenz and Kang G. Shin

December, 2010

Botnets are one of the most serious security threats to the Internet and its end users. In recent years, utilizing P2P as a Command and Control (C&C) protocol has gained popularity due to its decentralized nature that can help hide the botmaster�s identity. Most bot detection approaches targeting P2P botnets either rely on behavior monitoring or traffic flow and packet analysis, requiring fine-grained information collected locally. This requirement limits the scale of detection. In this paper, we consider detection of P2P botnets at a highlevel� the infrastructure level�by exploiting their structural properties from a graph analysis perspective. Using three different P2P overlay structures, we measure the effectiveness of detecting each structure at various locations (the Autonomous System (AS), the Point of Presence (PoP), and the router rendezvous) in the Internet infrastructure.

CSE-TR-568-11

HydraVM: Low-Cost, Transparent High Availability for Virtual Machines

Kai-Yuan Hou, Mustafa Uysal, Arif Merchant, Kang G. Shin, and Sharad Singhal

January, 2011

Existing approaches to providing high availability (HA) for virtualized environments require a backup VM for every primary running VM. These approaches are expensive in memory because the backup VM requires the same amount of memory as the primary, even though it is normally passive. In this paper, we propose a storage-based, memory-efficient HA solution for VMs, called HydraVM, that eliminates the passive memory reservations for backups. HydraVM maintains a complete, recent image of each protected VM in shared storage using an incremental checkpointing technique. Upon failure of a primary VM, a backup can be promptly restored on any server with available memory. Our evaluation results have shown that HydraVM provides protection for VMs at a low overhead, and can restore a failed VM within 1.6 seconds without excessive use of memory resource in a virtualized environment.

CSE-TR-564-10

Internet Background Radiation Revisited

Wustrow, Karir, Bailey, Jahanian, Houston

June, 2010

The monitoring of packets destined for reachable, yet unused, Internet addresses has proven to be a useful technique for measuring a variety of specific Internet phenomenon (e.g., worms, DDoS). In 2004, Pang et al. stepped beyond these targeted uses and provided one of the first generic characterizations of this non-productive traffic, demonstrating both its significant size and diversity. However, the six years that followed this study have seen tremendous changes in both the types of malicious activity on the Internet and the quantity and quality of unused address space. In this paper, we revisit the state of Internet "background radiation" through the lens of two unique data-sets: a five-year collection from a single unused /8 network block, and week-long collections from three recently allocated /8 network blocks. Through the longitudinal study of the long-lived block, comparisons between blocks, and extensive case studies of traffic in these blocks, we characterize the current state of background radiation specifically highlighting those features that remain invariant from previous measurements and those which exhibit significant differences. Of particular interest in this work is the exploration of address space pollution, in which significant non uniform behavior is observed. However, unlike previous observations of differences between unused blocks we show that increasingly these differences are the result of environmental factors (e.g., misconfiguration, location), rather than algorithmic factors. Where feasible, we offer suggestions for clean up of these polluted blocks and identify those blocks whose allocations should be withheld.

CSE-TR-570-11

The PViz Comprehension Tool for Social Network Privacy Settings

Alessandra Mazzia, Kristen LeFevre and Eytan Adar

April, 2011

Users� mental models of privacy and visibility in social networks often involve natural subgroups, or communities, within their local networks of friends. Such groupings are not always explicit, and existing policy comprehension tools, such as Facebook�s Audience View, which allows the user to view her profile as it appears to each of her friends, are not naturally aligned with this mental model. In this paper, we introduce PViz, an interface and system which corresponds more directly with the way users model groups and privacy policies applied to their networks. PViz allows the user to understand the visibility of her profile according to natural sub-groupings of friends, and at different levels of granularity. We conducted an extensive user study comparing PViz to current privacy comprehension tools (Facebook�s Audience View and Custom Settings page). Despite requiring users to adapt to new ways of exploring their social spaces, our study revealed that PViz was comparable to Audience View for simple tasks, and provided a significant improvement for more complex, group based tasks.

CSE-TR-571-11

Memory Access Aware Performance Maximization for Power-Constrained Chip Multiprocessors: A Model-Guided Approach

Xi Chen, Robert P. Dick, Chi Xu

June, 2011

CSE-TR-573-11

Accessing Trusted Web Sites From Low-Integrity Systems Without End-host Snooping

Lau, Prakash, Annamalai

August, 11

CSE-TR-572-11

Censorship and Co-option of the Internet Infrastructure

Michael Bailey, Craig Labovitz

July, 2011

Over a few short years, the Internet has grown to play an integral part of daily economic, social, and political life in most countries. From the Egyptian ``Velvet Revolution' to the last US presidential campaign, Internet communication shapes public opinion and fosters social change. But despite its immense social importance, the Internet has proven remarkably susceptible to disruption and manipulation, including government induced multi-week outages (e.g. Libya and Egypt) and multi-year campaigns by autocratic regimes to render web sites and large address blocks unreachable. While parents, enterprises, and governments have always placed restrictions on end-user communication to meet social or legal goals we argue recent years have seen the beginning of a new trend---the co-option of the Internet infrastructure itself to affect large-scale censorship. In this paper, we use Internet routing, allocation, and backbone traffic statistics to explore several recent and ongoing infrastructure-based efforts to disrupt Internet communication. We focus our discussion on the risks of this infrastructure corruption trend to long-term evolution of the Internet.

CSE-TR-574-11

Low Overhead Control Channels in Wireless Networks

Eugene Chai and Kang G. Shin

August, 2011

Low-latency, low-overhead and reliable control channels are essential to the efficient operation of wireless networks. However, control channels that utilize cur- rent in-band and out-of-band designs do not fully meet this requirement. In this paper, we design and implement Aileron, a novel control channel based on automatic modulation recognition that carries control frames over an OFDM(A) PHY by varying the modulation rate of the OFDM subcarriers. Under Aileron, the control information is embedded into the modulation type, not as the actual symbol value. Aileron has three important advantages: (a) control frame exchange without frame synchronization, (b) signaling with low bandwidth overhead, and (c) resilience to channel errors. We have evaluated Aileron using both extensive simulations and real-world measurements, and discovered that control frames can be transmitted with more than 80% accuracy using only 10 OFDM blocks on a channel with SNR of merely 10dB.

CSE-TR-575-11

Automatic Root-cause Diagnosis of Performance Anomalies in Production Software

Mona Attariyan, Michael Chow and Jason Flinn

October, 2011

Troubleshooting the performance of complex production software is challenging. Most existing tools, such as profiling, tracing, and logging systems, reveal "what" events occurred during performance anomalies. However, the users of such tools must then infer "why" these events occurred during a particular execution; e.g., that their execution was due to a specific input request or configuration setting. Because manual root cause determination is time-consuming and difficult, this paper introduces performance summarization, a technique for automatically inferring the root cause of performance problems. Performance summarization first attributes performance costs to fine-grained events such as individual instructions and system calls. It then uses dynamic information flow to determine the probable root causes for the execution of each event. The cost of each event is assigned to root causes according to the relative probability that the causes led to the execution of that event. Finally, the total cost for each root cause is calculated by summing the percause costs of all events. This paper also describes a differential form of performance summarization that compares two activities. We have implemented a tool called X-ray that performs performance summarization. Our experimental results show that X-ray accurately diagnoses 14 performance issues in the Apache HTTP server, Postfix mail server and PostgreSQL database, while adding only 1–7% overhead to production systems.

CSE-TR-576-11

Improved Nearest Neighbor Methods For Text Classification

Gunes Erkan, Ahmed Hassan, Qian Diao, and Dragomir Radev

October, 2011

We present new nearest neighbor methods for text classication and an evaluation of these methods against the existing nearest neighbor methods as well as other well-known text classication algorithms. Inspired by the language modeling approach to information retrieval, we show improvements in k-nearest neighbor (kNN) classication by replacing the classical cosine similarity with a KL divergence based similarity measure. We also present an extension of kNN to the semi-supervised case which turns out to be a formulation that is equivalent to semi-supervised learning with harmonic functions. In both supervised and semi-supervised experiments, our algorithms surpass traditional nearest neighbor methods and produce competitive results when compared to the state-of-the-art methods such as Support Vector Machines (SVM) and transductive SVM on the Reuters-21578 dataset, the 20 Newsgroups dataset, and the Reuters Corpus Volume I (RCV1) dataset. To our knowledge, this paper presents one of the most comprehensive evaluation of dierent machine learning algorithms on the entire RCV1 dataset.

CSE-TR-577-11

ModelDoc: Auto-generated, Auto-regenerated Wiki-Based Database Documentation

Jim Steinberger and Atul Prakash

November , 2011

Software developers often find databases difficult to work with because they are not documented properly. When documentation does exist, it is often inaccurate, out-of-date, and/or unclear. In this paper, we present ModelDoc – an extension on MediaWiki – as a new approach to documenting databases and, potentially, other aspects of an application. ModelDoc will auto-generate documentation from a live data source, and it then continually auto-regenerates that documentation as the data source changes. The documentation is consistent and kept up-to-date, but also exists along the standard wiki functionality, allowing collaboration throughout the evolution of the database.

CSE-TR-578-13

Uncovering Cellular Network Characteristics: Performance, Infrastructure, and Policies

Huang, Qian, Xu, Qian, Mao, and Rayes

March, 2013

Mobile Smart Devices (Smartphones and tablets) have become increasingly popular especially in IT based service management companies. According to IDC, more than 70% of executives and sale managers are replacing their PCs with tablets. This is in part due to the agility flexibility and the availability of diverse network-based and support applications. Thus network characteristics directly affect user-perceived performance and a deep understanding of the properties of contemporary cellular networks for commonly used platforms is important for smartphone application and platform optimization. In this work, we carry out the largest study to date of cellular networks in terms of users, time duration, location, and networks to understand the performance, infrastructure, and policy characteristics. With the data set collected from around 100K users across the world over 18 months, MobiPerf, a smartphone network measurement tool we developed and publicly deployed, enables us to analyze network performance along several new dimensions, previously not examined. Our results indicate that with better infrastructure support, large cities appear to have better performance than rural areas. Our case study on packet size's effect on RTT uncovers a surprising pattern for AT&T's uplink RTT. We also show that Internet-based CDN service provides very limited latency improvement in today's cellular networks. We further examine how local DNS servers are assigned to mobile users. In addition, we scrutinize the carriers' policy towards different types of traffic and successfully identify some middlebox behavior of today's cellular carriers. which may negatively impact user experiences.

CSE-TR-579-13

Automatic Spreadsheet Data Extraction

Shirley Zhe Chen and Michael J. Cafarella

March, 2013

Spreadsheets contain a huge amount of useful data, but do not observe the relational data model, and thus cannot exploit relational integration tools. Existing systems for extracting relational data from spreadsheets are too labor-intensive to support ad-hoc integration tasks, in which the correct extraction target is only learned during the course of user interaction. This paper introduces Senbazuru, a system that automatically extracts relational data from spreadsheets, thereby enabling relational spreadsheet integration. When compared to standard techniques for spreadsheet data extraction on a set of 100 Web spreadsheets, Senbazuru reduces the amount of human labor by 72% to 92%. In addition to the Senbazuru design, we present the results of a general survey of more than 400,000 spreadsheets we downloaded from the Web, giving a novel view of how users organize their data in spreadsheets.

CSE-TR-581-13

Towards Scalable, Flexible, and Deployable Inter-Domain Routing via Plural Routing

Qiang Xu, Feng Qian, Mehrdad Moradi, Darrell Bethea, Z. Morley Mao, and Michael Reiter

June, 2013

BGP has critical deficiencies in scalability and flexibility. Scalability is increasingly critical as routing tables grow, limiting BGP to single-path-per-destination routing, which constrains routing flexibility due to a lack of alternate paths in forwarding tables. To address these limitations, previous solutions either require a clean-slate re-design of the Internet or incur nontrivial overhead in modifying BGP. In contrast, we propose plural routing, a novel inter-domain routing scheme based on routing table re-construction, i.e., using bloom filters to record multiple forwarding entries per destination while keeping routing tables scalable. Aimed to be a general solution supporting diverse inter-domain routing scenarios, plural routing’s design is limited to routing tables without requiring additional infrastructure support, allowing it to retain incremental deployability. To demonstrate these properties, we run simulations on both the topologies of the current Internet and those projected topologies based on an Internet growth model. In particular, plural routing achieves the routing table size only linearly proportional to the number of neighbor domains instead of destination domains, with 99% of routing tables of size _100KB, while 99% of routes have zero path inflation. In supporting dynamic routing, plural routing’s performance is almost identical to the optimal in policy enforcement and load balancing.

CSE-TR-582-13

FLOWR: A Self-Learning System for Classifying Mobile Application Traffic

Xu, Andrews, Liao, Miskovic, Mao, Baldi, Nucci

June, 2013

The numerous mobile apps available on app market pose a unique challenge to mobile network operators in network management tasks. Unlike apps used by Internet hosts, mobile apps communicate predominantly via HTTP and are thus indistinguishable without untangling the knot of generic HTTP traffic. Discerning app identities in real time using network traffic enables network operators to perform app profiling at flow level and traffic engineering at app level, which further benefit entities such as users, developers, advertisers, and enterprise managers. We propose FLOWR, a system that identifies flows belonging to individual apps probabilistically in real time. Benefiting from the rich metadata in HTTP queries, FLOWR tokenizes the key-value pairs from queries into signatures that can best identify apps, enabling flow identification using signature matching against a knowledge base of app signatures. To minimize the need for supervised learning to construct the knowledge base, FLOWR adapts to the large traffic volume in mobile networks. Using the property that flow signatures from the same app should co-occur repeatedly, FLOWR infers the app identities of flow signatures by capturing the co-occurrences between identified flow signatures and undetermined flow signatures, incrementally updating the knowledge base. Using the apps with Doubleclick service as ground truth to evaluate FLOWR, we observe that our system can identify 86–99% of flows, i.e., uniquely identifying 26–30% of flows and narrowing down another 60–65% of flows to _5 candidate apps with <1% false positive.

CSE-TR-584-13

Extending Channel Comparison Based Sybil Detection to MIMO Systems

Yue Liu, David Bild, and Robert P. Dick

November, 2013

Although wireless channel comparison based techniques have been shown effective in defeating Sybil attacks in wireless environments, most assume single-input single-output (SISO) models for the radio system. We consider extending wireless Sybil defenses to multi-input multi-output (MIMO) systems.

CSE-TR-585-14

Integrating Spreadsheet Data via Accurate and Low-Effort Extraction

Zhe Chen and Michael Cafarella

February, 2014

Spreadsheets contain valuable data on many topics, but they are difficult to integrate with other sources. Convert- ing spreadsheet data to the relational model would allow relational integration tools to be used, but using manual methods to do this requires large amounts of work for each integration candidate. Automatic data extraction would be useful but it is very challenging: spreadsheet designs gener- ally requires human knowledge to understand the metadata being described. Even if it is possible to obtain this meta- data information automatically, a single mistake can yield an output relation with a huge number of incorrect tuples. We propose a two-phase semiautomatic system that ex- tracts accurate relational metadata while minimizing user effort. Based on conditional random fields (CRFs), our system enables downstream spreadsheet integration applica- tions. First, the automatic extractor uses hints from spread- sheets’ graphical style and recovered metadata to extract the spreadsheet data as accurately as possible. Second, the interactive repair component identifies similar regions in dis- tinct spreadsheets scattered across large spreadsheet cor- pora, allowing a user’s single manual repair to be amortized over many possible extraction errors. Through our method of integrating the repair workflow into the extraction system, a human can obtain the accurate extraction with just 31% of the manual operations required by a standard classification based technique. We demonstrate and evaluate our system using two corpora: more than 1,000 spreadsheets published by the US government and more than 400,000 spreadsheets downloaded from the Web.

CSE-TR-586-14

TIVOs: Trusted Visual I/O Paths for Android

Fernandes, Chen, Essl, Halderman, Mao, Prakash

May, 2014

Stealthy pixel-perfect attacks on smartphone apps are a class of phishing attacks that rely on visual deception to trick users into entering sensitive information into trojan apps. We introduce an operating system abstraction called Trusted Visual I/O Paths (TIVOs) that enables a user to securely verify the app she is interacting with, only assuming that the operating system provides a trusted computing base. As proof of concept, we built a TIVO for Android, one that is activated any time a soft keyboard is used by an application (e.g., for password entry) so that the user can reliably determine the app that receives the user’s keyboard input. We implemented TIVO by modifying Android’s user-interface stack and evaluated the abstraction using a controlled user study where users had to decide whether to trust the login screen of four different applications that were randomly subjected to two forms of pixel-perfect attacks. The TIVO mechanism was found to significantly reduce the effectiveness of pixel-perfect attacks, with acceptable impact on overall usability and only modest performance overhead.

CSE-TR-588-15

CellShift: A System to Efficiently Time-shift Data on the Cellular Network

Sanae Rosen, Jeffrey Erman, Vijay Gopalakrishnan, Z. Morley Mao, and Jeffrey Pang

November, 2015

CSE-TR-587-15

Characterizing the Prefetchability of Cellular Network Traffic Over Long Time Scales

Sanae Rosen, Jeffrey Erman, Z. Morley Mao, Jeffrey Pang, and K.K. Ramakrishnan

November, 2015

CSE-TR-600-16

InvisiMem: Smart Memory for Trusted Computing

Shaizeen Aga and Satish Narayanasamy

November, 2016

A practically feasible low-overhead secure hardware designthat provides strong defenses against memory bus side chan-nels remains elusive. This paper observes that smart memory,memory with compute capability and a packetized interface,can dramatically simplify this problem. InvisiMem expandsthe trust base to include the logic layer in the smart memory toimplement cryptographic primitives, which allows the securehost processor to send encrypted addresses over the untrustedmemory bus. This eliminates the need for expensive addressobfuscation techniques based on Oblivious RAM (ORAM).In addition, smart memory enables simple and efficient solu-tions for ensuring freshness using authenticated encryption,and for mitigating memory bus timing channel using constantheart-beat packets. We demonstrate that InvisiMem designsare one to two orders of magnitude lower in performance,space, energy, and memory bandwidth overhead, compared toORAM-based solutions.

CSE-TR-601-16

Spreadsheet Property Detection With Rule-assisted Active Learning

Zhe Chen, Xin Rong, Sasha Dadiomov, Richard Wesley, Gang Xiao, Daniel Cory, Michael Cafarella, Jock Mackinlay

11, 2016

Spreadsheets are a critical and widely-used data management tool. Converting spreadsheet data into relational tables would bring benefits to a number of fields, including public policy, public health, and economics. Research to date has focused on designing domain-specific languages to describe transformation processes or automatically converting a specific type of spreadsheets. To handle a larger variety of spreadsheets, we have to identify various spreadsheet properties, which correspond to a series of transformation programs that contribute towards a general framework that converts spreadsheets to relational tables. In this paper, we focus on the problem of spreadsheet property detection, identifying when a corresponding transformation program should be applied. We propose a hybrid approach of building a variety of spreadsheet property detectors to reduce the amount of required human labeling effort. Our approach integrates an active learning framework with crude, easy-to-write, user-provided rules. Our experiments show that when compared to a standard active learning approach, we reduced the training data needed to reach the performance plateau by 34--44% when a human provides relatively high-quality rules, and by a comparable amount with low-quality rules. A study on a large-scale web-crawled spreadsheet dataset demonstrates that it is crucial to detect a variety of spreadsheet properties in order to transform a large portion of the spreadsheets into a relational form.

CSE-TR-001-17

AP-atoms: A High-Accuracy Data-Driven Client Aggregatin for Global Load Balancing

Yibo Pi, Sugih Jamin, Peter Danzig, Jacob Shaha

May , 2017

In Internet mapping, IP address space is divided into a set of client aggregation units, which are the finest-grained units for global load balancing. Choosing the proper level of aggregation is a complex and crucial problem, which determines the total number of aggregation units that a mapping system has to maintain and the accuracy of client redirection. In this paper, using Internet-wide measurements provided by a commercial global load balancing service provider, weshow that even for the best existing client aggregation, almost 17% of clients have latency more than 50 ms apart from the average latency of clients in the same aggregation unit. To address this, we propose a data-driven client aggregation, AP-atoms, which can tradeoff scalability for accuracy and adapts to changing network conditions. Our experimentsshow that by using the same scale of client aggregations, AP-atoms can reduce the number of widely dispered clients by almost 2 and the 98-th percentile difference in clients latencies by almost 100 ms.

CSE-TR-002-17

Analysis of Microbump Overheads for 2.5D Disintegrated Design

Pete Ehrett, Vidushi Goyal, Opeoluwa Matthews, Reetuparna Das, Todd Austin, Valeria Bertacco

November, 2017

Systems-in-Package based on 2.5D integration have significant potential to reduce custom-ASIC costs and to improve chip performance over monolithic designs. A common concern about 2.5D integration is that microbumps (μbumps) introduce significant delay and power overhead relative to monolithically integrated systems. We demonstrate that a typical μbump has a delay contribution on the order of tens of attoseconds (10^-17) and power consumption overhead on the order of 1μW/Gbps - values so small that, in current systems with delays in the nanoseconds and power consumption in the hundreds of milliwatts or higher, they may safely be ignored.

CSE-TR-001-18

On Cuba, Diplomats, Ultrasound, and Intermodulation Distortion

Chen Yan, Kevin Fu, Wenyuan Xu

March, 2018

This technical report analyzes how ultrasound could have led to the AP news recordings of metallic sounds reportedly heard by diplomats in Cuba. Beginning with screen shots of the acoustic spectral plots from the AP news, we reverse engineered ultrasonic signals that could lead to those outcomes as a result of intermodulation distortion and non-linearities of the acoustic transmission medium. We created a proof of concept eavesdropping device to exfiltrate information by AM modulation over an inaudible ultrasonic carrier. When a second inaudible ultrasonic source interfered with the primary inaudible ultrasonic source, intermodulation distortion created audible byproducts that share spectral characteristics with audio from the AP news. Our conclusion is that if ultrasound played a role in harming diplomats in Cuba, then a plausible cause is intermodulation distortion between ultrasonic signals that unintentionally synthesize audible tones. In other words, acoustic interference without malicious intent to cause harm could have led to the audible sensations in Cuba.

CSE-TR-002-18

EUForia: Uninterpreted Functions Abstraction with Incremental Induction

Denis Bueno and Karem A. Sakallah

June , 2018

We investigate a novel algorithm for an IC3-style checker that operates entirely at the level of equality with uninterpreted functions (EUF). EUF abstraction is efficient to compute from a word-level transition system, whereas predicate abstraction typically requires a (possibly exponential) number of calls to a theorem prover. Data operations are treated as uninterpreted functions and relations as uninterpreted predicates. Our checker, called EUForia, checks a transition system for a given safety property and either (1) discovers an inductive strengthening EUF formula or (2) produces an abstract counterexample which corresponds to zero, one, or many concrete counterexamples. We also present a simple method for computing renement lemmas that checks the feasibility of the abstract counterexamples. We formalize the EUF transition system, prove our algorithm correct, and demonstrate our results on a subset of benchmarks from the software verication competition (SV-COMP) 2017.

CSE-TR-001-19

Aegean: Replication beyond the client-server model

Remzi Can Aksoy, Manos Kapritsos

October, 2019

This paper presents Aegean, a new approach that allows fault-tolerant replication to be implemented beyond the confines of the client-server model. In today?s computing, where services are rarely standalone, traditional replication protocols such as Primary-Backup, Paxos, and PBFT are not directly applicable, as they were designed for the client-server model. When services interact, these protocols run into a number of problems, affecting both correctness and performance. In this paper, we rethink the design of replication protocols in the presence of interactions between services and introduce new techniques that accommodate such interactions safely and efficiently. Our evaluation shows that a prototype implementation of Aegean not only ensures correctness in the presence of service interactions, but can further improve throughput by an order of magnitude.

CSE-TR-001-21

Proceedings of the 2020 Scheme and Functional Programming Workshop

Baptiste Saleil and Michael D. Adams

January, 2021

CSE-TR-002-21

A Program Logic to Verify Signal Temporal Logic Specifications of Hybrid Systems: Extended Technical Report

Hammad Ahmad and Jean-Baptiste Jeannin

March, 2021

Signal temporal logic (STL) was introduced for monitoring temporal properties of continuous-time signals for continuous and hybrid systems. Differential dynamic logic (dL) was introduced to reason about the end states of a hybrid program. Over the past decade, STL and its variants have significantly gained in popularity in the industry for monitoring purposes, while dL has gained in popularity for verification of hybrid systems. In this paper, we bridge the gap between the two different logics by introducing signal temporal dynamic logic (STdL) -- a dynamic logic that reasons about a subset of STL specifications over executions of hybrid systems. Our work demonstrates that STL can be used for deductive verification of hybrid systems. STdL significantly augments the expressiveness of dL by allowing reasoning about temporal properties in given time intervals. We provide a semantics and a proof calculus for STdL, along with a proof of soundness and relative completeness.

Technical Reports Page