Selected, recent papers from the Secure Systems Lab. Click a project name to filter by project.
Conference Papers
Artemis: Defanging Software Supply Chain Attacks in Multi-repository Update Systems
M. Moore, T. Kuppusamy, J. Cappos
2023 Annual Computer Security Applications Conference (ACSAC 2023)[Artifact Functional] [Artifact Reusable] [Results Reproduced]
2023
Modern software installation tools often use packages from more than one repository, presenting a unique set of security challenges. Such a configuration increases the risk of repository compromise and introduces attacks like dependency confusion and repository fallback. In this paper, we offer the first exploration of attacks that specifically target multiple repository update systems, and propose a unique defensive strategy we call articulated trust. Articulated trust is a principle that allows software installation tools to specify trusted developers and repositories for each package. To implement articulated trust, we built Artemis, a framework that introduces several new security techniques, such as per-package prioritization of repositories, multi-role delegations, multiple-repository consensus, and key pinning. These techniques allow for a greater diversity of trust relationships while eliminating the security risk of single points of failure. To evaluate Artemis, we examine attacks on software update systems from the Cloud Native Computing Foundation’s Catalog of Supply Chain Compromises, and find that the most secure configuration of Artemis can prevent all of them, compared to 14-59% for the best existing system. We also cite real-world deployments of Artemis that highlight its practicality. These include the JDF/Linux Foundation Uptane Standard that secures over-the-air updates for millions of automobiles, and TUF, which is used by many companies for secure software distribution.
Needles in a Haystack: Using PORT to Catch Bad Behaviors within Application Recordings
P. Moore, T. Wies, M. Waldman, P. Frankl, J. Cappos
2022 International Conference on Software Technologies (ICSOFT 2022)
Earlier work has proven that information extracted from recordings of an application’s activity can be tremendously valuable. However, given the many requests that pass between applications and external entities, it has been difficult to isolate the handful of patterns that indicate the potential for failure. In this paper, we propose a method that harnesses proven event processing techniques to find those problematic patterns. The key addi- tion is PORT, a new domain specific language which, when combined with its event stream recognition and transformation engine, enables users to extract patterns in system call recordings and other streams, and then rewrite input activity on the fly. The former task can spot activity that indicates a bug, while the latter produces a modified stream for use in more active testing. We evaluated PORT’s capabilities in several ways, starting with recreating the mutators and checkers utilized by an earlier work called SEA to modify and replay the results of system calls. Our re-implementations achieved the same efficacy using fewer lines of code. We also illustrated PORT’s extensibility by adding support for detecting malicious USB commands within recorded traffic.
Cybersecurity Shuffle: Using Card Magic to Teach Introductory Cybersecurity Topics
P. Moore, J. Cappos
2022 Consortium for Computer Science Education Northeast (CCSNE 2022)
One of the main challenges in designing lessons for an introductory information security class is how to present new technical concepts in a manner comprehensible to students with widely different backgrounds. A non-traditional approach can help students engage with the material and master these unfamiliar ideas. We have devised a series of lessons that teach important information security topics, such as social engineering, side-channel attacks, and attacks on randomness using card magic. Each lesson centers around a card trick that allows the instructor to simulate the described attack in a way that makes sense, even for those who have no prior technical background. In this paper, we describe our experience using these lessons to teach cybersecurity topics to high school students with limited computer science knowledge. Students were assessed be- fore and after the demonstration to gauge their mastery of the material, and, while we had a very limited set of responses, the results show an improvement on post-test scores. Furthermore, several indicators affirm the students enjoyed the lessons and remained engaged throughout the session.
Thinking Aloud About Confusing Code: A Qualitative Investigation of Program Comprehension and Atoms of Confusion
D. Gopstein, A. L. Fayard, S. Apel, J. Cappos
2020 Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE 20)
Atoms of confusion are small patterns of code that have been empirically validated to be difficult to hand-evaluate by programmers. Previous research focused on defining and quantifying this phenomenon, but not on explaining or critiquing it. In this work, we address core omissions to the body of work on atoms of confusion, focusing on the ‘how’ and ‘why’ of programmer misunderstanding. We performed a think-aloud study in which we observed programmers, both professionals and students, as they hand-evaluated confusing code. We performed a qualitative analysis of the data and found several surprising results, which explain previous results, outline avenues of further research, and suggest improvements of the research methodology. A notable observation is that correct hand-evaluations do not imply understanding, and incorrect evaluations not misunderstanding. We believe this and other observations may be used to improve future studies and models of program comprehension. We argue that thinking of confusion as an atomic construct may pose challenges to formulating new candidates for atoms of confusion. Ultimately, we question whether hand-evaluation correctness is, itself, a sufficient instrument to study program comprehension.
Microcash: Practical Concurrent Processing of Micropayments
G. Almashaqbeh, A. Bishop, J. Cappos
24th International Conference on Financial Cryptography and Data Security (FC '20)
Micropayments have a large number of potential applications. However, processing these small payments individually can be expensive, with transaction fees often exceeding the payment value itself. By aggregating the small transactions into a few larger ones, and using cryptocurrencies, today’s decentralized probabilistic micropayment schemes can reduce these fees. Unfortunately, existing solutions force micropayments to be issued sequentially, thus to support fast issuance rates a customer needs a large number of escrows, which bloats the blockchain. Moreover, these schemes incur a large computation and bandwidth overhead, limiting their applicability in large-scale systems. In this paper, we propose MicroCash, the first decentralized probabilistic framework that supports concurrent micropayments. MicroCash introduces a novel escrow setup that enables a customer to concurrently issue payment tickets at a fast rate using a single escrow. MicroCash is also cost effective because it allows for ticket exchange using only one round of communication, and it aggregates the micropayments using a non-interactive lottery protocol that requires only secure hashing and supports fixed winning rates. Our experiments show that MicroCash can process thousands of tickets per second, which is around 1.7-4.2x times the rate of a state-of-the-art sequential micropayment system. Moreover, MicroCash supports any ticket issue rate over any period using only one escrow, while the sequential scheme would need more than 1000 escrows per second to permit high rates. This enables our system to further reduce transaction fees and data on the blockchain by ∼ 50%.
Charting a Course Through Uncertain Environments: SEA Uses Past Problems to Avoid Future Failures
P. Moore, J. Cappos, P. Frankl, and T. Wies
30th IEEE International Symposium on Software Reliability Engineering (ISSRE'19) Best Paper Award
A common problem for developers is applications exhibiting new bugs after deployment. Many of these bugs can be traced to unexpected network, operating system, and file system differences that cause program executions that were successful in a development environment to fail once deployed. Preventing these bugs is difficult because it is impractical to test an application in every environment. Enter Simulating Environmental Anomalies (SEA), a technique that utilizes evidence of one application’s failure in a given environment to generate tests that can be applied to other applications, to see whether they suffer from analogous faults. In SEA, models of unusual properties extracted from interactions between an application, A, and its environment guide simulations of another application, B, running in the anomalous environment. This reveals faults B may experience in this environment without the expense of deployment. By accumulating these anomalies, applications can be tested against an increasing set of problematic conditions. We implemented a tool called CrashSimulator, which uses SEA, and evaluated it against Linux applications selected from coreutils and the Debian popularity contest. Our tests found a total of 63 bugs in 31 applications with effects including hangs, crashes, data loss, and remote denial of service conditions.
in-toto: providing farm-to-table security properties for bits and bytes
S. Torres-Arias, H. Nanize, T. Kuppusamy, R. Curtmola, and J. Cappos
28th USENIX Security Symposium (USENIX Sec'19)
The software development process is quite complex and involves a number of independent actors. Developers check source code into a version control system, the code is compiled into software at a build farm, and CI/CD systems run multiple tests to ensure the software’s quality among a myriad of other operations. Finally, the software is packaged for distribution into a delivered product, to be consumed by end users. An attacker that is able to compromise any single step in the process can maliciously modify the software and harm any of the software’s users. To address these issues, we designed in-toto, a framework that cryptographically ensures the integrity of the software supply chain. in-toto grants the end user the ability to verify the software’s supply chain from the project’s inception to its deployment. We demonstrate in-toto’s effectiveness on 30 software supply chain compromises that affected hundreds of million of users and showcase in-toto’s usage over cloud-native, hybrid-cloud and cloud-agnostic applications. in-toto is integrated into products and open source projects that are used by millions of people daily.
Commit Signatures for Centralized Version Control Systems
S. Vaidya, S. Torres-Arias, R. Curtmola, and J. Cappos
34th International Conference on ICT Systems Security and Privacy Protection (IFIP SEC '19)
Version Control Systems (VCS-es) play a major role in the software development life cycle, yet historically their security has been relatively underdeveloped compared to their importance. Recent history has shown that source code repositories represent appealing attack targets. Attacks that violate the integrity of repository data can impact negatively millions of users. Some VCS-es, such as Git, employ commit signatures as a mechanism to provide developers with cryptographic protections for the code they contribute to a repository. However, an entire class of other VCS-es, including the well-known Apache Subversion (SVN), lacks such protections. We design the first commit signing mechanism for centralized version control systems, which supports features such as working with a subset of the repository and allowing clients to work on disjoint sets of files without having to retrieve each other’s changes. We implement a prototype for the proposed commit signing mechanism on top of the SVN codebase and show experimentally that it only incurs a modest overhead. With our solution in place, the VCS security model is substantially improved.
CAPnet: A Defense Against Cache Accounting Attacks on Content Distribution Networks
G. Almashaqbeh, A. Bishop, K. Kelley, and J. Cappos
7th Annual IEEE Conference on Communications and Network Security (CNS '19)
Peer-assisted content distribution networks (CDNs) have emerged to improve performance and reduce deployment costs of traditional, infrastructure-based content delivery networks. This is done by employing peer-to-peer data transfers to supplement the resources of the network infrastructure. However, these hybrid systems are vulnerable to accounting attacks in which the peers, or caches, collude with clients in order to report that content was transferred when it was not. This is a particular issue in systems that incentivize cache participation, because malicious caches may collect rewards from the content publishers operating the CDN without doing any useful work. In this paper, we introduce CAPnet, the first technique that lets untrusted caches join a peer-assisted CDN while providing a bound on the effectiveness of accounting attacks. At its heart is a lightweight cache accountability puzzle that clients must solve before caches are given credit. This puzzle requires colocating the data a client has requested, so its solution confirms that the content has actually been retrieved. We analyze the security and overhead of our scheme in realistic scenarios. The results show that a modest client machine using a single core can solve puzzles at a rate sufficient to simultaneously watch dozens of 1080p videos. The technique is designed to be even more scalable on the server side. In our experiments, one core of a single low-end machine is able to generate puzzles for 4.26 Tbps of bandwidth, enabling 870,000 clients to concurrently view the same 1080p video. This demonstrates that our scheme can ensure cache accountability without degrading system productivity
API Blindspots: Why Experienced Developers Write Vulnerable Code
D. Oliveira, T. Lin, M. Rahman, R. Akefirad, D. Ellis, E. Perez, R. Bobhate, L. DeLong, J. Cappos, Y. Brun, and N. Ebner
14th Symposium on Usable Privacy and Security (SOUPS '18)
Despite the best efforts of the security community, security vulnerabilities in software are still prevalent, with new vulnerabilities reported daily and older ones stubbornly repeating themselves. One potential source of these vulnerabilities is shortcomings in the used language and library APIs. Developers tend to trust APIs, but can misunderstand or misuse them, introducing vulnerabilities. We call the causes of such misuse blindspots. In this paper, we study API blindspots from the developers’ perspective to: (1) determine the extent to which developers can detect API blindspots in code and (2) examine the extent to which developer characteristics (i.e., perception of code correctness, familiarity with code, confidence, professional experience, cognitive function, and personality) affect this capability. We conducted a study with 109 developers from four countries solving programming puzzles that involve Java APIs known to contain blindspots. We find that (1) The presence of blindspots correlated negatively with the developers’ accuracy in answering implicit security questions and the developers’ ability to identify potential security concerns in the code. This effect was more pronounced for I/O-related APIs and for puzzles with higher cyclomatic complexity. (2) Higher cognitive functioning and more programming experience did not predict better ability to detect API blindspots. (3) Developers exhibiting greater openness as a personality trait were more likely to detect API blindspots. This study has the potential to advance API security in (1) design, implementation, and testing of new APIs; (2) addressing blindspots in legacy APIs; (3) development of novel methods for developer recruitment and training based on cognitive and personality assessments; and (4) improvement of software development processes (e.g., establishment of security and functionality teams).
le-git-imate: Towards Verifiable Web-based Git Repositories
H. Afzali, S. Torres, R. Curtmola, and J.Cappos
13th ACM Asia Conference on Computer & Communications Security (AsiaCCS '18)
Web-based Git hosting services such as GitHub and GitLab are popular choices to manage and interact with Git repositories. However, they lack an important security feature—the ability to sign Git commits. Users instruct the server to perform repository operations on their behalf and have to trust that the server will execute their requests faithfully. Such trust may be unwarranted though because a malicious or a compromised server may execute the requested actions in an incorrect manner, leading to a different state of the repository than what the user intended.
In this paper, we show a range of high-impact attacks that can be executed stealthily when developers use the web UI of a Git hosting service to perform common actions such as editing files or merging branches. We then propose le-git-imate, a defense against these attacks which provides security guarantees comparable and compatible with Git's standard commit signing mechanism. We implement le-git-imate as a Chrome browser extension. le-git-imate does not require changes on the server side and can thus be used immediately. It also preserves current workflows used in Github/GitLab and does not require the user to leave the browser, and it allows anyone to verify that the server's actions faithfully follow the user's requested actions. Moreover, experimental evaluation using the browser extension shows that le-git-imate has comparable performance with Git's standard commit signature mechanism. With our solution in place, users can take advantage of GitHub/GitLab's web-based features without sacrificing security, thus paving the way towards verifiable web-based Git repositories.
Prevalence of Confusing Code in Software Projects: Atoms of Confusion in the Wild
D. Gopstein, H. Zhou, P. Frankl, and J. Cappos
The 15th International Conference on Mining Software Repositories (MSR '18)
Distinguished Paper Award
Prior work has shown that extremely small code patterns, such as the conditional operator and implicit type conversion, can cause considerable misunderstanding in programmers. Until now, the real world impact of these patterns - known as 'atoms of confusion' - was only speculative. This work uses a corpus of 14 of the most popular and influential open source C and C++ projects to measure the prevalence and significance of these small confusing patterns. Our results show that the 15 known types of confusing micro patterns occur millions of times in programs like the Linux kernel and GCC, appearing on average once every 23 lines. We show there is a strong correlation between these confusing patterns and bug-fix commits, as well as a tendency for confusing patterns to be commented. We also explore patterns at the project level showing the rate of security vulnerabilities is higher in projects with more atoms. Finally, we examine real code examples containing these atoms, including ones that were used to find and fix bugs in our corpus. In total, this work demonstrates that beyond simple misunderstanding in the lab setting, atoms of confusion are both prevalent - occurring often in real projects, and meaningful - being removed by bug-fix commits at an elevated rate.
Four Years Experience: Making Sensibility Testbed Work for SAS
Y. Zhuang, A. Rafetseder, R. Weiss, and J. Cappos
13th Annual Sensors Applications Symposium (SAS '18)
Sensibility Testbed is a framework for developing sensor-based applications that can run on user-provided smart- phones, and is easy to program. Over the past four years, we have been organizing hackathons at SAS in order to perform semi-controlled experiments with this platform. Any smartphone user can install Sensibility Testbed and develop a simple sensor application in less than a day. One of the problems with developing and testing such a framework is that there are many possible hardware platforms and system configurations. Hackathons provide an effective venue for observing the development of applications on a range of devices by users with no previous knowledge of the framework.
In this paper, we describe our experiences with hosting hackathons in a variety of venues, including the challenges of working in unfamiliar environments and with researchers who had no prior knowledge of the testbed. The feedback from participants has been very useful in identifying usability issues, hardware issues, and the types of sensor applications that users want to create.
Detecting and Comparing Brain Activity in Short Program Comprehension Using EEG
M.K.-C. Yeh, D. Gopstein, Y. Yan, and Y. Zhuang
IEEE Frontiers in Education Conference (FIE '17)
Program comprehension is a common task in software development. Programmers perform program comprehension at different stages of the software development life cycle. Detecting when a programmer experiences problems or confusion can be difficult. Self-reported data may be useful, but not reliable. More importantly, it is hard to use the self-reported feedback in real time. In this study, we use an inexpensive, non-invasive EEG device to record 8 subjects’ brain activity in short program comprehension. Subjects were presented either confusing or nonconfusing C/C++ code snippets. Paired sample t-tests are used to compare the average magnitude in alpha and theta frequency bands. The results show that the differences in the average magnitude in both bands are significant comparing confusing and non-confusing questions. We then use ANOVA to detect whether such difference also presented in the same type of questions. We found that there is no significant difference across questions of the same difficulty level. Our outcome, however, shows alpha and theta band powers both increased when subjects are under the heavy cognitive workload. Other research studies reported a negative correlation between (upper) alpha and theta band powers.
Practical Fog Computing with Seattle
A. Rafetseder, L. Pühringer, and J. Cappos
Fog World Congress 2017
In this paper we present Seattle, a practical and publicly accessible fog computing platform with a deployment history going back to 2009. Seattle’s cross-platform portable sandbox implementation tackles the widely-recognized issue of node heterogeneity. Its componentized architecture supports a number of approaches to operating a Seattle-based fog system, from isolated, standalone and peer-to-peer operations, to full-fledged provisioning by a dedicated operator, or federations of many operators. Seattle’s components and interfaces are designed for compatibility and reuse, and may be aligned with existing trust boundaries between different stakeholders. Seattle comprises implementations of all components discussed in this paper. Its free, open-source software stack has been used for teaching and research; outside groups have used existing Seattle components, and constructed new components with compatible interfaces in order to adapt the platform to their needs.
Understanding Misunderstandings in Source Code
D. Gopstein, J. Iannacone, Y. Yan, L. DeLong, Y. Zhuang, K.C. Yeh,and J. Cappos
The 2017 ACM SIGSOFT Symposium on the Foundations of Software Engineering (FSE 2017)
Distinguished Paper Award
Humans often mistake the meaning of source code, and so misjudge a programs true behavior. These mistakes can be caused by extremely small, isolated patterns in code, which can lead to significant runtime errors. These patterns are used in large, popular software projects and even recommended in style guides. To identify code patterns that may confuse programmers we extracted a preliminary set of 'atoms of confusion' from known confusing code. We show empirically in an experiment with 73 participants that these code patterns can lead to a significantly increased rate of misunderstanding versus equivalent code without the patterns. We then go on to take larger confusing programs and measure (in an experiment with 43 participants) the impact, in terms of programmer confusion, of removing these confusing patterns. All of our instruments, analysis code, and data are publicly available online for replication, experimentation, and feedback.
Mercury: Bandwidth-Effective Prevention of Rollback Attacks Against Community Repositories
T. Kuppusamy, V. Diaz, and J. Cappos
The 2017 USENIX Annual Technical Conference (USENIX ATC 2017)
A popular community repository such as Docker Hub, PyPI, or RubyGems distributes tens of thousands of soft- ware projects to millions of users. The large number of projects and users make these repositories attractive targets for exploitation. After a repository compromise, a malicious party can launch a number of attacks on unsuspecting users, including rollback attacks that revert projects to obsolete and vulnerable versions. Unfortunately, due to the rapid rate at which packages are updated, existing techniques that protect against rollback attacks would cause each user to download 2–3 times the size of an average package in metadata each month, making them impractical to deploy.
In this work, we develop a system called Mercury that uses a novel technique to compactly disseminate version information while still protecting against rollback attacks. Due to a different technique for dealing with key revocation, users are protected from rollback attacks, even if the software repository is compromised. This technique is bandwidth-efficient, especially when delta compression is used to transmit only the differences between previous and current lists of version information. An analysis we performed for the Python community shows that once Mercury is deployed on PyPI, each user will only download metadata each month that is about 3.5% the size of an average package. Our work has been incorporated into the latest versions of TUF, which is being integrated by Haskell, OCaml, RubyGems, Python, and CoreOS, and is being used in production by LEAP, Flynn, and Docker.
Lock-in-Pop: Securing Privileged Operating System Kernels by Keeping on the Beaten Path
Y. Li, B. Dolan-Gavitt, S. Weber, and J. Cappos
The 2017 USENIX Annual Technical Conference (USENIX ATC 2017)
Virtual machines (VMs) that try to isolate untrusted code are widely used in practice. However, it is often possible to trigger zero-day flaws in the host Operating System (OS) from inside of such virtualized systems. In this paper, we propose a new security metric showing strong correlation between “popular paths” and kernel vulnerabilities. We verify that the OS kernel paths accessed by popular applications in everyday use contain significantly fewer security bugs than less-used paths. We then demonstrate that this observation is useful in practice by building a prototype system which locks an application into using only popular OS kernel paths. By doing so, we demonstrate that we can prevent the triggering of zero-day kernel bugs significantly better than three other competing approaches, and argue that this is a practical approach to secure system design.
Measuring the Fitness of Fitness Trackers
C. Bender, J. Hoffstot, B. Combs, S. Hooshangi, and J. Cappos
IEEE Sensors Applications Symposium (SAS 2017)
Data collected by fitness trackers could play an important role in improving the health and well-being of the individuals who wear them. Many insurance companies even offer monetary rewards to participants who meet certain steps or calorie goals. However, in order for it to be useful, the collected data must be accurate and also reflect real-world performance. While previous studies have compared step counts data in controlled laboratory environments for limited periods of time, few studies have been done to measure performance over longer periods of time, while the subject does real-world activities. There are also few direct comparisons of a range of health indicators on different fitness tracking devices. In this study, we compared step counts, calories burned, and miles travelled data collected by three pairs of fitness trackers over a 14-day time period in free-living conditions. Our work indicates that the number of steps reported by different devices worn simultaneously could vary as much as 26%. At the same time, the variations seen in distance travelled, based on the step count, followed the same trends. Little correlation was found between the number of calories burned and the variations seen in the step count across multiple devices. Our results demonstrate that the reporting of health indicators, such as calories burned and miles travelled, are heavily dependent on the device itself, as well as the manufacturer’s proprietary algorithm to calculate or infer such data. As a result, it is difficult to use such measurements as an accurate predictor of health outcomes, or to develop a consistent criteria to rate the performance of such devices in head-to-head comparisons.
Uptane: Securing Software Updates for Automobiles
T.K. Kuppusamy, A. Brown, and S.Awwad, D. McCoy, R. Bielawski, C. Mott, S. Lauzon, A. Weimerskirch, and J. Cappos
14th Embedded Security in Cars Conference (escar 2016)
Software update systems for automobiles can deliver significant benefits, but, if not implemented carefully, they could potentially incur serious security vulnerabilities. Previous solutions for securing software updates consider standard attacks and deploy widely understood security mechanisms, such as digital signatures for the software updates, and hardware security modules (HSM) to sign software updates. However, no existing solution considers more advanced security objectives, such as resilience against a repository compromise, or freeze attacks to the vehicle's update mechanism, or a compromise at a supplier's site. Solutions developed for the PC world do not generalize to automobiles for two reasons: first, they do not solve problems that are unique to the automotive industry (e.g., that there are many different types of computers to be updated on a vehicle), and second, they do not address security attacks that can cause a vehicle to fail (e.g. a man-in-the-middle attack without compromising any signing key) or that can cause a vehicle to become unsafe. In this paper, we present Uptane, the first software update framework for automobiles that counters a comprehensive array of security attacks, and is resilient to partial compromises. Uptane adds strategic features to the state-of-the-art software update framework, TUF, in order to address automotivespecific vulnerabilities and limitations. Uptane is flexible and easy to adopt, and its design details were developed together with the main automotive industry stakeholders in the USA.
On omitting commits and committing omissions: Preventing git metadata tampering that (re) introduces software vulnerabilities
S. Torres-Arias, A. Ammula, R. Curtmola, and J. Cappos
25th USENIX Security Symposium (USENIX Sec 2016)
Metadata manipulation attacks represent a new threat class directed against Version Control Systems, such as the popular Git. This type of attack provides inconsistent views of a repository state to different developers, and deceives them into performing unintended operations with often negative consequences. These include omitting security patches, merging untested code into a production branch, and even inadvertently installing software containing known vulnerabilities. To make matters worse, the attacks are subtle by nature and leave no trace after being executed. We propose a defense scheme that mitigates these attacks by maintaining a cryptographically-signed log of relevant developer actions. By documenting the state of the repository at a particular time when an action is taken, developers are given a shared history, so irregularities are easily detected. Our prototype implementation of the scheme can be deployed immediately as it is backwards compatible and preserves current workflows and use cases for Git users. An evaluation shows that the defense adds a modest overhead while offering significantly stronger security. We performed responsible disclosure of the attacks and are working with the Git community to fix these issues in an upcoming version of Git.
Diplomat: Using Delegations to Protect Community Repositories
T. Kuppusamy, S. Torres-Arias, V. Diaz, and J. Cappos
13th USENIX Symposium on Networked Systems Design and Implementation (NSDI 16)
Community repositories, such as Docker Hub, PyPI, and RubyGems, are bustling marketplaces that distribute software. Even though these repositories use common software signing techniques (e.g., GPG and TLS), attackers can still publish malicious packages after a server compromise. This is mainly because a community repository must have immediate access to signing keys in order to certify the large number of new projects that are registered each day.
This work demonstrates that community repositories can offer compromise-resilience and real-time project registration by employing mechanisms that disambiguate trust delegations. This is done through two delegation mechanisms that provide flexibility in the amount of trust assigned to different keys. Using this idea we implement Diplomat, a software update framework that supports security models with different security / usability tradeoffs. By leveraging Diplomat, a community repository can achieve near-perfect compromise-resilience while allowing real-time project registration. For example, when Diplomat is deployed and configured to maximize security on Python's community repository, less than 1% of users will be at risk even if an attacker controls the repository and is undetected for a month. Diplomat is being integrated by Ruby, CoreOS, Haskell, OCaml, and Python, and has already been deployed by Flynn, LEAP, and Docker.
Finding Sensitive Accounts on Twitter: An Automated Approach Based on Follower Anonymity
S.T. Peddinti, K.W. Ross, and J. Cappos
Tenth International AAAI Conference on Web and Social Media (ICWSM 16)
We explore the feasibility of automatically finding accounts that publish sensitive content on Twitter, by examining the percentage of anonymous and identifiable followers the accounts have. We first designed a machine learning classifier to automatically determine if a Twitter account is anonymous or identifiable. We then classified an account as potentially sensitive based on the percentages of anonymous and identifiable followers the account has. We applied our approach to approximately 100,000 accounts with 404 million active followers. The approach uncovered accounts that were sensitive for a diverse number of reasons.
Detecting Latent Cross-platform API Violations
J. Rasley, E. Gessiou, T. Ohmann, Y. Brun, S. Krishnamurthi and J. Cappos
2015 IEEE 26th International Symposium on Software Reliability Engineering (ISSRE 2015)
Many APIs enable cross-platform system development by abstracting over the details of a platform, allowing application developers to write one implementation that will run on a wide variety of platforms. Unfortunately, subtle differences in the behavior of the underlying platforms make cross-platform behavior difficult to achieve. As a result, applications using these APIs can be plagued by bugs difficult to observe before deployment. These portability bugs can be particularly difficult to diagnose and fix because they arise from the API implementation, the operating system, or hardware, rather than application code. This paper describes CheckAPI, a technique for detecting violations of cross-platform portability. CheckAPI compares an application's interactions with the API implementation to its interactions with a partial specification-based API implementation, and does so efficiently enough to be used in real production systems and at runtime. CheckAPI finds latent errors that escape pre-release testing. This paper discusses the subtleties of different kinds of API calls and strategies for effectively producing the partial implementations. Validating CheckAPI on JavaScript, the Seattle project's Repy VM, and POSIX detects dozens of violations that are confirmed bugs in widely-used software.
Trust Evaluation in Mobile Devices: An Empirical Study
R. Weiss, L. Reznik, Y. Zhuang, A. Hoffman, D. Pollard, A. Rafetseder, T. Li, and J. Cappos
2015 IEEE Trustcom/BigDataSE/ISPA
2015
Mobile devices today, such as smartphones and tablets, have become both more complex and diverse. This paper presents a framework to evaluate the trustworthiness of the individual components in a mobile system, as well as the entire system. The major components are applications, devices and networks of devices. Given this diversity and multiple levels of a mobile system, we develop a hierarchical trust evaluation methodology, which enables the combination of trust metrics and allows us to verify the trust metric for each component based on the trust metrics for others. The paper first demonstrates this idea for individual applications and Android-based smartphones. The methodology involves two stages: initial trust evaluation and trust verification. In the first stage, an expert rule system is used to produce trust metrics at the lowest level of the hierarchy. In the second stage, the trust metrics are verified by comparing data from components and a trust evaluation is produced for the combined system. This paper presents the results of two empirical studies, in which this methodology is applied and tested. The first study involves monitoring resource utilization and evaluating trust based on resource consumption patterns. We measured battery voltage, CPU utilization and network communication for individual apps and detected anomalous behavior that could be indicative of malicious code. The second study involves verification of the trust evaluation by comparing the data from two different devices: the GPS location from an Android smartphone in an automobile and the data from an on-board diagnostics (OBD) sensor of the same vehicle.
Fence: Protecting Device Availability With Uniform Resource Control
T. Li, A. Rafetseder, R. Fonseca, and J. Cappos
2015 USENIX Annual Technical Conference (USENIX ATC 15)
Applications such as software updaters or a run-away web app, even if low priority, can cause performance degradation, loss of battery life, or other issues that reduce a computing device's availability. The core problem is that OS resource control mechanisms unevenly apply uncoordinated policies across different resources. This paper shows how handling resources - e.g., CPU, memory, sockets, and bandwidth - in coordination, through a unifying abstraction, can be both simpler and more effective. We abstract resources along two dimensions of fungibility and renewability, to enable resource-agnostic algorithms to provide resource limits for a diverse set of applications. We demonstrate the power of our resource abstraction with a prototype resource control subsystem, Fence, which we implement for two sandbox environments running on a wide variety of operating systems (Windows, Linux, the BSDs, Mac OS X, iOS, Android, OLPC, and Nokia) and device types (servers, desktops, tablets, laptops, and smartphones). We use Fence to provide systemwide protection against resource hogging processes that include limiting battery drain, preventing overheating, and isolating performance. Even when there is interference, Fence can double the battery life and improve the responsiveness of other applications by an order of magnitude. Fence is publicly available and has been deployed in practice for five years, protecting tens of thousands of users.
A Fast Multi-Server, Multi-Block Private Information Retrieval Protocol
L. Wang, T. Kuppusamy, Y. Liu, and J. Cappos
IEEE GLOBECOM 2015 Conference (GLOBECOM 2015)
Private Information Retrieval (PIR) allows users to retrieve information from a database without revealing which information in the database was queried. The traditional information-theoretic PIR schemes utilize multiple servers to download a single data block, thus incurring high communication overhead and high computation burdens. In this paper, we develop an information-theoretic multi-block PIR scheme that significantly reduces client communication and computation overheads by downloading multiple data blocks at a time. The design of k-safe binary matrices insures the information will not be revealed even if up to k servers collude. Our scheme has much lower overhead than classic PIR schemes. The implementation of fast XOR operations benefits both servers and clients in reducing coding and decoding time. Our work demonstrates that multiblock PIR scheme can be optimized to simultaneously achieve low communication and computation overhead, comparable to even non-PIR systems, while maintaining a high level of privacy.
Selectively Taming Background Android Apps to Improve Battery Lifetime
M. Martins, J. Cappos, and R. Fonseca
2015 USENIX Annual Technical Conference (USENIX ATC 15)
Background activities on mobile devices can cause significant battery drain with little visibility or recourse to the user. They can range from useful but sometimes overly aggressive tasks, such as polling for messages or updates from sensors and online services, to outright bugs that cause resources to be held unnecessarily. In this paper we instrument the Android OS to characterize background activities that prevent the device from sleeping. We present TAMER, an OS mechanism that interposes on events and signals that cause task wakeups, and allows for their detailed monitoring, filtering, and rate-limiting. We demonstrate how TAMER can help reduce battery drain in scenarios involving popular Android apps with background tasks. We also show how TAMER can mitigate the effects of well-known energy bugs while maintaining most of the apps' functionality. Finally, we elaborate on how developers and users can devise their own application-control policies for TAMER to maximize battery lifetime.
A First Look at Vehicle Data Collection via Smartphone Sensors
M. Reininger, S. Miller, Y. Zhuang, and J. Cappos
2015 IEEE Sensors Applications Symposium (SAS 2015)
Smartphones serve as a technical interface to the outside world. These devices have embedded, on-board sensors (such as accelerometers, WiFi, and GPSes) that can provide valuable information for investigating users' needs and behavioral patterns. Similarly, computers that are embedded in vehicles are capable of collecting valuable sensor data that can be accessed by smartphones through the use of On-Board Diagnostics (OBD) sensors. This paper describes a prototype of a mobile computing platform that provides access to vehicles' sensors by using smartphones and tablets, without compromising these devices' security. Data such as speed, engine RPM, fuel consumption, GPS locations, etc. are collected from moving vehicles by using a WiFi On-Board Diagnostics (OBD) sensor, and then backhauled to a remote server for both real-time and offline analysis. We describe the design and implementation details of our platform, for which we developed a library for in-vehicle sensor access and created a non-relational database for scalable backend data storage. We propose that our data collection and visualization tools are useful for analyzing driving behaviors; we also discuss future applications, security, and privacy concerns specific to vehicular networks.
Can the Security Mindset Make Students Better Testers?
S. Hooshangi, R. Weiss, and J. Cappos
Proceedings of the 46th ACM Technical Symposium on Computer Science Education (SIGCSE '15)
Writing secure code requires a programmer to think both as a defender and an attacker. One can draw a parallel between this model of thinking and techniques used in test-driven development, where students learn by thinking about how to effectively test their code and anticipate possible bugs. In this study, we analyzed the quality of both attack and defense code that students wrote for an assignment given in an introductory security class of 75 (both graduate and senior undergraduate levels) at NYU. We made several observations regarding students' behaviors and the quality of both their defensive and offensive code. We saw that student defensive programs (i.e., assignments) are highly unique and that their attack programs (i.e., test cases) are also relatively unique. In addition, we examined how student behaviors in writing defense programs correlated with their attack program's effectiveness. We found evidence that students who learn to write good defensive programs can write effective attack programs, but the converse is not true. While further exploration of causality is needed, our results indicate that a greater pedagogical emphasis on defensive security may benefit students more than one that emphasizes offense.
It's the Psychology Stupid: How Heuristics Explain Software Vulnerabilities and How Priming Can Illuminate Developer's Blind Spots
D. Oliveira, M. Rosenthal, N. Morin, K-C Yeh, J. Cappos, and Y. Zhuang
Proceedings of the 30th Annual Computer Security Applications Conference (ACSAC '14)
Despite the security community's emphasis on the importance of building secure software, the number of new vulnerabilities found in our systems is increasing. In addition, vulnerabilities that have been studied for years are still commonly reported in vulnerability databases. This paper investigates a new hypothesis that software vulnerabilities are blind spots in developer's heuristic-based decision-making processes. Heuristics are simple computational models to solve problems without considering all the information available. They are an adaptive response to our short working memory because they require less cognitive effort. Our hypothesis is that as software vulnerabilities represent corner cases that exercise unusual information flows, they tend to be left out from the repertoire of heuristics used by developers during their programming tasks.
To validate this hypothesis we conducted a study with 47 developers using psychological manipulation. In this study each developer worked for approximately one hour on six vulnerable programming scenarios. The sessions progressed from providing no information about the possibility of vulnerabilities, to priming developers about unexpected results, and explicitly mentioning the existence of vulnerabilities in the code. The results show that (i) security is not a priority in software development environments, (ii) security is not part of developer's mindset while coding, (iii) developers assume common cases for their code, (iv) security thinking requires cognitive effort, (v) security education helps, but developers can have difficulties correlating a particular learned vulnerability or security information with their current working task, and (vi) priming or explicitly cueing about vulnerabilities on-the-spot is a powerful mechanism to make developers aware about potential vulnerabilities.
On the Internet, Nobody Knows You're a Dog: A Twitter Case Study of Anonymity in Social Networks
S.T. Peddinti, K.W. Ross, and J. Cappos
Proceedings of the Second ACM Conference on Online Social Networks (COSN '14)
Twitter does not impose a Real-Name policy for usernames, giving users the freedom to choose how they want to be identified. This results in some users being Identifiable (disclosing their full name) and some being Anonymous (disclosing neither their first nor last name).
In this work we perform a large-scale analysis of Twitter to study the prevalence and behavior of Anonymous and Identifiable users. We employ Amazon Mechanical Turk (AMT) to classify Twitter users as Highly Identifiable, Identifiable, Partially Anonymous, and Anonymous. We find that a significant fraction of accounts are Anonymous or Partially Anonymous, demonstrating the importance of Anonymity in Twitter. We then select several broad topic categories that are widely considered sensitive--including pornography, escort services, sexual orientation, religious and racial hatred, online drugs, and guns--and find that there is a correlation between content sensitivity and a user's choice to be anonymous. Finally, we find that Anonymous users are generally less inhibited to be active participants, as they tweet more, lurk less, follow more accounts, and are more willing to expose their activity to the general public. To our knowledge, this is the first paper to conduct a large-scale data-driven analysis of user anonymity in online social networks.
NetCheck: Network Diagnoses from Blackbox Traces
Y. Zhuang, E. Gessiou, S. Portzer, F. Fund, M. Muhammad, I. Beschastnikh, and J. Cappos
11th USENIX Symposium on Networked Systems Design and Implementation (NSDI 14)
This paper introduces NetCheck, a tool designed to diagnose network problems in large and complex applications. NetCheck relies on blackbox tracing mechanisms, such as strace, to automatically collect sequences of network system call invocations generated by the application hosts. NetCheck performs its diagnosis by (1) totally ordering the distributed set of input traces, and by (2) utilizing a network model to identify points in the totally ordered execution where the traces deviated from expected network semantics.
Our evaluation demonstrates that NetCheck is able to diagnose failures in popular and complex applications without relying on any application- or network-specific information. For instance, NetCheck correctly identified the existence of NAT devices, simultaneous network disconnection/ reconnection, and platform portability issues. In a more targeted evaluation, NetCheck correctly detects over 95% of the network problems we found from bug trackers of projects like Python, Apache, and Ruby. When applied to traces of faults reproduced in a live network, NetCheck identified the primary cause of the fault in 90% of the cases. Additionally, NetCheck is efficient and can process a GB-long trace in about 2 minutes.
BlurSense: Dynamic fine-grained access control for smartphone privacy
J. Cappos, L. Wang, R. Weiss, Y. Yang, and Y. Zhuang
IEEE Sensors Applications Symposium (SAS 2014)
For many people, smartphones serve as a technical interface to the modern world. These smart devices have embedded on-board sensors, such as accelerometers, gyroscopes, GPS sensors, and cameras, which can be used to develop new mobile applications. However, the sensors also pose privacy risks to users. This work describes BlurSense, a tool that provides secure and customizable access to all of the sensors on smartphones, tablets, and similar end user devices. The current access control to the smartphone resources, such as sensor data, is static and coarse-grained. BlurSense is a dynamic, fine-grained, flexible access control mechanism, acting as a line of defense that allows users to define and add privacy filters. As a result, the user can expose filtered sensor data to untrusted apps, and researchers can collect data in a way that safeguards users' privacy.
Teaching the Security Mindset with Reference Monitors
J. Cappos and R. Weiss
Proceedings of the 45th ACM Technical Symposium on Computer Science Education (SIGCSE '14)
One of the central skills in computer security is reasoning about how programs fail. As a result, computer security necessarily involves thinking about the corner cases that arise when software executes. An unfortunate side effect of this is that computer security assignments typically necessitate deep understanding of a topic, such as how the stack is laid out in memory or how web applications interact with databases. This work presents a series of assignments that require very little background knowledge from students, yet provide them with the ability to reason about failures in programs. In this set of assignments, students implement two very simple programs in a high-level language (Python). Students first implement a reference monitor that tries to uphold a security property within a sandbox. For the second portion, the students are provided each others' reference monitors and then write attack code to try to bypass the reference monitors. By leveraging a Python-based sandbox, student code is isolated cleanly, which simplifies development and grading. These assignments have been used in about a dozen classes in a range of environments, including a research university, online classes, and a four year liberal arts school. Student and instructor feedback has been overwhelmingly positive. Furthermore, survey results demonstrate that after a 2-3 week module, 76% of the students who did not understand reference monitors and access control learned these key security concepts.
Avoiding Theoretical Optimality to Efficiently and Privately Retrieve Security Updates
J. Cappos
Financial Cryptography and Data Security - 17th International Conference, FC 2013, Revised Selected Papers
2013
This work demonstrates the feasibility of building a PIR system with performance similar to non-PIR systems in real situations. Prior Chor PIR systems have chosen block sizes that are theoretically optimized to minimize communication. This (ironically) reduces the throughput of the resulting system by roughly 50x. We constructed a Chor PIR system called upPIR that is efficient by choosing block sizes that are theoretically suboptimal (from a communications standpoint), but fast and efficient in practice. For example, an upPIR mirror running on a threeyear-old desktop provides security updates from Ubuntu 10.04 (1.4 GB of data) fast enough to saturate a T3 link. Measurements run using mirrors distributed around the Internet demonstrate that a client can download software updates with upPIR about as quickly as with FTP.
Survivable Key Compromise in Software Update Systems
J. Samuel, N. Matthewson, J. Cappos, R. Dingledine
17th ACM Conference on Computer and Communications Security (CCS'10)
Today’s software update systems have little or no defense against key compromise. As a result, key compromises have put millions of software update clients at risk. Here we identify three classes of information whose authenticity and integrity are critical for secure software updates. Analyzing existing software update systems with our framework, we find their ability to communicate this information securely in the event of a key compromise to be weak or nonexistent. We also find that the security problems in current software update systems are compounded by inadequate trust revocation mechanisms. We identify core security principles that allow software update systems to survive key compromise. Using these ideas, we design and implement TUF, a software update framework that increases resilience to key compromise.
Retaining Sandbox Containment Despite Bugs in Privileged Memory-Safe Code
J. Cappos, A. Dadgar, J. Rasley, J. Samuel, I. Beschastnikh, C. Barsan, A. Krishnamurthy, and T. Anderson
17th ACM Conference on Computer and Communications Security (CCS '10)
Flaws in the standard libraries of secure sandboxes represent a major security threat to billions of devices worldwide. The standard libraries are hard to secure because they frequently need to perform low-level operations that are forbidden in untrusted application code. Existing designs have a single, large trusted computing base that contains security checks at the boundaries between trusted and untrusted code. Unfortunately, flaws in the standard library often allow an attacker to escape the security protections of the sandbox. In this work, we construct a Python-based sandbox that has a small, security-isolated kernel. Using a mechanism called a security layer, we migrate privileged functionality into memory-safe code on top of the sandbox kernel while retaining isolation. For example, significant portions of module import, file I/O, serialization, and network communication routines can be provided in security layers. By moving these routines out of the kernel, we prevent attackers from leveraging bugs in these routines to evade sandbox containment. We demonstrate the effectiveness of our approach by studying past bugs in Java’s standard libraries and show that most of these bugs would likely be contained in our sandbox.
Seattle: A Platform for Educational Cloud Computing
J. Cappos, I. Beschastnikh, A. Krishnamurthy, and T. Anderson
Proceedings of the 40th ACM Technical Symposium on Computer Science Education (SIGCSE '09)
Cloud computing is rapidly increasing in popularity. Companies such as RedHat, Microsoft, Amazon, Goog le, and IBM are increasingly funding cloud computing infrastructure and research, making it important for students to gain the necessary skills to work with cloud-based resources. This paper presents a free, educational research platform called Seattle that is community-driven, a common denominator for diverse platform types, and is broadly deployed. Seattle is community-driven - universities donate available compute resources on multi-user machines to the platform. These donations can come from systems with a wide variety of systems operating and architectures, removing the need for a dedicated infrastructure. Seattle is also surprisingly flexible and supports a variety of pedagogical uses because as a platform it represents a common denominator for cloud computing, grid computing, peer-to-peer networking, distributed systems, and networking. Seattle programs are portable. Students' code can run across different operating systems and architectures without change, while the Seattle programming language is expressive enough for experimentation at a fine-grained level. Our current deployment of Seattle consists of about one thousand computers that are distributed around the world. We invite the computer science education community to employ Seattle in their courses.
Workshop Papers
ABC: A Cryptocurrency-Focused Threat Modeling Framework
G. Almashaqbeh, A. Bishop, and J. Cappos
2nd Workshop on Cryptocurrencies and Blockchains for Distributed Systems (CryBlock '19)
Cryptocurrencies are an emerging economic force, but there are concerns about their security. This is due, in part, to complex collusion cases and new threat vectors, that could be missed by conventional security assessment strategies. To address these issues, we propose ABC, an Asset-Based Cryptocurrency-focused threat modeling framework capable of identifying such risks. ABC’s key innovation is the use of collusion matrices. A collusion matrix forces a threat model to cover a large space of threat cases while simultaneously manages this process to prevent it from being overly complex. We demonstrate that ABC is effective by presenting real-world use cases and by conducting a user study. The user study showed that around 71% of those who used ABC were able to identify financial security threats, as compared to only 13% of participants who used the popular framework STRIDE.
Sensibility Testbed: Automated IRB Policy Enforcement in Mobile Research Apps
Y. Zhuang, A. Rafetseder, Y. Hu, Y. Tian, and J. Cappos
Proceedings of the 19th International Workshop on Mobile Computing Systems (HotMobile '18)
Due to their omnipresence, mobile devices such as smartphones could be tremendously valuable to researchers. However, since research projects can extract data about device owners that could be personal or sensitive, there are substantial privacy concerns. Currently, the only regulation to protect user privacy for research projects is through Institutional Review Boards (IRBs) from researchers’ institutions. However, there is no guarantee that researchers will follow the IRB protocol. Even worse, researchers without security expertise might build apps that are vulnerable to attacks. In this work, we present a platform, Sensibility Testbed, for automated enforcement of the privacy policies set by IRBs. Our platform enforces such policies when a researcher runs code on mobile devices. The enforcement mechanism is a set of obfuscation layers in a secure sandbox, that can be customized for any level of IRB compliance, and can be augmented by policies set by the device owner.
Vulnerabilities as Blind Spots in Developer's Heuristic-Based Decision-Making Processes
J. Cappos, Y. Zhuang, D. Oliveira, N. Rosenthal, and K-C Yeh
Proceedings of the 2014 New Security Paradigms Workshop (NSPW '14')
The security community spares no effort in emphasizing security awareness and the importance of building secure software. However, the number of new vulnerabilities found in today's systems is still increasing. Furthermore, old and well-studied vulnerability types such as buffer overflows and SQL injections, are still repeatedly reported in vulnerability databases. Historically, the common response has been to blame the developers for their lack of security education. This paper discusses a new hypothesis to explain this problem by introducing a new security paradigm where software vulnerabilities are viewed as developers' blind spots in their decision making. We argue that such a flawed mental process is heuristic-based, where humans solve problems without considering all the information available, just like taking shortcuts. This paper's thesis is that security thinking tends to be left out by developers during their programming, as vulnerabilities usually exist in corner cases with unusual information flows. Leveraging this paradigm, this paper introduces a novel methodology for capturing and understanding security-related blind spots in Application Programming Interfaces (APIs). Finally, it discusses how this methodology can be applied to the design and implementation of the next generation of automated diagnosis tools.
Experience with Seattle: A Community Platform for Research and Education
Y. Zhuang, A. Rafetseder and J. Cappos
Second GENI Research and Educational Experiment Workshop
2013
Hands-on experience is a critical part of research and education. Today's distributed testbeds fulfill that need for many students studying networking, distributed systems, cloud computing, security, operating systems, and similar topics. In this work, we discuss one such testbed, Seattle. Seattle is an open research and educational testbed that utilizes computational resources provided by end users on their existing devices. Unlike most other platforms, resources are not dedicated to the platform which allows a greater degree of network diversity and realism at the cost of programmability. Seattle is designed to preserve user security and to minimally impact application performance. We describe the architectural design of Seattle, and summarize our experiences with Seattle over the past few years as both researchers and educators. We have found that Seattle is very easy to adopt due to cross-platform support, and is also surprisingly easy for students to use. While there are programmability limitations, it is possible to construct complex applications integrated with real devices, networks, and users with Seattle as a core component. From an educational standpoint, Seattle has been shown not only to be useful as a teaching tool, it has been successful in variety of different systems classes at a variety of different types of schools. In our experience, when low-level programmability is not the main requirement, Seattle can supersede many existing testbeds for diverse educational and research tasks
Sensorium-A Generic Sensor Framework
A. Rafetseder, F. Metzger, L. Pühringer, K. Tutschku, Y. Zhuang, and J. Cappos
2013
This contribution describes Sensorium, our framework for accessing sensor values on computing devices and making them available to other applications. Meanwhile, it allows users control the exposure of privacy-related data. Our goal is to bring the sensing capabilities of modern devices to a broader range of reseachers and experimenters via an open source framework. We also present a real application making use of Sensorium's virtues: For our web service Open3GMap, we crowd-source radio reception quality measurements in 3G networks. We combine the data into an open geo-information system.
Towards a Representive Testbed: Harnessing Volunteers for Networks research
M. Muhammad and J. Cappos
The First GENI Research and Educational Workshop, GREE
2012
A steady rise in home systems has been seen over the past few years. As more systems are designed and deployed, an appropriate testbed is required to test these systems. Several systems exist, such as PlanetLab, that currently provide a networking testbed allowing researchers and developers to test and measure various applications. However in the long run such testbeds will be unable to keep up and meet all the demands of many of the large scale modern day peer-to-peer systems. We outline the various challenges and essentials of a networking testbed and we provide an alternate networking testbed that is driven by resources that are voluntarily contributed. We talk about the various advantages and disadvantages of the Seattle system, an open source peer-topeer computing testbed that has the potential to meet these demands. The testbed is composed of sandboxed resources that are donated by volunteers. Seattle has been deployed for about three years and supports many researchers who are interested in a networking testbed. The testbed consists of over 4100 nodes and is constantly growing. Seattle looks to grow and meet the demands of networking testbeds as they are made.
Lind: Challenges Turning Virtual Composition into Reality
C. Matthews, J. Cappos, R. McGeer, S. Neville, and Y. Coady
Workshop on Free Composition (FREECO '11)
Security is a constant sore spot in application development. Applications now need structural support for better isolation and security on a domain specific basis to stave off the multitude of modern security vulnerabilities. Currently, application developers have been relying upon cumbersome workarounds to address these issues. We propose the design and initial implementation details for Lind, a highly flexible composition infrastructure that can be well-integrated with modern application development processes and extends traditional mechanisms like virtualization and software fault isolation in a way that can be tailored according to an application's need. Lind does this by providing the structures and services needed to build a virtual component model. Since compositions of virtual components are different than current software systems, building and using virtual component models provides a new set of software engineering challenges in composition and system construction. As a possible solution to many modern security problems, it is important to understand how virtual component models can be evaluated, to further both the users understanding of them, and future research in this area. This paper proposes a design and implementation strategy for components that run in isolation. An evaluation of the efficacy of this approach in terms of performance, isolation, security and composition provides insight into the possible advantages and disadvantages of a virtual component model.
ET (Smart) Phone Home!
L. Collares, C. Matthews, J. Cappos, Y. Coady, and R. McGeer
Workshop on NExt-generation Applications of smarTphones (NEAT'11)
Most home users are not able to troubleshoot advanced network issues themselves. Hours on the phone with an ISP's customer representative is a common way to solve this problem. With the advent of mobile devices with both Wi-Fi and cellular radios, troubleshooters at the ISP have a new back-door into a malfunctioning residential network. However, placing full trust in an ISP is a poor choice for a home user. In this paper we present Extra Technician (ET), a system designed to provide ISPs and others with an environment to troubleshoot home networking in a remote, safe and flexible manner.
Model-based Testing Without a Model: Assessing Portability in the Seattle Testbed
J.Cappos and J. Jacky
5th Workshop on Systems Software Verification (SSV'10)
2010
Despite widespread OS, network, and hardware heterogeneity, there has been a lack of research into quantifying and improving portability of a programming environment. We have constructed a distributed testbed called Seattle built on a platform-independent programming API that is implemented on different operating systems and architectures. Our goal is to show that applications written to our API will be portable. In this work, we use an instrumented version of the programming environment for testing purposes. The instrumentation allows us to gather traces of actual program behavior from a running implementation. These traces can be used across different versions of the implementation exactly as if they were test cases generated offline from a model program, so we can commence testing using model based testing tools, without constructing a model program. Such offline testing is only effective in scenarios where traces are expected to be reproducible (deterministic). Where reproducibility is not expected, for instance due to nondeterminism in the network environment, we must resort to on-the-fly testing, which does require a model program. To validate this model program, we can use the recorded traces of actual behavior. Validating with captured traces should provide greater coverage than we could achieve by validating only with traces constructed a priori.
Journal Articles, Magazines, Tech Reports, etc.
Identifying Program Confusion using Electroencephalogram Measurements
M.K-C. Yeh, Y. Yan, Y. Zhuang, L.A. DeLong
Behaviour & Information Technology
In this paper, we present an experimental study in which an electroencephalogram (EEG) device was used to measure cognitive load in programmers as they attempted to predict the output of C code snippets. Our goal was to see if particular patterns within the snippet induced higher levels of cognitive load, and if the collected EEG data might provide more detailed insights than performance measures. Our results suggest that while cognitive load can be an influence on code comprehension performance, other human factors, such as a tendency to forget certain programming rules or to misread what the code is asking them to do may also play a role, particularly for novice programmers. We conclude that: (1) different types of code patterns can affect programmers’ cognitive processes in disparate ways, (2) neither self-reported data nor brainwave activity alone is a reliable indicator of programmers’ level of comprehension for all types of code snippets, (3) EEG techniques could be useful to better understand the relationships between program comprehension, code patterns and cognitive processes, and (4) tests like ours could be useful to identify crucial learning gaps in novice programmers, which, in turn can be leveraged to improve programming tools and teaching strategies.
Towards Adding Verifiability to Web-based Git Repositories
H. Afzali, S. Torres-Arias, R. Curtmola, J. Cappos
The Journal of Computer Security
“Web-based Git hosting services such as GitHub and GitLab are popular choices to manage and interact with Git repositories. However, they lack an important security feature — the ability to sign Git commits. Users instruct the server to perform repository operations on their behalf and have to trust that the server will execute their requests faithfully. Such trust may be unwarranted though because a malicious or a compromised server may execute the requested actions in an incorrect manner, leading to a different state of the repository than what the user intended. In this paper, we show a range of high-impact attacks that can be executed stealthily when developers use the web UI of a Git hosting service to perform common actions such as editing files or merging branches. We then propose le-git-imate, a defense against these attacks, which enables users to protect their commits using Git’s standard commit signing mechanism. We implement le-git-imate as a Chrome browser extension. le-git-imate does not require changes on the server side and can thus be used immediately. It also preserves current workflows used in Github/GitLab and does not require the user to leave the browser, and it allows anyone to verify that the server’s actions faithfully follow the user’s requested actions. Moreover, experimental evaluation using the browser extension shows that le-git-imate has comparable performance with Git’s standard commit signature mechanism. With our solution in place, users can take advantage of GitHub/GitLab’s web-based features without sacrificing security, thus paving the way towards verifiable web-based Git repositories.”
Using a Dual-Layer Specification to Offer Selective Interoperability for Uptane
M. Moore, I. McDonald, A. Weimerskirch, S. Awwad, L. A. DeLong, J. Cappos
ESCAR USA 2020 Special Issue
This work introduces the concept of a dual-layer specification structure for standards that separates interoperability functions, such as backwards compatibility, localization, and deployment, from those essential to reliability, security, and functionality. The latter group of features, which constitute the actual standard, make up the baseline layer for instructions, while all the elements required for interoperability are specified in a second layer, known as a Protocols, Operations, Usage, and Formats (POUF) document. We applied this technique in the development of a standard for Uptane, a security framework for over-the-air software updates used in many automobiles. This standard is a good candidate for a dual-layer specification because it requires communication between entities, but does not require a specific format for this communication. By deferring wire protocols and other implementation details to POUFs, the creators of the Uptane Standard were able to focus on the basic procedures and operations needed to secure automotive updates. We demonstrate the effectiveness of this format by specifying a POUF for the Uptane Reference Implementation.
IEEE-ISTO 6100.1.0.0 Uptane Standard for Design and Implementation
Uptane Standards Group
2019
Uptane is a secure software update framework for ground vehicles. This document describes procedures to enable programmers for OEMs and suppliers to securely design and implement this framework in a manner that better protects connected units on ground vehicles. Integrating Uptane as outlined in the sections that follow can reduce the ability of attackers to compromise critical systems. It also assures a faster and easier recovery process should a compromise occur.
Tsumiki: A Meta-Platform for Building Your Own Testbed
J.Cappos, Y. Zhuang, A. Rafetseder, and I. Beschastnikh
Transactions on Parallel and Distributed Systems
2018
Network testbeds are essential research tools that have been responsible for valuable network measurements and major advances in distributed systems research. However, no single testbed can satisfy the requirements of every research project,prompting continual efforts to develop new testbeds. The common practice is to re-implement functionality anew for each testbed. This work introduces a set of ready-to-use software components and interfaces called Tsumiki to help researchers to rapidly prototype custom networked testbeds without substantial effort. We derive Tsumiki’s design using a set of component and interface design principles, and demonstrate that Tsumiki can be used to implement new, diverse, and useful testbeds. We detail a few such testbeds: a testbed composed of Android devices, a testbed that uses Docker for sandboxing, and a testbed that shares computation and storage resources among Facebook friends. A user study demonstrated that students with no prior experience with networked testbeds were able to use Tsumiki to create a testbed with new functionality and run an experiment on this testbed in under an hour. Furthermore, Tsumiki has been used in production in multiple testbeds, resulting in installations on tens of thousands of devices and use by thousands of researchers.
Uptane: Security and Customizability of Software Updates for Vehicles
T. Kuppusamy, L. DeLong, and J. Cappos
Vehicular Technology Magazine
March 2018
A widely accepted premise is that complex software frequently contains bugs that can be remotely exploited by attackers. When this software is on an electronic control unit (ECU) in a vehicle, exploitation of these bugs can have life or death consequences. Since software for vehicles is likely to proliferate and grow more complex in time, the number of exploitable vulnerabilities will increase. As a result, manufacturers are keenly aware of the need to quickly and efficiently deploy updates so that software vulnerabilities can be remedied as soon as possible. However, existing software-update security systems are not compromise resilient; if an attacker breaks into any portion of an automobile’s infrastructure, they could compromise numerous vehicles. The industry needs to dynamically choose updates for vehicles based on fresh information, forcing manufacturers to choose existing systems that sign updates using a key stored on the server. Attackers who compromise the repository can abuse this online key and cause malicious software to be installed on vehicles. In this article we discuss Uptane, the first, to our knowledge, compromise-resilient software update security system designed specifically for vehicles. It is designed to make obtaining all the pieces required to control a vehicle extremely difficult for attackers.
Securing Software Updates for Automotives Using Uptane
T. Kuppusamy, L. DeLong, and J. Cappos
;login
Summer 2017
Does secrecy improve security or impede securing software updates? The automotive industry has traditionally relied upon proprietary strategies developed behind closed doors. However, experience in the software security community suggests that open development processes can find flaws before they can be exploited. We introduce Uptane, a secure system for updating software on automobiles that follows the open door strategy. It was jointly developed with the University of Michigan Transportation Research Institute (UMTRI), and the Southwest Research Institute (SWRI), with input from the automotive industry as well as government regulators. We are now looking for academics and security researchers to break our system before black-hat hackers do it in the real world—with possibly fatal consequences.
PEP 480—Surviving a Compromise of PyPI: The Maximum Security Model
T. Kuppusamy, V. Diaz, D. Stuffit, and J. Cappos
2016
Proposed is an extension to PEP 458 that adds support for end-to-end signing and the maximum security model. End-to-end signing allows both PyPI and developers to sign for the distributions that are downloaded by clients. The minimum security model proposed by PEP 458 supports continuous delivery of distributions (because they are signed by online keys), but that model does not protect distributions in the event that PyPI is compromised. In the minimum security model, attackers may sign for malicious distributions by compromising the signing keys stored on PyPI infrastructure. The maximum security model, described in this PEP, retains the benefits of PEP 458 (e.g., immediate availability of distributions that are uploaded to PyPI), but additionally ensures that end-users are not at risk of installing forged software if PyPI is compromised.
This PEP discusses the changes made to PEP 458 but excludes its informational elements to primarily focus on the maximum security model. For example, an overview of The Update Framework or the basic mechanisms in PEP 458 are not covered here. The changes to PEP 458 include modifications to the snapshot process, key compromise analysis, auditing snapshots, and the steps that should be taken in the event of a PyPI compromise. The signing and key management process that PyPI MAY RECOMMEND is discussed but not strictly defined. How the release process should be implemented to manage keys and metadata is left to the implementors of the signing tools. That is, this PEP delineates the expected cryptographic key type and signature format included in metadata that MUST be uploaded by developers in order to support end-to-end verification of distributions.
Tsumiki: A Meta-Platform for Building your own Testbed
J. Cappos, Y. Zhuang, A. Rafetseder, and I. Beschastnikh
2015
New technologies sometimes result in disruptive changes to the existing infrastructure. Without adequate foresight, industry, academia, and government can be caught flat-footed. In this work, we focus on the trends surrounding home Internet bandwidth - the bandwidth required by end user applications at home. As building and managing last mile network infrastructure incurs substantial cost, the foresight of such trends is necessary to plan upgrades. Using a bottom-up approach, we look at four potentially disruptive technologies, including millimeter wave wireless (mm-wave), the Internet of Things (IoT), Fog Computing, and Software Defined Networking (SDN). We examine use cases proposed by academia and industry, delve into the bandwidth requirements for proposed applications, and use this data to forecast future traffic demands for typical home users. Our projections show that bandwidth changes at end user devices will most likely be driven by two of the above technologies: millimeter wave wireless and Fog Computing. These technologies not only change the peak bandwidth, but also have noticeable secondary effects on bandwidth such as increasing upload bandwidth use, improving flash crowd tolerance, and increasing off-peak demand. While IoT and SDN are important, innovative technologies, they will not drastically alter the bandwidth usage patterns of ordinary users at home. We hope that the data and recommendations from this study can help business leaders and policy makers get an early jump on emerging technologies before they begin to shape the economy and society.
Privacy-Preserving Experimentation with Sensibility Testbed
Y. Zhuang, A. Rafetseder, J. Cappos
;login
2015
PolyPasswordHasher: Improving Password Storage Security
S. Torres and J. Cappos
;login
2014
PolyPasswordHasher: Protecting Passwords in the Event of a Password File Disclosure
J. Cappos and S. Torres-Arias
2014
Over the years, we have witnessed various password-hash database breaches that have affected small and large companies, with a diversity of users and budgets. The industry standard, salted hashing (and even key stretching), has proven to be insufficient protection against attackers who now have access to clusters of GPU-powered password crackers. Although there are various proposals for better securing password storage, most do not offer the same adoption model (software-only, server-side) as salted hashing, which may impede adoption. In this paper, we present PolyPasswordHasher, a software-only, server-side password storage mechanism that requires minimal additional work for the server, but exponentially increases the attacker's effort. PolyPasswordHasher uses a threshold cryptosystem to interrelate stored password data so that passwords cannot be individually cracked. Our analysis shows that PolyPasswordHasher is memory and storage efficient, hard to crack, and easy to implement. In many realistic scenarios, cracking a PolyPasswordHasher-enabled database would be infeasible even for an adversary with millions of computers.
ToMaTo: A Virtual Research Environment for Large Scale Distributed Systems research
P. Mueller, D. Schwerdel, and J. Cappos
PIK
2014
Networks and distributed systems are an important field of research. To enable experimental research in this field we propose a new tool ToMaTo (Topology Management Tool) which was developed to support research projects within the BMBF funded project G-Lab. It should support researchers from various branches of science who investigate distributed systems by providing a virtual environment for their research. Using various virtualization technologies, ToMaTo is able to provide realistic components that can run realworld software as well as lightweight components that can be used to analyze algorithms at large scale. This paper describes how an additional virtualization technology from the Seattle testbed has been added to ToMaTo to allow even larger experiments with distributed algorithms. Moreover the paper describes some concrete experiments that are carried out with ToMaTo.
PEP 458—Surviving a Compromise of PyPI
T. Kuppusamy, V. Diaz, D. Stuffit, and J. Cappos
2013
This PEP proposes how the Python Package Index (PyPI [1]) should be integrated with The Update Framework [2] (TUF). TUF was designed to be a flexible security add-on to a software updater or package manager. The framework integrates best security practices such as separating role responsibilities, adopting the many-man rule for signing packages, keeping signing keys offline, and revocation of expired or compromised signing keys. For example, attackers would have to steal multiple signing keys stored independently to compromise a role responsible for specifying a repository's available files. Another role responsible for indicating the latest snapshot of the repository may have to be similarly compromised, and independent of the first compromised role.
The proposed integration will allow modern package managers such as pip [3] to be more secure against various types of security attacks on PyPI and protect users from such attacks. Specifically, this PEP describes how PyPI processes should be adapted to generate and incorporate TUF metadata (i.e., the minimum security model). The minimum security model supports verification of PyPI distributions that are signed with keys stored on PyPI: distributions uploaded by developers are signed by PyPI, require no action from developers (other than uploading the distribution), and are immediately available for download. The minimum security model also minimizes PyPI administrative responsibilities by automating much of the signing process.
This PEP does not prescribe how package managers such as pip should be adapted to install or update projects from PyPI with TUF metadata. Package managers interested in adopting TUF on the client side may consult TUF's library documentation [27], which exists for this purpose. Support for project distributions that are signed by developers (maximum security model) is also not discussed in this PEP, but is outlined in the appendix as a possible future extension and covered in detail in PEP 480 [26]. The PEP 480 extension focuses on the maximum security model, which requires more PyPI administrative work (none by clients), but it also proposes an easy-to-use key management solution for developers, how to interface with a potential future build farm on PyPI infrastructure, and discusses the feasibility of end-to-end signing.
Future Internet Bandwidth Trends: An Investigation on Current and Future Disruptive Technologies
Y. Zhuang, J. Cappos, T.S. Rappaport, and R. McGeer
2013
New technologies sometimes result in disruptive changes to the existing infrastructure. Without adequate foresight, industry, academia, and government can be caught flat-footed. In this work, we focus on the trends surrounding home Internet bandwidth - the bandwidth required by end user applications at home. As building and managing last mile network infrastructure incurs substantial cost, the foresight of such trends is necessary to plan upgrades. Using a bottom-up approach, we look at four potentially disruptive technologies, including millimeter wave wireless (mm-wave), the Internet of Things (IoT), Fog Computing, and Software Defined Networking (SDN). We examine use cases proposed by academia and industry, delve into the bandwidth requirements for proposed applications, and use this data to forecast future traffic demands for typical home users. Our projections show that bandwidth changes at end user devices will most likely be driven by two of the above technologies: millimeter wave wireless and Fog Computing. These technologies not only change the peak bandwidth, but also have noticeable secondary effects on bandwidth such as increasing upload bandwidth use, improving flash crowd tolerance, and increasing off-peak demand. While IoT and SDN are important, innovative technologies, they will not drastically alter the bandwidth usage patterns of ordinary users at home. We hope that the data and recommendations from this study can help business leaders and policy makers get an early jump on emerging technologies before they begin to shape the economy and society.
NetCheck Test Cases: Input Traces and NetCheck Output
J. Cappos, Y. Zhuang, and I. Beschastnikh
2013
Application failures due to network issues are some of the most difficult to diagnose and debug. This is because the failure may be due to in-network state or state maintained by a remote end-host, both of which are invisible to an application host. For instance, data may be dropped due to MTU issues [18], NAT devices and firewalls introduce problems due to address changes and connection blocking [11], default IPv6 options can cause IPv4 applications to fail [3], and default buffer size settings can cause UDP datagrams to be dropped or truncated [37]. This report presents the results of our work NetCheck [38]. In contrast with most prior approaches, NetCheck does not require application- or networkspecific knowledge to perform its diagnoses, and no modification to the application or the infrastructure is necessary. NetCheck treats an application as a blackbox and requires just a set of system call (syscall) invocation traces from the relevant end-hosts. These traces can be easily collected at runtime with standard blackbox tracing tools, such as strace. To perform its diagnosis, NetCheck derives a global ordering of the input syscalls by simulating the syscalls against a network model. The model is also used to identify those syscalls that deviate from expected network semantics. These deviations are then mapped to a diagnosis by using a set of heuristics.
Understanding Password Database Compromises
D. Mirante and J. Cappos
2013
Despite continuing advances in cyber security, website incursions, in which password databases are compromised, occur for high profile sites dozens of times each year. Dumps of recently stolen credentials appear on a regular basis at websites like pastebin.com and pastie.com, as do stories concerning significant breaches. As a result of these observations, we chose to examine this phenomenon. A study was undertaken to research information posted on the web concerning recent, high profile website intrusions, wherein user login credentials and other data were compromised. We searched for the party responsible for the incursion, the attack mechanism utilized, the format in which the login data was stored, and the location of any password dumps pilfered from the site. News stories from trade related journals, press releases from the victim company, hacker sites, and blogs from individuals and companies engaged in security analysis were, in particular, searched in order to find related information. A total of thirty four breaches were researched. It should be noted that some dumps, previously published, no longer exist. This is due to either the affected parties taking action against the site posting them, expiration of the allowed posting period, or removal by the original poster. An effort was made to locate copies of these files, sometimes to no avail. In those cases, details concerning the contents of the dumps were collected from published reports about them.
Hands-on Internet with Seattle and Computers from Across the Globe
S.A Wallace, M. Muhammad, J. Mache, and J. Cappos
Journal of Computing Sciences in College
The Internet Connectivity module is a short assignment covering distributed computing and networking. The Internet Connectivity module is part of the curriculum created for the Northwest Distributed Computer Science Department and is built upon the Seattle distributed computing platform. In this paper, we describe the module and illustrate how Seattle facilitates networking projects and experiments that use computers/resources from across the globe. In addition, we describe how the Internet Connectivity module was used in two courses, provide some comments on students' reactions to the project, and conclude with suggestions for faculty considering how to use this module in their future courses.