Matches in SemOpenAlex for { <https://semopenalex.org/work/W1727336311> ?p ?o ?g. }
Showing items 1 to 56 of
56
with 100 items per page.
- W1727336311 abstract "Software that processes rich content suffers from endemic security vulnerabilities. Frequently, these bugs are due to data confusion: discrepancies in how content data is parsed, composed, and otherwise processed by different applications, frameworks, and language runtimes. Data confusion often enables code injection attacks, such as cross-site scripting or SQL injection, by leading to incorrect assumptions about the encodings and checks applied to rich content of uncertain provenance. However, even for well-structured, value-only content, data confusion can critically impact security, e.g., as shown by XML signature vulnerabilities [12]. This paper advocates the position that data confusion can be effectively prevented through the use of simple mechanisms—based on parsing—that eliminate ambiguities by fully resolving content data to normalized, clearly-understood forms. Using code injection on the Web as our motivation, we make the case that automatic defense mechanisms should be integrated with programming languages, application frameworks, and runtime libraries, and applied with little, or no, developer intervention. We outline a scalable, sustainable approach for developing and maintaining those mechanisms. The resulting tools can offer comprehensive protection against data confusion, even when multiple types of rich content data are processed and composed in complex ways. 1 Data Confusion and Why Parsing Helps A persistent source of security issues is data confusion: vulnerabilities caused by inconsistencies between different software in the parsing, composition, and overall processing of rich content. Data confusion has already led to large-scale exploits such as rapidly-spreading Web application worms [18], and its risk is increasing, with the growth of distributed and cloud computing. Examples of data confusion have arisen in the handling of nested HTML tags [8], apostrophes in SQL statements [19], signature scopes in XML protocol messages [12], and encoded length fields in binary data [9]. Data confusion cannot be eliminated simply by training software developers or by exhorting them to be more careful. For general-purpose software, data is usually of uncertain provenance and, locally, it is usually hard to tell what data can be trusted, what data properties have been checked, and what assumptions about data are made elsewhere. Even if all software for processing rich content was written with the utmost care— and developers had the right incentives, know-how, and resources—discrepancies between different developers’ decisions would still be sure to introduce vulnerabilities. On the other hand, to avoid data confusion, it is often sufficient to simply normalize the content data by parsing and re-serializing the data. Normalization has been previously used by security mechanisms, e.g., to eliminate TCP fragmentation ambiguities [22] and to build deterministic HTML parse trees [21]. It benefits security by resolving ambiguities, by simplifying the data encoding (e.g., via conversion), and by eliding deprecated aspects or unnecessary functionality from the content. For example, to display raster images, only a single (compressed) encoding and color space (e.g., sRGB) is strictly necessary. Thus, by normalizing to a single form of bitmap data, most of the attack surface due to the variety of image formats (and all of their myriad encodings and options) can be eliminated. Notably, such normalization can benefit even the security of legacy software: eliminating esoteric options and encodings will prevent most known JPEG and PNG exploits (e.g., [1, 9]). Clearly, automatic mechanisms based on trustworthy parsing can prevent many types of data confusion by reducing the attack surface due to the divergent assumptions of different software. Centralized, trustworthy parsing can be helpful in other ways, as well. For example, such parsing could support large-scale collection of statistics about content data that would help identify corner cases and rarelyused features—both a common source of vulnerabilities. Also, such processing could ensure that content data met the required constraints of certain, preferred software— such as that deemed to be standard, or most secure—and thereby eliminate further sources of data confusion, such as those underlying recently-discovered attacks on antivirus scanners [11]. Centralized normalization could even improve performance, and eliminate redundant work, by serializing content data to a new unambiguous, highly-efficient structured format (e.g., based on Google’s Protocol Buffers [7]), instead of back to the original data format. In the context of the Web, Michal Zalewski of Google has pointed out many ancillary benefits of similar new formats, such as reduced latency of loading Web pages. 2 Towards Comprehensive Defenses Unfortunately, to overcome endemic data confusion, simple centralized mechanisms are not sufficient. Rich content may be composed and processed on both clients and servers and typically embeds some form of executable code—and that code often encodes complex predicates and content introspection that prevents static reasoning about behavior. During such processing, data confusion can easily result in code injection vulnerabilities, where attacker-controlled characters are included as part of executed expressions, in unexpected contexts [14]. Therefore, it is not surprising that, for many years, the most commonly-reported security vulnerabilities have been SQL-injection and Cross-Site Scripting (XSS) in Web applications [2, 6]. The remainder of this position paper uses the context of Web applications to outline a sustainable approach for developing comprehensive protections against data confusion—even when multiple types of rich content data are processed and composed in complex ways. Those defenses are based on the close integration of automated mechanisms for content data normalization, sanitization, and templating, as well as execution sandboxing, into client and server Web programming languages. For scalability, we describe how those mechanisms can be based on annotated parse-tree grammars developed independently of any language, platform, or application. 2.1 The Case of HTML and the Web Web application developers are forced, by necessity, to restrict their attention to functionality that works reliably cross-platform (e.g., “JavaScript: The Good Parts” [4]). On the other hand, attackers can make full Untrusted data Untrusted code Untrusted ctx. Lowering Sanitization Trusted ctx. Safe templating Sandboxing Figure 1: Techniques for securely handling Web content data, across different processing contexts and input data. use of all the Web’s bad parts: its corner cases, esoteric platform-specific features, and poorly-thought-out functionality. Also, the recent fast-paced experimentation with new features, languages, and application frameworks for the Web and for cloud computing forces defenders to consider an impossible menagerie of technologies: ASP.NET, CoffeeScript, Ruby on Rails, Django, jQuery, JSF, Dart, and Go—to name but a handful. Security-savvy Web developers must know how to (manually) employ a range of ad hoc tools for securely composing content strings from untrusted and trusted sources. In particular, consistent use of tools like SQL prepared statements or auto-escaped HTML templates in Web application frameworks can greatly reduce susceptibility to data confusion [5]. More principled, safe-byconstruction mechanisms (such as those in [19, 20]) have seen little adoption, since they have required extensive modification of the Web application source code as well as substantial programmer retraining. These existing tools fall on two axes, as depicted in Figure 1. The first axis is determined by the initial runtime processing of attacker-controlled inputs: untrusted data will be encoded into strings, whereas untrusted code will be passed to a language interpreter. untrusted = x; // is javascript:... ? location = untrusted + ’?foo=bar’; For example, the above code fragment composes untrusted data with a trusted literal, ‘?foo=bar’, to form a location URL. Here, the application developer may have failed to check that the untrusted data encodes a URL domain path, thereby enabling an attack. By contrast, untrusted code may exercise more authority than the Web application developer intends. Dear Sir, IPwnYou() For example, a Web mail client would be wise to remove the “ ” from the above HTML email body. The second axis depends on whether the Web application itself is trustworthy. Untrusted contexts of processing allow attacker-controlled input to fully determine the rendering of content data. However, more often Web servers or client browsers may process attackercontrolled inputs in a trusted context—e.g., to insert untrusted data or code into holes in templates trusted by the Web application. For example, in PHP, a Web application might use a template “$untrusted” with trusted HTML tokens “” and “”." @default.
- W1727336311 created "2016-06-24" @default.
- W1727336311 creator A5024723282 @default.
- W1727336311 creator A5048020070 @default.
- W1727336311 date "2012-01-01" @default.
- W1727336311 modified "2023-09-27" @default.
- W1727336311 title "Let's Parse to Prevent Pwnage" @default.
- W1727336311 cites W125279808 @default.
- W1727336311 cites W1432803199 @default.
- W1727336311 cites W1758438229 @default.
- W1727336311 cites W2013578787 @default.
- W1727336311 cites W2131261404 @default.
- W1727336311 cites W2139339270 @default.
- W1727336311 cites W2144696387 @default.
- W1727336311 cites W2152725427 @default.
- W1727336311 cites W2166509025 @default.
- W1727336311 hasPublicationYear "2012" @default.
- W1727336311 type Work @default.
- W1727336311 sameAs 1727336311 @default.
- W1727336311 citedByCount "0" @default.
- W1727336311 crossrefType "journal-article" @default.
- W1727336311 hasAuthorship W1727336311A5024723282 @default.
- W1727336311 hasAuthorship W1727336311A5048020070 @default.
- W1727336311 hasConcept C154945302 @default.
- W1727336311 hasConcept C186644900 @default.
- W1727336311 hasConcept C41008148 @default.
- W1727336311 hasConceptScore W1727336311C154945302 @default.
- W1727336311 hasConceptScore W1727336311C186644900 @default.
- W1727336311 hasConceptScore W1727336311C41008148 @default.
- W1727336311 hasLocation W17273363111 @default.
- W1727336311 hasOpenAccess W1727336311 @default.
- W1727336311 hasPrimaryLocation W17273363111 @default.
- W1727336311 hasRelatedWork W169783076 @default.
- W1727336311 hasRelatedWork W1874214737 @default.
- W1727336311 hasRelatedWork W2057887498 @default.
- W1727336311 hasRelatedWork W2058238996 @default.
- W1727336311 hasRelatedWork W2141043775 @default.
- W1727336311 hasRelatedWork W2252142759 @default.
- W1727336311 hasRelatedWork W2284691406 @default.
- W1727336311 hasRelatedWork W2517321897 @default.
- W1727336311 hasRelatedWork W2585631935 @default.
- W1727336311 hasRelatedWork W2765113989 @default.
- W1727336311 hasRelatedWork W2805738415 @default.
- W1727336311 hasRelatedWork W2899609939 @default.
- W1727336311 hasRelatedWork W2900146925 @default.
- W1727336311 hasRelatedWork W2956125792 @default.
- W1727336311 hasRelatedWork W2978239850 @default.
- W1727336311 hasRelatedWork W2978688163 @default.
- W1727336311 hasRelatedWork W301701231 @default.
- W1727336311 hasRelatedWork W826444738 @default.
- W1727336311 hasRelatedWork W830338198 @default.
- W1727336311 hasRelatedWork W2181733359 @default.
- W1727336311 isParatext "false" @default.
- W1727336311 isRetracted "false" @default.
- W1727336311 magId "1727336311" @default.
- W1727336311 workType "article" @default.