Matches in SemOpenAlex for { <https://semopenalex.org/work/W1523641353> ?p ?o ?g. }
Showing items 1 to 67 of
67
with 100 items per page.
- W1523641353 abstract "The number of PDF files with embedded malicious code has risen significantly in the past few years. This is due to the portability of the file format, the ways Adobe Reader recovers from corrupt PDF files, the addition of many multimedia and scripting extensions to the file format, and many format properties the malware author may use to disguise the presence of malware. Current research focuses on executable, MS Office, and HTML formats. In this paper, several features and properties of PDF Files are identified. Features are extracted using an instrumented open source PDF viewer. The feature descriptions of benign and malicious PDFs can be used to construct a machine learning model for detecting possible malware in future PDF files. The detection rate of PDF malware by current antivirus software is very low. A PDF file is easy to edit and manipulate because it is a text format, providing a low barrier to malware authors. Analyzing PDF files for malware is nonetheless difficult because of (a) the complexity of the formatting language, (b) the parsing idiosyncrasies in Adobe Reader, and (c) undocumented correction techniques employed in Adobe Reader. In May 2011, Esparza demonstrated that PDF malware could be hidden from 42 of 43 antivirus packages by combining multiple obfuscation techniques [4]. One reason current antivirus software fails is the ease of varying byte sequences in PDF malware, thereby rendering conventional signature-based virus detection useless. The compression and encryption functions produce sequences of bytes that are each functions of multiple input bytes. As a result, padding the malware payload with some whitespace before compression/encryption can change many of the bytes in the final payload. In this study we analyzed a corpus of 2591 benign and 87 malicious PDF files. While this corpus is admittedly small, it allowed us to test a system for collecting indicators of embedded PDF malware. We will call these indicators features throughout the rest of this report. The features are extracted using an instrumented PDF viewer, and are the inputs to a prediction model that scores the likelihood of a PDF file containing malware. The prediction model is constructed from a sample of labeled data by a machine learning algorithm (specifically, decision tree ensemble learning). Preliminary experiments show that the model is able to detect half of the PDF malware in the corpus with zero false alarms. We conclude the report with suggestions for extending this work to detect a greater variety of PDF malware." @default.
- W1523641353 created "2016-06-24" @default.
- W1523641353 creator A5008090738 @default.
- W1523641353 date "2011-09-01" @default.
- W1523641353 modified "2023-10-17" @default.
- W1523641353 title "Deep PDF parsing to extract features for detecting embedded malware." @default.
- W1523641353 cites W2107542581 @default.
- W1523641353 cites W2167277498 @default.
- W1523641353 cites W2912934387 @default.
- W1523641353 doi "https://doi.org/10.2172/1030303" @default.
- W1523641353 hasPublicationYear "2011" @default.
- W1523641353 type Work @default.
- W1523641353 sameAs 1523641353 @default.
- W1523641353 citedByCount "7" @default.
- W1523641353 countsByYear W15236413532014 @default.
- W1523641353 countsByYear W15236413532016 @default.
- W1523641353 countsByYear W15236413532019 @default.
- W1523641353 countsByYear W15236413532020 @default.
- W1523641353 countsByYear W15236413532023 @default.
- W1523641353 crossrefType "report" @default.
- W1523641353 hasAuthorship W1523641353A5008090738 @default.
- W1523641353 hasBestOaLocation W15236413532 @default.
- W1523641353 hasConcept C111919701 @default.
- W1523641353 hasConcept C154945302 @default.
- W1523641353 hasConcept C186644900 @default.
- W1523641353 hasConcept C199360897 @default.
- W1523641353 hasConcept C2777904410 @default.
- W1523641353 hasConcept C38652104 @default.
- W1523641353 hasConcept C40305131 @default.
- W1523641353 hasConcept C41008148 @default.
- W1523641353 hasConcept C43364308 @default.
- W1523641353 hasConcept C541664917 @default.
- W1523641353 hasConcept C544833334 @default.
- W1523641353 hasConcept C84525096 @default.
- W1523641353 hasConcept C97250363 @default.
- W1523641353 hasConceptScore W1523641353C111919701 @default.
- W1523641353 hasConceptScore W1523641353C154945302 @default.
- W1523641353 hasConceptScore W1523641353C186644900 @default.
- W1523641353 hasConceptScore W1523641353C199360897 @default.
- W1523641353 hasConceptScore W1523641353C2777904410 @default.
- W1523641353 hasConceptScore W1523641353C38652104 @default.
- W1523641353 hasConceptScore W1523641353C40305131 @default.
- W1523641353 hasConceptScore W1523641353C41008148 @default.
- W1523641353 hasConceptScore W1523641353C43364308 @default.
- W1523641353 hasConceptScore W1523641353C541664917 @default.
- W1523641353 hasConceptScore W1523641353C544833334 @default.
- W1523641353 hasConceptScore W1523641353C84525096 @default.
- W1523641353 hasConceptScore W1523641353C97250363 @default.
- W1523641353 hasLocation W15236413531 @default.
- W1523641353 hasLocation W15236413532 @default.
- W1523641353 hasLocation W15236413533 @default.
- W1523641353 hasOpenAccess W1523641353 @default.
- W1523641353 hasPrimaryLocation W15236413531 @default.
- W1523641353 hasRelatedWork W1591058456 @default.
- W1523641353 hasRelatedWork W2051912542 @default.
- W1523641353 hasRelatedWork W2116761843 @default.
- W1523641353 hasRelatedWork W2171726649 @default.
- W1523641353 hasRelatedWork W2743459917 @default.
- W1523641353 hasRelatedWork W3087706721 @default.
- W1523641353 hasRelatedWork W3102852402 @default.
- W1523641353 hasRelatedWork W37608242 @default.
- W1523641353 hasRelatedWork W4286587341 @default.
- W1523641353 hasRelatedWork W4287664162 @default.
- W1523641353 isParatext "false" @default.
- W1523641353 isRetracted "false" @default.
- W1523641353 magId "1523641353" @default.
- W1523641353 workType "report" @default.