Skip to main content

Requirements for tools for comprehending highly specialized assembly language code and how to elicit these requirements

Abstract

Program comprehension tools used with assembly language—often for maintaining legacy software or reverse engineering malware threats—are dated and fail to provide rudimentary features found in tool support for higher-level languages. The need for people who can maintain these legacy systems is growing, as is the number of malicious cyberspace threats. To build new visualization and analysis tools within this domain, we need to understand the unique challenges faced by these developers. This paper presents the results of an exploratory case study to elicit requirements from two uniquely specialized groups of assembly language developers in an industrial setting: a large multi-national company developing mainframe software and a government defense facility analyzing malware and security flaws. In addition to surveys, observations and interviews, this study applies social psychology and nominal group techniques. We provide a ranking, and detailed description, for the requirements elicited in each group. We further include additional requirements obtained from observational studies. The ultimate conclusion we reach is that while similarities exist at a high level, upon deeper inspection, each group is quite unique with regard to their tooling needs.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3

Notes

  1. https://github.com/cbenning/idapro_dataflow.

  2. https://github.com/cbenning/idapro_comment, https://github.com/cbenning/idapro_comment_template.

References

  1. Baldwin J, Sinha P, Salois M, Coady Y (2011) Progressive user interfaces for regressive analysis: making tracks with large, low-level systems. In: Proceedings of the Australasian user interface conference (AUIC), Perth, Australia

  2. Treude C, Figueira Filho F, Storey M-A, Salois M (2011) An exploratory study of software reverse engineering in a security context. In: 18th working conference on reverse engineering (WCRE), Oct 2011, pp 184–188

  3. Teh A, Baniassad E, Rooy DV, Boughton C (2011) Social psychology and software teams: a preliminary look at establishing task-effective group norms, vol 99. IEEE Software (PrePrints)

  4. Postmes T, Spears R, Cihangir S (2001) Quality of decision making and group norms. J Pers Soc Psychol 80(6):918–930

    Article  Google Scholar 

  5. Stangor C (2004) Social groups in action and interaction. Psychology Press, New York, NY

    Google Scholar 

  6. Janis IL (1982) Groupthink: psychological studies of policy decisions and fiascoes. Houghton Mifflin, Boston

    Google Scholar 

  7. Tajfel H, Billig MG, Bundy RP, Flament C (1971) Social categorization and intergroup behaviour. Eur J Soc Psychol 1(2):149–178. doi:10.1002/ejsp.2420010202

    Article  Google Scholar 

  8. Goncalo J, Staw B (2006) Individualism-collectivism and group creativity. Organ Behav Hum Decis Process 100(1):96–109

    Article  Google Scholar 

  9. Kruglanski AW (1990) Motivations for judging and knowing: implications for causal attribution. Handb Motiv Cogn Found Soc Behav 2:333–368

    Google Scholar 

  10. Kruglanski AW, Webster DM (1996) Motivated closing of the mind:“seizing” and “freezing”. Psychol Rev 103(2):263–283 (Online). http://www.ncbi.nlm.nih.gov/pubmed/8637961

  11. Bechtoldt MN, De Dreu CKW, Nijstad BA, Choi H-S (2010) Motivated information processing, social tuning, and group creativity. J Personal Soc Psychol 99(4):622–637

    Article  Google Scholar 

  12. Oyserman D, Coon HM, Kemmelmeier M (2002) Rethinking individualism and collectivism: evaluation of theoretical assumptions and meta-analyses. Psychol Bull 128(1):3–72 (Online). http://psycnet.apa.org/index.cfm?fa=fulltext.journal&jcode=bul&vol=128&issue=1&format=html&page=3&expand=1

  13. LimeSurvey (2013) (Online). http://www.limesurvey.org/en/

  14. Webster DM, Kruglanski AW (1994) Individual differences in need for cognitive closure. J Personal Soc Psychol 67(6):1049–1062 (Online). http://psycnet.apa.org/journals/psp/67/6/1049/

  15. Roets A, Van Hiel A (2011) Item selection and validation of a brief, 15-item version of the need for closure scale. Personal Individ Differ

  16. Ericsson KA, Simon HA (1993) Protocol analysis: verbal reports as data, Rev edn. MIT Press, Cambridge

    Google Scholar 

  17. Lewis C, Rieman J (1994) Task-centered user interface design: a practical introduction. Department of Computer Science, University of Colorado, Boulder

  18. Goguen J, Linde C (1993) Techniques for requirements elicitation. In: Proceedings of IEEE international symposium on requirements engineering (RE), Jan 1993, pp 152–164

  19. Singer J, Lethbridge T, Vinson N, Anquetil N (1997) An examination of software engineering work practices. In: Proceedings of the centre for advanced studies conference (CASCON). IBM Press (Online). http://portal.acm.org/citation.cfm?id=782010.782031

  20. Delbecq AL, VandeVen AH (1971) A group process model for problem identification and program planning. J Appl Behav Sci VII:466–491

    Article  Google Scholar 

  21. Diehl M, Stroebe W (1987) Productivity loss in brainstorming groups: toward the solution of a riddle. J Personal Soc Psychol, 53(3):497–509 (Online). http://linkinghub.elsevier.com/retrieve/pii/S0022351403031157

  22. High level assembler and toolkit feature (2010) (Online). http://www-01.ibm.com/software/awdtools/hlasm

  23. Hex-Rays SA (2010) IDA pro disassembler (Online). http://www.hex-rays.com/idapro

  24. Storey M-A, Cheng L-T, Bull I, Rigby P (2006) Shared waypoints and social tagging to support collaboration in software development. In: Proceedings of the 2006 20th anniversary conference on computer supported cooperative work, ser. CSCW ’06. New York, NY, USA: ACM, pp 195–198

  25. Collberg C, Thomborson C, Low D (1997) A taxonomy of obfuscating transformations. Technical Report 148

  26. Plug-In Contest 2011: Hall Of Fame, 2012 (Online). http://www.hex-rays.com/contests/2011/index.shtml

  27. Van Emmerik M, Waddington T (2004) Using a decompiler for real-world source recovery. In: WCRE ’04: proceedings of the 11th working conference on reverse engineering. IEEE Computer Society, Washington, DC, USA, pp 27–36

  28. IDA Plugins: Sobek, 2012. (Online). http://www.openrce.org/downloads/details/38/Sobek

  29. Baldwin J, Coady Y (2012) AVA: assembly visualization and analysis. In: Eclipse Demo Camp. Vancouver, BC, Canada June 2012

  30. Thompson M (2010) Mariposa botnet analysis. Defence intelligence, Technical Report (Online). http://defintel.com/docs/Mariposa_Analysis

  31. Sinha P, Boukhtouta A, Belarde VH, Debbabi M (2010) Insights from the analysis of the Mariposa botnet. In: 5th international conference on risks and security of internet and systems (CRISIS), Montreal, QC, Canada

  32. Google App Engine (2012) (Online). https://developers.google.com/appengine/

  33. Amini P (2006) PaiMei—reverse engineering framework. In: RECON ’06: reverse engineering conference. Montreal, Canada

  34. Bales RF (1950) Interaction process analysis. Massachusetts, Cambridge

    Google Scholar 

  35. Teh A (2012) Normative manipulation as a way of improving the performance of software engineering groups: three experiments. Ph.D. dissertation, The Australian National University

  36. First Nations Stewardship Tools Partnership (2013) (Online). http://web.uvic.ca/fnst/

  37. Franke RH, Kaul JD (1978) The hawthorne experiments: first statistical interpretation. Am Sociol Rev 43(5):623–643 (Online). http://www.jstor.org/stable/2094540

Download references

Acknowledgments

The authors would like to thank the members of the Alpha group and Beta group for participating in our research. This work was partially funded by NSERC (Natural Sciences and Engineering Research Council of Canada).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jennifer Baldwin.

Appendices

Appendix 1: Script used during the nominal group session

This script was used for the 2-h-long session with the Alpha group of six participants. While the same script was also used with the Beta group, times should be adjusted according to the size of the participant group.

Time Action Script
0-min introduction SAY Hi, I’m RESEARCHER NAME from UNIVERSITY NAME. For my PhD in Computer Science, I am exploring how visualization and tool support for assembly language might be useful. My work is being funded by COMPANY NAMES
Since you are experts in the area, we really value your experience and expertise in defining the issues
This is SECOND RESEARCHER NAME and I’ll let him introduce himself
University
Degree
Research interest
To get started, I’d like to collect the ethics forms that you were given yesterday
DO Collect the ethics forms
SAY This session should take no longer than 2 h, including a 20 min break. The aim is to discuss and critically rank all of the items from the exercise yesterday. If you come up with new ideas during the session, please add them to your list. Feel free to be creative
Does everyone have the blue pages?
First of all, I’d like to go around the table and have everyone introduce themselves and tell us about your job. We also know from the survey that your teams are expertise-centered, so it would be great to hear about that, as well as your interests
10-min listing of ideas SAY Now to begin the group exercise, we will go around the table and each person will share one item from their list at a time. At this time, please avoid discussion or talking out of turn
After all of the items are listed, we will have a discussion to clarify the items. If you have any new ideas then feel free to add them to your sheet. If you want to skip a turn, that is also fine
DO Record word for word what each person says on the power point slide
30-min discussion of ideas SAY We will now have a 30 min discussion on all the ideas generated
Now is the time to ask for clarification or elaboration on an idea, or dispute or defend an item
You are also welcome to suggest new items during this time, but no items can be eliminated
We’ll go through them item by item
DO Announce each item on the list and ask what it means, or how people feel about it. Record any new ideas on the power point slide
60-min ranking to select the “top ten” ideas SAY Now if everyone could take out their yellow sheet for preliminary ranking
You can see there are 10 spaces to be filled in. You can select 10 items that are the most important for you from all of the options. Then assign them a rank which is a numbering between 1 and 10, where 10 is the most important
Once you are finished, please turn it face down on the table and then you are free to take a break for about 20 min
70-min break DO Go around the table and transcribe and sum up the points from the ranking sheets onto the power point slides. Then reorder them on the slide based on the greatest number of points
Collect everyone from after their break
90-min discussion of vote SAY We have reordered the items according to rank and you can see the score for them. We have also highlighted the top ten
We will now have a free-for-all discussion about the nature and content of the top ten
We would also like to hear how you feel about items that should have been included or excluded from this list
110-min re-ranking and rating revised “top ten” items SAY Now if everyone could take out their green sheet for final ranking. Here you will again list the top ten items that you think are the most important
This may be the same ten, or feel free to modify which items are in your top ten
The ranking here is different in that 100 points will be given to the most important item. Every other item can have a value between 0 and 100. Two items can have the same ranking
Once you are finished, please hand in your sheets to me face down, and then we’re all done!
DO Collect the green sheets from everyone and tally up the final scores based on the 0–100 ranking
END CASE STUDY AT (START + 120 MIN)

Appendix 2: Issues observed at the Alpha group during activity-based protocol elicitation

Requirement category Issue Description
First session
Browsing and navigation XREF works on only 8 character long names When there are more, search must be used, which only finds them one at a time in the code
Bookmarking lines of code Have to create names “a,” “b.” If the name already exists, it is just overwritten
Lack of navigation Need to scroll through many screens of code to look for the right spot
Build   
Control flow Hard to find main task  
Tools would need to support multi-threading  
Data   
Debugging Timing issues were tricky Timing dumps not useful because they are too complicated
Couldn’t work out what was causing the cancel Need some way to trap the event
XREF plus debugger to find the correct place to debug Step-through debugging might be helpful
De-obfuscation Redundant code makes the code confusing to read Statements such as branching to the next address. Unnecessary since that code is next to be executed
Documentation Look up vendor error code in CA documentation Not indexable online so need to download CA docs to search them. CA error code is then used to look up IBM Manual error code. Codes are OS version dependent
User prints off whole modules The printoff is portable and more comfortable to look at (easier on the eyes). There are also sticky notes and writing on the pages. These written notes include variable names, addresses and error codes
The dump was scrolling off the page There were so many errors, it did not fit. Need a way to condense it
Integration   
References   
Source control Object module replacement Overwrites whole module, have to be careful not to overwrite a change. Have to check prerequisite chain, and which fixes supersede others
Source editing   
Second session
Browsing and navigation *temp is used as a TODO Shows up only when you dig into the module you’re interested in. Used pdsman to scan and find. Scan doesn’t show the active module however
Switching terminal screens constantly Need to scroll through many screens of code to look for the right spot. Kept many terminal screens open. Was hard to keep track of which showed the right code
Build Register usage Waits for compile error to say that the register is in use
? at the start of lines To ensure you get errors, but do no want to deal with the actual errors (stub error)
Scanning software for changes he knows he has to make Otherwise waits for compile errors. Compile errors would be better if they occurred during editing. Context aware correction suggestions (i.e., does not exist, did you mean…?). Calls out to code that does not exist anymore
Control flow   
Data   
Debugging No breakpoints in XDC Puts code in to make it fail
De-obfuscation   
Documentation IBM Principles of Hardware Manual Useful to double check some things
Integration   
References Code module—fan in, fan out Wanted to know what module was being called dependent on the code and what code it depended on
Source control   
Source editing Tedious refactoring of modules Splitting larger modules into smaller ones to use as templates. Templates are not useful, not maintained that much, but useful for people starting from scratch. Instead he uses something else he’s working on, copies it and butchers it (side by side editing)
Forgot to save the file No alert was given
Code shortcuts Stuff he does more than once

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Baldwin, J., Teh, A., Baniassad, E. et al. Requirements for tools for comprehending highly specialized assembly language code and how to elicit these requirements. Requirements Eng 21, 131–159 (2016). https://doi.org/10.1007/s00766-014-0214-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00766-014-0214-y

Keywords

  • Requirements elicitation
  • Assembly language
  • Reverse engineering
  • Social psychology
  • Nominal group technique
  • Case study
  • Software visualization
  • Software analysis