Skip to main content
  • 506 Accesses

Abstract

This chapter provides a discussion and demonstration of creating datasets. The management of Excel and Stata datasets is also presented. These datasets include primary and secondary data. While this chapter discusses and demonstrates how to create datasets based on primary data, it focuses on the creation and management of the datasets based on secondary data.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 59.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 79.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    It is assumed the reader is familiar with Excel.

  2. 2.

    Table 304.70—Total fall enrollment in degree-granting postsecondary institutions, by level of enrollment and state or jurisdiction: Selected years, 2000 through 2017. The table can be found at: https://nces.ed.gov/programs/digest/d18/tables/dt18_304.70.asp.

References

  • Simons, K. L. (2016). A sparser, speedier reshape. The Stata Journal, 16(3), 632–649.

    Article  Google Scholar 

  • Snyder, T. D., De Brey, C., & Dillow, S. A. (2019). Digest of Education Statistics 2017, NCES 2018-070. (53rd Edition). National Center for Education Statistics, Institute of Education Sciences, U.S. Department of Education.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

4.4 Appendix

4.4 Appendix

*Chapter 4 Syntax *Primary data *example below shows how data for three variables (variable_x, variable_y, /// and variable_z) can be entered in Stata input variable_x variable_y variable_z 31 57 18 25 68 12 35 60 13 38 59 17 30 59 15 end *To see the data that was entered above, type list *To save the above data, type: save "Example 1.0.dta" *To use the Stata editor to enter additional data in Example 1.0, type: edit *the data above may be imported from an Excel comma delimited file (csv) /// by typing in the following: insheet using "Example 1.csv", comma *Secondary data to change to the “working directory” which contains the Excel file /// is as follows: cd "C:\Users\Marvin\Dropbox\Manuscripts\Book\Chapter 4\Excel files" *Using the same file from above, the Stata command is: import excel "tabn302.50 - reformatted.xls", firstrow *Using the user-created Stata program “statastates”, the FIPS codes /// and state abbreviations can be easily added to any state-level data /// set that includes the state name. (In our example from above, the /// state name is “States”.) This is demonstrated in the two steps below: ssc install statastates statastates, name(<State name>) *We can delete the variable _merge, which was created when we added /// the FIPS codes and state abbreviations. This is done by simply typing: drop _merge *We may also want to move the FIPS codes and state abbreviations /// somewhere near the front of our dataset. This can be accomplished /// typing the following Stata command: order state_abbrev state_fips, before( state) *Stata dataset, based on the modified Excel tabn302.50 describe *To create labels, based on the column names in the Excel file, /// we use the label variable (lab var) command for each variable. /// Here is an example: lab var Stateid “Stateid” lab var state_abbrev "State abbreviation" lab var state_fips "FIPS code" lab var state "State name" lab var total "Total number of graduates from HS located in the state" lab var public "Number of graduates from public HS located in the state" lab var private "Number of graduates from private HS located in the state" lab var anystate /// "Number of first-time freshmen graduating from HS 12 months enrolled in any state" *Labels cannot be more than 80 characters. So we have to shorten the label. lab var anystate /// "Number of 1st-time freshmen graduating from HS enrolled in any state" lab var homerate /// "Estimated rate of HS graduates going to college in home state" describe *we drop the cases with missing data. drop if var1==. *We rename var1 totalpct by typing: rename var1 totalpct gen year = 1959 + _n *We relocate the year variable to the beginning of the dataset by typing: order year, first *Then we declare the dataset to be a time series. tsset year, yearly *change the working directory to the one with our Stata files and save the file using. cd "C:\Users\Marvin\Dropbox\Manuscripts\Book\Chapter 4\Stata" *Finally, we save the file with a descriptive name. save "Percent of US high school graduates in PSE, 1960 to 2016.dta" *import worksheet from Excel workbook into Stata, via the following syntax clear all cd "C:\Users\Marvin\Dropbox\Manuscripts\Book\Chapter 4\Excel files" import excel "College enrollment data.xls",sheet("Ugrad") firstrow *save this Stata file with our panel data: cd "C:\Users\Marvin\Dropbox\Manuscripts\Book\Chapter 4\Stata save "Undergraduate enrollment data - Wide.dta" *convert the data from a “wide” to a “long” format using the reshape /// or the much faster user-created sreshape (Simons 2016) *install sreshape net install dm0090.pkg, replace sreshape long Ugrad, i(id) j(year) *declare a panel dataset by typing: xtset id year, yearly *save our “declared” panel data file as follows: save "Undergraduate enrollment data - Long.dta" *cd "C:\Users\Marvin\Dropbox\Manuscripts\Book\Chapter 4\Excel files" import excel "Example 4.xls", sheet("HSGrad") firstrow *change our working directory to where we want to save our Stata /// file and save it, by typing: cd "C:\Users\Marvin\Dropbox\Manuscripts\Book\Chapter 4\Stata" save "HSGrad - Wide.dta" *reformat our file from wide to long, declare it a panel data set, /// and save it to a new file sreshape long HSGrad, i(id) j(year) xtset id year, yearly save "HSGrad - Long.dta" *join the two datasets, based on id, into one dataset that would /// contain two variables joinby id year using "First-Time - Long.dta", unmatched(none) *join two or more Stata files that were reshaped from wide to long /// and have the variables State, id, and year. use "Undergraduate enrollment data - Long.dta", clear joinby id year using "Undergraduate state financial aid - need" joinby id year using "Undergraduate state financial aid - merit" xtset id year, yearly save "Example - 4.1.dta" *see how our new panel dataset is structured, by typing the command /// xtdescribe or the shortened version: xtdes *end

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Titus, M. (2021). Creating Datasets and Managing Data. In: Higher Education Policy Analysis Using Quantitative Techniques . Quantitative Methods in the Humanities and Social Sciences. Springer, Cham. https://doi.org/10.1007/978-3-030-60831-6_4

Download citation

Publish with us

Policies and ethics