Try reloading this page, or reviewing your browser settings
This video demonstrates a range of techniques used by forensic accounts and fraud examiners to uncover fraudulent journal entries and illegal activities. As data professionals, most of us will never unravel a Bernie Madoff scheme, but we can apply these same techniques in our own environments to learn more about our data. This video will uses the R programming language to apply these fraud detection techniques and help you to gain a better understanding of your data.
What You Will Learn
Summarize and review a new data set
Perform regression analysis using linear regression
Discover distributions of data, overall and between cohorts
Compare cohort behavior to discover outliers
Use distributions of first and last digits to test data set validity
Who This Video Is For
Data platform specialists and data scientists who are interested in identifying anomalies which may indicate fraud or the opportunity for deeper business insight. Viewers may have some experience with Python or R and some knowledge of statistics, but neither is necessary to get value from the video.
You will learn a variety of techniques from this video by which to examine your data and draw inferences that can help you to detect fraud and malfeasance. You’ll begin with the use of basic analytical techniques such as including regression analysis. From there, you will learn how to use cohort analysis to find outliers between groups, leading you on a data-driven approach to forensic investigation. Finally, you will review numeric techniques around data set validity, including rules around the distributions of the first and last digits in data sets.
About The Author
Kevin Feasel is a Microsoft Data Platform MVP and CTO at Envizage, where he specializes in data analytics with T-SQL and R, forcing Spark clusters to do his bidding, fighting with Kafka, and pulling rabbits out of hats on demand. He is the lead contributor to Curated SQL and author of PolyBase Revealed (forthcoming). A resident of Durham, North Carolina, he can be found cycling the trails along the triangle whenever the weather’s nice enough.
About this video
- Kevin Feasel
- Online ISBN
- Total duration
- 46 min
- Copyright information
- © Kevin Feasel 2019
- Kevin Feasel
- Thomas Mailund
- Eric Goh Ming Hui
- John Deardurff
Welcome to Forensic Analysis with R. My name is Kevin Feasel. I’m the CTO of Envizage Technologies. I also run a predictive analytics team. In addition, I’m a Microsoft Data Platform MVP.
We are tasked with reviewing expense reports for our company as part of an audit. Our company has never gone through this kind of audit before. But we have a trustworthy group, so no problem, right?
Now, our data looks a bit like this. We have 12 people in our sales department, and they travel to different cities throughout the year selling our products. The only thing we have here is a set of nine years of individual expense reports. Each expense report looks like the ones sampled here. We have the type of city, employee name, date, and amount.
Armed with this data, we will use a series of tools to audit our data. We will learn a series of data analysis techniques around summary and growth analysis. Next we will dive into the use of linear regression to review and predict results. From there we will use cardinality and cohort analysis to compare groups with one another. Finally, we will look at digit analysis, particularly the last and first digits in sequences of numbers.
You will need a couple of skills going into this course. First I expect some basic familiarity with statistics. We won’t get too deep into the weeds, and I will try to explain as much as I can along the way. But there may be terms I expect you to understand.
I also expect some familiarity with R. I won’t show you how to install R and do not intend this course is a primer on R. But the tools and techniques we use are pretty straightforward. So with that, it’s on with the show.