Genius | Song Lyrics & Knowledge

The dataset cannot tell us why any borrowers defaulted. Some probably came upon financial hardship due to the economic recession and were unable to pay their bills. Others might have been taken advantage of by unscrupulous mortgage brokers, and could never afford their monthly payments. And, yes, some also “strategically” defa... Mortgages Are About Math: Open-Source Loan-Level Analysis of Fannie and Freddie - Todd W. Schneider

21,952

author

9 years

It’d be great if someone wanted to write some code to add local unemployment data to the GitHub repo. Unemployment data could be used as another predictive variable, though based on personal experience with subprime mortgage default models, I’d guess that unemployment rate will not prove as predictive as home price-adjusted LTV

This video is processing – it'll appear automatically when it's done.

report abuse

Here are graphs of annualized default rates as a function of credit score and current LTV: Mortgages Are About Math: Open-Source Loan-Level Analysis of Fannie and Freddie - Todd W. Schneider

21,952

author

9 years

You can convert a monthly default rate to an annualized default rate with the following formula:

annualized_rate = (1 – (1 – monthly_rate) ^ 12)

The idea is that if monthly_rate fraction default every month, then (1 – monthly_rate) survive every month, so after 12 months you’d have (1 – monthly_rate) ^ 12 fraction of your original population remaining, and so 1 minus that quantity is the annualized default rate

This video is processing – it'll appear automatically when it's done.

report abuse

... I find it helpful to look at graphs of aggregated data. I took every monthly observation from 2009-11, bucketed along several dimensions, and calculated default rates. Note that we’re now looking at transition rates from current to defaulted, as opposed to the cumulative default rates in the previous section. Transition rates are a more natural quantity to model, since when we make future projections we have to ... Mortgages Are About Math: Open-Source Loan-Level Analysis of Fannie and Freddie - Todd W. Schneider

21,952

author

9 years

Simple example: if a month started with 100 non-defaulted loans, and 4 of them defaulted in that month, then that month’s transition rate was 4 / 100 = 4%

This video is processing – it'll appear automatically when it's done.

report abuse

Arizona and Nevada have very few counties, so their maps don’t look very interesting, and each state is dominated by a single metropolitan area: Phoenix experienced a 31% cumulative default rate, and Las Vegas a 42% cumulative default rate. Mortgages Are About Math: Open-Source Loan-Level Analysis of Fannie and Freddie - Todd W. Schneider

21,952

author

9 years

The Phoenix and Las Vegas maps might look more interesting if Fannie and Freddie released detailed ZIP code data. Subprime mortgage data was typically available at the ZIP code level, and I remember from my days as a mortgage analyst that when you looked at a map of Las Vegas ZIP codes, you would see concentric circles, where the outermost exurban areas had the highest default rates, and as you got closer to the city center then default rates were progressively lower

This video is processing – it'll appear automatically when it's done.

report abuse

...dium data” revolution: personal computers are so powerful that my MacBook Air is capable of analyzing the entire 215 GB of data, representing some 38 million loans, 1.6 billion observations, and over $7.1 trillion of origination volume. Furthermore, I did everything with free, open-source software. I chose PostgreSQL and R, but there are plenty of other free options you could choose for storage and analysis. Mortgages Are About Math: Open-Source Loan-Level Analysis of Fannie and Freddie - Todd W. Schneider

21,952

author

9 years

2003 and 2009 were the years with the largest origination volume:

This video is processing – it'll appear automatically when it's done.

report abuse

...nd Freddie making their data freely available, we’re in the midst of what I might call the “medium data” revolution: personal computers are so powerful that my MacBook Air is capable of analyzing the entire 215 GB of data, representing some 38 million loans, 1.6 billion observations, and over $7.1 trillion of origination volume. Furthermore, I did everything with free, open-source software. I chose PostgreSQL and R, b... Mortgages Are About Math: Open-Source Loan-Level Analysis of Fannie and Freddie - Todd W. Schneider

21,952

author

9 years

Well, that’s not quite fair because I can’t fit the entire dataset into RAM, which would make it even faster to analyze the data. According to the website yourdatafitsinram.com, it would cost about $10,000 to buy a server with enough RAM to fit the whole dataset. $10k is certainly more than I’m going to spend, but it’s a drop in the bucket for any big company looking to analyze mortgage data

This video is processing – it'll appear automatically when it's done.

report abuse

Jason

9 years

$399 a month to rent: http://www.ovh.com/ca/en/dedicated-servers/enterprise/

Upvote +2 Downvote

14,486

9 years

You can fit it on your SSD though, which is good enough!

Upvote +1 Downvote

100

9 years

You can run R on EC2. I’m not sure how much it would be for an analysis like yours.

http://blog.yhathq.com/posts/r-in-the-cloud-part-1.html

Upvote +0 Downvote

100

8 years

Fitting data into RAM helps only to a point, for this size of dataset it probably would be easier and much cheaper to use a service (for example something like BigQuery) that’d be able to parallelize the data analysis queries and run on many machines.

Upvote +1 Downvote

Show 3 more replies

... it, you could have easily incurred costs north of a million dollars per year. Today, in addition to Fannie and Freddie making their data freely available, we’re in the midst of what I might call the “medium data” revolution: personal computers are so powerful that my MacBook Air is capable of analyzing the entire 215 GB of data, representing some 38 million loans, 1.6 billion observations, and over $7.1 trill... Mortgages Are About Math: Open-Source Loan-Level Analysis of Fannie and Freddie - Todd W. Schneider

21,952

author

9 years

“Big Data” has become a meaningless cliché, so much so that complaining about Big Data being a cliché is also a cliché… so what’s the hipster data scientist to do? Talk about “medium data”, I suppose!

I like this definition of data-bigness, and when I say “medium data” I’m thinking of a dataset that can be stored on a single machine, but is big enough that you have to think before executing queries. In this particular case, some of my queries took up to a few hours – not an insanely long time, but long enough that it’s best to avoid having to rerun things if possible

This video is processing – it'll appear automatically when it's done.

report abuse

Rufus Pollock

7 years

You might like this post: http://blog.okfn.org/2013/04/22/forget-big-data-small-data-is-the-real-revolution/

There i actually defined data that could be processed on a single machine as “small data” and argued that the real revolution is the increasing size of “small data”.

Upvote +0 Downvote

...egulator, the Federal Housing Finance Agency. The stated purpose of releasing the data was to “increase transparency, which helps investors build more accurate credit performance models in support of potential risk-sharing initiatives.” Mortgages Are About Math: Open-Source Loan-Level Analysis of Fannie and Freddie - Todd W. Schneider

21,952

author

9 years

Traditionally Fannie and Freddie have guaranteed the timely payment of principal and interest to MBS investors. This means that if an investor owns an agency mortgage bond, and some of the homeowners whose mortgages are part of the bond stop paying their loans, then Fannie and Freddie reimburse the mortgage investors for any losses.

In the new world of “potential risk-sharing initiatives”, investors themselves would take on at least some of the risk of loss should homeowners stop paying their mortgages. Before investors are willing to take that risk, though, it seems likely they would want to see historical data so that they can make some informed projections about future default rates for Fannie and Freddie loans.

This video is processing – it'll appear automatically when it's done.

report abuse