College application moneyball

One of my hobby projectsis a website for high schoolers to track their college application deadlines. The site would also show students theirlikelihood of admission at prospective schools. Here’s my brief explanation of the concept and my cursory attempt at market research.

I’m sure most high schoolers applying to college know well enough, or are told oftenenough, to pick a variety of institutions that includes “safety schools”, “reaches”, and the middle. These are the odds of admission. When I was graduating high school, the college counselor showed me a scatter plot of how kids had fared before me when applying to college, a graph per college with SAT scores on the y-axis and GPAs on the x-axis (it was a big book). Circles for misses and x’s for wins (acceptances).

So, that’s not what I have in mind since that data isn’t available. But this data is (For U Buffalo SUNY for example):

Screen Shot 2014-02-01 at 2.09.04 PM Screen Shot 2014-02-01 at 2.09.22 PM

Screen Shot 2014-02-01 at 2.09.34 PM

I envision something like this for the website:

Skitch<em>-</em>February<em>1</em><em>2014</em>2<em>16</em>43<em>PM</em>PST-3I’m likening it to the way Billy Beane uses statistics to make a better baseball franchise in Moneyball (Michael Lewis).

Screen Shot 2014-02-01 at 1.36.16 PM

So, I am not at all sure how I build a model from the data above to get the graph above. I figure assumptions have to be made about the general shape of the graph (linear, quadratic, exponential, and so on) and then we fit the score and GPA data on top of it. What do I know?

I’d gladly take a hint or help from some Math majors out there. Or just anyone who likes the challenge.

I would wax about why it would be cool/useful if kids could be this analytical about the application process, but so far, most people I talk to get it and can imagine the benefits.

Chime in, of course!