This little blurb will help you get started. The rest is in the documentation.
First, download some data from your MLS. Let's say it's all residential sales over the last two years for a defined neighborhood. You tell your MLS to export the search results as an Excel file.
Open the resulting download in Excel, and look at the results. Your first row should be a header, with names for each of the columns, e.g., Map ID, Property Address, Sale Date, Sale Price, GLA, etc. Why open it in Excel first? Well, because R is very picky about its diet. It won't eat just anything. We're going to use Excel to clean it up a little.
First, look at the header. You'll be typing those names into R commands later on, so now's the time to change them to something easier to read or type. My data service, for example, insists on sending "Last_Sale_Date" as one of the headers. I change that to "sdate" for convenience. These things are case-sensitive, too.
Next, make sure none of the cells are merged. Choose Edit->Select All and un-merge any cells. My service transmits data like this:
- Code: Select all
1.-> 091-011.07 ->1194 Sulfur Springs Rd->11/9/070->$205,000->2194 <-
: -> ->Selmer, TN 38375 <-
I remove the merged cells, then (since that second line is the owner's city and address and not needed) I sort on Column A. This brings all the "second" lines to the top, and I delete them.
Next thing to do is to select the column of sale prices. My data service sends these in the form "$140,000". R interprets anything with non-numeric characters as ALL non-numeric (called "factors" in R-speak.) So you want to change the number format to plain vanilla, thus: "140000". No commas, no dollar signs. Decimals are allowed, but are not practical. Make the same change to land values, improvement values, and assessed values if you downloaded those. **Note: To select the whole column, just click on the column letter at the top of the screen. E.g., clicking the "A" selects all rows in column A from Row 1 to Row 65535.
Finally, look at the date columns and make sure they conform to "DD/MM/YY" format. You can use any format you want, but I'll be explaining this on the basis of the format shown.
Scroll through the data from top to bottom quickly. You're looking for artifacts, extra rows, formatting glitches, or anything else that isn't simple, straightforward ASCII numbers and letters.
Now, save that file as MyData.csv, and choose "," for a separator. You're finished with the first step. In the next post, we'll examine how to pull this file into R and run some simple analyses.
