How’s my city doing on crimes? How fast are cities in my county growing and in which neighborhoods? How’s the electricity use is changing over years? These are some of the questions you can answer using Microsoft Excel 2013 and with the free add-on “Power Query”. I’ll give you an example walk-through of setting it up.
First, get Excel Power Query. Landing page is here. Available in most languages to match your main Excel 2013 installation. You have to have Excel 2013 installed first. You can get it easily from Office 365, here.
Then, you need to find your data set. Below I will provide an example from San Mateo county in California. As part of the open data initiative, they maintain a web service powered by Socrata, which exposes a number of interesting data sets in OData format. Many counties in the US does this – you can also visit http://opendata.socrata.com to get even broader sets of data.
Ok.. back to our example. We want to see the change in electricity usage by city within San Mateo county. The data for 2003-2010 is available for free, here:
Just so you know whether you installed Excel Power Query correctly; when installed, looks like this:
The data source that’s of interest to us at this point is OData Feed. Although it can pull from many different sources, including Facebook.
Establishing connection to an OData Feed is comprised of following simple steps:
- Obtain OData service URL
- Obtain credentials (some data sets are behind a pay-wall; so you may need to subscribe and use specific username/password)
- Go to Excel Power Query and add the feed using the data source “From OData Feed” as indicated above.
Now that you understand the basics, let’s construct our example query. On San Mateo county’s open data portal (the screenshot above), click on Export and expand OData, like below, and copy the OData service URL. In this case San Mateo county is not asking for special credentials so we will just access it anonymously.
Now to go Excel, while on a blank sheet, click on “From OData Feed” in the Get External Data|From Other Sources menu, like the screenshot I included earlier. Following dialog pops up:
Paste/enter the URL you obtained earlier, click OK, like this:
Excel then brings up the Query editor along with the preview of the data it could read from that URL, allowing you to make adjustments to the way data is imported. Like this:
Let me explain couple of things here…
The data set you see is downloaded from the source we have provided just in time, read the lower right corner – it’s a “preview”. For large data sets, this will include only first 100 records. In this screen, you can, among many other things:
- Remove columns (useful to remove columns you don’t need — speeds up refresh actions later)
- Add filters (filtering in the query will yield faster refresh while minimizing the load on the worksheet)
- Rename columns or re-order them, change sort preferences. (this way you don’t have to re-do them later inside Excel sheet)
Key difference you should conceptually understand here is that you are creating a “query”, not really working on a static data set. But you are doing that by looking at a sample, preview data from the actual source.
For this exercise, I chose not to make any changes and simply clicking on the Close & Load button on upper left corner. Here’s the result:
Now, this XLSX workbook file will contain the latest “data” from this source when you open it. However, you can always hit Data/Refresh button while in Excel to trigger a live data refresh from the source – which you can save of course. For frequently updating data sets, this gives you dynamic access to data to improve your decision making.
Hope this helped you getting some familiarity with this incredible tool, Power Query.
Categories: Data Analytics