Using R to gain insight from the data in your Mendix Applications

As mentioned in our blog post about OData support in Mendix, OData is also useful if you want to access data in your applications from R or RStudio. In this blog post, I’ll show you how to do this. I’ll also give some simple examples of how to use R aimed at people who are unfamiliar with it.

What is R?

R is a very popular Open Source Statistical Analysis programming language used by many data scientists. You can download a free, open source version, but you’ll also find R support in many commercial tools, including Microsoft Revolution R, Tibco Spitfire, Pivotal, Oracle, and Tableau. What’s interesting about R is that there are a large number of libraries that you can use. Libraries range from data manipulation algorithms, to graphing, to reporting, to machine learning.

Loading OData into R

To retrieve OData into R we’re going to use two packages: httr and XML. The first package enables you to read data from the web, the second can be used to parse XML documents.

With a small function using these packages, you can fetch a OData resource and turn it into R data.

dataset <- getODataResource(<url of the odata resource>,<username>,<password>)

Before we start we need to specify the libraries we need.

library('httr')
library('XML')
library('dplyr')
library('lubridate')

The function getODataResource first reads the OData resource and parses the returned XML document using the Httr package. Next, it determines the names of the attributes, gets the values, and builds a data frame containing the values.

getODataResource <- function(resourcePath,domain,usr,pwd){
  
  url <- paste(domain, resourcePath,sep="")
  # get the OData resource
  response <- GET(url,authenticate(usr,pwd))
  
  # parse xml docucument
  responseContent <- content(response,type="text/xml")
  
  # determine the names of the attributes
  xmlNames <- xpathSApply(responseContent,
      '//ns:entry[1]//m:properties[1]/d:*',xmlName, 
      namespaces = c(ns = "http://www.w3.org/2005/Atom",
        m="http://schemas.microsoft.com/ado/2007/08/dataservices/metadata",
        d="http://schemas.microsoft.com/ado/2007/08/dataservices"))
  
  # determine all attribute values
  properties <- xpathSApply(responseContent,'//m:properties/d:*',xmlValue)
  
  # cast the attributes values into a data frame
  propertiesDF <- 
      as.data.frame(matrix(properties,ncol=length(xmlNames),byrow=TRUE))
  
  # set the column names
  names(propertiesDF) <- xmlNames
  return(propertiesDF)
}

This is the basic functionality that you need to retrieve Mendix OData resources into R. In its current form, it doesn’t handle nested data, so for example, Orders with order lines nested in the orders. It also doesn’t use the datatype info that is included in the OData resource. Instead, we’ll use some R expressions to specify the data types manually, as illustrated below.

Example use case

The example Mendix application has resources for orders, customers and addresses. We’ll create an overview of the number of orders per city in R.

First, we’ll define how to connect to the OData resources in our Mendix application.

domain <- "http://localhost:8080/"
username <- "demo_reporter"
password <- "goSfcsDj00"[/bash]

The following will load all the objects in the customers and address entities into R data frames.

[bash]customers <- getODataResource("odata/Orders/Customers()",domain,username,password)
addresses <-  getODataResource("odata/Orders/Address()",domain,username,password)

For orders we’re just interested in the orders created since january 1st 2014. We could filter on order date in R, but specifying it on the OData resource url avoids the overhead of querying them from your Mendix database, and transporting all the data from the Mendix runtime to R.

orders <- getODataResource(
  "odata/Orders/Orders()/?$filter=OrderDate%20gt%20datetimeoffset'2014-01-01T00:00:00z'"
  ,domain,username,password)

The data frames contain all the Mendix objects, retrieved through OData resources. The first five records of orders look like this:

orders[1:5,c("OrderNumber","Order_Customer","OrderDate")]
##   OrderNumber   Order_Customer                OrderDate
## 1           1 2533274790395905 2015-01-18T10:32:00.000Z
## 2           2 2533274790395906 2014-01-30T07:01:00.000Z
## 3           3 2533274790395907 2014-04-16T13:15:00.000Z
## 4           4 2533274790395908 2014-09-19T07:52:00.000Z
## 5           5 2533274790395909 2015-01-14T07:28:00.000Z

Now that we have the data, we need to make sure we use the correct datatypes. Mendix IDs are implemented using longs in Mendix. In R, you can use doubles.

customers$DateOfBirth <- ymd_hms(customers$DateOfBirth)
customers$ID <- as.double(customers$ID)
customers$Billing_Address <- as.double(customers$Billing_Address)
customers$Delivery_Address <- as.double(customers$Delivery_Address)

addresses$ID <- as.double(addresses$ID)

orders$ID <- as.double(orders$ID)
orders$Order_Customer <- as.double(orders$Order_Customer)
# for easy access, create separate columns with date and date time
orders$OrderDateTime <- ymd_hms(orders$OrderDate)
orders$OrderDate <- ymd(format(ymd_hms(orders$OrderDate),"%Y-%m-%d"))

Next we’ll use the dplyr library to manipulate the data received. Dplyr enables you to filter data, add new columns, select certain columns, and order the rows, very much what you would do with SQL in a regular database.

The following dplyr expression joins the customers, addresses, and orders data frames.

customerOrders <- customers %>%
  rename(CustomerID=ID) %>%
  left_join(addresses, by=c("Delivery_Address"="ID")) %>%
  select(CustomerID,Firstname,Lastname,City,Country) %>%
  left_join(orders, by=c("CustomerID"="Order_Customer")) %>%
  select(Firstname,Lastname,City,Country,OrderNumber)

The first rows of this data frame look like this:

customerOrders[1:5,]
##   Firstname Lastname        City Country OrderNumber
## 1      Ivan  Freeman New Orleans      US           1
## 2      Ivan  Freeman New Orleans      US         499
## 3   Anthony Robinson    Brighton      US           2
## 4   Anthony Robinson    Brighton      US         500
## 5     Kaden Griffith  Bridgeport      US           3

Now we can count the number of orders per city as follows:

cityOrderCount <- customerOrders %>%
  group_by(City) %>%
  summarize(OrderCount = n()) %>%
  arrange(desc(OrderCount))

The result:

cityOrderCount[1:5,]
## Source: local data frame [5 x 2]
## 
##           City OrderCount
## 1     New York         28
## 2 Philadelphia         16
## 3      Chicago         15
## 4        Miami         12
## 5    Baltimore         10

Creating graphs

The following example uses ggplot2 to create graphs. You can also use other libraries for graphing, but ggplot2 is one of the more popular.

We’ll start with a simple bart chart plotting the number of orders per month.

library('ggplot2')

# Determine first day of month for every order, count number of orders per month

orderCount <- orders %>%
  # Determine date of first day of the month, so ggplot understands it's a date
  mutate(OrderMonth = ymd(format(OrderDate,"%Y-%m-01"))) %>%
  group_by(OrderMonth) %>%
  summarize(noOfOrders = n())

# generate barchart to display number of orders per month

ggplot() +
  geom_bar(data=orderCount,aes(x=OrderMonth,y=noOfOrders), stat="identity") +
  xlab("Month") + 
  ylab("Number of orders") + 
  ggtitle("Orders per month")

Orders per Month Chart

Next we’re interested in seeing when orders are placed during the week. First, we need the dataset, containing day of week and hour of day for every order.

ordersPerWeekHour <- orders %>%
  mutate(DayOfWeek = wday(OrderDate)) %>%
  mutate(HourOfDay = hour(OrderDateTime)) %>%
  group_by(DayOfWeek,HourOfDay) %>%
  summarize(noOfOrders = n())

Now for the actual graph, we can use tile plane to show the number of orders for every hour of every day. We put the day on the x-axis, the hour on the y-axis, and use number of orders to determine the color displayed.

ggplot(ordersPerWeekHour, aes(x=DayOfWeek,y=HourOfDay))+
  geom_tile(aes(fill=noOfOrders)) +
  scale_fill_gradient(low="green", high="red") + 
  scale_x_continuous(breaks=1:7,labels=c("Sun","Mon","Tues","Wed","Thurs","Fri","Sat")) + 
  scale_y_continuous(breaks=0:24) +
  labs(x="Day", y="Hour")

Number of orders heatmap

You can now easily see that most of the orders are placed between 16:00 and 22:00, evenly divided across all days of the week.

Generate reports using R

R has some interesting reporting facilities. Using Rmarkdown, you can generate HTML, Word, PDF or even slides straight from an R report.

This whole blogpost was actually written using rmarkdown, generating an MS-Word document. To update the Word document with the latest data from my Mendix application, I just have to rerun the rmarkdown script.

The easiest way to work with Rmarkdown is to use the built-in facilities of RStudio. Alternatively, you can generate an Rmarkdown report without RStudio with some R code:

library(knitr) # required for knitting from rmd to md
require(rmarkdown) # required for md to html
setwd('<location of your rmarkdown file>')
render("<name of the rmarkdown script>", "all")

Conclusion

The OData feature in Mendix opens up a lot of possibilities for Mendix users. R is a powerful tool for statistical analysis and data reporting. Using OData Mendix user can now easily benefit from all these facilities.