Wednesday, November 16, 2011

Scheduling and Workspace in SUGAR

With the recent addition of perspectives to the Pentaho User Console (PUC) we opened up a whole new way to integrate with the BI platform.  This will go a long way for customers and OEMs who want to add (or remove) functionality from PUC.  Having said this, we are currently developing new perspectives for the SUGAR release.  Sean Flatley has been developing a PDI admin perspective (based on CDF).  There are also plans to create an admin perspective (or add to the PDI admin) to replace the admin console (PAC and PEC).

Recently, I have been developing a total replacement for the PUC workspace, which was in dire need of TLC. When PDI added scheduling capabilities against our DI server, this was against a brand new scheduling system.  As of yet, we hadn't taken advantage of this in the BI server.  All of this changes in SUGAR, the old scheduler is completely removed, the new scheduler has taken over!  Rather than get our existing (pre-SUGAR) workspace to work against the new scheduler, we spent some time re-writing it.  The new workspace makes all scheduler interactions using REST.  This means that it will be easy for other developers to interact with the scheduler in their own interfaces.

Scheduling with REST
I mainly wanted to highlight the new workspace in this post, but I figured there might be a fair amount of outside interest in learning about scheduling + REST.  We have held up our end of REST purity in that GET, POST and DELETE HTTP methods are used where appropriate.  Simple results are returned as text/plain, while complex state (such as a list of jobs) can be returned as either XML or JSON.  Whatever your client-side technology of choice is, you can set the "accept" HTTP header to instruct the server to return the desired type back.  For example, myrequest.setHeader("accept", "application/json") will cause the scheduler REST service to return results back (if supported) as JSON.

The URLs listed in the examples below assume that your BI server is running on "localhost" port 8080.

Scheduler State
To get the state of the scheduler make a GET request to:
http://localhost:8080/pentaho/api/scheduler/state

The return type for this is text/plain and the result will be one of:
RUNNING, PAUSED or STOPPED

To control the state of the scheduler you must make a POST request.  In order to start or resume the scheduler as a whole:
http://localhost:8080/pentaho/api/scheduler/start

To pause the scheduler:
http://localhost:8080/pentaho/api/scheduler/pause

To shutdown the scheduler (must be rebooted after a shutdown):
http://localhost:8080/pentaho/api/scheduler/shutdown

Remember, these are POST requests, you cannot just paste the URL in a browser and expect them to work (would be a GET request this way).

Listing Jobs
Since listing jobs does not change any state on the server, the request for getting the list of jobs is a GET.

The following URL can be used as GET request and will return XML or JSON.
http://localhost:8080/pentaho/api/scheduler/jobs

Job State
Like the scheduler itself, we can interact with the running state of an individual job.  Getting the state of a specific job requires that you submit the jobId (which is given in the list jobs REST call).  You can only get the state of a job that you created (own), unless you have administration privileges.

To get the state of a job, the REST url is:
http://localhost:8080/pentaho/api/scheduler/jobState

You must submit JSON or XML wrapping the jobId.  For example, the JSON request payload:
{"jobId":"joe:1685214344:1321154720424"}

The request header for the "Content-Type" is also set: myrequest.setHeader("Content-Type", "application/json");

The return type for this is text/plain and the result will be one of:
NORMAL, PAUSED, COMPLETE, ERROR, BLOCKED or UNKNOWN

Altering the state of a job is not much different than getting the state except that a POST request must be made.  The REST urls are:
http://localhost:8080/pentaho/api/scheduler/resumeJob
http://localhost:8080/pentaho/api/scheduler/pauseJob

Triggering a Job Immediately
To trigger the immediate execution of a job, you can invoke the triggerNow REST endpoint (POST) with the jobId wrapped with JSON or XML.  You must be authorized to execute the job in order to trigger it.


Deleting a Job

To remove a job from the scheduler, you can invoke the removeJob REST endpoint (DELETE) with the jobId wrapped with JSON or XML.  You must be authorized (job owner or admin) to delete the job from the scheduler.

Creating a New Job

This is the most complex part of interacting with the scheduler.  In order to represent a new schedule, there are 3 "trigger" types, simple, complex and cron.

I'm just going to give some examples rather than document every possible combination.  First, let's use a simple schedule, run the Inventory.prpt every 4 hours until December 31, 2012.  The JSON payload would be:

{"inputFile":"/public/pentaho-solutions/steel-wheels/reports/Inventory.prpt", "outputFile":null, "simpleJobTrigger":{"repeatInterval":14400, "repeatCount":-1, "startTime":"2011-11-16T00:00:00.000-05:00", "endTime":"2012-12-31T23:59:59.000-05:00"}}

The inputFile is the full path to the Inventory.prpt resource.  We're using a simple trigger, meaning that we don't worry about special recurrence patterns, we just want to run every 4 hours until the "endTime" has been reached.  The repeatInterval is 14400 seconds which equals 4 hours.  If you want to repeat a specific number of times until the trigger is no longer fired (in lieu of endTime) you can give a repeatCount.  A repeatCount of -1 means forever.

Next, let's imagine we want to schedule the Produce Line Sales.prpt every Sunday at 2am with no end date.

The REST endpoint is http://localhost:8080/pentaho/api/scheduler/createJob.  The JSON payload would be something like this:

{"inputFile":"/public/pentaho-solutions/steel-wheels/reports/Product Line Sales.prpt", "outputFile":null, "complexJobTrigger":{"daysOfWeek":["0"], "startTime":"2011-11-16T02:00:00.000-05:00", "endTime":null}}

Dissecting this, we can see the inputFile is set to the full path to the scheduled resource.  We are creating a "complex" job trigger with a recurrence pattern of "daysOfWeek" including just "0" meaning Sunday, the days range from 0-6.  If the trigger was going to be for multiple days of the week, this would be given as "daysOfWeek":["0","1"]" (for Sunday/Monday).  All times are in ISO_8601 date format (this is true for dates coming out of the scheduler REST services as well).  The startTime specifies the "from" date and endTime refers to the date at which the schedule will no longer be run.  A null value for the endTime means it has no end.

Another example, "The last Friday of every month at 4am" would have a JSON payload of:

{"inputFile":"/public/pentaho-solutions/steel-wheels/reports/Income Statement.prpt", "outputFile":null, "complexJobTrigger":{"weeksOfMonth":["4"], "daysOfWeek":["5"], "startTime":"2011-11-16T04:00:00.000-05:00", "endTime":null}}

Finally, a yearly schedule, "Every January 1st at midnight":

{"inputFile":"/public/pentaho-solutions/steel-wheels/reports/Invoice Statements.prpt", "outputFile":null, "complexJobTrigger":{"monthsOfYear":["0"], "daysOfMonth":["1"], "startTime":"2011-11-16T00:00:00.000-05:00", "endTime":null}}


The Workspace
With all the REST details behind me now, I can finally cover some new UI work that I've been working on the past few weeks. As I said before, the new workspace interacts with the server exclusively through REST web services, meaning that it is possible for someone with better UI skills to replace it (by removing the default one from the default-plugin/plugin.xml).


The old workspace listed all content for each schedule, this was unbelievably unmanageable, it was also rather clunky when it came to starting/stopping/removing schedules and their output content. It also lacked the ability to manage the scheduler as a whole (start/stop).


The new workspace lists schedules (aka jobs), not content (output) from those jobs. You can start/stop the entire scheduler or pause/resume individual jobs. A human readable description of each schedule is provided. Each column in the table view can be sorted. If there are many schedules, the table will enter a "paging" mode. If there are still too many schedules to find what you are looking for you can easily add a filter. You can multi-select (with the help of CTRL or SHIFT keys) and manage many schedules at once. Selected jobs can be triggered to run immediately, paused, resumed or removed permanently. When you click on a cell in the file (resource) column you can view and manage (TBD) content from previous executions of that schedule.


Workspace View showing multi-select "pause" (notice state of selected items)

You can filter the list of jobs by file, state, user, schedule type and execution times

Selecting a file link will show past execution history and allow content to be viewed.

We're not done with the scheduling yet, but we've been making incredible progress. We still need to finish (WIP) parameter support and define (TBD) what content management can be done from the history (generated content dialog).



Tuesday, November 1, 2011

Plugin Overlays, PUC Layout

I was thinking about my previous post on PUC Perspectives and some of the things that were necessary to make that happen when it dawned on me that I hadn't really highlighted some really cool changes.  There are two primary changes that are worth talking about:  menubar and PUC layout.

Menubar
The PUC menubar was a GWT menubar, standing not much different than it did in the proof-of-concept that I did back in March 2008.  James Dixon later extended upon this by adding the ability to define "menu customizations" through our plugin system.  What we really needed though was a total rewrite of the menu system but the prospect of making such a drastic change was never a priority.  Fortunately, we were able to justify the rewrite with the fact that without it, the capability of doing perspective overlays for the menubar were going to be pretty darn near impossible.  That is, without adding hacks upon existing hacks.  And so, the menubar was rewritten and XULified.

We now have a main_menubar.xul which defines the content and layout of the main menubar.  This will make it MUCH easier for 3rd party/OEMs to add/remove/update any/all of the behavior of the menubar simply by tweaking the XUL file.

This means that plugin.xml which used to contain "menu-items" will now use an overlay.  Most plugins which defined menu-items already had an overlay section to define toolbar tweaks.  The same overlay section is used for both menu and toolbar changes, for example,

<overlay id="startup.analyzer"  resourcebundle="content/analyzer/resources/messages">
    <toolbar id="mainToolbar">
      <toolbarbutton id="newAnalysisButton" image="../api/repos/xanalyzer/images/analyzer_toolbar_icon.png" onclick="mainToolbarHandler.openUrl('${tabName}','${tabName}','api/repos/xanalyzer/service/selectSchema')" tooltiptext="${openNewAnalyzerReport}" insertafter="dummyPluginContentButton"/>
    </toolbar>
    <menubar id="newmenu">
      <menuitem id="new-analyzer" label="${openNewAnalyzerReport}" command="mainMenubarHandler.openUrl('${tabName}','${tabName}','api/repos/xanalyzer/service/selectSchema')" />  
      </menubar>    
</overlay>

There are a few subtle differences here with pre-SUGAR overlay definitions.  I have fixed the annoying bug in the plugin system which required the nesting of an overlay inside of an overlay (simply due to XML parsing bug), for example:

<overlay id="startup.analyzer"  resourcebundle="content/analyzer/resources/messages">
    <overlay id="startup.analyzer"  resourcebundle="content/analyzer/resources/messages">


Another benefit is that overlays can have resource bundles associated with them, while the old menu-item section did not.  This allows us to localize the display strings in the menu system.

PUC Layout
I was recently asked by James Dixon if it would be possible to update the entire layout of PUC with XUL.  I said we needed to get the story on our sprint backlog, which I ended up doing (BISERVER-6693).  Instead of using XUL, which might be a barrier to entry for some customers/OEMs, a much easier solution to the PUC layout actually exists:  just use HTML.  What if the PUC layout existed in HTML and we just inject into various id's?  So, in SUGAR, I have done just this, the layout of PUC is based on DIV tags in the HTML (Mantle.jsp).  When PUC loads, it no longer "takes over" the page, it now looks for certain elements by id, such as "pucMenuBar" or "pucPerspectives" and then injects the widget at that location.  This will allow easier customization, for example, in SUGAR we have actually removed the "logo panel" from PUC itself.  With the DIV-based layout, we can easily add a logo panel back into the product by editing the HTML.  This is still a work in progress, the "pucContent" is very high-level and refers to the entire bottom section of PUC (explorer + content).  The next phase will be to define the layout even further, but we've taken steps towards this direction and what has been done is beyond concept, it's committed.

The layout of PUC can be defined as something as simple as this:

<div id="puc" style="height: 100%">

    <div id="pucTopBar" style="background-color: black; height: 28px">
        <div id="pucMenuBar" style="float: left">
        </div>

        <div id="pucPerspectives" style="float: right;">
        </div>
    </div>

    <div id="pucToolBar" style="clear; both; float: left; width: 100%">
    </div>

    <div id="pucContent" style="clear: both; height: 100%; width: 100%">
    </div>

</div>

Enjoy!