Turnkey solution design for large scale Microsoft Forms automation and Data collation

Jeffery Tay
6 min readFeb 15, 2020

--

Background

Now that both Jenzus Hsu and myself have demonstrated how you can leverage modern technologies to automate large scale creation of Microsoft Forms. Its time to revisit the solution architecture, design and costs considerations for such a turnkey solution.

I’ll recommend that you spend some time to read the following posts first to get a sense of the capabilities of Power Platform as compared to bespoke software development.

  1. Building Contact Tracing (Check-in) Canvas App using Power Apps (link)
  2. Automating Microsoft Forms Creation (link)
  3. 🤖Power Automate UI flows — Automating Microsoft Forms Creation (link)

Solution Design

This is a 2-part solution design, where the output of the first solution design will become one of the inputs for the second design

Solution design for automating creation of Microsoft Forms

The second solution design focuses on collating the responses from all of the forms into multiple associated excel files and also to automate the insertion of records into a SQL database

Solution design for data extraction and consolidation

Data Extraction via Power Platform

From a technical viewpoint, Microsoft Flow offers connectors to Microsoft Flow and Microsoft SQL server out of the box. Enabling them as output source is a simple matter of drag and drop and connecting the elements together. As demonstrated by Jenzus, Power Automate UI flows employs Selenium together with Edge(Chromium) to provide one of the best in class support for web browser automation.

If cost is not a consideration, this is perhaps one of the most elegant way to consolidate all of the responses from the forms.

Data Extraction via Bespoke software

Lets now consider the other angle where a developer decides to build upon the initial work done to now collate all of the form responses.

Considerations

  1. Throttling of Office365 services (link): Microsoft can and will throttle your requests when you send too many at one go. Understanding of this information is crucial to the solution design.
  2. Parallelism: Whilst one can go about collecting each of the form sequentially, parallelism can vastly improve the speed of data collection resulting in a quicker turnaround time.
  3. AzureAD vs ADFS: You will need to consider the difference in behavior between Azure AD and ADFS
  4. Forms-API: Fortunately as Microsoft Forms is a modern application, there are certain APIs that are used to provide data to the UI. You can call these APIs to obtain the raw data.
    Obviously this is NOT the recommended way, but it does make life alot simpler
  5. Checkpoint: Similar to High Performance Computing (HPC) jobs, checkpoint is important as it helps the program to keep track of state and resume where it last left off. This can also help with #1 as the program can then be smart enough to download those which are still outstanding rather than downloading the old ones all over again.
  6. Excel rows limit: Each excel file can have up to 1M rows only.

Developing the solution (code-map)

Solution code map

The program is split into 3 main functions

  1. Create batch files
  2. Extract Office365 User’s Forms Responses as Excel files
  3. Consolidate and combine responses into one or many Excel files

Create batch files

Batching is a common concept in HPCC where a task is split across multiple processors to reduce the run time.

Obviously this is not going to run on a supercomputer, so what it does is to create multiple batch files to maximize the available cores within the current host.

When the batch files are written to disk, the program will also generate a runall.bat which will trigger all of these jobs to run in parallel.

Code map for Generate Batch files

Extract Office365 User’s Forms Responses as Excel files

Extraction is split into 2 parts, with a primary function (ExtractO365UserMSForms) handling the back-off process from O365 and a secondary function (ExtractSpecificO365UserMSForms) that does the actual extraction.

The primary function takes in the inputs based on the solution design for data extraction and consolidation, loops thru each of the identified user accounts and tracks the success/failure for each of these accounts.

In the event any of these accounts hit the throttling limit, the application will proceed with other accounts first and loop back to those that got throttled after the back-off period is observed.

It is quite common for this process to take a while to run as you will encounter the throttling limits frequently during the downloads and have to wait for a certain period before retrying. i.e most of the time is spent waiting.

Code map for Extract Office365 Forms for multiple users

The secondary function will first obtain a current list of all of the specified user’s forms and associated no of responses. If there is an existing state file, the state file is updated with the delta changes from the current list. The function then continues from where it last left off and only attempts to download the responses for those that have at least 1 response. In the event the function hits an HTTP error 429 or 503, it will stop and defer execution back to the primary function.

Code map for Extract Office365 Forms for a specific user

Consolidate and combine responses into one or many Excel files

As there can potentially be thousands of responses, this program will loop thru all of the exported form responses and collate them into one or many excel files (bearing in mind the 1-mil row limits)

At the same time, this function will connect to a SQL database and attempt to update the delta changes into a primary table for ease of data analysis.

Code map for Combine Excel

Conclusions

Both of us have presented two different viewpoints on how you can approach a potentially challenging and tricky problem to solve. I’ve summarized the various considerations into the table below for ease of reading.

At the end of the day, both methods will work as long as the team has the relevant expertise and licenses.

That said typically RPA/low code platforms are meant for citizen developers to develop user-based applications i.e automate something they do frequently.

Bespoke development is usually miles ahead of any low code solution if the team is extremely well versed and strong in software development. A properly documented and maintained source code is a pleasure to read for any software developers and helps to build up the team’s capabilities as a whole.

The opinions and views expressed here are those of my own.

--

--

Jeffery Tay

Education is in my blood, partnership and coaching is my passion. ¬ L’essentiel est invisible pour les yeux