Turnkey solution design for large scale Microsoft Forms automation and Data collation
Background
Now that both Jenzus Hsu and myself have demonstrated how you can leverage modern technologies to automate large scale creation of Microsoft Forms. Its time to revisit the solution architecture, design and costs considerations for such a turnkey solution.
I’ll recommend that you spend some time to read the following posts first to get a sense of the capabilities of Power Platform as compared to bespoke software development.
- Building Contact Tracing (Check-in) Canvas App using Power Apps (link)
- Automating Microsoft Forms Creation (link)
- 🤖Power Automate UI flows — Automating Microsoft Forms Creation (link)
Solution Design
This is a 2-part solution design, where the output of the first solution design will become one of the inputs for the second design
The second solution design focuses on collating the responses from all of the forms into multiple associated excel files and also to automate the insertion of records into a SQL database
Data Extraction via Power Platform
From a technical viewpoint, Microsoft Flow offers connectors to Microsoft Flow and Microsoft SQL server out of the box. Enabling them as output source is a simple matter of drag and drop and connecting the elements together. As demonstrated by Jenzus, Power Automate UI flows employs Selenium together with Edge(Chromium) to provide one of the best in class support for web browser automation.
If cost is not a consideration, this is perhaps one of the most elegant way to consolidate all of the responses from the forms.
Data Extraction via Bespoke software
Lets now consider the other angle where a developer decides to build upon the initial work done to now collate all of the form responses.
Considerations
- Throttling of Office365 services (link): Microsoft can and will throttle your requests when you send too many at one go. Understanding of this information is crucial to the solution design.
- Parallelism: Whilst one can go about collecting each of the form sequentially, parallelism can vastly improve the speed of data collection resulting in a quicker turnaround time.
- AzureAD vs ADFS: You will need to consider the difference in behavior between Azure AD and ADFS
- Forms-API: Fortunately as Microsoft Forms is a modern application, there are certain APIs that are used to provide data to the UI. You can call these APIs to obtain the raw data.
Obviously this is NOT the recommended way, but it does make life alot simpler - Checkpoint: Similar to High Performance Computing (HPC) jobs, checkpoint is important as it helps the program to keep track of state and resume where it last left off. This can also help with #1 as the program can then be smart enough to download those which are still outstanding rather than downloading the old ones all over again.
- Excel rows limit: Each excel file can have up to 1M rows only.
Developing the solution (code-map)
The program is split into 3 main functions
- Create batch files
- Extract Office365 User’s Forms Responses as Excel files
- Consolidate and combine responses into one or many Excel files
Create batch files
Batching is a common concept in HPCC where a task is split across multiple processors to reduce the run time.
Obviously this is not going to run on a supercomputer, so what it does is to create multiple batch files to maximize the available cores within the current host.
When the batch files are written to disk, the program will also generate a runall.bat which will trigger all of these jobs to run in parallel.
Extract Office365 User’s Forms Responses as Excel files
Extraction is split into 2 parts, with a primary function (ExtractO365UserMSForms) handling the back-off process from O365 and a secondary function (ExtractSpecificO365UserMSForms) that does the actual extraction.
The primary function takes in the inputs based on the solution design for data extraction and consolidation, loops thru each of the identified user accounts and tracks the success/failure for each of these accounts.
In the event any of these accounts hit the throttling limit, the application will proceed with other accounts first and loop back to those that got throttled after the back-off period is observed.
It is quite common for this process to take a while to run as you will encounter the throttling limits frequently during the downloads and have to wait for a certain period before retrying. i.e most of the time is spent waiting.
The secondary function will first obtain a current list of all of the specified user’s forms and associated no of responses. If there is an existing state file, the state file is updated with the delta changes from the current list. The function then continues from where it last left off and only attempts to download the responses for those that have at least 1 response. In the event the function hits an HTTP error 429 or 503, it will stop and defer execution back to the primary function.
Consolidate and combine responses into one or many Excel files
As there can potentially be thousands of responses, this program will loop thru all of the exported form responses and collate them into one or many excel files (bearing in mind the 1-mil row limits)
At the same time, this function will connect to a SQL database and attempt to update the delta changes into a primary table for ease of data analysis.
Conclusions
Both of us have presented two different viewpoints on how you can approach a potentially challenging and tricky problem to solve. I’ve summarized the various considerations into the table below for ease of reading.
At the end of the day, both methods will work as long as the team has the relevant expertise and licenses.
That said typically RPA/low code platforms are meant for citizen developers to develop user-based applications i.e automate something they do frequently.
Bespoke development is usually miles ahead of any low code solution if the team is extremely well versed and strong in software development. A properly documented and maintained source code is a pleasure to read for any software developers and helps to build up the team’s capabilities as a whole.
The opinions and views expressed here are those of my own.