

Preference is given to graduates in the fields of engineering, but other disciplines are considered based on the needs of the organisation. Our National Graduate Training Programme provides graduates with an approximately 18-months training programme enabling them to assume a role as a first line supervisor. Three members of our current Executive Committee originally joined EGA as fresh graduates.

If you have questions about Azure Data Factory, data warehousing or anything Azure related, we’re here to help. For us, it came down to the number of files that we were processing which would take too long to loop through, so we preferred to load by folder. In summary, we’re shifting more to patterns where we load data from files in a folder and then maybe loop through a smaller list of folders if needed and moving away from patterns where we process things one file at a time.Īs with many things, how you make that decision will vary depending on several factors. In that case, our logic changes as far as how we keep track of those files or folders that we have/have not loaded, but in the end, making that change will provide you some tremendous performance gains.

If the files in that folder all have identical or compatible structure with your Copy Activity, we can copy all those files at once, rather than in a loop. This being said, we’ve shifted our approach recently in many cases, away from loading data file by file, but instead pointing it to a folder. If you’re dealing with a long list of files, you’re going to run into some severe performance problems. So, the mechanism that’s used behind the scenes is quite different it must provision resources behind the scenes and the process of initiating these tasks can take some time. Each one of the tasks that we see here, even the logging, starting, copy and completion tasks, in Data Factory requires some start up effort. If you’re coming from an SSIS background, the idea of using the ForEach Loop is a powerful technique and it’s not a big deal to loop through 100s of files.īut in Azure Data Factory, the story is a bit different. We run the Copy Activity there and then we record whether it succeeded or failed at the end. I’ve got a stored procedure that puts an entry into a table that says I’ve started this. This other screenshot is a typical pattern we would do for each of those files. So, we have a couple tables behind here telling us which files are available and then a list of those files that may have already been loaded to our target. The screenshot below shows a typical pattern that we use where we would start off by getting a list of files that we want to load. What I’m talking about here comes down to the difference of loading data one file at a time vs loading an entire set of files in a folder.

In this post I’d like to share some knowledge based on recent experiences when it comes to performance of Azure Data Factory when we are loading data from Azure Data Lake into a database more specifically in using the Copy Activity.
