Advanced Azure Data Factory (ADF) interview questions and answers

Sanjay Kumar PhD
4 min readNov 23, 2024

--

Image generated by DALL E
  1. How to Rerun a Pipeline from the Azure Data Factory Monitor?

To rerun a pipeline that has already been triggered:

  1. Navigate to the Azure Data Factory portal and select the ‘Monitor’ tab from the main menu.
  2. Under the Pipeline Runs section, locate the pipeline you wish to rerun.
  3. Click on the specific pipeline to open its details.
  4. Look for the rerun icon (red circular arrow) and click on it to restart the pipeline.
  • Note: Ensure that all prior pipeline conditions or data states are validated before rerunning.

2. How can you debug only the first 10 activities in a pipeline with 15 activities?
Azure Data Factory allows selective debugging using the ‘Debug Until’ feature:

  1. Open the pipeline in the ADF authoring interface.
  2. Locate the activity where you want to stop debugging (in this case, the 10th activity).
  3. Click on the red circle located at the top of the selected activity.
  4. Choose the ‘Debug Until’ option, which runs the pipeline until that activity.
  • This method prevents executing unnecessary steps and helps in isolating issues efficiently.

3. How to Restart Failed Pipeline Jobs in Azure Data Factory?

When a pipeline fails, ADF provides flexible options to restart or continue execution:

Rerun from the Beginning:

  • This option restarts the pipeline from the first activity.
  • Note: All succeeded steps will also be re-triggered, so verify to avoid duplicate data loading.

Rerun from Failed Activity:

  • This option starts the pipeline directly from the activity that failed.
  • Useful when you want to resume the pipeline without repeating previous successful steps.

Rerun from a Specific Activity:

  • Open the pipeline runs view, select the activity you want to rerun from, and initiate the process.

4. How to Define and Use Activity Dependencies in Pipelines?

Activity dependencies determine how and when subsequent activities in a pipeline execute, based on the status of preceding activities. Dependency conditions include:

  1. Succeeded: Runs the next activity only if the previous one succeeded.
  2. Failed: Runs the next activity only if the previous one failed.
  3. Completed: Runs the next activity regardless of success or failure.
  4. Skipped: Runs the next activity if the preceding one was skipped.
  • Example: If Activity A -> Activity B, the dependency type determines if Activity B will run after Activity A. For example:
  • Succeeded Condition: Activity B executes only when Activity A completes successfully.
  • Skipped Condition: Activity B executes if Activity A was skipped.

5. How can you share a Self-Hosted Integration Runtime (IR) between Data Factories?

Grant Permission to Share:

  • While setting up the original Self-Hosted IR, enable the ‘Grant Permission’ option.
  • Specify the target Data Factory to which the IR should be shared.

Create a Linked IR in the Target Data Factory:

  • In the target Data Factory, create a new Linked IR.
  • Provide the resource ID of the original shared IR during setup.

Finalize and Save:

  • After the configuration, the shared IR will be available for use in the target Data Factory.

6. How can you send email notifications in Azure Data Factory?
Answer:

Set up a Logic App:

  • Create a new Logic App in the Azure portal.
  • Configure the Logic App to send an email (e.g., using Gmail or another provider).
  • Obtain the endpoint URL of the Logic App after saving it.

Integrate with ADF Using Web Activity:

  • Add a Web Activity to your pipeline.
  • Provide the Logic App endpoint URL in the settings of the Web Activity.
  • Use the Web Activity to send an HTTP POST request containing details about the pipeline failure (e.g., error message, pipeline name).
  • This enables automatic email notifications for pipeline issues.

7. What is the Get Metadata Activity, and when should it be used?

The Get Metadata Activity retrieves metadata about datasets, such as file names and folder structures. Use cases include:

  1. Validating Data: Check for the presence of files or folders before proceeding.
  2. Triggering Pipelines: Automatically initiate a pipeline when specific data becomes available.
  3. Control Flow Logic: Use metadata outputs in conditional expressions or looping structures to dynamically control pipeline execution.

8. What is the Lookup Activity, and how is it used?

The Lookup Activity fetches data from supported data sources in ADF. Use it for:

  1. Dynamically determining which objects (e.g., files, tables) to process in subsequent activities.
  2. Handling scenarios where object names are not hardcoded but retrieved dynamically based on a query.
  • Example: Retrieve the latest file name from a folder and pass it as a parameter to the next activity.

9. How do you improve performance when running many pipelines that take too long?

  1. Batch Processing: Split pipelines into smaller, manageable batches.
  2. Use Multiple Integration Runtimes: Distribute the workload across multiple Integration Runtimes to balance resource utilization and improve execution speed.

10. How do you trigger a pipeline when a file arrives in Blob Storage?

Use an Event-Based Trigger:

  1. Create a new trigger and select the Event-Based type.
  2. Specify the conditions:
  • Blob Path Begins With: Define the folder structure.
  • Blob Path Ends With: Specify file names or extensions to monitor.

Choose the event type:

  • Blob Created: Trigger the pipeline when a new file is added.
  • Blob Deleted: Trigger the pipeline when a file is removed.

11.How can you call a pipeline in another Data Factory?

Use a Web Activity with the Azure REST API:

  1. Add a Web Activity to the pipeline in the calling Data Factory.
  2. Configure the API URL to reference the target Data Factory and pipeline.
  3. Grant the Contributor Role in the target Data Factory to the calling Data Factory’s identity.

--

--

Sanjay Kumar PhD
Sanjay Kumar PhD

Written by Sanjay Kumar PhD

AI Product | Data Science| GenAI | Machine Learning | LLM | AI Agents | NLP| Data Analytics | Data Engineering | Deep Learning | Statistics

No responses yet