Published on 10.09.2019

Engineering

Data Engineer Intern

 

Within Sigfox, the Data & Analytics team is responsible for collecting and making data available for analytics and other services. Data freshness is becoming more and more critical for the company. At the same time, as the amount of data collected is growing, it is also becoming interesting to rationalize the cost to process the data. Most of the data processing is currently realized with Apache Spark. The aim of the internship is to fine tune the sizing and configuration of our Spark clusters in order to offer the best trade-off between performance and cost. Some problematics of hybrid cloud will also be explored during the internship.

 

WIN/WIN MISSIONS:

As part of Data & Analytics team, the intern will work on the following tasks:

 

  • Get familiar with existing data processing pipeline

  • Explore Spark documentation to acquire the best practices in terms of configuration

  • Define different Spark cluster architectures and data preparation plans to be tested

  • Carry out performance tests

  • Write test reports

  • Write technical documentation to best size and configure Spark clusters given Sigfox’s use case

 Technical environment: Python, Apache Spark, AWS S3, AWS EMR, Bitbucket, JIRA

 

KEY SKILLS & MINDSET:

The candidate should be in the last year of a master’s degree in computer engineering. Ideally the candidate would demonstrate the following skills:

  • Proficient in Python programming

  • Knowledge of some open source tools for Big Data (Spark, Hadoop, Flink, Storm, Hive)

  • Interest in performance tuning

  • Proactive

  • Team player

 

 

"Sigfox, as a learning organization and open minded on Diversity is ready to welcome Extra-ordinary people and adapt their Workplace."

Detail

Labège

Internship