Published on 12.05.2017

Engineering

Big Data developer internship

Here at Sigfox we set up an analytics platform in order to allow different services to build their own data analysis. The platform is based upon a Datalake/Datamart architecture which centralises most of the company data. The aim of the internship is to improve the ETL process which populate the datalake/datamart by implementing a smart automatic schema change detection and adaptation.

 

MAIN RESPONSIBILITIES

As part of the Data Valorisation team, the intern will be assigned the following tasks:

  • get familiar with our current ETL process
  • design, implement and validate a Spark library for automatic schema change detection
  • study the option of publishing the library as open source project
  • write library technical documentation

Technical environment: Apache Spark, Amazon AWS, Python

 

KEY SKILLS

Student in computer science with a specialization in big data technologies:

  • Knowledge of bigdata architectures and bigdata oriented DBMS
  • Skilled in relational algebra/model
  • Proficient in the following programming languages: SQL, bash, python.
  • Knowledge of Scala or Java would be a plus
  • First experience on map reduce paradigm implementation (Hadoop, Spark, Hive) is a plus
  • Team player, autonomous, good communication skills and open minded

French and English speaking

 

Detail

Toulouse

Internship