Research Scientist — Big Data Integration

Job Description

The Biomedical Informatics Department, at Emory University, is looking for a Research Scientist in Big Data Management and Integration. You will be responsible for researching and developing data engineering solutions to integrate and federate data from various sources of cancer research in the Internet scale. As a member of our research group you will be expected to design, implement, and deploy distributed systems to manage data of heterogeneous nature, storage, and communication. We are looking for someone with a solid understanding and experience on distributed systems and cloud computing and is interested in learning and applying cutting edge technologies for the challenges of performance and scale. The ideal candidate would be an independent, yet a team player with keen interest to collaborating and coordinating with the team working on the data sciences research.

This work is part of a large, multi-year, NIH-funded project. What this means is that you will help develop an exciting piece of technology that will be part of a national infrastructure supporting cancer research — one that brings together disparate cancer datasets and gives the research community the ability to examine data like never before. Your work will have direct impact on cancer research and will become widely known through national and international adoption, collaborative participation in large-scale projects and open source software efforts, and most importantly, in advancing healthcare.

The ideal candidate is one who is motivated by challenging applications and is interested in exploring how advances in distributed computing and cloud computing can benefit healthcare. We collaborate extensively with well-known research groups from prominent institutions, and draw upon the first rate technical and scientific resources available at Emory and Georgia Tech.


You will be primarily developing server side applications using Java and a variety of open-source big data frameworks such as Apache Drill and Hadoop. You will build upon a suite of existing systems that have been built in our lab and help add new functionality and increase the scale of storage, complexity of data, and the variety of data types.


  • PhD in Computer Science or Computer Engineering with an emphasis on Distributed Computing/Cloud Computing/HPC.
  • Expert in developing enterprise applications with Java, with experience in concurrent computing and streams.
  • Experienced in distributed systems principles execution frameworks such as Hazelcast or Infinispan, and distributed storage and processing frameworks such as Apache Hadoop.
  • Working knowledge of SQL and NoSQL databases
  • Comfortable with public and private cloud deployments, including the Amazon web services (AWS)
  • Experienced in web services engines and development of RESTful APIs and web services
  • Experienced in various architectures and paradigms such as SOA and OSGi
  • Unit and integration testing, version control (Git), and project management tools like Maven
  • Track record of presenting at premier conferences and publishing in relevant journals

Salary is commensurate with experience. These are ful-time staff positions and are elligible for Emory's excellent benefits package incl. health and retirement benefits ->[See Here]

How to Apply

For more information please email a copy of your CV, a research statement that summarizes your interests and achievements, and two relevant papers to with the subject “Research Scientist – Data Integration