At Hyperscience, our machine learning (ML) engineers are committed to collaborating on cutting-edge natural language processing (NLP) research and development. Since joining Hyperscience one year ago, Jocelyn Beauchesne has dedicated his time to our human-centered approach to automation, working with his teammates to create better and faster experiences for our customers.
Inspired by the limitless possibilities of ML, Jocelyn spent four months in late 2021 mentoring three NYU Center for Data Science graduate students on their NLP Capstone project, a course designed for graduate students to apply their knowledge and put it into practice to ultimately solve real-word problems.. What transpired was a whirlwind of knowledge sharing between Di He, Congyun Jin, and Xinli Gu. We chatted with the teammates to learn more about their interest in ML, mentorship from Jocelyn, and future artificial intelligence (AI) predictions.
How did you first become interested in machine learning and artificial intelligence?
Congyun: When I was in my third year at University of Toronto, I took my first ML course which really opened my eyes to the power and complexity of computer algorithms. My first assignment in the course focused on handwritten digit recognition. After completing the assignment, I was utterly astonished that the program could recognize images humans could barely distinguish by using only a few code chunks.
As I dove deeper into my classes and research, I became fascinated by the power of AI and how it can solve a wide range of problems impacting our world today, including: tumor classification, automatic image captioning, medical diagnosis, speech recognition, question answering, and robotic control.
Why did you choose to pursue a master’s degree in Data Science from NYU?
Congyun: I received my bachelor’s degree in computer science, statistics, and economics, so I chose to pursue a master’s in data science because it matched my background perfectly. Data science is an interdisciplinary field using scientific methods and algorithms to extract knowledge and insights from data and make business decisions. In addition, the Center for Data Science within NYU is a world-renowned data science training and research facility andis rich in teaching resources with plentiful opportunities to work side-by-side with some of the brightest minds in AI.
Talk to us about your Capstone experience. How did you choose the topic? Can you describe the research process?
Di: There were roughly 50 Capstone topics this year at the Center for Data Science, including: Machine Learning, Image Recognition, Signal De-nosing, and Algorithmic Trading. Our group is very interested in the applications of NLP, and we found the topic “Question Answering on Long Context” to be both meaningful and challenging, securing this as our topic of choice.
We began our project by reviewing some of the literature and discussing our initial ideas with Jocelyn. Next, we implemented our baseline model to get the first round of results. Then we explored several different methods and implemented them to increase performance. The purpose of this stage is to continuously find something useful by testing new ideas and models. Finally, we built an automatic pipeline to be used for Question Answering.
What was the goal of the project?
Di: The general problem of answering questions from a context has been studied in depth in recent years. Specifically, SQuAD is a dataset that was created by selecting passages from Wikipedia articles, then pairing them with questions and corresponding answers from the selected passage. In the real world, however, we don’t typically know which passage in a document contains the answer to a question. The goal of this project was to remove that limitation and answer the same questions starting from the whole Wikipedia article, rather than the selected passages containing the answer.
What guidelines did Jocelyn give you throughout your four-month partnership?
Di: Before the project kicked off, Jocelyn explained the plan objectives and made a very clear and detailed list of each project stage. We had weekly meetings throughout the lifecycle of the project to discuss research ideas, experiment results, and our next steps. What we also found to be invaluable was that Jocelyn helped us sharpen our business story-telling and presentation skills.
What are some of the most exciting challenges you think machine learning will tackle over the next few years?
Xinli: I think autonomous vehicles will take over the automobile market. Currently, many companies are investing heavily in this space and have made great progress in avoiding obstacles when self-driving. Ultimately, the biggest questions facing autonomous vehicles relate to ethics and unbalanced infrastructure on a global scale. However, as AI and ML continue to develop and those issues are mitigated, autonomous vehicles can eventually appear on regular roads.
Furthermore, I believe ML will achieve more of a spotlight in the healthcare sector. This is because ML and AI can help build clinical decision systems, create smart record keeping, personalized medicine, and analyze medical images. NYU Langone Health, for example, works closely with the NYU Center for Data Science on many medical projects.
Is there anything else you’d like to add?
Di: Thanks to Jocelyn’s mentoring, we won the CDS’ Best Poster award, which was based on professor and student votes. It was truly rewarding to see our project resonate with so many people.
What was your favorite part of working with these graduate students?
Jocelyn: Our weekly meetings were the standout for me. We’d start with an update on the progress they made over the past week, followed by a team brainstorm. Generally speaking, students have a multitude of theoretical knowledge, but less real life experience – especially when it comes to solving ambiguous problems. In a business context, there are many ways to solve a problem. In order to be successful, we have to define our goals through the lens of how they help us derive impact. By sharing this mindset approach with the team, they were able to identify the most impactful next steps.