Code that Performs and Outperforms – Prepare for the Performance Requirements of Tomorrow

Learn how to harness the power and performance of Intel® technologies in your Financial Services Solutions and compute-intense applications—for free. Seats are limited to get the latest developments in Intel tools and technologies that can make your applications work better, faster and smarter. The 2017 Intel® Software Developer Conference is an information-packed one day technical training event. Don’t miss the opportunity to share best practices and techniques to help realize the full potential of today’s and tomorrow’s compute environment.

Register today for exclusive access to:

  • Learn how to modernize existing or new code to maximize performance on current and future Intel® Xeon® and Intel® Xeon Phi™ processors
  • Gain deep insights into the latest programming techniques and tools for achieving the highest performance on Intel® architecture with C / C ++ or Fortran
  • Accelerate your machine learning performance with a faster Python*
  • Get the latest information for enabling the future of Artificial Intelligence
  • Learn about FPGA-based acceleration of compute-intensive workloads in finance
  • And much more

 

Register here ›

When:
October 3, 2017
Where:

CCT Venues Plus Bank Street
Level 32
40 Bank Street
London
E14 5NR

October 3, 2017

8:00 - 9:30
Registration
9:30 - 10:00
Ways to Achieve Peak Performance on Intel® Hardware

High-performance doesn’t just happen. To unlock the power of modern high-performance computers, software needs to be designed with modern coding techniques vectorization, multithreading, memory optimization, and more. In this information-packed session, Jim will discuss how to optimize performance on the extensive variety of Intel® hardware, including an introduction to the latest innovations in code modernization tools and techniques that are helping developers achieve peak performance on Intel® hardware.

Jim Cownie
Jim Cownie
About the Speaker

Jim Cownie is an ACM Distinguished Engineer and Intel Principal Engineer. He has been involved with parallel computing since starting to work for Inmos in 1979. Along the way he owned the profiling chapter in the MPI-1 standard and has worked on parallel debuggers and OpenMP implementations. If he wasn't here, he would rather be skiing.

10:00 - 10:45
Intel® Xeon® Scalable Processor: The Foundation of Data Centre Innovation

The new Intel Xeon Processor Scalable family represents a major architectural leap forward in processor architecture and platform advancements, delivering workload-optimized performance for compute, network and storage. In his talk, Toby will discuss the key technology attributes of the processor formerly codenamed Skylake-SP and surrounding platform ingredients. From the perspective of the processor and system architecture, we’ll highlight the areas relevant to high performance software.

Toby Smith
Toby Smith
About the Speaker

Toby Smith is a Solution Architect inside Intel’s end-user team and focuses on Government and High Performance Computing customers. He helps customers to design and deploy leading edge solutions based on Intel’s portfolio of Compute, Storage and Fabric/Ethernet technologies. Toby joined Intel in 2000 and has held a variety of Data Centre-related positions, and prior to his current role worked closely with a European based OEM, primarily driving the long-range architecture and technology partnership.

10:45 - 11:15
Break
11:15 - 11:45
Optimizing for Latest Processors with Intel® Parallel Studio XE

Learn how to modernize your code for Performance, Portability and Scalability on the Latest Intel® Platforms. In particular, this talk will cover:

  • Intel® Advanced Vector Extensions 512 (Intel® AVX-512) instructions on - Intel® Xeon® and Intel® Xeon® Phi™ processors and coprocessors
  • Intel® Advisor Roofline Analysis – Find high-impact, under-optimized loops
  • Intel® Distribution for Python* – Faster Python applications
  • Application Performance Snapshot – Get quick answers: Does my hybrid code need optimization?
  • Intel® VTune™ Amplifier – Profile private clouds with Docker* and Mesos* containers, Java* daemons
Michael Steyer & Cedric Andreolli
About the Speakers

Michael Steyer is a technical consulting engineer supporting technical and high performance computing segments within the Software & Services Group at Intel. His main focus are software analysis tools for HPC applications at scale as well as MPI runtime tuning. Prior to his position at Intel, he worked in the field of enterprise Mainframe computing for financial services.

Cédric Andreolli started to work at Intel in 2013 as Application Engineer for Oil and Gas industry spending most of his time optimizing applications for Intel architectures (Xeon and Xeon Phi). He also spent some time developing a tool based on genetic algorithm to optimize compilation and execution parameters. End of 2016, Cédric joined the EMEA Technical Consulting Engineering team where he works on technologies such as the Intel® compilers and libraries, Intel® Advisor and Intel® Vtune™.

11:45 - 12:15
Visualizing and Finding Optimization Opportunities with Intel® Advisor Roofline feature

An Intel Advisor Roofline Analysis provides insight into:

  • Where your performance bottlenecks are
  • How much performance is left on the table because of them
  • Which bottlenecks are possible to address, and which ones are worth addressing
  • Why these bottlenecks are most likely occurring
  • What your next steps should be

While the Roofline chart is not a conversion table that directly tells you exactly what changes need to be made in your code, it is an incredibly useful diagnosis tool. Examples will be shown with OpenMP code to demonstrate best practices with this new, cutting-edge analysis tool.

Cedric Andreolli
Cedric Andreolli
About the Speaker

Cédric Andreolli started to work at Intel in 2013 as Application Engineer for Oil and Gas industry spending most of his time optimizing applications for Intel architectures (Xeon and Xeon Phi). He also spent some time developing a tool based on genetic algorithm to optimize compilation and execution parameters. End of 2016, Cédric joined the EMEA Technical Consulting Engineering team where he works on technologies such as the Intel® compilers and libraries, Intel® Advisor and Intel® Vtune™.

12:15 - 13:00
Intel® Distribution for Python* – Advantages and Acceleration of Machine Learning Workloads

Python is a popular open source scripting language known for its easy-to-learn syntax and an active developer community. Performance, however, remains a key drawback due to Python being an interpreted language and the implementation of the GIL lock. The Intel® Distribution for Python* is an easy-to-install, optimized Python distribution that includes the popular NumPy and SciPy stack packages used for scientific, engineering and data analysis. It tunes and leverages the powerful Intel® Math Kernel Library to offer significant performance gains, enhancing the performance profile of your application. For example, DGEMM functions deliver 3x speedups on single core and show impressive scalability on multiple cores. The distribution’s easy out-of-the box installation saves you time and effort, so even a novice Python user can focus on the application at hand, rather than setting up the Python infrastructure.

Michael Steyer
Michael Steyer
About the Speaker

Michael Steyer is a technical consulting engineer supporting technical and high performance computing segments within the Software & Services Group at Intel. His main focus are software analysis tools for HPC applications at scale as well as MPI runtime tuning. Prior to his position at Intel, he worked in the field of enterprise Mainframe computing for financial services.

13:00 - 13:45
Lunch
13:45 - 14:15
Optimizing Threaded Code Performance and Scalability

Learn how Intel® VTune™ Amplifier can help you uncover common performance and scalability issues. And see effective solutions using OpenMP* 4.0 that can dramatically improve performance on modern processors. Intel VTune Amplifier provides critical performance insights to help you identify whether problems are due to imbalance, lock contention, creation overhead, or scheduling overhead. OpenMP 4.0 provides new capabilities to achieve explicit SIMD vectorization, in addition to threading. Using both explicit SIMD vectorization and threading in an application allows for optimal usage of the newest Intel® hardware.

Cedric Andreolli
Cedric Andreolli
About the Speaker

Cédric Andreolli started to work at Intel in 2013 as Application Engineer for Oil and Gas industry spending most of his time optimizing applications for Intel architectures (Xeon and Xeon Phi). He also spent some time developing a tool based on genetic algorithm to optimize compilation and execution parameters. End of 2016, Cédric joined the EMEA Technical Consulting Engineering team where he works on technologies such as the Intel® compilers and libraries, Intel® Advisor and Intel® Vtune™.

14:15 - 15:15
Performance tuning an exotic derivatives library

Within banking there is an ever-increasing demand for scenario, VaR and risk calculations and a consequent ever-increasing compute cost. This talk will look at the Intel tools, and the methods, used in significantly improving the performance of an exotic-derivatives library. It will additionally discuss the particular issues affecting performance tuning within the constraints of existing hardware, software and business environments.

Jason Charlesworth
Jason Charlesworth
About the Speaker

Jason Charlesworth runs the Roots team at Citi, focusing on numerical and high-performance computing. He’s spent the last 15 years working in banking and has previously worked in academia, secure computing, industrial research and fintech.

15:15 - 15:45
Break
15:45 - 16:15
Why would I use Threading Building Blocks?

Threading Building Blocks (TBB) is a powerful open source C++ template library for task parallelism. It lets you easily write parallel C++ programs that are portable, composable and scalable, that can thus take full advantage of multicore CPUs. TBB is now ten years old but continues to evolve its capabilities, specifically to support additions to the C++ standard for threading and vectorization, and to handle the increasing importance of heterogeneous compute resources. Intel’s STAC-A2 benchmark code uses TBB, and we will discuss some details of that as additional motivation!

Jim Cownie
Jim Cownie
About the Speaker

Jim Cownie is an ACM Distinguished Engineer and Intel Principal Engineer. He has been involved with parallel computing since starting to work for Inmos in 1979. Along the way he owned the profiling chapter in the MPI-1 standard and has worked on parallel debuggers and OpenMP implementations. If he wasn't here, he would rather be skiing.

16:15 - 17:00
Enabling the Future of Artificial Intelligence

Artificial intelligence is unlocking tremendous economic value across various market sectors. And although individual data scientists can draw from several open source frameworks and basic hardware resources during the initial investigative phases, they quickly require significant hardware and software resources to build and deploy production models.
Intel offers various hardware to support a diversity of workloads and user needs, including a competitive deep-learning platform to help data scientists start from the iterative, investigatory phase and easily take models all the way to deployment. This platform is designed for speed and scale, and serves as a catalyst for all types of organizations to benefit from the full potential of deep learning. Examples of supported applications include—but are not limited to—automotive speech interfaces, image search, language translation, agricultural robotics and genomics, financial document summarization, and finding anomalies in IoT data.
This talk will detail what Intel has done and plans to do—from hardware to software to state-of-the-art algorithms—to democratized AI.

Jacek Czaja
Jacek Czaja
About the Speaker

Jacek Czaja is a Machine Learning engineer in Intel Artificial Intelligence Product Group (AIPG) solutions enablement team. He is responsible for enabling and optimizing AI solutions into Intel platforms. He is a computer scientist with a passion for applicable machine learning. Prior to joining Intel he worked as Developer Technology engineer at Imagination Technologies. He got his M.Eng. from Gdansk University of Technology.

17:00 - 17:30
FPGA based acceleration of compute-intensive workloads in finance

Differing levels of task/data parallelism on FPGA can produce superior performance with a very small power budget and very low latency. To enable this acceleration, developers have historically faced a choice limited to either OpenCL or C language representations of their algorithms. Recent development now provides a third alternative in the form of the FinLib library provided by Intel. In this talk, we will introduce the high level design flows and library use models. We will also provide application examples to demonstrate the performance of FPGA based implementation of a few relevant workloads.

Suleyman Demirsoy & Stephen Weston
About the Speakers

Suleyman Demirsoy is a Systems FAE in the Programmable Solutions Group covering the High Performance Computing engagements in EMEA. He has been actively working with FPGA technology for 15 years at various levels. He has a PhD in digital signal processing.

Stephen Weston is a Principal Engineer and Libraries Architect in the Programmable Solutions Group at Intel Corporation. He has over 25 years of experience in investment banking in trading risk management and quantitative research, latterly as head credit quant at JP Morgan where he developed a system for real-time credit derivatives risk. He is also a visiting professor in computational finance at Imperial College and holds a PhD in mathematical finance.

17:30 - 18:00
Closing and Networking