Ashwin Nanjappa

twitter github stackoverflow linkedin goodreads flickr email

I am a research fellow at the Bioinformatics Institute in Singapore, working with Cheng Li on GPU-accelerated machine learning algorithms for computer vision.

I got my PhD at NUS under Tan Tiow Seng, where I developed GPU-accelerated algorithms for 3D Delaunay triangulation and regular triangulation.


My research focuses on developing GPU-accelerated algorithms for computer vision, machine learning and computational geometry. I like to have elegance in algorithm and code, while utilizing and optimizing the complex GPU architecture to the fullest.

Mouse pose estimation


I am developing a realtime algorithm for full-body pose estimation of mouse using depth images. It estimates poses of a 24-joint simplified mouse model in realtime, including the spine, limbs and paws. It works with different depth cameras and types of rodents, thus enabling neuroscientists to study behavorial phenotyping.

Hand pose estimation


GHand is a GPU-accelerated algorithm developed for realtime hand pose estimation from depth images. It can estimate full 3D hand pose with an average joint error of 20mm. It runs fully on the GPU with a realtime performance of 64FPS.

GHand has been demoed successfully in conferences to researchers and in science festivals to the public. It has been found to work well in different camera setups for hands of all shapes, colors and sizes with no prior calibration.


  • GHand: A GPU algorithm for realtime hand pose estimation using depth camera
    Eurographics, 2015

  • Estimate Hand Poses Efficiently from Single Depth Images
    International Journal of Computer Vision (IJCV), 2015

  • Real-time hand pose estimation from depth camera using GPU
    GPU Technology Conference 2014 (South East Asia)


  • Hand Pose Estimation Demo Booth
    Best Booth Award, A*STAR Scientific Conference (ASC) 2014

  • Efficient hand pose estimation from single depth images
    X-periment!, Singapore Science Festival, 2014

Delaunay triangulation


For my PhD thesis, I developed GPU-accelerated algorithms for 3D Delaunay triangulation and 3D regular triangulation. The overarching ideas are to maximize utilization of the massively parallel resources in GPU by dualizing discrete Voronoi, fixing 4D convex hull using star splaying and parallel insertion and fixing of triangulation. Implementations of these algorithms in CUDA are highly optimized and the runtimes are 5-10x faster when compared to the venerable CGAL.


  • GeomGPU: Algorithms of computational geometry on the GPU
    Book website
    (Work in progress)

  • Delaunay mesh generation using the GPU
    Merit Award, NVIDIA Poster Contest,
    GPU Technology Conference 2014 (South East Asia)

  • A GPU accelerated algorithm for 3D Delaunay triangulation
    ACM SIGGRAPH Symposium on Interactive 3D Graphics and Games (I3D), 2014

  • gHull: A GPU algorithm for 3D Convex Hull
    ACM Transactions on Mathematical Software (TOMS), 2013

  • Delaunay triangulation in R³ on the GPU
    PhD Thesis, National University of Singapore, 2012
    Thesis • Code [1, 2] • BibTeX


Source code from my research, my PhD and other projects can be found at Github here. Some of the popular ones are listed here:


The gStar4D algorithm computes the 3D Delaunay triangulation on the GPU. The CUDA implementation of gStar4D is robust and achieves a speedup of up to 5 times over the 3D Delaunay triangulator of CGAL.

The gStar4D algorithm uses neighbourhood information in the 3D digital Voronoi diagram as an approximation of the 3D Delaunay triangulation. It uses this to perform massively parallel creation of stars of each input point lifted to 4D and employs an unique star splaying approach to splay these 4D stars in parallel and make them consistent. The result is the 3D Delaunay triangulation of the input constructed fully on the GPU.


The gDel3D algorithm constructs the Delaunay Triangulation of a set of points in 3D using the GPU. The algorithm utilizes a novel combination of incremental insertion, flipping and star splaying to construct Delaunay. The CUDA implementation is robust and its runtime is 10 times faster when compared to the Delaunay triangulator of CGAL.


The gReg3D algorithm computes the 3D regular (weighted Delaunay) triangulation on the GPU. Our CUDA implementation of gReg3D is robust and achieves a speedup of up to 4 times over the 3D regular triangulator of CGAL.

The gReg3D algorithm extends the star splaying concepts of the gStar4D and gDel3D algorithms to construct the 3D regular (weighted Delaunay) triangulation on the GPU. This algorithm allows stars to die, finds their death certificate and uses methods to propagate this information to other stars efficiently. The result is the 3D regular triangulation of the input computed fully on the GPU.

GPU Coursera

I created this library of code to work offline on the assignments of Heterogenous Parallel Programming, a GPU/CUDA course offered by Coursera. Many folks chipped in and have converted this into an easy to use library for the course.

Tech blog

choorucode is my tech blog which has been regularly updated since 2009. I’ve written and shared over 1700 posts on topics such as CUDA, C++, Vim, Eclipse and Ubuntu. Google has been very generous and my blog has received over 2.8 million visits from programmers all around the world.

Book reviews

Reading, writing about and discussing books is one of my passions. Genres that interest me are the classics, history, science fiction, manga and graphic novels.

Full reviews, ratings and excerpts of the hundreds of books I have read since 2004 can all be found online. If you have read any of these books and have comments or suggestions, I would love to hear from you.