Inquiries about solving the N-S equation, projection algorithm and parallel

I have written a FreeFEM code to compute vorticity formulation of incompressible flow using problem, which is a bit slow and can’t be parallelized.
So I tried to write a matrix form but met some problem.
I found an algorithm to solve this problem(JIAN-GUO LIU AND WEINAN E, 2000"SIMPLE FINITE ELEMENT METHOD IN VORTICITY FORMULATION FOR INCOMPRESSIBLE FLOWS"), which decouples the vorticity \omega and stream function \psi :




image
The X_{0,h}^k is the subspace of X_h^k with zero boundary value.
To compute an auxiliary term \bar{\omega}^{n+1}, I need to get a submatrix (A_{0,0} A_{0,b}) from the stiffness matrix A to compute right hand side of (I’), and compute the nonlinear term N.
In my code using problem, the explicit nonlinear term (\nabla \varphi,\omega \nabla^{\perp}\psi) is:

grad(v1)'*[-dy(psiold),dx(psiold)]*omegaold

How to write a varf of nonlinear term as (2.6)?
I know that I can use varf Stiffness(u,v)=int2d(Th)(grad(u)'*grad(v)); and set tgv=-10 to build a matrix of Stiffness with zero rows of label 1 , but I have no idea to build A_{0,b}.
I found the method to get list of DoF boundary in Frédéric Hecht’s reply:

If I get the list of DoF on boundary, should I permutate the rows and columns of Stiffness to get A?Or is there a better way to implement such algorithm?
Could you give me some advice?