Big data viewed from two directions:
- Us; the 3 Perspectives: Architecture/Algorithm/Programming (exam structured this way, multi-choice section must be answered sequentially)
- Clients, the 5 V Challenges: Variety/Velocity/Volume/Veracity/Value
Exam topics:
- Programming
Knowledge: awareness
Analysis: pros and cons, comparisons to others, is this an appropriate solution to this scenario?
Application (skill):
- Apply an algorithm
- Implement an algorithm
CodeBox environment.
Have access to course notes, sample solutions to labs, locked-down version of Colab. Bring phone for 2FA.
10-15 multiple choice questions
2 hours
Code questions on Spark
Analysis type questions on MPI
Multichoice on GPU programming, message passing, threads, locks and atomics, work queues, schedulers
Algorithms:
- Union, intersection, difference
- Distance/similarity: Jaccard/Cosine
- Matrix-Vector multiplication
- Hashing
- Graphs
Communication cost: Amdahl’s Law, Gustafson’s Law
Weak vs strong scalability
SAMPLE QUESTION
Caching (spatial/temporal locality) Time sensitive operation: logging, DDOS detection
sc.readFile().flatMap(lambda line: [strip(el) for el in line.split(" ") if len(strip(el)) and [next(char) for char in el if not isalpha(el)] is None]).
next((True for char in word if not char.isalpha()), None) is not None