合集目录
- Java多线程专题1: 并发与并行的基础概念

什么是多线程并发和并行?

并发: Concurrency
特指单核可以处理多任务, 这种机制主要实现于操作系统层面, 用于充分利用单CPU的性能, 时分复用同时处理多个任务
并行: Parallelism
特指使用多核处理单任务或多任务, 这种机制需要同时在操作系统层面和应用层面实现, 用于充分利用多核环境下多CPU的整体性能, 并行处理同一个任务或多个任务.

什么是线程安全问题?

线程安全问题, 就是多个线程同时访问共享的数据, 如果未合理使用volatile或synchronized, 有线程在其线程内存对共享的数据进行了写操作, 并且将其更新回了主内存, 但是其他线程不知道数据已经修改, 导致运行结果与预期不一致的问题.

If two or more threads are sharing an object, without the proper use of either volatile declarations or synchronization, updates to the shared object made by one thread may not be visible to other threads, which will lead to threadsafe problem.

什么是共享变量的内存可见性问题?

需要阅读Java内存模型(Java Memory Model)(JMM), 其中描述了Java程序中各种变量(线程共享变量)的访问规则, 以及在JVM中将变量存储到内存和从内存中读取出变量这样的底层细节.

所有的变量都储存在主内存中
每个线程都有自己独立的工作内存, 里面保存了该线程使用到的变量的副本, 这些副本是主内存中该变量的一份拷贝
线程对共享变量的所有操作都必须在自己的工作内存中进行, 不能从主内存中读写
线程无法直接访问其它线程工作内存中的变量
线程间变量值的传递需要通过主内存来完成

线程1对共享变量的修改要想被线程2及时看到, 必须要经过如下两个步骤:

把工作内存1中更新过的共享变量刷新到主内存中
把主内存中最新的共享变量的值更新到工作内存2中

Imagine that the shared object is initially stored in main memory. A thread running on CPU one then reads the shared object into its CPU cache. There it makes a change to the shared object. As long as the CPU cache has not been flushed back to main memory, the changed version of the shared object is not visible to threads running on other CPUs. This way each thread may end up with its own copy of the shared object, each copy sitting in a different CPU cache.

For example, in 2-core CPU, one thread running on the first core copies the shared object into its core cache, and changes its count variable to 2. This change is not visible to other threads running on the second core, because the update to count has not been flushed back to main memory yet.

以上就是对共享变量的内存可见性的说明. Java语言层面支持的可见性实现方式有两种: synchronized 和 volatile

参考:

什么是Java中的原子性操作?

原子性操作, 指的是程序指令的最小操作单元, 这种操作一次完成, 不会被中断, 不会出现意外的结果.

回答这个问题前, 需要确认一下当前系统的位数: 对于32位系统的原子性操作, 例如i是一个int型整数, i = 1就是一个原子性操作, 这个过程只涉及一个赋值操作. 而i++就不是一个原子操作, 它相当于语句i = i + 1, 这里包括读取i, i + 1, 结果写入内存三个操作单元. 如果操作不符合原子性操作, 那么整个语句的执行就会出现混乱, 导致出现错误的结果, 从而导致线程安全问题.

因此, 在多线程中需要保证线程安全问题, 就应该保证操作的原子性. 如何保证操作的原子性? 一是加锁, 二是使用atomic类型的对象.

在32-bit JVM中的原子操作

all assignments of primitive types except for long and double 除了long, double以外所有的赋值操作
all assignments of references 所有引用的赋值
all operations of java.concurrent.Atomic… classes 使用java.concurrent.Atomic开头的类方法进行的操作
all assignments to volatile longs and doubles 添加了volatile的long和double赋值操作

为什么long型赋值不是原子操作

回答这个问题前, 也需要确认一下当前系统的位数, 这个问题对于32位系统是对的

long foo = 65465498L;

在32位JVM中会分两步写入这个long变量: 先写低32位, 再写高32位, 这个操作是可以被硬件中断的, 因此是线程不安全的. 为保证线程安全需要加上volatile

private volatile long foo;

而在64位系统中, 因为64位的引用类型其赋值是原子的, 所以对long和double的赋值也是原子的

什么是Java中的CAS(Compare And Swap)操作, AtomicLong实现原理

要实现无锁(lock-free)的非阻塞算法有多种实现方法, 其中 CAS(Compare and swap) 是一种重要的无锁实现.

在大多数处理器架构, 包括IA32, Space中采用的都是CAS指令, CAS的语义是我认为K的值应该为A, 如果是那么将K的值更新为B, 否则不修改并告诉K的值实际为多少.

CAS属于乐观锁, 当多个线程尝试使用CAS同时更新同一个变量时, 只有其中一个线程能成功更新, 其它线程会失败, 但是失败的线程并不会被挂起, 而是自旋再次尝试.

CAS有3个操作数, 内存地址K, 旧值A, 新值B. 当且仅当旧值A和内存K的值相同时, 将内存V的值修改为B, 否则什么都不做. CAS的伪代码可以表示为：

do{
  备份旧值;
  基于旧旧值构造新值;
} while (!CAS( 内存地址, 旧值, 新值 ))

在Java中的CAS实现

    public final long getAndIncrement() {
　　　　   // 当设置失败时, 循环再次尝试直至成功
        while (true) {
            long current = get();
            long next = current + 1;
　　　　　　     //调用compareAndSet方法
            if (compareAndSet(current, next))
                return current;
        }
    }    public final boolean compareAndSet(long expect, long update) {
　　　　// valueOffSet为内存中的值, expect的值为旧的预期值, 该线程执行getAndIncrement()函数时, 通过get()获取的当时的变量值
　　  　 // update=expect+1
　　　   // 只有valueOffset=expect时才会把变量的值设置为update
　 　　  return unsafe.compareAndSwapLong(this, valueOffset, expect, update);
    }

AtomicInteger这些常用于counter, 以及生成uuid

参考

原子操作是如何实现的？

CAS底层是怎么实现的?

一般来讲, 问到unsafe就差不多了. 对于有些打算问到底的, 可以对这部分了解一下.

不同的系统架构使用了不同的实现, 但是底层都是通过汇编直接调用芯片的指令实现的. 当今的多核架构直接在硬件上支持CAS指令(一个指令完成CAS)

x86架构, 调用的指令为CMPXCHG, 这个指令可以实现4个字节(32bit)的CAS
x64为CMPXCHGQ或CMPXCHG8B, 这个指令可以实现8个字节(64bit)的CAS.

In the x86 (since 80486) and Itanium architectures this is implemented as the compare and exchange (CMPXCHG) instruction (on a multiprocessor the LOCK prefix must be used).

As of 2013, most multiprocessor architectures support CAS in hardware, and the compare-and-swap operation is the most popular synchronization primitive for implementing both lock-based and non-blocking concurrent data structures.

对于更早期的系统, 可以通过多条指令完成, 在指令前后通过lock/unlock, 或criticial进行保护.

参考

什么是Java指令重排序?

There are a number of cases in which accesses to program variables (object instance fields, class static fields, and array elements) may appear to execute in a different order than was specified by the program. The compiler is free to take liberties with the ordering of instructions in the name of optimization. Processors may execute instructions out of order under certain circumstances. Data may be moved between registers, processor caches, and main memory in different order than specified by the program.

For example, if a thread writes to field a and then to field b, and the value of b does not depend on the value of a, then the compiler is free to reorder these operations, and the cache is free to flush b to main memory before a. There are a number of potential sources of reordering, such as the compiler, the JIT, and the cache.

The compiler, runtime, and hardware are supposed to conspire to create the illusion of as-if-serial semantics, which means that in a single-threaded program, the program should not be able to observe the effects of reorderings. However, reorderings can come into play in incorrectly synchronized multithreaded programs, where one thread is able to observe the effects of other threads, and may be able to detect that variable accesses become visible to other threads in a different order than executed or specified in the program.

Most of the time, one thread doesn’t care what the other is doing. But when it does, that’s what synchronization is for.

指令重排序有两个层面

在虚拟机层面
为了尽可能减少内存操作速度远慢于CPU运行速度所带来的CPU空置的影响, 虚拟机会按照自己的一些规则将程序编写顺序打乱——即写在后面的代码在时间顺序上可能会先执行, 而写在前面的代码会后执行, 以尽可能充分地利用CPU. 假定一段代码不是a = 1, 而是a = new byte[1024x1024](分配1M空间), 那么它会运行很慢, 此时CPU是等待其执行结束, 还是先执行下面那句flag = true? 显然, 先执行flag = true可以提前使用CPU, 加快整体效率, 当然这样的前提是不会产生错误. 虽然这里有两种情况:
- 后面的代码先于前面的代码开始执行；
- 前面的代码先开始执行, 但当效率较慢, 于是后面的代码开始执行并先于前面的代码执行结束. 不管谁先开始, 总之后面的代码在一些情况下存在先结束的可能.
在硬件层面
CPU会将接收到的一批指令按照其规则重排序, 同样是基于CPU速度比缓存速度快的原因, 和上一点的目的类似, 只是硬件处理的话, 每次只能在接收到的有限指令范围内重排序, 而虚拟机可以在更大层面、更多指令范围内重排序. 硬件的重排序机制参见《从JVM并发看CPU内存指令重排序(Memory Reordering)》