【数据结构】树状数组笔记

树状数组(Binary Indexed Tree, BIT)

  • 本质上是按照二分对数组进行分组,维护和查询都是O(lgn)的复杂度
  • 树状数组与线段树:树状数组和线段树很像,但能用树状数组解决的问题,基本上都能用线段树解决,而线段树能解决的树状数组不一定能解决。相比较而言,树状数组效率要高很多。
  • lowbit
    • lowbit = x & (-x)
    • lowbit(x)也可以理解为能整除x的最大的2的幂次
  • c[i]存放的是在i号之前(包括i号)lowbit(i)个整数的和(即:c[i]的覆盖长度是lowbit(i) )
  • 树状数组的下标必须从1开始

单点更新,区间查询

int getsum(int x)函数:返回前x个整数之和

  • 如果要求[x, y]之内的数的和,可以转换成getsum(y) – getsum(x – 1)来解决

void update(x, v)函数:将第x个数加上一个数v

经典应用:统计序列中在元素左边比该元素小的元素个数

如果是求序列第k大的问题:

可以用二分法查询第一个满足getsum(i) >= k的i

如果给定一个二维整数矩阵A,求A[1][1]~A[x][y]这个子矩阵中所有元素之和,以及给单点A[x][y]加上整数v:

只需把getsum和update函数中的for循环改为两重

区间更新,单点查询

  • 将getsum改为沿着i增大lowbit(i)的方向
  • 将update改为沿着i减小的lowbit(i)的方向
  • c[i]不再表示这段区间的元素之和,而是表示这段区间每个数被加了多少
  • int getsum(int x)返回第x个整数的值(就是从小块到大块累加一共被增加了多少)

  • void update(int x, int v)是将前x个整数都加上v


  • 所以,~~i从x往后是从小块更新到大块c[i],i从x往前是累加前面的覆盖块的值~

1057. Stack (30)-PAT甲级真题(树状数组)

  • 求栈内所有元素的中位数:用排序查询的方法会超时~~~用树状数组,即求第k = (s.size() + 1) / 2大的数。查询小于等于x的数的个数是否等于k的时候用二分法更快~

1057. Stack (30)-PAT甲级真题(树状数组)

Stack is one of the most fundamental data structures, which is based on the principle of Last In First Out (LIFO). The basic operations include Push (inserting an element onto the top position) and Pop (deleting the top element). Now you are supposed to implement a stack with an extra operation: PeekMedian — return the median value of all the elements in the stack. With N elements, the median value is defined to be the (N/2)-th smallest element if N is even, or ((N+1)/2)-th if N is odd.

Input Specification:

Each input file contains one test case. For each case, the first line contains a positive integer N (<= 105). Then N lines follow, each contains a command in one of the following 3 formats:

Push key
Pop
PeekMedian
where key is a positive integer no more than 105.

Output Specification:

For each Push command, insert key into the stack and output nothing. For each Pop or PeekMedian command, print in a line the corresponding returned value. If the command is invalid, print “Invalid” instead.

Sample Input:
17
Pop
PeekMedian
Push 3
PeekMedian
Push 2
PeekMedian
Push 1
PeekMedian
Pop
Pop
Push 5
Push 4
PeekMedian
Pop
Pop
Pop
Pop
Sample Output:
Invalid
Invalid
3
2
2
1
2
4
4
5
3
Invalid

题目大意:现请你实现一种特殊的堆栈,它多了一种操作叫“查中值”,即返回堆栈中所有元素的中值。对于N个元素,若N是偶数,则中值定义为第N/2个最小元;若N是奇数,则中值定义为第(N+1)/2个最小元。
分析:用排序查询的方法会超时~~用树状数组,即求第k = (s.size() + 1) / 2大的数。查询小于等于x的数的个数是否等于k的时候用二分法更快~

 

L3-002. 堆栈-PAT团体程序设计天梯赛GPLT(树状数组)

大家都知道“堆栈”是一种“先进后出”的线性结构,基本操作有“入栈”(将新元素插入栈顶)和“出栈”(将栈顶元素的值返回并从堆栈中将其删除)。现请你实现一种特殊的堆栈,它多了一种操作叫“查中值”,即返回堆栈中所有元素的中值。对于N个元素,若N是偶数,则中值定义为第N/2个最小元;若N是奇数,则中值定义为第(N+1)/2个最小元。

输入格式:

输入第一行给出正整数N(<= 105)。随后N行,每行给出一个操作指令,为下列3种指令之一:

Push key
Pop
PeekMedian
其中Push表示入栈,key是不超过105的正整数;Pop表示出栈;PeekMedian表示查中值。

输出格式:

对每个入栈指令,将key入栈,并不输出任何信息。对每个出栈或查中值的指令,在一行中打印相应的返回结果。若指令非法,就打印“Invalid”。

输入样例:
17
Pop
PeekMedian
Push 3
PeekMedian
Push 2
PeekMedian
Push 1
PeekMedian
Pop
Pop
Push 5
Push 4
PeekMedian
Pop
Pop
Pop
Pop
输出样例:
Invalid
Invalid
3
2
2
1
2
4
4
5
3
Invalid

分析:如果排序查找的话会超时,用树状数组,即求第k = (s.size() + 1) / 2大的数。查询小于等于x的数的个数是否等于k的时候用二分法更快~

 

1075. PAT Judge (25)-PAT甲级真题

The ranklist of PAT is generated from the status list, which shows the scores of the submittions. This time you are supposed to generate the ranklist for PAT.

Input Specification:

Each input file contains one test case. For each case, the first line contains 3 positive integers, N (<=104), the total number of users, K (<=5), the total number of problems, and M (<=105), the total number of submittions. It is then assumed that the user id’s are 5-digit numbers from 00001 to N, and the problem id’s are from 1 to K. The next line contains K positive integers p[i] (i=1, …, K), where p[i] corresponds to the full mark of the i-th problem. Then M lines follow, each gives the information of a submittion in the following format:

user_id problem_id partial_score_obtained

where partial_score_obtained is either -1 if the submittion cannot even pass the compiler, or is an integer in the range [0, p[problem_id]]. All the numbers in a line are separated by a space.

Output Specification:

For each test case, you are supposed to output the ranklist in the following format:

rank user_id total_score s[1] … s[K]

where rank is calculated according to the total_score, and all the users with the same total_score obtain the same rank; and s[i] is the partial score obtained for the i-th problem. If a user has never submitted a solution for a problem, then “-” must be printed at the corresponding position. If a user has submitted several solutions to solve one problem, then the highest score will be counted.

The ranklist must be printed in non-decreasing order of the ranks. For those who have the same rank, users must be sorted in nonincreasing order according to the number of perfectly solved problems. And if there is still a tie, then they must be printed in increasing order of their id’s. For those who has never submitted any solution that can pass the compiler, or has never submitted any solution, they must NOT be shown on the ranklist. It is guaranteed that at least one user can be shown on the ranklist.

Sample Input:
7 4 20
20 25 25 30
00002 2 12
00007 4 17
00005 1 19
00007 2 25
00005 1 20
00002 2 2
00005 1 15
00001 1 18
00004 3 25
00002 2 25
00005 3 22
00006 4 -1
00001 2 18
00002 1 20
00004 1 15
00002 4 18
00001 3 4
00001 4 2
00005 2 -1
00004 2 0
Sample Output:
1 00002 63 20 25 – 18
2 00005 42 20 0 22 –
2 00007 42 – 25 – 17
2 00001 42 18 18 4 2
5 00004 40 15 0 25 –

分析:结构体数组中passnum统计完整通过的题目个数,isshown在用户有一题通过了编译器(不管得不得0分)的时候置为true。vector<int> score;记录每门课的最高分
因为没有通过编译器的分数为0,但是没有提交过的分数为“-”,所以把有一门课每次都是未通过编译器的那门课分数置为-2。初始化数组分数为-1,所以可以根据-1和-2判断当前分数是提交过了没通过编译器的,还是没提交过的题目
注意:因为最后一个测试样例是有一个人一开始得到了分数,后来提交了一次没有通过编译器的,所以要判断在分数每次更新最大值之后if(v[id].score[num] == -1),说明最好成绩只是-1(也就是没通过编译器或者没有提交过),这个时候再置v[id].score[num] = -2,否则会误操作把已经提交过很好分数的人的成绩抹掉成了-2

 

1071. Speech Patterns (25)-PAT甲级真题(map应用)

People often have a preference among synonyms of the same word. For example, some may prefer “the police”, while others may prefer “the cops”. Analyzing such patterns can help to narrow down a speaker’s identity, which is useful when validating, for example, whether it’s still the same person behind an online avatar.

Now given a paragraph of text sampled from someone’s speech, can you find the person’s most commonly used word?

Input Specification:

Each input file contains one test case. For each case, there is one line of text no more than 1048576 characters in length, terminated by a carriage return ‘\n’. The input contains at least one alphanumerical character, i.e., one character from the set [0-9 A-Z a-z].

Output Specification:

For each test case, print in one line the most commonly occurring word in the input text, followed by a space and the number of times it has occurred in the input. If there are more than one such words, print the lexicographically smallest one. The word should be printed in all lower case. Here a “word” is defined as a continuous sequence of alphanumerical characters separated by non-alphanumerical characters or the line beginning/end.

Note that words are case insensitive.

Sample Input:
Can1: “Can a can can a can? It can!”
Sample Output:
can 5

题目大意:统计单词个数~大小写字母+数字的组合才是合法的单词,给出一个字符串,求出现的合法的单词的个数最多的那个单词,以及它出现的次数。如果有并列的,那么输出字典序里面的第一个~~
分析:用map很简单的~不过呢~有几个注意点~:
1. 大小写不区分,所以统计之前要先s[i] = tolower(s[i]);
2. [0-9 A-Z a-z]可以简写为cctype头文件里面的一个函数isalnum~~
3. 必须用getline读入一长串的带空格的字符串~~
4. 一定要当t不为空的时候m[t]++,因为t为空也会被统计的!!!~~
5. 最重要的是~如果i已经到了最后一位,不管当前位是不是字母数字,都得将当前这个t放到map里面(只要t长度不为0)~

 

1055. The World’s Richest (25)-PAT甲级真题

Forbes magazine publishes every year its list of billionaires based on the annual ranking of the world’s wealthiest people. Now you are supposed to simulate this job, but concentrate only on the people in a certain range of ages. That is, given the net worths of N people, you must find the M richest people in a given range of their ages.

Input Specification:

Each input file contains one test case. For each case, the first line contains 2 positive integers: N (<=105) – the total number of people, and K (<=103) – the number of queries. Then N lines follow, each contains the name (string of no more than 8 characters without space), age (integer in (0, 200]), and the net worth (integer in [-106, 106]) of a person. Finally there are K lines of queries, each contains three positive integers: M (<= 100) – the maximum number of outputs, and [Amin, Amax] which are the range of ages. All the numbers in a line are separated by a space.

Output Specification:

For each query, first print in a line Case #X: where X is the query number starting from 1. Then output the M richest people with their ages in the range [Amin, Amax]. Each person’s information occupies a line, in the format

Name Age Net_Worth
The outputs must be in non-increasing order of the net worths. In case there are equal worths, it must be in non-decreasing order of the ages. If both worths and ages are the same, then the output must be in non-decreasing alphabetical order of the names. It is guaranteed that there is no two persons share all the same of the three pieces of information. In case no one is found, output “None”.
Sample Input:
12 4
Zoe_Bill 35 2333
Bob_Volk 24 5888
Anny_Cin 95 999999
Williams 30 -22
Cindy 76 76000
Alice 18 88888
Joe_Mike 32 3222
Michael 5 300000
Rosemary 40 5888
Dobby 24 5888
Billy 24 5888
Nobody 5 0
4 15 45
4 30 35
4 5 95
1 45 50
Sample Output:
Case #1:
Alice 18 88888
Billy 24 5888
Bob_Volk 24 5888
Dobby 24 5888
Case #2:
Joe_Mike 32 3222
Zoe_Bill 35 2333
Williams 30 -22
Case #3:
Anny_Cin 95 999999
Michael 5 300000
Alice 18 88888
Cindy 76 76000
Case #4:
None

题目大意:给出n个人的姓名、年龄和拥有的钱,然后进行k次查询,每次查询输出在年龄区间内的财富值的从大到小的前m个人的信息。如果财富值相同就就先输出年龄小的,如果年龄相同就把名字按照字典序排序输出~
分析:不能先排序然后根据每一个条件再新建一个数组、对新数组排序的方法,这样测试点2会超时~因为n和m的悬殊太大了,n有10的5次方,m却只有100个。所以先把所有的人按照财富值排序,再建立一个数组book标记每个年龄段拥有的人的数量,遍历数组并统计相应年龄的人数,当 当前年龄的人的数量不超过100的时候压入新的数组,多出来的不要压入新数组中(也就是说只取每个年龄的前100名,因为一个年龄段最小的就是一个年龄,即使这样也不会超过100个需要输出),再从这个新的数组里面取符合相应年龄的人的信息~~